harden(org-external): token via http.extraHeader, .complete cache marker, ref '..' deny, naming cleanup #107

Merged
claude-ceo-assistant merged 1 commits from fix/external-resolver-hardening into staging 2026-05-08 13:04:00 +00:00

Summary

Self-review of molecule-core#105 + #106 (the !external resolver chain) surfaced 3 real correctness/security gaps and 2 readability nits. Fixes all four in one PR since they're the same file's hardening.

(1) Token leakage — fixed

Before: clone URLs built with auth in userinfo (https://oauth2:TOKEN@host/repo.git). Two leak paths:

  • Token persisted in cloned repo's .git/config.
  • Token could appear in clone error output captured via cmd.CombinedOutput().

After: clone URL has no userinfo. Auth layered via -c http.extraHeader=Authorization: token ... (sent per-request, not persisted). redactToken() pass over any error string as belt-and-braces.

Tradeoff: token now visible in ps aux for the duration of the git child process (same as before via the env var), but not in persistent state.

(2) Cache-validity footgun — fixed

Before: cache-hit was cacheDir/.git exists. A clone interrupted after .git creation but before content finished would leave a partially-written cache that subsequent imports treated as hit → stale content forever, no self-heal.

After: cache-hit requires a .complete marker file written only AFTER successful clone+rename. Partially-written cache is treated as cache-miss and re-fetched cleanly (after RemoveAll to avoid blocking the new clone's mkdir).

(3) Ref .. deny — fixed

Before: safeRefPattern ^[a-zA-Z0-9_./-]+$ allowed .. as a substring. Git itself rejects most such refs, but defense-in-depth: don't depend on the downstream tool's validation at the input boundary.

After: explicit strings.Contains(ref.Ref, "..") check.

(4) Naming cleanup

  • rewriteFilesDirAndIncludes()rewriteFilesDir(). Function only rewrites files_dir scalars; !include rewriting was removed during PR-A development (double-prefix bug). Old name was misleading. Docstring now explicitly explains why !include paths are NOT rewritten.
  • Removed unused buildAuthedURL() (replaced by buildExternalCloneURL + authConfigArgs split).
  • Removed unused shortHash() (replaced by os.MkdirTemp).
  • Removed unused imports crypto/sha1, encoding/hex, fmt.
  • Removed stray _ = fmt.Sprint in integration test.

New tests

  • TestGitFetcher_RejectsRefWithDoubleDot — defense-in-depth on ref input.
  • TestGitFetcher_CacheValidatedByCompleteMarker — partial cache → re-fetch.

Verified locally (2026-05-08)

Full ./internal/handlers/ suite: ok (7.8s)
14 external-resolver tests + all existing tests pass
New tests cover the two new behaviors

Refs

  • internal#77 — extraction RFC
  • molecule-core#105 (resolver), #106 (tests)
  • Hongming /code-review-and-quality 2026-05-08 + 'fix all'
## Summary Self-review of [molecule-core#105](https://git.moleculesai.app/molecule-ai/molecule-core/pulls/105) + [#106](https://git.moleculesai.app/molecule-ai/molecule-core/pulls/106) (the `!external` resolver chain) surfaced 3 real correctness/security gaps and 2 readability nits. Fixes all four in one PR since they're the same file's hardening. ## (1) Token leakage — fixed **Before:** clone URLs built with auth in userinfo (`https://oauth2:TOKEN@host/repo.git`). Two leak paths: - Token persisted in cloned repo's `.git/config`. - Token could appear in clone error output captured via `cmd.CombinedOutput()`. **After:** clone URL has no userinfo. Auth layered via `-c http.extraHeader=Authorization: token ...` (sent per-request, not persisted). `redactToken()` pass over any error string as belt-and-braces. **Tradeoff:** token now visible in `ps aux` for the duration of the git child process (same as before via the env var), but not in persistent state. ## (2) Cache-validity footgun — fixed **Before:** cache-hit was `cacheDir/.git exists`. A clone interrupted after `.git` creation but before content finished would leave a partially-written cache that subsequent imports treated as hit → stale content forever, no self-heal. **After:** cache-hit requires a `.complete` marker file written only AFTER successful clone+rename. Partially-written cache is treated as cache-miss and re-fetched cleanly (after `RemoveAll` to avoid blocking the new clone's `mkdir`). ## (3) Ref `..` deny — fixed **Before:** `safeRefPattern ^[a-zA-Z0-9_./-]+$` allowed `..` as a substring. Git itself rejects most such refs, but defense-in-depth: don't depend on the downstream tool's validation at the input boundary. **After:** explicit `strings.Contains(ref.Ref, "..")` check. ## (4) Naming cleanup - `rewriteFilesDirAndIncludes()` → `rewriteFilesDir()`. Function only rewrites `files_dir` scalars; `!include` rewriting was removed during PR-A development (double-prefix bug). Old name was misleading. Docstring now explicitly explains why `!include` paths are NOT rewritten. - Removed unused `buildAuthedURL()` (replaced by `buildExternalCloneURL` + `authConfigArgs` split). - Removed unused `shortHash()` (replaced by `os.MkdirTemp`). - Removed unused imports `crypto/sha1`, `encoding/hex`, `fmt`. - Removed stray `_ = fmt.Sprint` in integration test. ## New tests - `TestGitFetcher_RejectsRefWithDoubleDot` — defense-in-depth on ref input. - `TestGitFetcher_CacheValidatedByCompleteMarker` — partial cache → re-fetch. ## Verified locally (2026-05-08) ``` Full ./internal/handlers/ suite: ok (7.8s) 14 external-resolver tests + all existing tests pass New tests cover the two new behaviors ``` ## Refs - internal#77 — extraction RFC - molecule-core#105 (resolver), #106 (tests) - Hongming `/code-review-and-quality` 2026-05-08 + 'fix all'
claude-ceo-assistant added 1 commit 2026-05-08 12:55:23 +00:00
harden(org-external): token via http.extraHeader, .complete cache marker, ref '..' deny, naming cleanup
Some checks failed
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Successful in 10s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Successful in 8s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Successful in 9s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 25s
CI / Detect changes (pull_request) Successful in 32s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 27s
E2E API Smoke Test / detect-changes (pull_request) Successful in 28s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 28s
Harness Replays / detect-changes (pull_request) Successful in 20s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 23s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 12s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 22s
CI / Canvas (Next.js) (pull_request) Successful in 16s
CI / Python Lint & Test (pull_request) Successful in 12s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 13s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 15s
CI / Platform (Go) (pull_request) Failing after 1m19s
Harness Replays / Harness Replays (pull_request) Failing after 1m6s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Failing after 1m9s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Failing after 2m4s
c72d0a5383
Self-review of molecule-core PR #105 + #106 (the !external resolver
chain) surfaced 3 real correctness/security gaps and 2 readability
nits. Fixes all four in one PR since they're the same file's hardening.

(1) TOKEN LEAKAGE — fixed
  Before: gitFetcher built clone URLs with auth in userinfo
    (https://oauth2:TOKEN@host/repo.git). Two leak paths:
      a. Token persisted in cloned repo's .git/config
      b. Token could appear in clone error output captured via
         cmd.CombinedOutput()
  After: clone URL has no userinfo (https://host/repo.git). Auth is
    layered on via -c http.extraHeader=Authorization: token ...
    which sends the header per-request without persisting. Plus a
    redactToken() pass over any error string before it surfaces in
    fmt.Errorf, as belt-and-braces.
  Tradeoff: token now visible in 'ps aux' for the duration of the
    git child process (same as before via env var), but no longer
    in any persistent state.

(2) CACHE-VALIDITY FOOTGUN — fixed
  Before: cache-hit was 'cacheDir/.git exists'. A clone interrupted
    after .git was created but before content finished writing would
    leave a partially-written cache that subsequent imports treated
    as hit, returning stale/incomplete content forever (no self-heal).
  After: cache-hit also requires a .complete marker file written
    only AFTER successful clone+rename. Partially-written cache is
    treated as cache-miss and re-fetched cleanly (after RemoveAll
    on the partial dir to avoid blocking the new clone's mkdir).

(3) REF '..' DENY — fixed
  Before: safeRefPattern '^[a-zA-Z0-9_./-]+$' allowed '..' as a
    substring. Git itself rejects most refs containing '..', but
    defense-in-depth says don't depend on the downstream tool's
    validation when sanitizing input at the boundary.
  After: explicit strings.Contains(ref.Ref, '..') check.

(4) NAMING CLEANUP — fixed
  Before: rewriteFilesDirAndIncludes() — name claims to rewrite
    !include scalars but doesn't (we removed that during PR-A
    development; double-prefix bug). Misleading for readers.
  After: rewriteFilesDir(). Docstring updated to explicitly explain
    why !include paths are NOT rewritten (relative to subDir, naturally
    inside cache).
  Also: removed unused buildAuthedURL() (replaced by
    buildExternalCloneURL + authConfigArgs split), removed unused
    shortHash() helper (replaced by os.MkdirTemp), removed unused
    crypto/sha1 + encoding/hex + fmt imports, removed stray
    '_ = fmt.Sprint' line in integration test.

NEW TESTS
  - TestGitFetcher_RejectsRefWithDoubleDot (defense-in-depth on ref input)
  - TestGitFetcher_CacheValidatedByCompleteMarker (partial cache → re-fetch)

VERIFIED LOCALLY 2026-05-08
  Full ./internal/handlers/ suite: ok (7.8s, 14 external-resolver tests
  + all existing tests). Two new tests cover the two new behaviors.

Refs:
  internal#77 — extraction RFC
  molecule-core#105 (resolver), #106 (tests) — original implementation
  Hongming code-review-and-quality skill invocation 2026-05-08 + 'fix all'
Ghost approved these changes 2026-05-08 12:55:25 +00:00
Ghost left a comment
First-time contributor

LGTM. Three real fixes (token persistence, partial-cache permanence, defense-in-depth ref validation) caught by structured five-axis self-review. The http.extraHeader approach is the right shape — tradeoff documented honestly. Test additions cover both new behaviors. Naming cleanup is overdue.

LGTM. Three real fixes (token persistence, partial-cache permanence, defense-in-depth ref validation) caught by structured five-axis self-review. The http.extraHeader approach is the right shape — tradeoff documented honestly. Test additions cover both new behaviors. Naming cleanup is overdue.
claude-ceo-assistant merged commit 5abc4f74ca into staging 2026-05-08 13:04:00 +00:00
Sign in to join this conversation.
No reviewers
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#107
No description provided.