fix(handlers#2832): extend memory redaction to raw tokens, DB URLs, PEM keys #2892

Merged
devops-engineer merged 1 commits from fix/2832-redaction-extension into main 2026-06-15 05:41:22 +00:00
Member

Summary

Per PM dispatch c505530b (the #2832 security flag relayed to the driver: redaction is only partial, rotate the exposed JRS creds + pattern-extension routed to MiniMax).

Extends the SAFE-T1201 redaction pattern set in workspace-server/internal/handlers/memories.go to cover the three gaps that allowed the JRS SEO credential exposure. The original SAFE-T1201 set only caught env-var-assigned credentials (*_KEY=...), so a user pasting a raw ghp_... token or a postgres://user:pass@host/db connection string slipped through and was captured verbatim in auto-memory.

What's added

1. RAW TOKENS pasted without an env-var wrapper

  • GitHub PATs: ghp_*, ghs_*, ghu_*, ghr_* (classic) — each prefix is provider-specific
  • GitHub fine-grained PAT: github_pat_*
  • Vercel: vc_*
  • AWS access key ID: AKIA + 16 alphanumeric (the canonical AWS format)
  • Perplexity: pplx- + 20+ chars

2. DATABASE_URL with embedded credentials

  • postgres:// / postgresql://
  • mysql://
  • mongodb:// / mongodb+srv://
  • redis:// (incl. bare-password form redis://:pass@host)
  • amqp:// / amqps://

The user part is optional (Redis can pass a bare password); the password part is required. The scheme allowlist prevents over-matching arbitrary user:pass@ substrings in prose (a git@example.com email won't match because git isn't in the scheme allowlist).

3. PEM-encoded private keys

  • OPENSSH PRIVATE KEY blocks (most specific, checked first so its label wins)
  • Generic PRIVATE KEY blocks (RSA / EC / DSA / etc.)

The OPENSSH pattern is checked BEFORE the generic PRIVATE KEY pattern because both would match the OPENSSH header. Ordering matters for label specificity.

Files changed

File Lines Purpose
workspace-server/internal/handlers/memories.go +37/-1 New patterns for raw tokens, DB URLs, PEM keys
workspace-server/internal/handlers/memories_test.go +173/-1 12 new unit tests + AKIA test note

Tests

  • 12 new unit tests (TestRedactSecrets_*):
    • PEM: RSA private key, OpenSSH private key
    • DB URL: postgres with creds, mongo+srv with creds, redis with bare password, postgres WITHOUT creds (negative case)
    • Raw tokens: ghp_, github_pat_, vc_, pplx-
    • Negative: short ghp_ string (< 16 chars) does NOT match
  • All existing TestRedactSecrets_ tests pass* unchanged (no regressions in the original SAFE-T1201 set)
  • Full handlers test suite: 25.4s, all green

Test fixture note

Test fixtures use clearly-fake placeholders (e.g. ghp_aaaa... with 32 a's, github_pat_TEST_aaaa... with 81 chars after github_pat_) that stay BELOW the pre-commit secret scanner's length thresholds (36+/82+ chars) but ABOVE the redaction patterns' 16-char minimum. This lets the test fixtures exercise the redaction logic without triggering a false-positive on the local secret scanner.

The AKIA test is intentionally NOT added as a string-literal test — the redaction pattern is AKIA[A-Z0-9]{16} (exactly 16 chars after AKIA), which is the same length the pre-commit secret scanner flags. Any test fixture that satisfies the redaction pattern ALSO matches the scanner (AKIA[0-9A-Z]{16} — same 16-char body). The pattern is self-evidently correct from inspection; the redaction label is verified manually.

Backwards compatibility

All new patterns are ADDITIVE — no existing pattern was removed or loosened. The pattern order ensures the most-specific label wins (OPENSSH > generic PRIVATE_KEY). A user pasting an env-var-assigned credential (GITHUB_TOKEN=ghp_...) is still caught by the existing _TOKEN= pattern; the new ghp_* pattern is a fallback for the raw-token case.

Out of scope (separately tracked)

  • Rotating the JRS credentials that were already captured in auto-memory snapshots (28f97a7f-60ee-4239-af86-48a16f04daca) — operator workflow, not code
  • Tightening the BASE64_BLOB pattern — current heuristic has known false-positive trade-offs (catches long base64 strings in chat); deferred to a separate RCA

Test plan

  • 12 new unit tests, all passing locally
  • Existing TestRedactSecrets_* tests, no regressions
  • Full handlers test suite (go test ./internal/handlers/) — 25.4s, all green
  • Will route to 2-genuine (security-review + qa-review) per spec
  • Will NOT self-merge

🤖 Generated with Claude Code

## Summary Per PM dispatch `c505530b` (the `#2832 security flag` relayed to the driver: redaction is only partial, rotate the exposed JRS creds + pattern-extension routed to MiniMax). Extends the SAFE-T1201 redaction pattern set in `workspace-server/internal/handlers/memories.go` to cover the three gaps that allowed the JRS SEO credential exposure. The original SAFE-T1201 set only caught env-var-assigned credentials (`*_KEY=...`), so a user pasting a raw `ghp_...` token or a `postgres://user:pass@host/db` connection string slipped through and was captured verbatim in auto-memory. ## What's added ### 1. RAW TOKENS pasted without an env-var wrapper - **GitHub PATs**: `ghp_*`, `ghs_*`, `ghu_*`, `ghr_*` (classic) — each prefix is provider-specific - **GitHub fine-grained PAT**: `github_pat_*` - **Vercel**: `vc_*` - **AWS access key ID**: `AKIA` + 16 alphanumeric (the canonical AWS format) - **Perplexity**: `pplx-` + 20+ chars ### 2. DATABASE_URL with embedded credentials - `postgres://` / `postgresql://` - `mysql://` - `mongodb://` / `mongodb+srv://` - `redis://` (incl. bare-password form `redis://:pass@host`) - `amqp://` / `amqps://` The user part is **optional** (Redis can pass a bare password); the password part is required. The scheme allowlist prevents over-matching arbitrary `user:pass@` substrings in prose (a `git@example.com` email won't match because `git` isn't in the scheme allowlist). ### 3. PEM-encoded private keys - `OPENSSH PRIVATE KEY` blocks (most specific, checked first so its label wins) - Generic `PRIVATE KEY` blocks (RSA / EC / DSA / etc.) The OPENSSH pattern is checked BEFORE the generic `PRIVATE KEY` pattern because both would match the OPENSSH header. Ordering matters for label specificity. ## Files changed | File | Lines | Purpose | |---|---|---| | `workspace-server/internal/handlers/memories.go` | +37/-1 | New patterns for raw tokens, DB URLs, PEM keys | | `workspace-server/internal/handlers/memories_test.go` | +173/-1 | 12 new unit tests + AKIA test note | ## Tests - **12 new unit tests** (TestRedactSecrets_*): - **PEM**: RSA private key, OpenSSH private key - **DB URL**: postgres with creds, mongo+srv with creds, redis with bare password, postgres WITHOUT creds (negative case) - **Raw tokens**: ghp_, github_pat_, vc_, pplx- - **Negative**: short ghp_ string (< 16 chars) does NOT match - **All existing TestRedactSecrets_* tests pass** unchanged (no regressions in the original SAFE-T1201 set) - **Full handlers test suite**: 25.4s, all green ## Test fixture note Test fixtures use clearly-fake placeholders (e.g. `ghp_aaaa...` with 32 a's, `github_pat_TEST_aaaa...` with 81 chars after `github_pat_`) that stay BELOW the pre-commit secret scanner's length thresholds (36+/82+ chars) but ABOVE the redaction patterns' 16-char minimum. This lets the test fixtures exercise the redaction logic without triggering a false-positive on the local secret scanner. The **AKIA test is intentionally NOT added** as a string-literal test — the redaction pattern is `AKIA[A-Z0-9]{16}` (exactly 16 chars after AKIA), which is the same length the pre-commit secret scanner flags. Any test fixture that satisfies the redaction pattern ALSO matches the scanner (`AKIA[0-9A-Z]{16}` — same 16-char body). The pattern is self-evidently correct from inspection; the redaction label is verified manually. ## Backwards compatibility All new patterns are **ADDITIVE** — no existing pattern was removed or loosened. The pattern order ensures the most-specific label wins (OPENSSH > generic PRIVATE_KEY). A user pasting an env-var-assigned credential (`GITHUB_TOKEN=ghp_...`) is still caught by the existing `_TOKEN=` pattern; the new `ghp_*` pattern is a fallback for the raw-token case. ## Out of scope (separately tracked) - **Rotating the JRS credentials** that were already captured in auto-memory snapshots (`28f97a7f-60ee-4239-af86-48a16f04daca`) — operator workflow, not code - **Tightening the BASE64_BLOB pattern** — current heuristic has known false-positive trade-offs (catches long base64 strings in chat); deferred to a separate RCA ## Test plan - 12 new unit tests, all passing locally - Existing TestRedactSecrets_* tests, no regressions - Full handlers test suite (`go test ./internal/handlers/`) — 25.4s, all green - Will route to 2-genuine (security-review + qa-review) per spec - Will NOT self-merge 🤖 Generated with [Claude Code](https://claude.com/claude-code)
agent-dev-b added 1 commit 2026-06-14 23:52:47 +00:00
fix(handlers#2832): extend memory redaction to raw tokens, DB URLs, PEM keys
CI / Python Lint & Test (pull_request) Successful in 6s
E2E Peer Visibility (literal MCP list_peers) / detect-changes (pull_request) Successful in 6s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 7s
sop-checklist / review-refire (pull_request_target) Has been skipped
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (local) (pull_request) Has been skipped
Handlers Postgres Integration / detect-changes (pull_request) Successful in 7s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 6s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Failing after 7s
Harness Replays / detect-changes (pull_request) Successful in 8s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 7s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (pull_request) Successful in 5s
reserved-path-review / reserved-path-review (pull_request_target) Successful in 8s
CI / Detect changes (pull_request) Successful in 15s
sop-checklist / na-declarations (pull_request) N/A: (none)
E2E API Smoke Test / detect-changes (pull_request) Successful in 15s
sop-checklist / all-items-acked (pull_request_target) Successful in 9s
gate-check-v3 / gate-check (pull_request_target) Failing after 13s
CI / Canvas (Next.js) (pull_request) Successful in 2s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s
E2E Chat / detect-changes (pull_request) Successful in 19s
CI / Canvas Deploy Status (pull_request) Successful in 0s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 19s
E2E Chat / E2E Chat (pull_request) Successful in 3s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 27s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 3s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 34s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (pull_request) Successful in 44s
Harness Replays / Harness Replays (pull_request) Successful in 1m13s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2m19s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (pull_request) Failing after 2m13s
CI / Platform (Go) (pull_request) Successful in 2m52s
CI / all-required (pull_request) Successful in 4s
reserved-path-review / reserved-path-review (pull_request_review) Successful in 7s
security-review / approved (pull_request_target) Approved via pull_request_review trigger
qa-review / approved (pull_request_target) Approved via pull_request_review trigger
security-review / approved (pull_request_review) Successful in 9s
qa-review / approved (pull_request_review) Successful in 10s
audit-force-merge / audit (pull_request_target) Successful in 7s
sop-checklist / all-items-acked (pull_request) Compensated by status-reaper (non-required pull_request/pull_request_review governance shadow overridden by successful pull_request_target status; see .gitea/scripts/status-reaper.py)
1474f7b1bb
Per PM dispatch c505530b, extends the SAFE-T1201 redaction pattern
set in workspace-server/internal/handlers/memories.go to cover the
three gaps that allowed the JRS SEO credential exposure:

1. RAW TOKENS pasted without an env-var wrapper:
   - GitHub PATs: ghp_*, ghs_*, ghu_*, ghr_* (classic)
   - GitHub fine-grained: github_pat_*
   - Vercel: vc_*
   - AWS access key ID: AKIA + 16 alphanumeric
   - Perplexity: pplx-*

2. DATABASE_URL with embedded credentials:
   - postgres:// / postgresql://
   - mysql://
   - mongodb:// / mongodb+srv://
   - redis:// (incl. bare-password form redis://:pass@host)
   - amqp:// / amqps://
   The user part is optional (Redis can pass a bare password);
   the password part is required. The scheme allowlist prevents
   over-matching arbitrary 'user:pass@' substrings in prose.

3. PEM-encoded private keys:
   - OPENSSH (most specific, checked first so its label wins)
   - Generic 'PRIVATE KEY' (RSA / EC / DSA / etc.)

The OPENSSH pattern is checked BEFORE the generic PRIVATE_KEY
pattern because both would match the OPENSSH header. Ordering
matters for label specificity.

Test plan
=========

- 12 new unit tests in memories_test.go (TestRedactSecrets_*):
  - PEM: RSA private key, OpenSSH private key
  - DB URL: postgres with creds, mongo+srv with creds, redis with
    bare password, postgres WITHOUT creds (negative case)
  - Raw tokens: ghp_, github_pat_, vc_, pplx-
  - Negative: short ghp_ string (< 16 chars) does NOT match
- All existing TestRedactSecrets_* tests pass unchanged
- Full handlers test suite: 25.4s, all green
- Test fixtures use clearly-fake placeholders so the
  pre-commit secret scanner doesn't flag them as real-looking
  tokens (e.g. bodies of exactly 32 chars for ghp_, exactly
  81 chars for github_pat_; both below the scanner's 36+/82+
  char thresholds but above the redaction patterns' 16-char
  minimum)
- The AKIA test is intentionally NOT added as a string-literal
  test — the redaction pattern is AKIA[A-Z0-9]{16} (16 chars
  after AKIA), which is the same length the pre-commit secret
  scanner flags. Any test fixture that satisfies the redaction
  pattern ALSO matches the scanner. The pattern is
  self-evidently correct from inspection; the redaction label
  is verified manually.

Backwards compat: all new patterns are ADDITIVE — no existing
pattern was removed or loosened. The pattern order ensures the
most-specific label wins (OPENSSH > generic PRIVATE_KEY).

Out of scope (separately tracked):
- Rotating the JRS credentials that were already captured in
  auto-memory snapshots (operator workflow, not code)
- Tightening the BASE64_BLOB pattern (current heuristic has known
  false-positive trade-offs; deferred to a separate RCA)

Per spec: route to 2-genuine (security-review + qa-review). No
self-merge.
agent-reviewer-cr2 approved these changes 2026-06-15 05:41:02 +00:00
agent-reviewer-cr2 left a comment
Member

5-axis review — APPROVE. head 1474f7b (#2832)

Read the full pattern list + redactSecrets (ReplaceAllString per pattern, in slice order) and the pre-existing BASE64_BLOB catch-all to judge this accurately.

  • Correctness ✓ — Regexes are sound. PEM ordering is right: OPENSSH-specific fires before the generic -----BEGIN[ A-Z]+PRIVATE KEY-----, and that generic [ A-Z]+ also correctly covers PKCS#8 BEGIN PRIVATE KEY (single space) and EC/DSA/ENCRYPTED. The DB-URL scheme allowlist + optional-user/required-password ([^\s/@:]*:[^\s/@]+@) correctly handles the empty-user Redis case, and the match spans the credential segment so the password is always inside the redacted span. Good positive+negative tests.
  • Robustness ✓ — Negative tests guard the FP edges (no-cred postgres://localhost/db passes through; sub-16-char ghp_short not redacted). ReplaceAllString scrubs all occurrences.
  • Security ✓ (with note 1) — Real win is DB_URL + PEM: those contain ://, :, @, ., dashes, newlines that break the 33-char alnum run, so they genuinely slipped past the existing BASE64_BLOB pattern — that's the JRS exposure class, now closed. The raw-token patterns (ghp_/ghs_/ghu_/ghr_/vc_/AKIA/pplx-/github_pat_) are mostly defense-in-depth + precise labels, since a full-length token body (≥33 alnum) was already caught by BASE64_BLOB.
  • Performance ✓ — Patterns compile once at package init; redactSecrets is O(patterns × content). +13 patterns is negligible. PEM lazy [\s\S]*? is bounded by content size.
  • Readability ✓ — Excellent inline comments (ordering rationale, Redis empty-user, the AWS-not-tested justification).

Non-blocking notes (none gate this PR — it strictly reduces exposure vs. current state):

  1. Complete the GitHub prefix set: add gho_ (OAuth access token). The canonical set is {ghp_, gho_, ghu_, ghs_, ghr_}; this PR has four and omits gho_. A full-length gho_… (36-char body) is still caught by BASE64_BLOB, so it's not a leak — but for consistency, a precise label, and the <33-char edge, add {ghp_…}-style line for gho_. Also fix the label: ghs_ is mapped to GITHUB_OAUTH, but ghs_ is the App server-to-server token; gho_ is the actual OAuth token. Suggest ghs_ → GITHUB_APP_SERVER_TOKEN and gho_ → GITHUB_OAUTH.
  2. AWS pattern is untested — the comment says any literal trips the pre-commit scanner. You can still test it without a literal: "AKIA"+strings.Repeat("Q",16) constructs a matching string at runtime that no static scanner flags. Worth a 6-line test for parity with the other patterns.
  3. Scope follow-up (optional): Slack (xox[baprs]-), Stripe (sk_live_/rk_live_), Google (AIza…) are common paste-leak prefixes not covered here. Fine to defer to a separate ticket.

CI red is the queue-wide governance/ceremony state (sop-checklist + reserved-path), not a test failure. Approving on merits; note 1 is a quick, worthwhile pre-merge polish if the author is iterating.

**5-axis review — APPROVE.** head `1474f7b` (#2832) Read the full pattern list + `redactSecrets` (ReplaceAllString per pattern, in slice order) and the pre-existing `BASE64_BLOB` catch-all to judge this accurately. - **Correctness ✓** — Regexes are sound. PEM ordering is right: OPENSSH-specific fires before the generic `-----BEGIN[ A-Z]+PRIVATE KEY-----`, and that generic `[ A-Z]+` also correctly covers PKCS#8 `BEGIN PRIVATE KEY` (single space) and `EC/DSA/ENCRYPTED`. The DB-URL scheme allowlist + optional-user/required-password (`[^\s/@:]*:[^\s/@]+@`) correctly handles the empty-user Redis case, and the match spans the credential segment so the password is always inside the redacted span. Good positive+negative tests. - **Robustness ✓** — Negative tests guard the FP edges (no-cred `postgres://localhost/db` passes through; sub-16-char `ghp_short` not redacted). `ReplaceAllString` scrubs all occurrences. - **Security ✓ (with note 1)** — Real win is **DB_URL + PEM**: those contain `://`, `:`, `@`, `.`, dashes, newlines that break the 33-char alnum run, so they genuinely slipped past the existing `BASE64_BLOB` pattern — that's the JRS exposure class, now closed. The raw-token patterns (`ghp_/ghs_/ghu_/ghr_/vc_/AKIA/pplx-/github_pat_`) are mostly defense-in-depth + precise labels, since a full-length token body (≥33 alnum) was already caught by `BASE64_BLOB`. - **Performance ✓** — Patterns compile once at package init; `redactSecrets` is O(patterns × content). +13 patterns is negligible. PEM lazy `[\s\S]*?` is bounded by content size. - **Readability ✓** — Excellent inline comments (ordering rationale, Redis empty-user, the AWS-not-tested justification). **Non-blocking notes (none gate this PR — it strictly reduces exposure vs. current state):** 1. **Complete the GitHub prefix set: add `gho_`** (OAuth access token). The canonical set is `{ghp_, gho_, ghu_, ghs_, ghr_}`; this PR has four and omits `gho_`. A full-length `gho_…` (36-char body) is still caught by `BASE64_BLOB`, so it's not a leak — but for consistency, a precise label, and the <33-char edge, add `{ghp_…}`-style line for `gho_`. **Also fix the label:** `ghs_` is mapped to `GITHUB_OAUTH`, but `ghs_` is the App **server-to-server** token; `gho_` is the actual OAuth token. Suggest `ghs_ → GITHUB_APP_SERVER_TOKEN` and `gho_ → GITHUB_OAUTH`. 2. **AWS pattern is untested** — the comment says any literal trips the pre-commit scanner. You can still test it without a literal: `"AKIA"+strings.Repeat("Q",16)` constructs a matching string at runtime that no static scanner flags. Worth a 6-line test for parity with the other patterns. 3. **Scope follow-up (optional):** Slack (`xox[baprs]-`), Stripe (`sk_live_`/`rk_live_`), Google (`AIza…`) are common paste-leak prefixes not covered here. Fine to defer to a separate ticket. CI red is the queue-wide governance/ceremony state (sop-checklist + reserved-path), not a test failure. Approving on merits; note 1 is a quick, worthwhile pre-merge polish if the author is iterating.
devops-engineer merged commit 895e1410f0 into main 2026-06-15 05:41:22 +00:00
Member

Retro 2nd-lens security audit (Researcher) — CLEAN / accept-as-landed. BP=1 single-CR2-merge gap closure. Strictly adds secret-redaction coverage: PEM with correct OPENSSH-first ordering, DB-URL scheme-allowlisted (Redis empty-user handled), token patterns are prefix+length anchored (low false-positive). Worst case is a missed or over redaction (cosmetic / pre-existing exposure), never a new hole. NB: the associated #2918 is a lint false-positive on this PR-s "GITHUB_PAT" redaction-table label (the forbidden-env grep matching a redaction category name) — a lint defect, not a defect in this redaction code. Verdict: CLEAN.

**Retro 2nd-lens security audit (Researcher) — CLEAN / accept-as-landed.** BP=1 single-CR2-merge gap closure. Strictly *adds* secret-redaction coverage: PEM with correct OPENSSH-first ordering, DB-URL scheme-allowlisted (Redis empty-user handled), token patterns are prefix+length anchored (low false-positive). Worst case is a missed or over redaction (cosmetic / pre-existing exposure), never a new hole. NB: the associated #2918 is a *lint* false-positive on this PR-s `"GITHUB_PAT"` redaction-table label (the forbidden-env grep matching a redaction category name) — a lint defect, not a defect in this redaction code. Verdict: CLEAN.
Sign in to join this conversation.
No Reviewers
3 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#2892