Commit Graph

2 Commits

Author SHA1 Message Date
Hongming Wang
43c234df35 secret-scan: align local pre-commit + extend drift lint (closes #1569 root)
#1569 Phase 1 discovery (2026-05-02) found six historical credential
exposures in molecule-core git history. All confirmed dead — but the
reason they got committed in the first place was that the local
pre-commit hook had two gaps that the canonical CI gate (and the
runtime's hook) didn't:

  1. **Pattern set was incomplete.** Local hook checked
     `sk-ant-|sk-proj-|ghp_|gho_|AKIA|mol_pk_|cfut_` — missing
     `ghs_*`, `ghu_*`, `ghr_*`, `github_pat_*`, `sk-svcacct-`,
     `sk-cp-`, `xox[baprs]-`, `ASIA*`. The historical leaks were 5×
     `ghs_*` (App installation tokens) + 1× `github_pat_*` — none of
     which the local hook would have caught even if it ran.
  2. **`*.md` and `docs/` were skip-listed.** The leaked tokens lived
     in `tick-reflections-temp.md`, `qa-audit-2026-04-21.md`, and
     `docs/incidents/INCIDENT_LOG.md` — exactly the file types the
     skip-list excluded. The hook ran and silently passed.

This commit:

- Replaces the local hook's hard-coded inline regex with the canonical
  13-pattern array (byte-aligned with `.github/workflows/secret-scan.yml`
  and the workspace runtime's `pre-commit-checks.sh`).
- Removes the `\.md$|docs/` skip — keeps only binary, lockfile, and
  hook-self exclusions.
- Adds the local hook to `lint_secret_pattern_drift.py` as an in-repo
  consumer (read-from-disk, no network — the hook lives in the same
  checkout the lint runs against). Drift now fails the lint when
  canonical changes without the local hook updating in lockstep.
- Adds `.githooks/pre-commit` to the drift-lint workflow's path
  filter so consumer-side edits also trigger the lint.
- Adopts the canonical's "don't echo the matched value" defense (the
  prior version would have round-tripped a leaked credential into
  scrollback / CI logs).

Verified: `python3 .github/scripts/lint_secret_pattern_drift.py`
reports both consumers aligned at 13 patterns. The hook's existing
six other gates (canvas 'use client', dark theme, SQL injection,
go-build, etc.) are untouched.

Companion change (already applied via API, no diff here):
`Scan diff for credential-shaped strings` is now in the required-checks
list on both `staging` and `main` branch protection — was previously a
soft gate (workflow ran, exited 1, but didn't block merge).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 23:47:56 -07:00
Hongming Wang
6638d6e1d7 feat(ci): SECRET_PATTERNS drift lint across known consumers
Adds a lint that diffs the canonical SECRET_PATTERNS array in
.github/workflows/secret-scan.yml against every known public
consumer mirror, failing on any divergence.

Why: every side that scans for credentials carries its own copy of
the pattern list. They drift — most recently the workspace-runtime
pre-commit hook lagged the canonical by one pattern (sk-cp- /
MiniMax F1088 vector), so a developer's local pre-commit would let
a sk-cp- token through while the org-wide CI scan would refuse it.
Useless friction; automated detection closes the gap.

Implementation:
  .github/scripts/lint_secret_pattern_drift.py — pure stdlib, fetches
    each consumer's RAW file via urllib, extracts the
    SECRET_PATTERNS=( ... ) array via anchored regex (the closing
    `)` is anchored to the start of a line because pattern comments
    like `# GitHub PAT (classic)` contain their own paren mid-line),
    diffs against canonical, fails on missing or extra patterns.
    Fetch failures are warnings, not errors — a consumer whose
    branch was renamed shouldn't fail the lint until someone updates
    the URL list.

  .github/workflows/secret-pattern-drift.yml — daily 05:00 UTC cron
    + on-push gate (when canonical, the workflow, or the script
    changes) + workflow_dispatch. Read-only token, 5-minute timeout.

Initial consumer set: workspace-runtime's bundled pre-commit hook
(the one that drifted on sk-cp-). molecule-controlplane's inlined
copy is private so this workflow can't read it; that's tracked
separately and the controlplane's own self-monitor is the gap.

Verified locally: lint detects drift correctly when the runtime
hook is missing sk-cp-, returns clean when aligned.

Refs: task #139.
2026-04-28 15:29:09 -07:00