secret-scan: align local pre-commit + extend drift lint (closes #1569 root)
#1569 Phase 1 discovery (2026-05-02) found six historical credential exposures in molecule-core git history. All confirmed dead — but the reason they got committed in the first place was that the local pre-commit hook had two gaps that the canonical CI gate (and the runtime's hook) didn't: 1. **Pattern set was incomplete.** Local hook checked `sk-ant-|sk-proj-|ghp_|gho_|AKIA|mol_pk_|cfut_` — missing `ghs_*`, `ghu_*`, `ghr_*`, `github_pat_*`, `sk-svcacct-`, `sk-cp-`, `xox[baprs]-`, `ASIA*`. The historical leaks were 5× `ghs_*` (App installation tokens) + 1× `github_pat_*` — none of which the local hook would have caught even if it ran. 2. **`*.md` and `docs/` were skip-listed.** The leaked tokens lived in `tick-reflections-temp.md`, `qa-audit-2026-04-21.md`, and `docs/incidents/INCIDENT_LOG.md` — exactly the file types the skip-list excluded. The hook ran and silently passed. This commit: - Replaces the local hook's hard-coded inline regex with the canonical 13-pattern array (byte-aligned with `.github/workflows/secret-scan.yml` and the workspace runtime's `pre-commit-checks.sh`). - Removes the `\.md$|docs/` skip — keeps only binary, lockfile, and hook-self exclusions. - Adds the local hook to `lint_secret_pattern_drift.py` as an in-repo consumer (read-from-disk, no network — the hook lives in the same checkout the lint runs against). Drift now fails the lint when canonical changes without the local hook updating in lockstep. - Adds `.githooks/pre-commit` to the drift-lint workflow's path filter so consumer-side edits also trigger the lint. - Adopts the canonical's "don't echo the matched value" defense (the prior version would have round-tripped a leaked credential into scrollback / CI logs). Verified: `python3 .github/scripts/lint_secret_pattern_drift.py` reports both consumers aligned at 13 patterns. The hook's existing six other gates (canvas 'use client', dark theme, SQL injection, go-build, etc.) are untouched. Companion change (already applied via API, no diff here): `Scan diff for credential-shaped strings` is now in the required-checks list on both `staging` and `main` branch protection — was previously a soft gate (workflow ran, exited 1, but didn't block merge). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
d64570a665
commit
43c234df35
@ -129,19 +129,57 @@ fi
|
||||
# ──────────────────────────────────────────────────────────
|
||||
# 6. Secrets: No tokens/keys in staged files
|
||||
# ──────────────────────────────────────────────────────────
|
||||
#
|
||||
# Pattern set MUST match .github/workflows/secret-scan.yml SECRET_PATTERNS
|
||||
# and molecule-ai-workspace-runtime/molecule_runtime/scripts/pre-commit-checks.sh —
|
||||
# .github/workflows/secret-pattern-drift.yml lints this invariant. Rebuilt
|
||||
# against canonical 2026-05-02 after #1569 Phase 1 discovery surfaced
|
||||
# real ghs_*/github_pat_* leaks that the prior pattern set
|
||||
# ('sk-ant-|sk-proj-|ghp_|gho_|AKIA|mol_pk_|cfut_') would have missed:
|
||||
# (a) it lacked ghs_ / ghu_ / ghr_ / github_pat_ / sk-svcacct- / sk-cp- /
|
||||
# xox[baprs]- / ASIA prefixes, (b) it skipped *.md and docs/* — but the
|
||||
# actual leaks lived in tick-reflections-temp.md, qa-audit-2026-04-21.md,
|
||||
# docs/incidents/INCIDENT_LOG.md.
|
||||
SECRET_PATTERNS=(
|
||||
'ghp_[A-Za-z0-9]{36,}' # GitHub PAT (classic)
|
||||
'ghs_[A-Za-z0-9]{36,}' # GitHub App installation token
|
||||
'gho_[A-Za-z0-9]{36,}' # GitHub OAuth user-to-server
|
||||
'ghu_[A-Za-z0-9]{36,}' # GitHub OAuth user
|
||||
'ghr_[A-Za-z0-9]{36,}' # GitHub OAuth refresh
|
||||
'github_pat_[A-Za-z0-9_]{82,}' # GitHub fine-grained PAT
|
||||
'sk-ant-[A-Za-z0-9_-]{40,}' # Anthropic API key
|
||||
'sk-proj-[A-Za-z0-9_-]{40,}' # OpenAI project key
|
||||
'sk-svcacct-[A-Za-z0-9_-]{40,}' # OpenAI service-account key
|
||||
'sk-cp-[A-Za-z0-9_-]{60,}' # MiniMax API key (F1088 vector — caught only after the fact)
|
||||
'xox[baprs]-[A-Za-z0-9-]{20,}' # Slack tokens (bot/app/user/refresh)
|
||||
'AKIA[0-9A-Z]{16}' # AWS access key ID
|
||||
'ASIA[0-9A-Z]{16}' # AWS STS temp access key ID
|
||||
)
|
||||
|
||||
ALL_STAGED=$(git diff --cached --name-only --diff-filter=ACM || true)
|
||||
if [ -n "$ALL_STAGED" ]; then
|
||||
for f in $ALL_STAGED; do
|
||||
# Skip binary, known safe files, hooks, docs, and markdown
|
||||
if echo "$f" | grep -qE '\.png$|\.jpg$|\.ico$|\.woff|node_modules|\.lock$|\.githooks/|\.md$|docs/'; then
|
||||
# Skip ONLY binary + lockfiles + the hook itself. Markdown +
|
||||
# docs/* are NOT skipped — that was the bug (#1569 leaks were
|
||||
# all in *.md). If a doc legitimately needs a token-shaped
|
||||
# placeholder, use ghs_EXAMPLE_TOKEN_DO_NOT_USE — short enough
|
||||
# to dodge the {36,} length suffix.
|
||||
if echo "$f" | grep -qE '\.png$|\.jpg$|\.ico$|\.woff|node_modules|\.lock$|\.githooks/'; then
|
||||
continue
|
||||
fi
|
||||
DIFF=$(git diff --cached "$f" 2>/dev/null | grep '^+' | grep -v '^+++' || true)
|
||||
if echo "$DIFF" | grep -qE 'sk-ant-|sk-proj-|ghp_|gho_|AKIA[A-Z0-9]|mol_pk_|cfut_' 2>/dev/null; then
|
||||
echo "❌ POSSIBLE SECRET in $f — do not commit API keys or tokens"
|
||||
ERRORS=$((ERRORS + 1))
|
||||
fi
|
||||
DIFF=$(git diff --cached --no-color --unified=0 -- "$f" 2>/dev/null | grep -E '^\+[^+]' || true)
|
||||
[ -z "$DIFF" ] && continue
|
||||
for pattern in "${SECRET_PATTERNS[@]}"; do
|
||||
if echo "$DIFF" | grep -qE "$pattern"; then
|
||||
echo "❌ POSSIBLE SECRET in $f (matched: ${pattern})"
|
||||
echo " The actual matched value is NOT echoed here — round-tripping a"
|
||||
echo " leaked credential into scrollback widens the blast radius."
|
||||
echo " If false positive (test/docs example), use a short placeholder"
|
||||
echo " like ghs_EXAMPLE_TOKEN_DO_NOT_USE that doesn't satisfy the length."
|
||||
ERRORS=$((ERRORS + 1))
|
||||
break
|
||||
fi
|
||||
done
|
||||
done
|
||||
fi
|
||||
|
||||
|
||||
32
.github/scripts/lint_secret_pattern_drift.py
vendored
32
.github/scripts/lint_secret_pattern_drift.py
vendored
@ -41,6 +41,17 @@ CONSUMERS: list[tuple[str, str]] = [
|
||||
),
|
||||
]
|
||||
|
||||
# In-repo consumers — paths read locally from the workflow checkout.
|
||||
# Read-from-disk avoids the staging→main lag that the URL fetcher
|
||||
# would hit (a freshly-edited canonical wouldn't yet be on the
|
||||
# consumer's default branch). Same drift semantics, no network.
|
||||
LOCAL_CONSUMERS: list[tuple[str, Path]] = [
|
||||
(
|
||||
".githooks/pre-commit (molecule-core local hook)",
|
||||
Path(".githooks/pre-commit"),
|
||||
),
|
||||
]
|
||||
|
||||
# Matches the SECRET_PATTERNS=( ... ) array in either yaml-indented
|
||||
# (the canonical workflow's `run:` block) or shell-flat (runtime
|
||||
# hook) format. Patterns inside are single-quoted Bash strings; we
|
||||
@ -89,6 +100,27 @@ def main() -> int:
|
||||
print(f"canonical ({CANONICAL_FILE}): {len(canonical)} patterns")
|
||||
|
||||
drift = False
|
||||
|
||||
# In-repo consumers first — these are read from the workflow's own
|
||||
# checkout, so they never lag behind the canonical and a missing
|
||||
# file IS a real error (not a fetch warning).
|
||||
for label, path in LOCAL_CONSUMERS:
|
||||
if not path.exists():
|
||||
print(f"::error::{label}: file not found at {path}")
|
||||
drift = True
|
||||
continue
|
||||
consumer = extract_patterns(path.read_text(), label)
|
||||
missing, extra = diff_patterns(canonical, consumer)
|
||||
if not missing and not extra:
|
||||
print(f" ✓ {label}: aligned ({len(consumer)} patterns)")
|
||||
continue
|
||||
drift = True
|
||||
print(f"::error::DRIFT in {label}:")
|
||||
for p in missing:
|
||||
print(f" - missing from consumer: {p!r}")
|
||||
for p in extra:
|
||||
print(f" - extra in consumer (not in canonical): {p!r}")
|
||||
|
||||
for label, url in CONSUMERS:
|
||||
try:
|
||||
content = fetch(url)
|
||||
|
||||
1
.github/workflows/secret-pattern-drift.yml
vendored
1
.github/workflows/secret-pattern-drift.yml
vendored
@ -34,6 +34,7 @@ on:
|
||||
- ".github/workflows/secret-scan.yml"
|
||||
- ".github/workflows/secret-pattern-drift.yml"
|
||||
- ".github/scripts/lint_secret_pattern_drift.py"
|
||||
- ".githooks/pre-commit"
|
||||
workflow_dispatch:
|
||||
|
||||
# GITHUB_TOKEN scoped to read-only. The lint only does git checkout
|
||||
|
||||
Loading…
Reference in New Issue
Block a user