molecule-core/.github/scripts/lint_secret_pattern_drift.py
dev-lead f5f96df5e3
All checks were successful
audit-force-merge / audit (pull_request) Has been skipped
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 12s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 8s
Check migration collisions / Migration version collision check (pull_request) Successful in 37s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 32s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 11s
sop-tier-check / tier-check (pull_request) Successful in 9s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 39s
Runtime Pin Compatibility / PyPI-latest install + import smoke (pull_request) Successful in 2m0s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 3m3s
ci: port 9 gates/lints/audits to .gitea/workflows/ (RFC internal#219 §1, Category C-1)
Sweep companion to PR#372 (ci.yml port), PR#378 (Cat A), PR#379 (Cat B).

Ports 9 workflow files from .github/workflows/ to .gitea/workflows/.
Each port applies the four-surface audit pattern per
feedback_gitea_actions_migration_audit_pattern:

  1. YAML — dropped workflow_dispatch.inputs (Gitea 1.22.6 parser
     rejects them per feedback_gitea_workflow_dispatch_inputs_unsupported),
     dropped merge_group (no Gitea merge queue), workflow-level
     env.GITHUB_SERVER_URL pinned per feedback_act_runner_github_server_url.
  2. Cache — actions/setup-python cache:pip retained (works with Gitea
     1.22.x cache server). No actions/cache@v4 usage in this batch.
  3. Token — auto-injected GITHUB_TOKEN (Gitea-aliased) used; no
     custom dispatch tokens.
  4. Docs — top-of-file "Ported from .github/workflows/X.yml on
     2026-05-11 per RFC internal#219 §1 sweep" comment on every file.

Per RFC §1: each job has `continue-on-error: true` so surfaced
defects do not block PRs. Follow-up PR (not in this sweep's scope)
flips to `continue-on-error: false` after triage.

Files ported:

- block-internal-paths.yml — forbidden-path PR gate. Standard port;
  dropped merge_group + the merge_group-specific fetch step.
- cascade-list-drift-gate.yml — TEMPLATES vs manifest.json drift.
  Passes WORKFLOW=.gitea/workflows/publish-runtime.yml to the script
  (script's default is .github/... which Cat A removes).
- check-migration-collisions.yml — Postgres migration prefix
  collision gate. The collision script already supports Gitea via
  _gitea_api_url() / _gitea_token() — no script edit needed.
- lint-curl-status-capture.yml — workflow-bash anti-pattern lint.
  Scanner glob and SELF self-skip path retargeted to .gitea/workflows/**.yml.
- runtime-pin-compat.yml — PyPI-latest install + import smoke.
  Dropped workflow_dispatch + merge_group.
- runtime-prbuild-compat.yml — PR-built wheel import smoke.
  dorny/paths-filter@v4 replaced with inline `git diff` per PR#372
  pattern. detect-changes job + per-step if-gates retained.
- secret-pattern-drift.yml — canonical/consumer pattern set drift
  lint. on.paths references the .gitea/ canonical path. Also edits
  .github/scripts/lint_secret_pattern_drift.py CANONICAL_FILE
  constant from `.github/workflows/secret-scan.yml` to
  `.gitea/workflows/secret-scan.yml` (Cat A removes the .github/
  one).
- test-ops-scripts.yml — scripts/ unittest runner. Dropped merge_group.
- railway-pin-audit.yml — daily Railway env var drift detection.
  `actions/github-script@v9` blocks (which call github.rest.* — a
  GitHub-specific JS API) replaced with curl calls against the
  Gitea REST API (/api/v1/repos/.../issues|comments). Issue
  open/comment-on-repeat/close-on-clean semantics preserved.

This Cat C-1 PR groups the "safer" gates/lints/audits. Categories
C-2 (E2E) and C-3 (deploy/publish/janitors) ship in separate PRs.

The original .github/ files are left in place per RFC §1 (deletion
is a Phase 4 follow-up). They are silently dead — Gitea Actions in
molecule-core only registers workflows under .gitea/workflows/ —
but keeping them documented in-repo eases the diff-review.

DO NOT MERGE without orchestrator-dispatched Five-Axis review +
@hongmingwang chat-go.

Cross-links:
- RFC: molecule-ai/internal#219
- Companion: PR#372 (ci.yml port), PR#378 (Cat A), PR#379 (Cat B)
- Runbook: runbooks/gitea-actions-migration-checklist.md (Cat B PR)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 21:18:11 -07:00

167 lines
6.5 KiB
Python

#!/usr/bin/env python3
"""Lint SECRET_PATTERNS drift across known consumers of molecule-core's canonical.
The canonical SECRET_PATTERNS array in
.github/workflows/secret-scan.yml is mirrored by every other side
that scans for credentials: the workspace-runtime's bundled
pre-commit hook, the molecule-controlplane inlined copy, etc. The
mirror is enforced socially today — when someone adds a new pattern
to canonical (e.g. the sk-cp- MiniMax token after F1088), the other
sides are supposed to be updated in lockstep.
This script automates the check. Diffs the canonical's pattern set
against each known public consumer and exits non-zero on any
mismatch. Wired into a daily cron + on-push gate via
.github/workflows/secret-pattern-drift.yml.
Private-repo consumers (currently molecule-controlplane's inlined
copy) are out of scope here because the molecule-core workflow's
GITHUB_TOKEN can't read other private repos in the org. They're
expected to self-monitor via their own copy of this script — not a
hard barrier, just a future expansion.
"""
from __future__ import annotations
import re
import sys
import urllib.request
from pathlib import Path
CANONICAL_FILE = Path(".gitea/workflows/secret-scan.yml")
# Public consumer mirrors. Each entry is (label, raw_url) — raw_url
# points at the file's RAW content on the consumer's default branch
# (or staging where applicable). Add an entry here when a new public
# repo starts shipping its own SECRET_PATTERNS array.
CONSUMERS: list[tuple[str, str]] = [
(
"molecule-ai-workspace-runtime/molecule_runtime/scripts/pre-commit-checks.sh",
"https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-runtime/raw/branch/main/molecule_runtime/scripts/pre-commit-checks.sh",
),
]
# In-repo consumers — paths read locally from the workflow checkout.
# Read-from-disk avoids the staging→main lag that the URL fetcher
# would hit (a freshly-edited canonical wouldn't yet be on the
# consumer's default branch). Same drift semantics, no network.
LOCAL_CONSUMERS: list[tuple[str, Path]] = [
(
".githooks/pre-commit (molecule-core local hook)",
Path(".githooks/pre-commit"),
),
]
# Matches the SECRET_PATTERNS=( ... ) array in either yaml-indented
# (the canonical workflow's `run:` block) or shell-flat (runtime
# hook) format. Patterns inside are single-quoted Bash strings; we
# pull each via _PATTERN_RE.
#
# Closing `)` is anchored to the start of a line (possibly indented)
# because pattern comments like `# GitHub PAT (classic)` contain
# their own `)` mid-line — a non-anchored regex would match through
# the comment's paren and capture only the first pattern.
_ARRAY_RE = re.compile(r"SECRET_PATTERNS=\((.*?)^\s*\)", re.DOTALL | re.MULTILINE)
_PATTERN_RE = re.compile(r"'([^']+)'")
def extract_patterns(content: str, source_label: str) -> list[str]:
"""Pull the SECRET_PATTERNS list out of either format. Raises if missing."""
m = _ARRAY_RE.search(content)
if not m:
raise SystemExit(f"::error::{source_label}: SECRET_PATTERNS=(...) array not found")
return _PATTERN_RE.findall(m.group(1))
def fetch(url: str) -> str:
req = urllib.request.Request(
url, headers={"User-Agent": "secret-pattern-drift-lint/1"}
)
with urllib.request.urlopen(req, timeout=30) as resp:
return resp.read().decode("utf-8")
def diff_patterns(canonical: list[str], consumer: list[str]) -> tuple[list[str], list[str]]:
"""Return (missing_from_consumer, extra_in_consumer) — both sorted."""
canonical_set = set(canonical)
consumer_set = set(consumer)
return (
sorted(canonical_set - consumer_set),
sorted(consumer_set - canonical_set),
)
def main() -> int:
if not CANONICAL_FILE.exists():
print(f"::error::canonical not found at {CANONICAL_FILE}")
return 1
canonical = extract_patterns(CANONICAL_FILE.read_text(), str(CANONICAL_FILE))
print(f"canonical ({CANONICAL_FILE}): {len(canonical)} patterns")
drift = False
# In-repo consumers first — these are read from the workflow's own
# checkout, so they never lag behind the canonical and a missing
# file IS a real error (not a fetch warning).
for label, path in LOCAL_CONSUMERS:
if not path.exists():
print(f"::error::{label}: file not found at {path}")
drift = True
continue
consumer = extract_patterns(path.read_text(), label)
missing, extra = diff_patterns(canonical, consumer)
if not missing and not extra:
print(f"{label}: aligned ({len(consumer)} patterns)")
continue
drift = True
print(f"::error::DRIFT in {label}:")
for p in missing:
print(f" - missing from consumer: {p!r}")
for p in extra:
print(f" - extra in consumer (not in canonical): {p!r}")
for label, url in CONSUMERS:
try:
content = fetch(url)
except Exception as e:
# Fetch failures are warnings, not errors. A consumer
# whose default branch was just renamed (or whose file
# moved) shouldn't fail the lint until someone updates
# the URL above. Real drift is the failure mode this
# gate exists to catch — fetch reliability isn't.
print(f"::warning::{label}: fetch failed ({e}) — skipping")
continue
consumer = extract_patterns(content, label)
missing, extra = diff_patterns(canonical, consumer)
if not missing and not extra:
print(f"{label}: aligned ({len(consumer)} patterns)")
continue
drift = True
print(f"::error::DRIFT in {label}:")
for p in missing:
print(f" - missing from consumer: {p!r}")
for p in extra:
print(f" - extra in consumer (not in canonical): {p!r}")
if drift:
print()
print("::error::SECRET_PATTERNS drift detected. Bring consumer(s) into")
print("alignment with the canonical SECRET_PATTERNS array in")
print(f"{CANONICAL_FILE} by adding the missing patterns and removing")
print("any extras. The two sides must stay byte-aligned on the pattern")
print("list — the runtime hook is the developer's local pre-commit,")
print("the canonical is the org-wide CI gate, divergence means a token")
print("can pass one but get rejected by the other.")
return 1
print()
print("✓ All known consumers aligned with canonical SECRET_PATTERNS.")
return 0
if __name__ == "__main__":
sys.exit(main())