Commit Graph

6 Commits

Author SHA1 Message Date
Hongming Wang
c77a88c247 chore(security): pin Actions to SHAs + enable Dependabot auto-bumps
Supply-chain hardening for the CI pipeline. 23 workflow files
modified, 59 mutable-tag refs replaced with commit SHAs.

The risk

Every `uses:` reference in .github/workflows/*.yml was pinned to a
mutable tag (e.g., `actions/checkout@v4`). A maintainer of an
action — or a compromised maintainer account — can repoint that
tag to malicious code, and our pipelines silently pull it on the
next run. The tj-actions/changed-files compromise of March 2025 is
the canonical example: maintainer credential leak, attacker
repointed several `@v<N>` tags to a payload that exfiltrated
repository secrets. Repos that pinned to SHAs were unaffected.

The fix

Replace each `@v<N>` with `@<commit-sha> # v<N>`. The trailing
comment preserves human readability ("ah, this is v4"); the SHA
makes the reference immutable.

Actions covered (10 distinct):
  actions/{checkout,setup-go,setup-python,setup-node,upload-artifact,github-script}
  docker/{login-action,setup-buildx-action,build-push-action}
  github/codeql-action/{init,autobuild,analyze}
  dorny/paths-filter
  imjasonh/setup-crane
  pnpm/action-setup (already pinned in molecule-app, listed here for completeness)

Excluded:
  Molecule-AI/molecule-ci/.github/workflows/disable-auto-merge-on-push.yml@main
    — internal org reusable workflow; we control its repo, threat model
    is different from third-party actions. Conventional to pin to @main
    rather than SHA for internal reusables.

The maintenance cost

SHA pinning means upstream fixes require manual SHA bumps. Without
automation, pinned SHAs go stale. So this PR also enables Dependabot
across four ecosystems:

  - github-actions (workflows)
  - gomod (workspace-server)
  - npm (canvas)
  - pip (workspace runtime requirements)

Weekly cadence — the supply-chain attack window is "minutes between
repoint and pull"; weekly auto-bumps don't help with zero-days
regardless. The point is to pull in non-zero-day fixes without
operator effort.

Aligns with user-stated principle: "long-term, robust, fully-
automated, eliminate human error."

Companion PR: Molecule-AI/molecule-controlplane#308 (same pattern,
smaller surface).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 15:37:06 -07:00
Hongming Wang
2c8792d3e0 fix(ci): printf format-string sink + filename word-split in secret-scan
Two latent bash bugs in the canonical secret-scan workflow caught
during the post-merge review of molecule-controlplane #301 (a
private consumer that inlined this workflow's logic and got both
fixes there). Same bugs apply here; fixing in canonical means every
public consumer (gh-identity, github-app-auth, the 8 workspace
template repos) inherits the fix on their next workflow_call.

Bug 1: `printf "$OFFENDING"` is a format-string sink.

  OFFENDING is built from filenames: `${f} (matched: ${pattern})\n`.
  When passed to printf as the first argument, `%` characters in a
  filename are interpreted as conversion specifiers — corrupting the
  error message or printing `%(missing)` artifacts. No filename in
  the current tree triggers it, but a future test fixture, build
  artifact, or contributor-supplied path could.

  Fix: `printf '%b' "$OFFENDING"` interprets the literal `\n` we
  appended without treating OFFENDING as a format string.

Bug 2: `for f in $CHANGED` word-splits on whitespace.

  Filenames containing spaces would split into multiple tokens. The
  self-exclude check (`[ "$f" = "$SELF" ] && continue`) and the diff
  lookup would both operate on partial-path tokens. No filename in
  the current tree has whitespace, but the failure would be silent
  if one ever did.

  Fix: `while IFS= read -r f; do ... done <<< "$CHANGED"` reads
  whole lines as filenames. Added `[ -z "$f" ] && continue` to
  match the original `for` loop's implicit empty-input skip.

Both fixes are mechanically straightforward (~16 lines net diff,
mostly comments documenting the why). No behavior change for
filenames in the current tree; strictly better for the edge cases.

The same fixes already shipped in molecule-controlplane via #301
which inlined a copy of this workflow. The runtime's bundled
pre-commit hook (molecule-ai-workspace-runtime:
molecule_runtime/scripts/pre-commit-checks.sh) likely has the same
bugs — flagged as a follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 14:02:50 -07:00
rabbitblood
b81d8e9fc5 chore(secret-scan): add sk-cp- MiniMax pattern (F1088 retroactive fix) 2026-04-26 21:43:22 -07:00
rabbitblood
6e0a8e8e1c docs(ci): fix secret-scan reusable workflow self-doc — repo is molecule-core, ref is @staging 2026-04-26 15:44:31 -07:00
Hongming Wang
0ce537750c fix(ci): handle merge_group + shallow-clone BASE in secret-scan
[Molecule-Platform-Evolvement-Manager]

## What was breaking

Two distinct failure modes in `.github/workflows/secret-scan.yml`,
both visible after PR #2115 / #2117 hit the merge queue:

1. **`merge_group` events**: the script reads `github.event.before /
   after` to determine BASE/HEAD. Those properties only exist on
   `push` events. On `merge_group` events both came back empty, the
   script fell through to "no BASE → scan entire tree" mode, and
   false-positived on `canvas/src/lib/validation/__tests__/secret-formats.test.ts`
   which contains a `ghp_xxxx…` literal as a masking-function fixture.
   (Run 24966890424 — exit 1, "matched: ghp_[A-Za-z0-9]{36,}".)

2. **`push` events with shallow clone**: `fetch-depth: 2` doesn't
   always cover BASE across true merge commits. When BASE is in the
   payload but absent from the local object DB, `git diff` errors
   out with `fatal: bad object <sha>` and the job exits 128.
   (Run 24966796278 — push at 20:53Z merging #2115.)

## Fixes

- Add a dedicated fetch step for `merge_group.base_sha` (mirrors
  the existing pull_request base fetch) so the diff base is in the
  object DB before `git diff` runs.
- Move event-specific SHAs into a step `env:` block so the script
  uses a clean `case` over `${{ github.event_name }}` instead of
  a single `if pull_request / else push` that left merge_group on
  the empty branch.
- Add an on-demand fetch for the push-event BASE when it isn't in
  the shallow clone, plus a `git cat-file -e` guard before the
  diff so we fall through cleanly to the "scan entire tree" path
  if the fetch fails (correct, just slower) instead of exiting 128.

## Defense-in-depth

`secret-formats.test.ts` had two literal continuous-string fixtures
(`'ghp_xxxx…'`, `'github_pat_xxxx…'`). The ghp_ one matched the
secret-scan regex. Switched both to the `'prefix_' + 'x'.repeat(N)`
pattern already used elsewhere in the same file — runtime value is
the same, but the literal source text no longer matches the regex
even if the BASE detection ever falls back to tree-scan mode again.

## Test plan

- [x] No remaining regex matches in the secret-formats.test.ts source
- [x] YAML structure preserved
- [ ] CI passes on this PR's pull_request scan (was already passing)
- [ ] CI passes on this PR's merge_group scan (the new path)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 14:08:19 -07:00
rabbitblood
8edbd12980 feat(ci): add secret-scan workflow + reusable entry point for org-wide enrollment
Defense-in-depth for the #2090-class incident (2026-04-24): GitHub's
hosted Copilot Coding Agent leaked a ghs_* installation token into
tenant-proxy/package.json via npm init slurping the URL from a
token-embedded origin remote. We can't fix upstream's clone hygiene,
so we gate at the PR layer.

Single workflow, dual purpose:

1. PR / push / merge_group gate on this repo (molecule-monorepo).
   Refuses any change whose diff additions contain a credential-shaped
   string. Same shape as Block forbidden paths — error message tells
   the agent how to recover without echoing the secret value.

2. Reusable workflow entry point (workflow_call) for the rest of the
   org. Other Molecule-AI repos enroll with a 3-line workflow:

     jobs:
       secret-scan:
         uses: Molecule-AI/molecule-monorepo/.github/workflows/secret-scan.yml@main

   This makes molecule-monorepo the single source of truth for the
   regex set; consumer repos pick up new patterns without per-repo PRs.

Pattern set covers GitHub family (ghp_, ghs_, gho_, ghu_, ghr_,
github_pat_), Anthropic / OpenAI / Slack / AWS. Mirror of the
runtime's bundled pre-commit hook (molecule-ai-workspace-runtime:
molecule_runtime/scripts/pre-commit-checks.sh) — keep aligned when
either side adds a pattern.

Self-exclude on .github/workflows/secret-scan.yml so the file's own
regex literals don't block its merge.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 12:05:18 -07:00