Supply-chain hardening for the CI pipeline. 23 workflow files
modified, 59 mutable-tag refs replaced with commit SHAs.
The risk
Every `uses:` reference in .github/workflows/*.yml was pinned to a
mutable tag (e.g., `actions/checkout@v4`). A maintainer of an
action — or a compromised maintainer account — can repoint that
tag to malicious code, and our pipelines silently pull it on the
next run. The tj-actions/changed-files compromise of March 2025 is
the canonical example: maintainer credential leak, attacker
repointed several `@v<N>` tags to a payload that exfiltrated
repository secrets. Repos that pinned to SHAs were unaffected.
The fix
Replace each `@v<N>` with `@<commit-sha> # v<N>`. The trailing
comment preserves human readability ("ah, this is v4"); the SHA
makes the reference immutable.
Actions covered (10 distinct):
actions/{checkout,setup-go,setup-python,setup-node,upload-artifact,github-script}
docker/{login-action,setup-buildx-action,build-push-action}
github/codeql-action/{init,autobuild,analyze}
dorny/paths-filter
imjasonh/setup-crane
pnpm/action-setup (already pinned in molecule-app, listed here for completeness)
Excluded:
Molecule-AI/molecule-ci/.github/workflows/disable-auto-merge-on-push.yml@main
— internal org reusable workflow; we control its repo, threat model
is different from third-party actions. Conventional to pin to @main
rather than SHA for internal reusables.
The maintenance cost
SHA pinning means upstream fixes require manual SHA bumps. Without
automation, pinned SHAs go stale. So this PR also enables Dependabot
across four ecosystems:
- github-actions (workflows)
- gomod (workspace-server)
- npm (canvas)
- pip (workspace runtime requirements)
Weekly cadence — the supply-chain attack window is "minutes between
repoint and pull"; weekly auto-bumps don't help with zero-days
regardless. The point is to pull in non-zero-day fixes without
operator effort.
Aligns with user-stated principle: "long-term, robust, fully-
automated, eliminate human error."
Companion PR: Molecule-AI/molecule-controlplane#308 (same pattern,
smaller surface).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The sweep-cf-orphans workflow shipped in #2088 was noisier than
intended in two ways. This PR fixes both — was filed under the
Optional finding I left on the original review and now matters because
the noise is observably hitting the merge queue.
1) `merge_group: types: [checks_requested]` was firing the entire
sweep job on every PR through the merge queue. The original intent
("future required-check support without a workflow edit") never
materialized, and meanwhile every recent merge-queue eval (#2091,
#2092, #2093, #2094, #2095, #2097) generated a red `Sweep CF
orphans (merge_group)` run.
Drop the trigger. Comment in the workflow explains the re-add path
if/when the workflow IS wired as a required check (re-add the
trigger AND gate the actual sweep step with
`if: github.event_name != 'merge_group'` so merge-queue evals are
no-op success).
2) The `Verify required secrets present` step exits 2 when the 6
secrets aren't configured yet (the PR body's post-merge step,
still pending). That turns the hourly schedule into an hourly red
CI run for as long as the secrets stay unset.
Convert to a soft skip: emit a `:⚠️:` listing the missing
secrets and set a `skip=true` step output, then gate the sweep
step with `if: steps.verify.outputs.skip != 'true'`. Workflow
reports green and ops still sees the warning when they review
recent runs.
Net effect:
- merge-queue evals stop generating spurious red runs
- the schedule reports green-with-warning until secrets land
- once secrets land, behavior is identical to today's (real sweep
runs, hard-fails if a secret is later removed)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Drop redundant 'aws --version' step. Script's own 'aws ec2
describe-instances' fails just as loud with a more actionable
error; the pre-check added ~1s with no signal value.
- timeout-minutes 10 → 3. Realistic worst case is ~2min (4 curls +
1 aws + N×CF-DELETE each individually capped at 10s by the
script's curl -m flag). 3 surfaces hangs within one cron tick
instead of burning the full interval.
- Document the schedule-vs-dispatch dry-run asymmetry inline so
the next reader doesn't need to trace input defaults.
- Add merge_group: types: [checks_requested] for queue parity with
runtime-pin-compat.yml — cheap insurance if this ever becomes a
required check.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes Molecule-AI/molecule-controlplane#239.
CF zone hit the 200-record quota 2026-04-23+ — every E2E and canary
left a record on moleculesai.app, and no scheduled job pruned them.
Provisions started failing with code 81045 ('Record quota exceeded').
The sweep-cf-orphans.sh script (PR #1978, with decision-function
unit tests added in #2079) already exists but no workflow fires it.
Adding it here as a parallel janitor to sweep-stale-e2e-orgs.yml:
- hourly schedule at :15 (offset from the e2e-orgs sweep at :00 so
the two converge cleanly without racing the same CP admin endpoint)
- workflow_dispatch with dry_run input default true (ad-hoc verify
without committing to deletes)
- workflow_dispatch with max_delete_pct input for major cleanups
(the script's own MAX_DELETE_PCT defaults to 50% as a safety gate)
- concurrency group prevents schedule + manual-dispatch from racing
the same zone
Why a separate workflow vs sweep-stale-e2e-orgs.yml:
- That workflow drives DELETE /cp/admin/tenants/:slug, assumes CP
has the org row. Doesn't catch records left when CP itself never
knew about the tenant (canary scratch, manual ops experiments)
or when the CP-side cascade's CF-delete branch failed.
- sweep-cf-orphans.sh enumerates the CF zone directly + matches
against live CP slugs + AWS EC2 names. Catches what the CP-driven
sweep can't.
Required secrets (will need to be set on the repo): CF_API_TOKEN,
CF_ZONE_ID, CP_PROD_ADMIN_TOKEN, CP_STAGING_ADMIN_TOKEN,
AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY. Pre-flight verify-secrets
step fails loud if any are missing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>