Supply-chain hardening for the CI pipeline. 23 workflow files
modified, 59 mutable-tag refs replaced with commit SHAs.
The risk
Every `uses:` reference in .github/workflows/*.yml was pinned to a
mutable tag (e.g., `actions/checkout@v4`). A maintainer of an
action — or a compromised maintainer account — can repoint that
tag to malicious code, and our pipelines silently pull it on the
next run. The tj-actions/changed-files compromise of March 2025 is
the canonical example: maintainer credential leak, attacker
repointed several `@v<N>` tags to a payload that exfiltrated
repository secrets. Repos that pinned to SHAs were unaffected.
The fix
Replace each `@v<N>` with `@<commit-sha> # v<N>`. The trailing
comment preserves human readability ("ah, this is v4"); the SHA
makes the reference immutable.
Actions covered (10 distinct):
actions/{checkout,setup-go,setup-python,setup-node,upload-artifact,github-script}
docker/{login-action,setup-buildx-action,build-push-action}
github/codeql-action/{init,autobuild,analyze}
dorny/paths-filter
imjasonh/setup-crane
pnpm/action-setup (already pinned in molecule-app, listed here for completeness)
Excluded:
Molecule-AI/molecule-ci/.github/workflows/disable-auto-merge-on-push.yml@main
— internal org reusable workflow; we control its repo, threat model
is different from third-party actions. Conventional to pin to @main
rather than SHA for internal reusables.
The maintenance cost
SHA pinning means upstream fixes require manual SHA bumps. Without
automation, pinned SHAs go stale. So this PR also enables Dependabot
across four ecosystems:
- github-actions (workflows)
- gomod (workspace-server)
- npm (canvas)
- pip (workspace runtime requirements)
Weekly cadence — the supply-chain attack window is "minutes between
repoint and pull"; weekly auto-bumps don't help with zero-days
regardless. The point is to pull in non-zero-day fixes without
operator effort.
Aligns with user-stated principle: "long-term, robust, fully-
automated, eliminate human error."
Companion PR: Molecule-AI/molecule-controlplane#308 (same pattern,
smaller surface).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes#134. The post-merge review of #2196 flagged that the combined
workflow's `paths:` filter (the union of both jobs' needs:
`workspace/**` + `scripts/build_runtime_package.py` + the workflow
itself) caused the `pypi-latest-install` job to fire on every
doc-only / adapter-only / unrelated workspace/ edit. The PyPI artifact
that job tests against can't change based on our workspace/ source —
only on actual PyPI publishes — so those runs add noise without
information.
Splits the previously-merged combined workflow:
runtime-pin-compat.yml (kept):
- PyPI-latest install + import smoke (was: pypi-latest-install)
- Narrow `paths:` filter — only fires when workspace/requirements.txt
or this workflow file changes
- Cron-driven daily for upstream-yank detection (unchanged)
runtime-prbuild-compat.yml (new):
- PR-built wheel + import smoke (was: local-build-install)
- Broad `paths:` filter — fires on any workspace/ source change,
scripts/build_runtime_package.py, or this workflow file
- No cron (workspace/ doesn't change between firings)
Behavior identical to before for content; only the trigger surface is
narrower per-job. Each workflow's name is its own status check, so
branch protection (which currently lists neither as required) can
gate them independently in future.
The prior comment in the combined file explicitly acknowledged the
asymmetry and proposed this split as a follow-up; this is that
follow-up.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes#128's chicken-and-egg. The original gate installed the
CURRENTLY-PUBLISHED molecule-ai-workspace-runtime from PyPI, then
overlaid workspace/requirements.txt, then smoke-imported. That
catches problems with the already-shipped artifact (the daily-cron
upstream-yank case), but it cannot catch problems introduced by the
PR itself: the imports it exercises are from the OLD wheel, not the
PR's source. A PR that adds `from a2a.utils.foo import bar` (where
`bar` is added in a2a-sdk 1.5 and the runtime currently pins 1.3)
slips through:
1. Pip resolves the existing PyPI wheel + a2a-sdk 1.3.
2. Smoke imports the OLD main.py — no reference to `bar` → green.
3. Merge → publish-runtime.yml ships a wheel WITH the new import.
4. Tenant images redeploy → all crash on first boot with
ImportError: cannot import name 'bar' from 'a2a.utils.foo'.
Splits the workflow into two jobs:
- pypi-latest-install (renamed from default-install): unchanged
behavior. Runs on the daily cron and on requirements.txt /
workflow edits. Catches upstream PyPI yanks + the
already-shipped artifact going stale.
- local-build-install (new): runs scripts/build_runtime_package.py
on the PR's workspace/, builds the wheel with python -m build
(mirroring publish-runtime.yml byte-for-byte), installs that
wheel, then runs the same smoke import. Tests the artifact
that WOULD be published if this PR merges.
Path filter widened to workspace/** so any runtime-source change
triggers the local-build job. The pypi-latest job's filter is the
same union; its internal logic is unchanged so the daily-cron and
upstream-detection use cases continue to work.
Verified locally: built the wheel from current workspace/ source via
the same script + python -m build invocation, installed into a fresh
venv, imported from molecule_runtime.main import main_sync
successfully.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
platform_auth.py validates WORKSPACE_ID at module load — EC2 user-data
sets it from cloud-init, but the CI smoke-test was missing it and
failed with 'WORKSPACE_ID is empty'. Set a placeholder UUID so the
import gate exercises only the dep-resolution path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Review of the runtime-pin-compat workflow:
- Add merge_group trigger so when this becomes a required check the
queue green-checks it (mirrors ci.yml convention).
- Cache pip on workspace/requirements.txt — actions/setup-python@v5
with cache: pip + cache-dependency-path. Saves ~30s per fire.
- Document the load-bearing install order: runtime FIRST so pip
honors the runtime's declared a2a-sdk constraint (the surface that
broke 2026-04-24); workspace/requirements.txt SECOND so a2a-sdk
is upgraded to the runtime image's pinned version. Import smoke
validates the upgraded combination.
Skipped: branch-protection wiring (separate ops decision, not in
scope here); ci.yml integration (the standalone schedule trigger
is the load-bearing reason to keep this workflow separate).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes Molecule-AI/molecule-controlplane#253.
Prevents recurrence of the 5-hour staging outage from 2026-04-24:
molecule-ai-workspace-runtime 0.1.13 declared `a2a-sdk<1.0` in its
metadata but actually imported `a2a.server.routes` (1.0+ only). pip
resolved successfully; every tenant workspace crashed at import. The
canary tenant ultimately caught it but only after 5 hours of degraded
staging. PR #249 fixed the version pin manually; nothing automated
catches the same class of bug for the next release.
This workflow:
- Installs molecule-ai-workspace-runtime fresh from PyPI in a Python
3.11 venv (mirrors EC2 user-data install pattern)
- Layers in workspace/requirements.txt (the runtime image's actual
dep set, including the a2a-sdk[http-server]>=1.0,<2.0 pin)
- Runs `from molecule_runtime.main import main_sync` — same import
the runtime entrypoint does
- Fails CI if pip resolution silently produced a combo that the
runtime can't actually import
Triggers:
- PR + push to main/staging touching workspace/requirements.txt or
this workflow (catches local pin changes)
- Daily 13:00 UTC schedule (catches upstream PyPI publishes that
break the pin combo without any change in our repo)
- workflow_dispatch (manual)
Concurrency cancels in-progress runs on the same ref.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>