From 09010212a0a887071d1c1f855b586641d8cca939 Mon Sep 17 00:00:00 2001 From: Hongming Wang Date: Sun, 3 May 2026 03:52:39 -0700 Subject: [PATCH] feat(ci): structural drift gate for cascade list vs manifest (RFC #388 PR-3) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Closes the recurrence path of PR #2556. The data fix realigned 8→4 templates in publish-runtime.yml's TEMPLATES variable, but the underlying drift hazard was unguarded — the next manifest change could silently leave cascade out of sync again. This gate fails any PR that changes manifest.json or publish-runtime.yml in a way that makes the cascade list diverge from manifest workspace_templates (suffix-stripped). Either direction is caught: missing-from-cascade templates that won't auto-rebuild on a new wheel publish (the codex-stuck-on-stale-runtime bug class — PR #2512 added codex to manifest, cascade wasn't updated, codex stayed pinned to its last-built runtime version for weeks). extra-in-cascade cascade dispatches to deprecated templates (the wasted-API-calls + dead-CI-noise class — PR #2536 pruned 5 templates from manifest; cascade kept dispatching to all 8 until PR #2556). Triggers narrowly: only on PRs that touch manifest.json, publish-runtime.yml, or the script itself. Fast (single grep+sed+comm pipeline, no Go build). Surfaced during the RFC #388 prior-art audit; folded in as the structural follow-up to the data fix #2556 promised. Self-tested both failure modes locally before commit: - Drop codex from cascade → script fails with "MISSING: codex" - Add langgraph to cascade → script fails with "EXTRA: langgraph" Refs: https://github.com/Molecule-AI/molecule-controlplane/issues/388 Co-Authored-By: Claude Opus 4.7 (1M context) --- .github/workflows/cascade-list-drift-gate.yml | 39 ++++++++ scripts/check-cascade-list-vs-manifest.sh | 95 +++++++++++++++++++ 2 files changed, 134 insertions(+) create mode 100644 .github/workflows/cascade-list-drift-gate.yml create mode 100755 scripts/check-cascade-list-vs-manifest.sh diff --git a/.github/workflows/cascade-list-drift-gate.yml b/.github/workflows/cascade-list-drift-gate.yml new file mode 100644 index 00000000..284a68d8 --- /dev/null +++ b/.github/workflows/cascade-list-drift-gate.yml @@ -0,0 +1,39 @@ +name: cascade-list-drift-gate + +# Structural gate: TEMPLATES list in publish-runtime.yml must match +# manifest.json's workspace_templates exactly. Closes the recurrence +# path of PR #2556 (the data fix) and is the first concrete deliverable +# of RFC #388 PR-3. +# +# Why a gate, not just discipline: PR #2536 pruned the manifest, but the +# cascade list wasn't updated for ~weeks before someone (PR #2556) +# noticed during an unrelated audit. During that window, codex never +# rebuilt on a runtime publish. A structural gate catches the drift +# the same day either file changes. +# +# Triggers narrowly to keep CI quiet: only on PRs that actually change +# one of the two files. The path-filtered split + always-emit-result +# pattern (memory: "Required check names need a job that always runs") +# is unnecessary here because the workflow IS the check name and PR +# branch protection should require it directly. Future-proof: if this +# becomes a required check, add a no-op aggregator with always() so the +# name still emits when paths don't match. + +on: + pull_request: + branches: [staging, main] + paths: + - manifest.json + - .github/workflows/publish-runtime.yml + - scripts/check-cascade-list-vs-manifest.sh + +permissions: + contents: read + +jobs: + check: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4 + - name: Check cascade list matches manifest + run: bash scripts/check-cascade-list-vs-manifest.sh diff --git a/scripts/check-cascade-list-vs-manifest.sh b/scripts/check-cascade-list-vs-manifest.sh new file mode 100755 index 00000000..434069a5 --- /dev/null +++ b/scripts/check-cascade-list-vs-manifest.sh @@ -0,0 +1,95 @@ +#!/usr/bin/env bash +# check-cascade-list-vs-manifest.sh — structural drift gate for the +# publish-runtime cascade list vs manifest.json workspace_templates. +# +# WHY: PR #2536 pruned the manifest to 4 supported runtimes; PR #2556 +# realigned the cascade list to match. The underlying drift hazard +# (cascade-list ≠ manifest) was unguarded — the data fix didn't prevent +# recurrence. This script is the structural gate that does. +# +# Behavior-based per project pattern: derives the expected set from +# manifest.json and the actual set from the workflow YAML, fails on +# any divergence in either direction. +# +# missing-from-cascade → templates in manifest that publish-runtime.yml +# won't auto-rebuild on a new wheel publish +# (the codex-stuck-on-stale-runtime bug class) +# extra-in-cascade → cascade dispatches to deprecated templates +# (the wasted-API-calls + dead-CI-noise class) +# +# Suffix mapping: manifest names map to GHCR repos via +# {name without -default suffix} → molecule-ai-workspace-template- +# That's the same map publish-runtime.yml's TEMPLATES variable iterates. +# +# Exit: +# 0 cascade matches manifest exactly +# 1 drift detected (script prints the diff) +# 2 bad usage / missing inputs + +set -eu + +MANIFEST="${1:-manifest.json}" +WORKFLOW="${2:-.github/workflows/publish-runtime.yml}" + +if [ ! -f "$MANIFEST" ]; then + echo "::error::manifest not found: $MANIFEST" >&2 + exit 2 +fi +if [ ! -f "$WORKFLOW" ]; then + echo "::error::workflow not found: $WORKFLOW" >&2 + exit 2 +fi + +# Expected cascade entries: manifest workspace_templates → suffix-only +# (strip -default tail, e.g. claude-code-default → claude-code, since +# publish-runtime.yml's TEMPLATES uses suffixes that match the +# molecule-ai-workspace-template- repo naming). +EXPECTED=$(jq -r '.workspace_templates[].name' "$MANIFEST" \ + | sed 's/-default$//' \ + | sort -u) + +# Actual cascade entries: extract from the TEMPLATES="…" line. We look +# for the line, pull the contents between the quotes, and split into +# one-per-line. Single source of truth in the workflow itself, no +# parallel registry needed. +# +# Why not \s in the regex: BSD sed (macOS) doesn't recognize \s as +# whitespace — treats it as literal `s`. POSIX [[:space:]] works on +# both BSD and GNU sed. Same hazard nuked the original draft of this +# script: \s* matched empty-prefix-of-literal-s, then the leading +# whitespace stayed in the captured group. +ACTUAL=$(grep -E '[[:space:]]*TEMPLATES="' "$WORKFLOW" \ + | head -1 \ + | sed -E 's/^[[:space:]]*TEMPLATES="([^"]*)".*$/\1/' \ + | tr ' ' '\n' \ + | grep -v '^$' \ + | sort -u) + +if [ -z "$ACTUAL" ]; then + echo "::error::could not extract TEMPLATES=\"…\" from $WORKFLOW — has the variable name or quoting changed?" >&2 + exit 2 +fi + +MISSING=$(comm -23 <(printf '%s\n' "$EXPECTED") <(printf '%s\n' "$ACTUAL")) +EXTRA=$(comm -13 <(printf '%s\n' "$EXPECTED") <(printf '%s\n' "$ACTUAL")) + +if [ -z "$MISSING" ] && [ -z "$EXTRA" ]; then + echo "✓ cascade list matches manifest workspace_templates ($(echo "$EXPECTED" | wc -l | tr -d ' ') entries)" + exit 0 +fi + +echo "::error::cascade list drift detected between $MANIFEST and $WORKFLOW" >&2 +echo "" >&2 +if [ -n "$MISSING" ]; then + echo " Templates in manifest but MISSING from cascade (won't auto-rebuild on wheel publish):" >&2 + echo "$MISSING" | sed 's/^/ - /' >&2 + echo "" >&2 +fi +if [ -n "$EXTRA" ]; then + echo " Templates in cascade but NOT in manifest (deprecated, wasting dispatch calls):" >&2 + echo "$EXTRA" | sed 's/^/ - /' >&2 + echo "" >&2 +fi +echo " Fix: edit the TEMPLATES=\"…\" line in $WORKFLOW so the set matches" >&2 +echo " manifest.json's workspace_templates (suffix-stripped). See PR #2556 for context." >&2 +exit 1