Compare commits

...

1 Commits

Author SHA1 Message Date
core-devops 60da675ea3 ci(cascade): structural hardening — .gitea-aware probe + convergence assertion + PEP 440 enforcement (RFC internal#613)
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Waiting to run
sop-checklist / review-refire (pull_request) Waiting to run
sop-tier-check / tier-check (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 4s
cascade-list-drift-gate / check (pull_request) Failing after 6s
CI / Detect changes (pull_request) Successful in 6s
CI / Platform (Go) (pull_request) Successful in 4m12s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 8s
CI / Canvas (Next.js) (pull_request) Successful in 5m19s
CI / Python Lint & Test (pull_request) Successful in 6m34s
CI / all-required (pull_request) Successful in 4m21s
E2E API Smoke Test / detect-changes (pull_request) Successful in 5s
E2E Chat / detect-changes (pull_request) Successful in 7s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 5s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 3s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 4s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 5s
Lint no tenant GITEA or GITHUB token write / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 5s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 1m12s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 1m13s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Failing after 58s
lint-required-workflows-docker-host-pinned / Lint docker-host pin on docker-touching workflows (pull_request) Successful in 3s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m4s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 5s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 3s
gate-check-v3 / gate-check (pull_request) Successful in 4s
qa-review / approved (pull_request) Failing after 3s
security-review / approved (pull_request) Failing after 4s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
sop-checklist / na-declarations (pull_request) N/A: (none)
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m12s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 3s
E2E Chat / E2E Chat (pull_request) Successful in 2s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 4s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 1s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 1s
Fixes the three structural defects surfaced by incident a66eb848:

1. The `.github/`-only probe at line 282-289 caused codex (which only
   carries `.gitea/workflows/publish-image.yml`) to be silently soft-skipped
   → no `.runtime-version` written → silent drift to PyPI floor pin.
   Now probes BOTH directories; soft-skip only when neither exists.

2. No post-flight read-back asserted mirror convergence. The openclaw
   `0.1.1000\n# fire-publish-image-…` literal (b40c39ba1) and the
   claude-code↔openclaw 0.1.129↔0.1.1000 drift both went undetected for
   days because `head -n1` consumer in publish-image.yml masked the
   malformed second line and there was no canonical-value check.
   New `cascade-converged` job fetches each non-skipped mirror's
   `.runtime-version`, head -n1 normalizes, compares to canonical
   `RUNTIME_VERSION`, emits `::error msg=cascade-divergence …` for Loki
   scrape + main-red-watchdog page. Fails the publish run on any
   divergence or missing pin.

3. No per-mirror write-side PEP 440 enforcement allowed b40c39ba1's
   `# fire-publish-image-<epoch>` literal to land. Added a strict regex
   gate before the `echo "$VERSION" > .runtime-version` write (symmetric
   with the publisher-side check at publish-runtime.yml:101), with a
   top-of-loop pre-check that aborts the whole fan-out on a contract
   violation.

Risk: low. Happy path (every cascade-active mirror at a valid PEP 440
version with publish-image.yml present) behaves identically — same
clone, same write, same push. New behavior only on the three known
failure modes.

Verification:
- Bug 1: codex now writes `.runtime-version` on next cascade fire
  (currently missing per direct contents-API probe 2026-05-20).
- Bug 2: artificial divergence (edit one mirror out-of-band, re-fire
  cascade) → cascade-converged job fails with the diverged template +
  observed value.
- Bug 3: a malformed VERSION at the top of the cascade step exits
  non-zero before any clone; a per-template malformed write attempt
  hits the inner regex and adds the template to FAILED.

RFC: internal#613
Incident: a66eb848
Memory cross-links:
- feedback_per_repo_gitea_vs_github_actions_dir
- reference_publish_runtime_pipeline
- feedback_molecule_core_qa_review_team_required

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 03:12:44 -07:00
+106 -3
View File
@@ -274,16 +274,39 @@ jobs:
git config --global user.name "publish-runtime cascade"
git config --global user.email "publish-runtime@moleculesai.app"
# PEP 440 strict regex (Fix #3 — RFC internal#613). Symmetric with
# the publisher-side check at publish-runtime.yml:101 — reject
# malformed values at the writer too, so a future caller that
# bypasses the publisher can't leak a non-PEP-440 string (the
# `# fire-publish-image-<epoch>` literal that b40c39ba1 injected
# into openclaw's .runtime-version slipped past head -n1 at
# provision time but is still a real fire-flag).
PEP440_RE='^[0-9]+\.[0-9]+\.[0-9]+(rc[0-9]+|a[0-9]+|b[0-9]+|\.post[0-9]+|\.dev[0-9]+)?$'
if ! echo "$VERSION" | grep -qE "$PEP440_RE"; then
echo "::error::cascade refusing to fan out non-PEP-440 value '$VERSION' — publisher contract violation"
exit 1
fi
WORKDIR="$(mktemp -d)"
for tpl in $TEMPLATES; do
REPO="molecule-ai/molecule-ai-workspace-template-$tpl"
CLONE="$WORKDIR/$tpl"
HTTP=$(curl -sS -o /dev/null -w "%{http_code}" \
# Fix #1 (RFC internal#613) — probe BOTH .github/ and .gitea/.
# The codex template only ports .gitea/workflows/ (no .github/
# mirror). The legacy .github/-only probe returned 404 on codex
# → soft-skip → codex never received .runtime-version → silent
# drift to PyPI floor (incident a66eb848). Soft-skip ONLY if
# NEITHER workflow file exists. Pairs with memory
# `feedback_per_repo_gitea_vs_github_actions_dir`.
HTTP_GH=$(curl -sS -o /dev/null -w "%{http_code}" \
-H "Authorization: token $DISPATCH_TOKEN" \
"$GITEA_URL/api/v1/repos/$REPO/contents/.github/workflows/publish-image.yml")
if [ "$HTTP" = "404" ]; then
echo "↷ $tpl has no publish-image.yml — soft-skip"
HTTP_GT=$(curl -sS -o /dev/null -w "%{http_code}" \
-H "Authorization: token $DISPATCH_TOKEN" \
"$GITEA_URL/api/v1/repos/$REPO/contents/.gitea/workflows/publish-image.yml")
if [ "$HTTP_GH" = "404" ] && [ "$HTTP_GT" = "404" ]; then
echo "↷ $tpl has no publish-image.yml in either .github/ or .gitea/ — soft-skip"
SKIPPED="$SKIPPED $tpl"
continue
fi
@@ -302,6 +325,15 @@ jobs:
fi
cd "$CLONE"
# Fix #3 (RFC internal#613) — re-validate at per-mirror write
# site (defense-in-depth in case future edits mutate $VERSION
# inside the loop, e.g. a per-template suffix).
if ! echo "$VERSION" | grep -qE "$PEP440_RE"; then
echo "::error::refusing to write non-PEP-440 value '$VERSION' to $tpl/.runtime-version"
FAILED="$FAILED $tpl"
cd - >/dev/null
break
fi
echo "$VERSION" > .runtime-version
if git diff --quiet -- .runtime-version; then
@@ -343,3 +375,74 @@ jobs:
else
echo "Cascade complete: $VERSION pinned across all manifest workspace_templates."
fi
# Fix #2 (RFC internal#613) — post-flight convergence assertion.
#
# The `cascade` job above writes .runtime-version to each non-skipped
# template, but until this job existed there was no read-back step
# asserting that every mirror ended up at the SAME canonical value.
# The openclaw `0.1.1000\n# fire-publish-image-…` literal (b40c39ba1)
# and the claude-code ↔ openclaw 0.1.129 ↔ 0.1.1000 divergence both
# went undetected for days because the head -n1 consumer in
# publish-image.yml masked the malformed line at provision time.
#
# This job fetches each template's .runtime-version via the Gitea
# contents API, head -n1 normalizes it (matches what publish-image.yml
# consumes), and compares to the canonical RUNTIME_VERSION. Loud failure
# on any divergence — Loki's gitea-actions scraper picks up the
# `::error::` line and the existing main-red-watchdog page fires.
cascade-converged:
needs: [publish, cascade]
runs-on: publish
steps:
- name: Assert all cascaded mirrors converged to canonical version
env:
DISPATCH_TOKEN: ${{ secrets.DISPATCH_TOKEN }}
RUNTIME_VERSION: ${{ needs.publish.outputs.version }}
run: |
set +e
GITEA_URL="${GITEA_URL:-https://git.moleculesai.app}"
TEMPLATES="claude-code hermes openclaw codex langgraph crewai autogen deepagents gemini-cli"
DIVERGED=""
MISSING=""
OK=""
for tpl in $TEMPLATES; do
REPO="molecule-ai/molecule-ai-workspace-template-$tpl"
# Skip templates that have no publish-image.yml (matches Fix #1
# soft-skip semantics — those legitimately don't carry a pin).
HTTP_GH=$(curl -sS -o /dev/null -w "%{http_code}" \
-H "Authorization: token $DISPATCH_TOKEN" \
"$GITEA_URL/api/v1/repos/$REPO/contents/.github/workflows/publish-image.yml")
HTTP_GT=$(curl -sS -o /dev/null -w "%{http_code}" \
-H "Authorization: token $DISPATCH_TOKEN" \
"$GITEA_URL/api/v1/repos/$REPO/contents/.gitea/workflows/publish-image.yml")
if [ "$HTTP_GH" = "404" ] && [ "$HTTP_GT" = "404" ]; then
continue
fi
RV_B64=$(curl -sS -H "Authorization: token $DISPATCH_TOKEN" \
"$GITEA_URL/api/v1/repos/$REPO/contents/.runtime-version" \
| python -c "import sys,json; d=json.load(sys.stdin); print(d.get('content','').replace('\n',''))" 2>/dev/null)
if [ -z "$RV_B64" ]; then
echo "::error msg=cascade-divergence template=$tpl reason=missing-runtime-version::"
MISSING="$MISSING $tpl"
continue
fi
GOT=$(echo "$RV_B64" | base64 -d 2>/dev/null | head -n1 | tr -d '[:space:]')
if [ "$GOT" = "$RUNTIME_VERSION" ]; then
echo "✓ $tpl converged at $GOT"
OK="$OK $tpl"
else
echo "::error msg=cascade-divergence template=$tpl got=$GOT want=$RUNTIME_VERSION::"
DIVERGED="$DIVERGED $tpl(got=$GOT)"
fi
done
if [ -n "$DIVERGED" ] || [ -n "$MISSING" ]; then
echo "::error::Cascade convergence FAILED — diverged:$DIVERGED missing:$MISSING"
exit 1
fi
echo "Cascade convergence OK — all cascade-active mirrors at $RUNTIME_VERSION:$OK"