Compare commits

...

5 Commits

Author SHA1 Message Date
Hongming Wang 4279fecde5 fix(ci): keep codex in TEMPLATES + skip-if-no-publish-image.yml
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 6s
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (pull_request) Successful in 6s
cascade-list-drift-gate / check (pull_request) Successful in 13s
CI / Detect changes (pull_request) Successful in 9s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 6s
E2E API Smoke Test / detect-changes (pull_request) Successful in 9s
pr-guards / disable-auto-merge-on-push (pull_request) Failing after 1s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 3s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 5s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 17s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 16s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 6s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Failing after 1m39s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 6s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 25s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 5s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 51s
CI / Canvas (Next.js) (pull_request) Failing after 5m16s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / Platform (Go) (pull_request) Successful in 5m22s
CI / Python Lint & Test (pull_request) Successful in 15m42s
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Failing after 19m46s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Failing after 20m54s
The v2 dropped codex from TEMPLATES on the basis of "no
publish-image.yml = not part of cascade today." That was correct
about the immediate behavior but tripped cascade-list-drift-gate.yml
because manifest.json still declares codex (it IS a live runtime —
referenced from workspace/config.py and cloned into dev envs by
clone-manifest.sh; only the image-publish path is missing).

Restore codex to TEMPLATES (matching manifest) and add a runtime
soft-skip: probe each repo for .github/workflows/publish-image.yml
via the Gitea contents API and skip cleanly if 404. Final job log
distinguishes "complete across all" vs "complete with soft-skips".

This preserves the drift gate's invariant (TEMPLATES == manifest)
while honoring the empirical fact that codex has no publish-image
workflow yet. If codex later gains the workflow, no change here is
needed — the probe will see 200 and the cascade will fan out to it
naturally.

Refs molecule-core#14, molecule-core#20.
2026-05-07 03:32:53 -07:00
Hongming Wang 607444e71b feat(ci): replace curl-dispatch with push-mode cascade (v2)
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 5s
cascade-list-drift-gate / check (pull_request) Failing after 9s
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (pull_request) Successful in 4s
CI / Detect changes (pull_request) Successful in 5s
E2E API Smoke Test / detect-changes (pull_request) Successful in 11s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 5s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 6s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 6s
pr-guards / disable-auto-merge-on-push (pull_request) Failing after 2s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 7s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 6s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Failing after 1m21s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 46s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Failing after 1m28s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 10s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 5s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 7s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 5s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 26s
CI / Platform (Go) (pull_request) Successful in 3m32s
CI / Canvas (Next.js) (pull_request) Failing after 3m34s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / Python Lint & Test (pull_request) Successful in 16m16s
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Failing after 20m25s
Empirical blocker on v1: Gitea 1.22.6 has no repository_dispatch /
workflow_dispatch trigger API (verified across 6 candidate paths in
issuecomment-913). v1's curl-POST loop would always exit-1.

v2 pivots to push-mode: each template repo got a small companion PR
(merged 2026-05-07) adding a `.runtime-version` file at root + a
`resolve-version` job in publish-image.yml that reads the file and
forwards the value to the reusable build workflow. publish-runtime
now updates that file via git-clone + commit + push, which trips
each template's existing `on: push: branches: [main]` trigger.

Behaviour changes vs v1:
- Templates list dropped from 9 → 8 (codex has no publish-image.yml
  so was never part of the cascade in practice).
- 3-retry pull-rebase loop per template (handles concurrent-push
  races without force-push). Failures collected, job exits 1 with
  the failed-template list at the end.
- Idempotency: when re-run with the same version, templates already
  pinned to that version contribute zero commits — operator can
  safely re-run to retry partial failures.
- Author line: "publish-runtime cascade <publish-runtime@moleculesai
  .app>" trailer makes it clear the commit is workflow-driven, not
  human (per memory feedback_github_botring_fingerprint).

DISPATCH_TOKEN secret name unchanged (still consumed at
secrets.DISPATCH_TOKEN per 569df259).

Refs molecule-core#14, builds on molecule-core#20 issuecomment-923
(Phase 2 design).
2026-05-07 03:17:38 -07:00
Hongming Wang 1ff7342e91 chore: retrigger CI after runner config fix
cascade-list-drift-gate / check (pull_request) Successful in 10s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 11s
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (pull_request) Successful in 5s
CI / Detect changes (pull_request) Successful in 6s
E2E API Smoke Test / detect-changes (pull_request) Successful in 11s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 4s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 6s
pr-guards / disable-auto-merge-on-push (pull_request) Failing after 1s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 9s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 5s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 5s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 44s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 11s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 22s
CI / Canvas (Next.js) (pull_request) Failing after 3m28s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / Platform (Go) (pull_request) Failing after 3m39s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 4s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 29s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 30s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Failing after 15m39s
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Failing after 15m41s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Failing after 16m1s
CI / Python Lint & Test (pull_request) Successful in 15m52s
2026-05-07 03:01:23 -07:00
Hongming Wang 569df259ba fix(ci): align secret name to plumbed DISPATCH_TOKEN (closes #14)
pr-guards / disable-auto-merge-on-push (pull_request) Failing after 3s
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (pull_request) Successful in 7s
CI / Detect changes (pull_request) Successful in 7s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 6s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 8s
cascade-list-drift-gate / check (pull_request) Successful in 13s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 9s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 14s
E2E API Smoke Test / detect-changes (pull_request) Successful in 12s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 10s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 6s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 6s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 19s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 12s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Failing after 19s
CI / Python Lint & Test (pull_request) Failing after 20s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 7s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 34s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Failing after 1m31s
CI / Platform (Go) (pull_request) Successful in 3m6s
CI / Canvas (Next.js) (pull_request) Failing after 3m8s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Failing after 14m54s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Failing after 15m3s
The cascade workflow was reading from `secrets.TEMPLATE_DISPATCH_TOKEN`
but the plumbed secret name is `DISPATCH_TOKEN` (verified just now via
GET /repos/molecule-ai/molecule-core/actions/secrets — only DISPATCH_TOKEN
is set). Without this rename the cascade would always evaluate "secret
missing" and exit 1 on the next push to staging, defeating the entire
point of grant-role-access.sh --apply that just landed.

Three references updated:
  - env mapping (`secrets.X` → `secrets.DISPATCH_TOKEN`)
  - workflow_dispatch warning text
  - push-trigger error text

The bash-side variable name is unchanged (still `DISPATCH_TOKEN`) so
the curl invocation at line 372 is unaffected. YAML round-trip parses
clean.
2026-05-07 02:38:20 -07:00
claude-ceo-assistant ce3f1f48a4 fix(ci): port publish-runtime cascade to Gitea repo-dispatch API (closes molecule-core#14)
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 4s
cascade-list-drift-gate / check (pull_request) Successful in 4s
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (pull_request) Successful in 5s
CI / Detect changes (pull_request) Successful in 6s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 6s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 5s
E2E API Smoke Test / detect-changes (pull_request) Successful in 6s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 6s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 6s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 3s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 7s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 6s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 6s
CI / Python Lint & Test (pull_request) Failing after 14s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 4s
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Failing after 49s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Failing after 1m20s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Failing after 1m24s
CI / Canvas (Next.js) (pull_request) Failing after 1m55s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / Platform (Go) (pull_request) Successful in 2m5s
## Symptom

`publish-runtime.yml::cascade` fired a `repository_dispatch` to 10 workspace-template
repos via direct curl to `https://api.github.com/repos/...`. Post-2026-05-06 the
org's GitHub presence is suspended; every invocation 404s. The job's
`::warning::` posture meant the failure didn't propagate, leaving the runtime
PyPI publish → template image rebuild pipeline silently broken.

## Why Option A (rewrite) and not Option B (delete)

Verified 2026-05-07 by devops-engineer (molecule-core#14 thread):

- The cron-poll mechanism (/etc/cron.d/molecule-deploy-poll) tracks ONLY the
  Vercel/Railway-deployed repos (landingpage/docs/molecule-app/molecules-market
  /molecule-controlplane). It does NOT track workspace-template-* repos.
- Each of the 9 template `publish-image.yml` workflows has
  `repository_dispatch: types: [runtime-published]` as a load-bearing trigger.
  Without the cascade, when the runtime ships a new PyPI version, templates
  don't auto-rebuild.

So Option B (delete) would silently break the runtime → template fan-out.
Option A (rewrite to Gitea's API shape) is the right call. Security-auditor
agreed after seeing the cron-poll TRACKED list.

## API surface change

| Concern | Pre-fix (GitHub) | Post-fix (Gitea) |
|---|---|---|
| URL | `https://api.github.com/repos/$REPO/dispatches` | `${GITEA_URL}/api/v1/repos/$REPO/dispatches` |
| Owner case | `Molecule-AI/...` | `molecule-ai/...` (lowercase, Gitea is case-sensitive) |
| Auth header | `Authorization: Bearer $DISPATCH_TOKEN` | `Authorization: token $DISPATCH_TOKEN` |
| Body shape | `{event_type, client_payload}` | UNCHANGED — Gitea is GitHub-compatible here |
| Success code | `204 No Content` | `204 No Content` (unchanged) |

`GITEA_URL` defaults to `https://git.moleculesai.app`; overridable via job env.

## Out-of-band: DISPATCH_TOKEN secret rotation

The DISPATCH_TOKEN secret was a GitHub PAT. It must be re-minted as a Gitea
PAT for the new API to authenticate. Per saved memory
`feedback_per_agent_gitea_identity_default`, this should be a dedicated
`publish-runtime-bot` persona token with `write:repository` scope on the
9 target repos — NOT the founder PAT.

This PR ships the workflow change. Token rotation is the operator-host
follow-up (security-auditor's lane) — coordinate the merge so the token
is in place before the next runtime release fires.

## Backwards compatibility

The workflow ran silently-broken since 2026-05-06 (every invocation 404
+ ::warning:: but no failure). So there is no functional regression from
"silently broken" to "actually working". Any in-progress operator-managed
manual dispatch path is unaffected; the Gitea API parallel path doesn't
require operator intervention.

## Test plan

- [x] YAML parse OK on the modified workflow file
- [ ] Smoke test: trigger a runtime publish (or simulate via dispatching to one
      template) post-merge; verify HTTP 204 + the template's publish-image
      workflow fires + the template's image gets re-pushed against the new
      runtime version. Phase 4 verification belongs to internal#46 follow-up.

## Hostile self-review (3 weakest spots)

1. The fan-out remains all-or-nothing: a single template failure surfaces as
   a `::warning::` but PyPI publish proceeds. With 9 templates this is a
   ~10% per-template chance of stale-image-on-runtime-bump if any one fails.
   Defense: the warning shows up in the workflow summary; operators retry.
   Future hardening: requeue-on-fail with bounded retry, or a separate
   reconcile cron that detects template/runtime version drift and re-dispatches.

2. `DISPATCH_TOKEN` validity is enforced by the Gitea API (401 on stale)
   but the workflow doesn't differentiate 401 from 404. Either way the
   warning fires. Future hardening: explicit token-shape check at the start
   of the cascade job (curl `/api/v1/user` once, fail-fast if 401).

3. Owner-case lowercase is right today but couples the workflow to the
   current Gitea org slug. If the org is ever renamed, this workflow
   breaks silently. Less fragile alternative: derive REPO from a
   canonical config (e.g. `gh repo list molecule-ai`) instead of
   string-concatenating. Acceptable today; filed as the same future
   hardening pass as item 1.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 01:31:37 -07:00
+126 -53
View File
@@ -282,42 +282,33 @@ jobs:
echo "::error::Refusing to fan out cascade against stale or corrupt PyPI surfaces."
exit 1
- name: Fan out repository_dispatch
- name: Fan out via push to .runtime-version
env:
# Fine-grained PAT with `actions:write` on the 8 template repos.
# GITHUB_TOKEN can't fire dispatches across repos — needs an explicit
# token. Stored as a repo secret; rotate per the standard schedule.
DISPATCH_TOKEN: ${{ secrets.TEMPLATE_DISPATCH_TOKEN }}
# Single source of truth: the publish job's output, which handles
# tag/manual-input/auto-bump uniformly. The previous fallback
# (`steps.version.outputs.version` from inside the cascade job)
# was a dead reference — different job, no shared step scope.
# Gitea PAT with write:repository scope on the 8 cascade-active
# template repos. Used here for `git push` (NOT for an API
# dispatch — Gitea 1.22.6 has no repository_dispatch endpoint;
# empirically verified across 6 candidate paths in molecule-
# core#20 issuecomment-913). The push trips each template's
# existing `on: push: branches: [main]` trigger on
# publish-image.yml, which then reads the updated
# .runtime-version via its resolve-version job.
DISPATCH_TOKEN: ${{ secrets.DISPATCH_TOKEN }}
RUNTIME_VERSION: ${{ needs.publish.outputs.version }}
run: |
set +e # don't abort on a single repo failure — collect them all
# Schedule-vs-dispatch behaviour split (hardened 2026-04-28
# after the sweep-cf-orphans soft-skip incident — same class
# of bug):
#
# The earlier "skipping cascade. templates will pick up the
# new version on their own next rebuild" message was wrong —
# templates only build on this dispatch trigger; without it
# they stay pinned to whatever runtime version they last saw.
# A silent skip here means "PyPI is current, templates are
# not" and the gap is invisible until someone notices a
# template still on the old version weeks later.
#
# - push → exit 1 (red CI surfaces the gap)
# - workflow_dispatch → exit 0 with a warning (operator
# ran this ad-hoc; let them rerun
# after fixing the secret)
# Soft-skip on workflow_dispatch when the token is missing
# (operator ad-hoc test); hard-fail on push so unattended
# publishes can't silently skip the cascade. Same shape as
# the original v1, intentional split per the schedule-vs-
# dispatch hardening 2026-04-28.
if [ -z "$DISPATCH_TOKEN" ]; then
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
echo "::warning::TEMPLATE_DISPATCH_TOKEN secret not set — skipping cascade."
echo "::warning::DISPATCH_TOKEN secret not set — skipping cascade."
echo "::warning::set it at Settings → Secrets and Variables → Actions, then rerun. Templates will stay on the prior runtime version until either this token is set or each template is rebuilt manually."
exit 0
fi
echo "::error::TEMPLATE_DISPATCH_TOKEN secret missing — cascade cannot fan out."
echo "::error::DISPATCH_TOKEN secret missing — cascade cannot fan out."
echo "::error::PyPI was published, but the 8 template repos will NOT pick up the new version until this token is restored and a republish dispatches the cascade."
echo "::error::set it at Settings → Secrets and Variables → Actions; then re-trigger publish-runtime via workflow_dispatch."
exit 1
@@ -327,37 +318,119 @@ jobs:
echo "::error::publish job did not expose a version output — cascade cannot fan out"
exit 1
fi
# All 9 active workspace template repos. The PR #2536 pruning
# ("deprecated, no shipping images") was empirically wrong:
# continuous-synth-e2e.yml defaults to langgraph as its primary
# canary (line 44), and every excluded template had successful
# publish-image runs as of 2026-05-03 — none were dormant.
# Symptom of the prune: today's a2a-sdk strict-mode fix
# (#2566 / commit e1628c4) cascaded to 4 templates but never
# reached langgraph, so the synth-E2E correctly canary'd a fix
# that had landed but not deployed. Re-added the 5 templates.
# Long-term: derive this list from manifest.json so cascade
# scope can't drift from E2E scope — tracked in RFC #388 as a
# Phase-1 invariant.
# All 9 workspace templates declared in manifest.json. The list
# MUST stay aligned with manifest.json's workspace_templates —
# cascade-list-drift-gate.yml enforces this in CI per the
# codex-stuck-on-stale-runtime invariant from PR #2556.
# Long-term goal: derive this list from manifest.json so it
# can't drift even on a manifest edit (RFC #388 Phase-1).
#
# Per-template publish-image.yml presence is checked at
# cascade-time below: codex doesn't ship one today, so the
# cascade soft-skips it with an informational message rather
# than dropping it from this list (which would re-introduce
# the drift the gate exists to catch).
GITEA_URL="${GITEA_URL:-https://git.moleculesai.app}"
TEMPLATES="claude-code hermes openclaw codex langgraph crewai autogen deepagents gemini-cli"
FAILED=""
SKIPPED=""
# Configure git identity once. The persona owning DISPATCH_TOKEN
# is the same identity that authored this commit on each
# template; using a generic "publish-runtime cascade" co-author
# trailer in the message keeps the audit trail honest about the
# workflow-driven origin.
git config --global user.name "publish-runtime cascade"
git config --global user.email "publish-runtime@moleculesai.app"
WORKDIR="$(mktemp -d)"
for tpl in $TEMPLATES; do
REPO="Molecule-AI/molecule-ai-workspace-template-$tpl"
STATUS=$(curl -sS -o /tmp/dispatch.out -w "%{http_code}" \
-X POST "https://api.github.com/repos/$REPO/dispatches" \
-H "Authorization: Bearer $DISPATCH_TOKEN" \
-H "Accept: application/vnd.github+json" \
-H "X-GitHub-Api-Version: 2022-11-28" \
-d "{\"event_type\":\"runtime-published\",\"client_payload\":{\"runtime_version\":\"$VERSION\"}}")
if [ "$STATUS" = "204" ]; then
echo "✓ dispatched $tpl ($VERSION)"
else
echo "::warning::✗ failed to dispatch $tpl: HTTP $STATUS — $(cat /tmp/dispatch.out)"
REPO="molecule-ai/molecule-ai-workspace-template-$tpl"
CLONE="$WORKDIR/$tpl"
# Pre-check: skip templates without a publish-image.yml.
# The cascade's job is to trip the template's on-push
# rebuild — if there's no rebuild workflow, pushing a
# .runtime-version commit is just noise on the target
# repo. Use the Gitea contents API (no clone required for
# the probe). 200 = present; 404 = absent.
HTTP=$(curl -sS -o /dev/null -w "%{http_code}" \
-H "Authorization: token $DISPATCH_TOKEN" \
"$GITEA_URL/api/v1/repos/$REPO/contents/.github/workflows/publish-image.yml")
if [ "$HTTP" = "404" ]; then
echo "↷ $tpl has no publish-image.yml — soft-skip (informational; manifest still tracks it)"
SKIPPED="$SKIPPED $tpl"
continue
fi
if [ "$HTTP" != "200" ]; then
echo "::warning::$tpl publish-image.yml probe returned HTTP $HTTP — proceeding anyway, push will surface the real failure if any"
fi
# Use a per-template attempt loop so a transient race (e.g.
# human pushing to the same template at the same instant)
# doesn't lose the cascade. Bounded retries (3) — beyond
# that we surface the failure and let the operator retry.
attempt=0
success=false
while [ $attempt -lt 3 ]; do
attempt=$((attempt + 1))
rm -rf "$CLONE"
if ! git clone --depth=1 \
"https://x-access-token:${DISPATCH_TOKEN}@${GITEA_URL#https://}/$REPO.git" \
"$CLONE" >/tmp/clone.log 2>&1; then
echo "::warning::clone $tpl attempt $attempt failed: $(tail -n3 /tmp/clone.log)"
sleep 2
continue
fi
cd "$CLONE"
echo "$VERSION" > .runtime-version
# Idempotency guard: if the file already matches, this
# publish is a re-run for a version already cascaded.
# Don't push a no-op commit (would spuriously re-trip the
# template's on-push and rebuild for nothing).
if git diff --quiet -- .runtime-version; then
echo "✓ $tpl already at $VERSION — no commit needed (idempotent)"
success=true
cd - >/dev/null
break
fi
git add .runtime-version
git commit -m "chore: pin runtime to $VERSION (publish-runtime cascade)" \
-m "Co-Authored-By: publish-runtime cascade <publish-runtime@moleculesai.app>" \
>/dev/null
if git push origin HEAD:main >/tmp/push.log 2>&1; then
echo "✓ $tpl pushed $VERSION on attempt $attempt"
success=true
cd - >/dev/null
break
fi
# Likely a non-fast-forward — pull-rebase and retry.
# Don't force-push: that would silently overwrite a racing
# human/cascade commit.
echo "::warning::push $tpl attempt $attempt failed, pull-rebasing: $(tail -n3 /tmp/push.log)"
git pull --rebase origin main >/tmp/rebase.log 2>&1 || true
cd - >/dev/null
done
if [ "$success" != "true" ]; then
FAILED="$FAILED $tpl"
fi
done
rm -rf "$WORKDIR"
if [ -n "$FAILED" ]; then
echo "::warning::Cascade incomplete. Failed templates:$FAILED"
# Don't fail the whole job — PyPI publish already succeeded;
# operators can retry the failed templates manually.
echo "::error::Cascade incomplete after 3 retries each. Failed templates:$FAILED"
echo "::error::PyPI publish succeeded; failed templates lag the new version. Re-run this workflow_dispatch with the same version to retry only the laggers (idempotent — already-cascaded templates skip)."
exit 1
fi
if [ -n "$SKIPPED" ]; then
echo "Cascade complete: pinned $VERSION on cascade-active templates. Soft-skipped (no publish-image.yml):$SKIPPED"
else
echo "Cascade complete: $VERSION pinned across all manifest workspace_templates."
fi