From 607444e71beeb3a28de7f1b67511f2d90632530c Mon Sep 17 00:00:00 2001 From: Hongming Wang Date: Thu, 7 May 2026 03:17:38 -0700 Subject: [PATCH] feat(ci): replace curl-dispatch with push-mode cascade (v2) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Empirical blocker on v1: Gitea 1.22.6 has no repository_dispatch / workflow_dispatch trigger API (verified across 6 candidate paths in issuecomment-913). v1's curl-POST loop would always exit-1. v2 pivots to push-mode: each template repo got a small companion PR (merged 2026-05-07) adding a `.runtime-version` file at root + a `resolve-version` job in publish-image.yml that reads the file and forwards the value to the reusable build workflow. publish-runtime now updates that file via git-clone + commit + push, which trips each template's existing `on: push: branches: [main]` trigger. Behaviour changes vs v1: - Templates list dropped from 9 → 8 (codex has no publish-image.yml so was never part of the cascade in practice). - 3-retry pull-rebase loop per template (handles concurrent-push races without force-push). Failures collected, job exits 1 with the failed-template list at the end. - Idempotency: when re-run with the same version, templates already pinned to that version contribute zero commits — operator can safely re-run to retry partial failures. - Author line: "publish-runtime cascade " trailer makes it clear the commit is workflow-driven, not human (per memory feedback_github_botring_fingerprint). DISPATCH_TOKEN secret name unchanged (still consumed at secrets.DISPATCH_TOKEN per 569df259). Refs molecule-core#14, builds on molecule-core#20 issuecomment-923 (Phase 2 design). --- .github/workflows/publish-runtime.yml | 167 ++++++++++++++------------ 1 file changed, 93 insertions(+), 74 deletions(-) diff --git a/.github/workflows/publish-runtime.yml b/.github/workflows/publish-runtime.yml index 47b2f9c8..29134aff 100644 --- a/.github/workflows/publish-runtime.yml +++ b/.github/workflows/publish-runtime.yml @@ -282,35 +282,26 @@ jobs: echo "::error::Refusing to fan out cascade against stale or corrupt PyPI surfaces." exit 1 - - name: Fan out repository_dispatch + - name: Fan out via push to .runtime-version env: - # Fine-grained PAT with `actions:write` on the 8 template repos. - # GITHUB_TOKEN can't fire dispatches across repos — needs an explicit - # token. Stored as a repo secret; rotate per the standard schedule. + # Gitea PAT with write:repository scope on the 8 cascade-active + # template repos. Used here for `git push` (NOT for an API + # dispatch — Gitea 1.22.6 has no repository_dispatch endpoint; + # empirically verified across 6 candidate paths in molecule- + # core#20 issuecomment-913). The push trips each template's + # existing `on: push: branches: [main]` trigger on + # publish-image.yml, which then reads the updated + # .runtime-version via its resolve-version job. DISPATCH_TOKEN: ${{ secrets.DISPATCH_TOKEN }} - # Single source of truth: the publish job's output, which handles - # tag/manual-input/auto-bump uniformly. The previous fallback - # (`steps.version.outputs.version` from inside the cascade job) - # was a dead reference — different job, no shared step scope. RUNTIME_VERSION: ${{ needs.publish.outputs.version }} run: | set +e # don't abort on a single repo failure — collect them all - # Schedule-vs-dispatch behaviour split (hardened 2026-04-28 - # after the sweep-cf-orphans soft-skip incident — same class - # of bug): - # - # The earlier "skipping cascade. templates will pick up the - # new version on their own next rebuild" message was wrong — - # templates only build on this dispatch trigger; without it - # they stay pinned to whatever runtime version they last saw. - # A silent skip here means "PyPI is current, templates are - # not" and the gap is invisible until someone notices a - # template still on the old version weeks later. - # - # - push → exit 1 (red CI surfaces the gap) - # - workflow_dispatch → exit 0 with a warning (operator - # ran this ad-hoc; let them rerun - # after fixing the secret) + + # Soft-skip on workflow_dispatch when the token is missing + # (operator ad-hoc test); hard-fail on push so unattended + # publishes can't silently skip the cascade. Same shape as + # the original v1, intentional split per the schedule-vs- + # dispatch hardening 2026-04-28. if [ -z "$DISPATCH_TOKEN" ]; then if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then echo "::warning::DISPATCH_TOKEN secret not set — skipping cascade." @@ -327,62 +318,90 @@ jobs: echo "::error::publish job did not expose a version output — cascade cannot fan out" exit 1 fi - # All 9 active workspace template repos. The PR #2536 pruning - # ("deprecated, no shipping images") was empirically wrong: - # continuous-synth-e2e.yml defaults to langgraph as its primary - # canary (line 44), and every excluded template had successful - # publish-image runs as of 2026-05-03 — none were dormant. - # Symptom of the prune: today's a2a-sdk strict-mode fix - # (#2566 / commit e1628c4) cascaded to 4 templates but never - # reached langgraph, so the synth-E2E correctly canary'd a fix - # that had landed but not deployed. Re-added the 5 templates. - # Long-term: derive this list from manifest.json so cascade - # scope can't drift from E2E scope — tracked in RFC #388 as a - # Phase-1 invariant. - # Fan out via Gitea's repository_dispatch API (post-2026-05-06; the - # GitHub-org's hostname is no longer reachable). API contract: - # POST {GITEA_URL}/api/v1/repos/{owner}/{repo}/dispatches - # Authorization: token (NOT "Bearer" like GitHub) - # body: {event_type, client_payload} (same shape as GitHub) - # The 9 template repos all have publish-image.yml waiting on - # `repository_dispatch: types: [runtime-published]` with - # client_payload.runtime_version (verified by devops-engineer - # 2026-05-07 when assessing molecule-core#14 Option B safety). - # - # DISPATCH_TOKEN must be a Gitea PAT (not a GitHub PAT) with - # write:repository scope on each of the 9 target repos. Per saved - # memory feedback_per_agent_gitea_identity_default this should be - # a per-agent-persona token (recommend: dedicated - # `publish-runtime-bot` persona), not the founder PAT. Token - # rotation is an out-of-band operator-host task; the workflow - # consumes whatever value is in the secret. - # - # GITEA_URL defaults to https://git.moleculesai.app; override via - # job env if the platform's Gitea host changes. + + # 8 cascade-active workspace templates. codex was in the v1 + # list but has no .github/workflows/publish-image.yml — never + # part of the cascade in practice; dropped here to match + # ground truth. Long-term goal: derive this list from + # manifest.json so it can't drift from E2E scope (RFC #388 + # Phase-1 invariant). GITEA_URL="${GITEA_URL:-https://git.moleculesai.app}" - TEMPLATES="claude-code hermes openclaw codex langgraph crewai autogen deepagents gemini-cli" + TEMPLATES="claude-code hermes openclaw langgraph crewai autogen deepagents gemini-cli" FAILED="" + + # Configure git identity once. The persona owning DISPATCH_TOKEN + # is the same identity that authored this commit on each + # template; using a generic "publish-runtime cascade" co-author + # trailer in the message keeps the audit trail honest about the + # workflow-driven origin. + git config --global user.name "publish-runtime cascade" + git config --global user.email "publish-runtime@moleculesai.app" + + WORKDIR="$(mktemp -d)" for tpl in $TEMPLATES; do - # Gitea is owner-case-sensitive: the org slug is lowercase - # `molecule-ai`, not `Molecule-AI`. GitHub auto-lowercased on - # the receive side; Gitea returns 404 on the wrong case. REPO="molecule-ai/molecule-ai-workspace-template-$tpl" - STATUS=$(curl -sS -o /tmp/dispatch.out -w "%{http_code}" \ - -X POST "$GITEA_URL/api/v1/repos/$REPO/dispatches" \ - -H "Authorization: token $DISPATCH_TOKEN" \ - -H "Accept: application/json" \ - -H "Content-Type: application/json" \ - -d "{\"event_type\":\"runtime-published\",\"client_payload\":{\"runtime_version\":\"$VERSION\"}}") - # Gitea returns 204 No Content on success, same as GitHub. - if [ "$STATUS" = "204" ]; then - echo "✓ dispatched $tpl ($VERSION)" - else - echo "::warning::✗ failed to dispatch $tpl: HTTP $STATUS — $(cat /tmp/dispatch.out)" + CLONE="$WORKDIR/$tpl" + + # Use a per-template attempt loop so a transient race (e.g. + # human pushing to the same template at the same instant) + # doesn't lose the cascade. Bounded retries (3) — beyond + # that we surface the failure and let the operator retry. + attempt=0 + success=false + while [ $attempt -lt 3 ]; do + attempt=$((attempt + 1)) + rm -rf "$CLONE" + if ! git clone --depth=1 \ + "https://x-access-token:${DISPATCH_TOKEN}@${GITEA_URL#https://}/$REPO.git" \ + "$CLONE" >/tmp/clone.log 2>&1; then + echo "::warning::clone $tpl attempt $attempt failed: $(tail -n3 /tmp/clone.log)" + sleep 2 + continue + fi + + cd "$CLONE" + echo "$VERSION" > .runtime-version + + # Idempotency guard: if the file already matches, this + # publish is a re-run for a version already cascaded. + # Don't push a no-op commit (would spuriously re-trip the + # template's on-push and rebuild for nothing). + if git diff --quiet -- .runtime-version; then + echo "✓ $tpl already at $VERSION — no commit needed (idempotent)" + success=true + cd - >/dev/null + break + fi + + git add .runtime-version + git commit -m "chore: pin runtime to $VERSION (publish-runtime cascade)" \ + -m "Co-Authored-By: publish-runtime cascade " \ + >/dev/null + + if git push origin HEAD:main >/tmp/push.log 2>&1; then + echo "✓ $tpl pushed $VERSION on attempt $attempt" + success=true + cd - >/dev/null + break + fi + + # Likely a non-fast-forward — pull-rebase and retry. + # Don't force-push: that would silently overwrite a racing + # human/cascade commit. + echo "::warning::push $tpl attempt $attempt failed, pull-rebasing: $(tail -n3 /tmp/push.log)" + git pull --rebase origin main >/tmp/rebase.log 2>&1 || true + cd - >/dev/null + done + + if [ "$success" != "true" ]; then FAILED="$FAILED $tpl" fi done + rm -rf "$WORKDIR" + if [ -n "$FAILED" ]; then - echo "::warning::Cascade incomplete. Failed templates:$FAILED" - # Don't fail the whole job — PyPI publish already succeeded; - # operators can retry the failed templates manually. + echo "::error::Cascade incomplete after 3 retries each. Failed templates:$FAILED" + echo "::error::PyPI publish succeeded; failed templates lag the new version. Re-run this workflow_dispatch with the same version to retry only the laggers (idempotent — already-cascaded templates skip)." + exit 1 fi + echo "Cascade complete: 8 templates pinned to $VERSION."