From e418d325820e15f5a4417a605b6dcacc9078395c Mon Sep 17 00:00:00 2001 From: Hongming Wang Date: Thu, 30 Apr 2026 08:55:49 -0700 Subject: [PATCH] ci(auto-promote): dispatch publish via molecule-ai App token to unblock workflow_run chain MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Root cause (verified 2026-04-30): GITHUB_TOKEN-initiated workflow_dispatch creates the dispatched run, but the resulting run's completion event does NOT fire downstream `workflow_run` triggers. This is the documented "no recursion" rule: https://docs.github.com/en/actions/using-workflows/triggering-a-workflow#triggering-a-workflow-from-a-workflow Evidence (publish-workspace-server-image runs on main): run_id | head_sha | triggering_actor | canary | redeploy ------------+-----------+-----------------------+--------+---------- 25151545007 | 6ef562ee | HongmingWang-Rabbit | YES | YES 25171773918 | 21313dc | github-actions[bot] | NO | NO 25173801008 | 59dec57 | github-actions[bot] | NO | NO The 06:52Z run that "worked" was an operator-fired dispatch from the terminal — actor was the operator's PAT. The two runs that "dropped" were dispatched by auto-promote-staging.yml's `gh workflow run` step authenticated via `secrets.GITHUB_TOKEN`, so the actor became `github-actions[bot]` and the workflow_run cascade was suppressed. Same workflow file, same dispatch call, same successful publish run — only the auth token differed. Fix: mint a molecule-ai GitHub App installation token before the dispatch step and use it as `GH_TOKEN`. App-initiated dispatches DO propagate the workflow_run cascade (the App user is a real identity, not the GITHUB_TOKEN bot pseudonym). The molecule-ai App (app_id=3398844, installation 124443072) is already installed on the org with `actions:write` — no new App needed. Only secrets are missing. ## Required setup before merge The following repo secrets must be added at https://github.com/Molecule-AI/molecule-core/settings/secrets/actions or auto-promote will hard-fail at the new "Mint App token" step: - `MOLECULE_AI_APP_ID` = `3398844` - `MOLECULE_AI_APP_PRIVATE_KEY` = contents of a .pem file generated at https://github.com/organizations/Molecule-AI/settings/installations/124443072 (Click "Generate a private key" if one doesn't exist yet.) ## Long-term cleanup The polling tail step still exists because the auto-merge call itself uses GITHUB_TOKEN, so the FF push to main doesn't fire publish-workspace-server-image's `push` trigger naturally. Switching the auto-merge call to use the SAME App token would eliminate the polling tail entirely. Tracked in #2357. Co-Authored-By: Claude Opus 4.7 (1M context) --- .github/workflows/auto-promote-staging.yml | 55 +++++++++++++++++----- 1 file changed, 42 insertions(+), 13 deletions(-) diff --git a/.github/workflows/auto-promote-staging.yml b/.github/workflows/auto-promote-staging.yml index 33c54e7e..a62010f2 100644 --- a/.github/workflows/auto-promote-staging.yml +++ b/.github/workflows/auto-promote-staging.yml @@ -267,6 +267,32 @@ jobs: echo "promote_pr_num=${PR_NUM}" >> "$GITHUB_OUTPUT" id: promote_pr + # Mint a short-lived GitHub App installation token for the dispatch + # step below. We CANNOT use `secrets.GITHUB_TOKEN` to dispatch the + # downstream publish chain — workflow runs created by GITHUB_TOKEN + # do not fire `workflow_run` triggers on completion (the + # documented "no recursion" rule — + # https://docs.github.com/en/actions/using-workflows/triggering-a-workflow#triggering-a-workflow-from-a-workflow). + # + # Symptom this caused (root-caused on 2026-04-30): publish-image + # ran successfully twice (21313dc 14:41Z, 59dec57 15:21Z) but + # canary-verify and redeploy-tenants-on-main never chained, + # because the publish run's `triggering_actor` was + # `github-actions[bot]` (i.e. GITHUB_TOKEN). A manual dispatch + # earlier in the day with the operator's PAT (d850ec7 06:52Z) did + # chain — same workflow file, only the actor differed. + # + # An App token's triggering_actor is the App user (e.g. + # `molecule-ai[bot]`), which IS allowed to fire downstream + # workflow_run cascades. + - name: Mint App token for downstream dispatch + if: steps.promote_pr.outputs.promote_pr_num != '' + id: app-token + uses: actions/create-github-app-token@1b10c78c7865c340bc4f6099eb2f838309f1e8c3 # v3.1.1 + with: + app-id: ${{ secrets.MOLECULE_AI_APP_ID }} + private-key: ${{ secrets.MOLECULE_AI_APP_PRIVATE_KEY }} + - name: Wait for promote merge, then dispatch publish + redeploy (#2357) # GITHUB_TOKEN-initiated merges suppress downstream `push` events # (https://docs.github.com/en/actions/using-workflows/triggering-a-workflow#triggering-a-workflow-from-a-workflow). @@ -276,18 +302,20 @@ jobs: # tenants stay on stale code (issue #2357). # # Workaround: poll for the merge to land, then explicitly - # `gh workflow run` publish-workspace-server-image. workflow_dispatch - # is the documented exception to the GITHUB_TOKEN suppression rule — - # dispatch DOES create a new workflow run. canary-verify chains via - # workflow_run (no branch filter) and redeploys to fleet via the - # existing chain. + # `gh workflow run` publish-workspace-server-image. The dispatch + # MUST authenticate as the molecule-ai App (App token minted + # above) — not GITHUB_TOKEN — so that the resulting publish + # run's completion event can fire the workflow_run cascade + # into canary-verify + redeploy-tenants-on-main. See the prior + # step's comment for the GITHUB_TOKEN no-recursion details. # - # Long-term fix: switch the auto-merge call above to a GitHub App - # token (actions/create-github-app-token) and remove this polling - # tail step. Tracked in #2357. + # Long-term fix: switch the auto-merge call above to use the + # same App token, so the merge's push event fires + # publish-workspace-server-image naturally and this polling tail + # becomes unnecessary. Tracked in #2357. if: steps.promote_pr.outputs.promote_pr_num != '' env: - GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} + GH_TOKEN: ${{ steps.app-token.outputs.token }} REPO: ${{ github.repository }} PR_NUM: ${{ steps.promote_pr.outputs.promote_pr_num }} run: | @@ -318,17 +346,18 @@ jobs: exit 0 fi - # Dispatch publish on main. workflow_dispatch via GITHUB_TOKEN - # IS allowed to create new workflow runs (per the linked docs). + # Dispatch publish on main using the App token. App-initiated + # workflow_dispatch DOES propagate the workflow_run cascade, + # unlike GITHUB_TOKEN-initiated dispatch. # publish completes → canary-verify chains via workflow_run → # redeploy-tenants-on-main chains via workflow_run + branches:[main]. if gh workflow run publish-workspace-server-image.yml \ --repo "$REPO" --ref main 2>&1; then - echo "::notice::Dispatched publish-workspace-server-image on ref=main — canary-verify and redeploy-tenants-on-main will chain via workflow_run." + echo "::notice::Dispatched publish-workspace-server-image on ref=main as molecule-ai App — canary-verify and redeploy-tenants-on-main will chain via workflow_run." { echo "## 🚀 Tenant redeploy chain dispatched" echo - echo "- publish-workspace-server-image (workflow_dispatch on \`main\`)" + echo "- publish-workspace-server-image (workflow_dispatch on \`main\`, actor: \`molecule-ai[bot]\`)" echo "- canary-verify will chain on completion" echo "- redeploy-tenants-on-main will chain on canary green" } >> "$GITHUB_STEP_SUMMARY"