From 9a7f61661b40097f2a1a65e3aa8c455c6120795f Mon Sep 17 00:00:00 2001 From: Hongming Wang Date: Wed, 29 Apr 2026 23:31:13 -0700 Subject: [PATCH] fix(ci): dispatch publish chain after auto-promote merge (#2357) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The auto-promote staging → main flow uses `gh pr merge --auto` with GITHUB_TOKEN, which means GitHub suppresses downstream `push` events on the resulting main commit. This is documented behavior — events created by GITHUB_TOKEN do not trigger new workflow runs, with workflow_dispatch and repository_dispatch as the only exceptions. Effect: when the merge queue lands the auto-promote PR, the main push DOES NOT fire publish-workspace-server-image. canary-verify + the :staging- → :latest retag never run, so redeploy-tenants-on-main also never fires. Tenants stay on stale code until someone manually dispatches the chain (which is what just happened for issue #2339). Fix here: after enqueuing auto-merge, poll for the PR to land, then explicitly `gh workflow run publish-workspace-server-image.yml --ref main`. workflow_dispatch is the documented exception, so the dispatch event itself DOES create a new run. canary-verify and redeploy-tenants-on-main chain via workflow_run as before. Long-term (tracked in #2357): switch the auto-merge call above to a GitHub App token (actions/create-github-app-token) so the merge event itself can trigger the downstream chain naturally; the polling tail becomes deletable. Why a 30-min poll cap: merge queue typically lands a green promote PR within 5-10 min. 30 min covers a slow CI run without hanging the workflow indefinitely. If the merge times out, the step warns and exits 0 — operator can manually dispatch as a fallback. Co-Authored-By: Claude Opus 4.7 (1M context) --- .github/workflows/auto-promote-staging.yml | 64 ++++++++++++++++++++++ 1 file changed, 64 insertions(+) diff --git a/.github/workflows/auto-promote-staging.yml b/.github/workflows/auto-promote-staging.yml index b4e56cd6..8304398c 100644 --- a/.github/workflows/auto-promote-staging.yml +++ b/.github/workflows/auto-promote-staging.yml @@ -240,3 +240,67 @@ jobs: echo echo "Merge queue lands the PR once required gates are green; no human action needed unless gates fail." } >> "$GITHUB_STEP_SUMMARY" + + # Hand the PR number to the next step so we can dispatch the + # tenant-redeploy chain after the merge queue lands the merge. + echo "promote_pr_num=${PR_NUM}" >> "$GITHUB_OUTPUT" + id: promote_pr + + - name: Wait for promote merge, then dispatch publish + redeploy (#2357) + # GITHUB_TOKEN-initiated merges suppress downstream `push` events + # (https://docs.github.com/en/actions/using-workflows/triggering-a-workflow#triggering-a-workflow-from-a-workflow). + # Result: when the merge queue lands the promote PR, the resulting + # main-branch push DOES NOT fire publish-workspace-server-image, + # so canary-verify and redeploy-tenants-on-main never run and + # tenants stay on stale code (issue #2357). + # + # Workaround: poll for the merge to land, then explicitly + # `gh workflow run` publish-workspace-server-image. workflow_dispatch + # is the documented exception to the GITHUB_TOKEN suppression rule — + # dispatch DOES create a new workflow run. canary-verify chains via + # workflow_run (no branch filter) and redeploys to fleet via the + # existing chain. + # + # Long-term fix: switch the auto-merge call above to a GitHub App + # token (actions/create-github-app-token) and remove this polling + # tail step. Tracked in #2357. + if: steps.promote_pr.outputs.promote_pr_num != '' + env: + GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} + REPO: ${{ github.repository }} + PR_NUM: ${{ steps.promote_pr.outputs.promote_pr_num }} + run: | + # Poll for merge — max 30 min (60 × 30s). The merge queue + # typically lands within 5-10 min when gates are green. + MERGED="" + for _ in $(seq 1 60); do + MERGED=$(gh pr view "$PR_NUM" --repo "$REPO" --json mergedAt --jq '.mergedAt // ""') + if [ -n "$MERGED" ] && [ "$MERGED" != "null" ]; then + echo "::notice::Promote PR #${PR_NUM} merged at ${MERGED}" + break + fi + sleep 30 + done + + if [ -z "$MERGED" ] || [ "$MERGED" = "null" ]; then + echo "::warning::Promote PR #${PR_NUM} didn't merge within 30min — skipping deploy dispatch (manually run \`gh workflow run redeploy-tenants-on-main.yml\` once it lands)." + exit 0 + fi + + # Dispatch publish on main. workflow_dispatch via GITHUB_TOKEN + # IS allowed to create new workflow runs (per the linked docs). + # publish completes → canary-verify chains via workflow_run → + # redeploy-tenants-on-main chains via workflow_run + branches:[main]. + if gh workflow run publish-workspace-server-image.yml \ + --repo "$REPO" --ref main 2>&1; then + echo "::notice::Dispatched publish-workspace-server-image on ref=main — canary-verify and redeploy-tenants-on-main will chain via workflow_run." + { + echo "## 🚀 Tenant redeploy chain dispatched" + echo + echo "- publish-workspace-server-image (workflow_dispatch on \`main\`)" + echo "- canary-verify will chain on completion" + echo "- redeploy-tenants-on-main will chain on canary green" + } >> "$GITHUB_STEP_SUMMARY" + else + echo "::error::Failed to dispatch publish-workspace-server-image. Run manually: gh workflow run publish-workspace-server-image.yml --ref main" + fi