fix(ci): dispatch publish chain after auto-promote merge (#2357)

The auto-promote staging → main flow uses `gh pr merge --auto` with
GITHUB_TOKEN, which means GitHub suppresses downstream `push` events on
the resulting main commit. This is documented behavior — events created
by GITHUB_TOKEN do not trigger new workflow runs, with workflow_dispatch
and repository_dispatch as the only exceptions.

Effect: when the merge queue lands the auto-promote PR, the main push
DOES NOT fire publish-workspace-server-image. canary-verify + the
:staging-<sha> → :latest retag never run, so redeploy-tenants-on-main
also never fires. Tenants stay on stale code until someone manually
dispatches the chain (which is what just happened for issue #2339).

Fix here: after enqueuing auto-merge, poll for the PR to land, then
explicitly `gh workflow run publish-workspace-server-image.yml --ref
main`. workflow_dispatch is the documented exception, so the dispatch
event itself DOES create a new run. canary-verify and
redeploy-tenants-on-main chain via workflow_run as before.

Long-term (tracked in #2357): switch the auto-merge call above to a
GitHub App token (actions/create-github-app-token) so the merge event
itself can trigger the downstream chain naturally; the polling tail
becomes deletable.

Why a 30-min poll cap: merge queue typically lands a green promote PR
within 5-10 min. 30 min covers a slow CI run without hanging the
workflow indefinitely. If the merge times out, the step warns and
exits 0 — operator can manually dispatch as a fallback.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Hongming Wang 2026-04-29 23:31:13 -07:00
parent b5bde0399a
commit 9a7f61661b

View File

@ -240,3 +240,67 @@ jobs:
echo
echo "Merge queue lands the PR once required gates are green; no human action needed unless gates fail."
} >> "$GITHUB_STEP_SUMMARY"
# Hand the PR number to the next step so we can dispatch the
# tenant-redeploy chain after the merge queue lands the merge.
echo "promote_pr_num=${PR_NUM}" >> "$GITHUB_OUTPUT"
id: promote_pr
- name: Wait for promote merge, then dispatch publish + redeploy (#2357)
# GITHUB_TOKEN-initiated merges suppress downstream `push` events
# (https://docs.github.com/en/actions/using-workflows/triggering-a-workflow#triggering-a-workflow-from-a-workflow).
# Result: when the merge queue lands the promote PR, the resulting
# main-branch push DOES NOT fire publish-workspace-server-image,
# so canary-verify and redeploy-tenants-on-main never run and
# tenants stay on stale code (issue #2357).
#
# Workaround: poll for the merge to land, then explicitly
# `gh workflow run` publish-workspace-server-image. workflow_dispatch
# is the documented exception to the GITHUB_TOKEN suppression rule —
# dispatch DOES create a new workflow run. canary-verify chains via
# workflow_run (no branch filter) and redeploys to fleet via the
# existing chain.
#
# Long-term fix: switch the auto-merge call above to a GitHub App
# token (actions/create-github-app-token) and remove this polling
# tail step. Tracked in #2357.
if: steps.promote_pr.outputs.promote_pr_num != ''
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
REPO: ${{ github.repository }}
PR_NUM: ${{ steps.promote_pr.outputs.promote_pr_num }}
run: |
# Poll for merge — max 30 min (60 × 30s). The merge queue
# typically lands within 5-10 min when gates are green.
MERGED=""
for _ in $(seq 1 60); do
MERGED=$(gh pr view "$PR_NUM" --repo "$REPO" --json mergedAt --jq '.mergedAt // ""')
if [ -n "$MERGED" ] && [ "$MERGED" != "null" ]; then
echo "::notice::Promote PR #${PR_NUM} merged at ${MERGED}"
break
fi
sleep 30
done
if [ -z "$MERGED" ] || [ "$MERGED" = "null" ]; then
echo "::warning::Promote PR #${PR_NUM} didn't merge within 30min — skipping deploy dispatch (manually run \`gh workflow run redeploy-tenants-on-main.yml\` once it lands)."
exit 0
fi
# Dispatch publish on main. workflow_dispatch via GITHUB_TOKEN
# IS allowed to create new workflow runs (per the linked docs).
# publish completes → canary-verify chains via workflow_run →
# redeploy-tenants-on-main chains via workflow_run + branches:[main].
if gh workflow run publish-workspace-server-image.yml \
--repo "$REPO" --ref main 2>&1; then
echo "::notice::Dispatched publish-workspace-server-image on ref=main — canary-verify and redeploy-tenants-on-main will chain via workflow_run."
{
echo "## 🚀 Tenant redeploy chain dispatched"
echo
echo "- publish-workspace-server-image (workflow_dispatch on \`main\`)"
echo "- canary-verify will chain on completion"
echo "- redeploy-tenants-on-main will chain on canary green"
} >> "$GITHUB_STEP_SUMMARY"
else
echo "::error::Failed to dispatch publish-workspace-server-image. Run manually: gh workflow run publish-workspace-server-image.yml --ref main"
fi