From 360361a0cee9360adbc3caa2c51a73ebf11aad6f Mon Sep 17 00:00:00 2001 From: Hongming Wang Date: Wed, 29 Apr 2026 21:14:41 -0700 Subject: [PATCH] ci: add concurrency block to redeploy-tenants-on-main for parity MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Parity with #2337's redeploy-tenants-on-staging.yml. Both prod and staging redeploys now have explicit serialization: group: redeploy-tenants-on-main (per-workflow, global) group: redeploy-tenants-on-staging (per-workflow, global) cancel-in-progress: false on both — aborting a half-rolled-out fleet would leave tenants stuck on whatever image they happened to be on when cancelled. Better to finish the in-flight rollout before starting the next one. Pre-fix this workflow relied on GitHub's implicit workflow_run queueing, which is "probably fine" but not defensible — explicit > implicit for load-bearing pipeline behavior. Picked up as a #2337 review nit (architecture finding 1: concurrency asymmetry between the two redeploy workflows). No behavior change in the common case. The change matters only when two main pushes land within seconds AND the first redeploy is still mid-rollout — currently rare; will become more common once #2335 (staging-trigger publish) feeds main more frequently via auto-promote. --- .github/workflows/redeploy-tenants-on-main.yml | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/.github/workflows/redeploy-tenants-on-main.yml b/.github/workflows/redeploy-tenants-on-main.yml index e0f84da5..0fd8820b 100644 --- a/.github/workflows/redeploy-tenants-on-main.yml +++ b/.github/workflows/redeploy-tenants-on-main.yml @@ -64,6 +64,20 @@ permissions: # No write scopes needed — the workflow hits an external CP endpoint, # not the GitHub API. +# Serialize redeploys so two rapid main pushes' redeploys don't overlap +# and cause confusing per-tenant SSM state. Without this, GitHub's +# implicit workflow_run queueing would *probably* serialize them, but +# the explicit block makes the invariant defensible. Mirrors the +# concurrency block on redeploy-tenants-on-staging.yml for shape parity. +# +# cancel-in-progress: false → aborting a half-rolled-out fleet would +# leave tenants stuck on whatever image they happened to be on when +# cancelled. Better to finish the in-flight rollout before starting +# the next one. +concurrency: + group: redeploy-tenants-on-main + cancel-in-progress: false + jobs: redeploy: # Skip the auto-trigger if publish-workspace-server-image didn't