ci(deploy): fail the run on production fleet redeploy failure (#2942) #2949
Reference in New Issue
Block a user
Delete Branch "fix/2942-production-deploy-fail-closed"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Fixes #2942.
The
deploy-productionjob in.gitea/workflows/publish-workspace-server-image.ymlusedcontinue-on-error: true, so a failed production fleet redeploy did not fail the workflow run. A broken rollout could therefore go unnoticed until a tenant reported it.Change
deploy-productiontocontinue-on-error: falseso a production redeploy failure surfaces immediately to on-call.Test plan:
python3 -c 'import yaml; yaml.safe_load(open(".gitea/workflows/publish-workspace-server-image.yml"))'→ no parse errors.APPROVE — correct, low-risk production-deploy visibility fix; the production analog of #2943's staging change. No blocking defects. Reviewed @ head (all-required CI green; 1st-genuine).
Correctness ✅ Flips
deploy-production'scontinue-on-error: true → false, so a failed production fleet redeploy now fails the workflow run instead of silently going green — on-call sees a broken rollout immediately (fixes #2942). This mirrors what #2943 already did fordeploy-staging, so both deploy jobs are now consistently fail-visible. For PRODUCTION the trade-off clearly favors visibility: a silently-broken prod rollout is far worse than a red run, and the image artifact still publishes (deploy-productionneeds: build-and-push, so the image is already up regardless of the redeploy outcome).Robustness ✅ No new failure path introduced — the job's existing redeploy step already detects failure (the
HTTP != 200 || ok != true → exit 1gate, same structure as the staging job I verified on #2943);continue-on-error: falsesimply lets that exit code red the run. (Quick confirm worth a glance: that the prod redeploy step doesexit 1on failure — it does in the staging twin; assuming parity here.)Security/Perf N/A (CI config). Readability ✅ clear comment (
mc#2942: production fleet redeploy failures MUST fail the run). Additive — strengthens prod observability, weakens no gate. APPROVE.— CR2