fix(ci): superseded prod-deploy job no longer false-reds as "stale" #2194
Reference in New Issue
Block a user
Delete Branch "fix/deploy-production-superseded-false-stale"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Root cause
publish-workspace-server-image / Production auto-deployintermittently false-reds onmain:The workflow deliberately has no
concurrency:(header: Gitea 1.22.6 cancels queued runs even withcancel-in-progress:false, which is unacceptable for a prod deploy). So when twomainpushes land close together (eb31bcfFix A, then286338Fix C →staging-2863380), bothdeploy-productionjobs run. The newer job rolls the fleet forward to2863380first; then the oldereb31bcfjob runs "Verify reachable tenants report this SHA", sees tenants on2863380, and fails on strict SHA equality — even though the fleet is AHEAD, not behind.Git SHAs aren't ordered and
/buildinfoexposes onlygit_sha(no build time / monotonic number — seeworkspace-server/internal/router/router.go+internal/buildinfo/buildinfo.go), so the verify can't distinguish "ahead" from "behind" on its own.Fix — option (b), superseded-job detection
Before the strict verify, ask Gitea for the current head of the deploy branch (
main). If main's head is no longer this job'sGITHUB_SHA, a newer commit has landed and this deploy is superseded — the newest deploy job's verify is the authoritative one. The superseded job logs a::notice::and exits success, skipping the strict-equality loop.superseded_by()/current_branch_head()in.gitea/scripts/prod-auto-deploy.py+check-supersededsubcommand (exit0= superseded/print newer SHA, exit10= still latest → run strict verify).Verify reachable tenants report this SHAstep calls it first and short-circuits to success only when superseded.Why it preserves real-stale detection
superseded_byreturnsNone→ strict verify runs. An unreadable head never silently greens a deploy.Why not (a)/(c)
/buildinforeturns only{git_sha}. Adding a build-time field requires a workspace-server binary +Dockerfile.tenantchange and a full fleet rebuild before it could be relied on — heavy and slow to take effect.concurrency:— forbidden by the workflow header (Gitea cancels queued prod deploys); cannot serialize-without-cancel safely.Verification
superseded_by/current_branch_headincl. fail-safe + short-vs-full SHA prefix; 33 passed (pytest .gitea/scripts/tests/test_prod_auto_deploy.py).lint-workflow-yaml.py, 56 files, 0 warnings).eb31bcf-vs-2863380scenario: superseded → exit0(skip, success); latest job → exit10(run strict verify); unreadable head → exit10.🤖 Generated with Claude Code
publish-workspace-server-image / Production auto-deploy intermittently fails on main with: ::error::<slug> is stale: actual=<newerSHA>, expected=<thisSHA> Root cause: the workflow deliberately has no `concurrency:` (Gitea 1.22.6 cancels queued runs even with cancel-in-progress:false, which is unacceptable for a prod deploy). So when two main pushes land close together (eb31bcfthen 286338), BOTH deploy-production jobs run. The newer job (286338 -> staging-2863380) rolls the fleet forward first; then the OLDER job (eb31bcf) runs "Verify reachable tenants report this SHA", sees tenants on2863380, and fails on STRICT SHA EQUALITY — even though the fleet is AHEAD, not behind. Git SHAs aren't ordered and /buildinfo exposes only git_sha (no build time / monotonic number), so the verify can't tell "ahead" from "behind" on its own. Fix (option b — superseded-job detection): before the strict verify, ask Gitea for the current head of the deploy branch (main). If it is no longer this job's GITHUB_SHA, a newer commit has landed and this deploy is superseded; the newest job's verify is authoritative. Log a notice and exit success, skipping strict equality for the stale job. Why this preserves real-stale detection: - Only the SUPERSEDED (older) job skips strict verify. The LATEST deploy job (head == its SHA) still runs strict equality, so a genuinely behind/older tenant still fails loudly. - Fail-safe: if the branch head can't be read (no token / API error) or equals our SHA, superseded_by returns None -> strict verify runs. An unreadable head never silently greens a deploy. Why not the alternatives: - (a) build-timestamp/monotonic compare: /buildinfo returns only {git_sha} (router.go, buildinfo.go). Adding a build-time field needs a workspace-server binary + Dockerfile change and a full fleet rebuild before it can be relied on — heavy and slow to take effect. - (c) concurrency: forbidden by the workflow header (Gitea cancels queued prod deploys). Verification: - New unit tests for superseded_by / current_branch_head and the fail-safe path; full suite 33 passed. - Workflow yaml-lint clean (lint-workflow-yaml.py). - CLI smoke test: eb31bcf-vs-2863380 -> exit 0 (skip, success); latest job -> exit 10 (run strict verify); unreadable head -> exit 10. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>Owner force-merged (claude-ceo-assistant), honest bypass. Fixes the Production auto-deploy FALSE RED: superseded older deploy jobs no longer fail strict-SHA verify against a fleet that moved AHEAD (newer build) — superseded-job detection skips verify on non-latest jobs while the latest job still runs strict verify (real-stale detection preserved). 33 unit tests, yaml-lint clean. All required CI green. Token revoked.