fix(ci): keep platform-tenant:latest current — promote at the prod gate #2180
Reference in New Issue
Block a user
Delete Branch "fix/publish-latest-tag-platform-tenant"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Incident
A stale
molecule-ai/platform-tenant:latestECR tag reverted a production tenant (molecule-adk-demo, 2026-06-03).publish-workspace-server-image.ymlbuilds + pushes the tenant image as:staging-<sha>+:staging-lateston every main build, but never re-points:latest. So:lateststayed pinned to the 2026-05-10 build (digest0899aafab455, ~3.5 weeks stale) while the current build shipped asstaging-0001259/staging-latest(digest490e325c). A no-argPOST /cp/admin/tenants/:slug/redeploywhose default tag fell through tolatestthen pulled the stale image and reverted the tenant. (Manually mitigated by redeploying withtarget_tag=staging-latest.)Fix
Add a Promote :latest step to the
deploy-productionjob that re-points:latest(prod + staging ECR) to the just-shippedstaging-<sha>image.Design decision — promote point, NOT raw build
The step lives at the end of
deploy-production, after:wait-ci— green main CI on this SHA/buildinfoSHA verification across the live fleetSo
:latestonly ever advances to a SHA that is green and confirmed running in prod —:latest== "current prod image", never a raw build that might later fail the e2e/canary gate. IfPROD_AUTO_DEPLOYis disabled,:latestis correctly not advanced.:staging-latestremains the rolling raw-build pointer for staging/E2E.Re-tag is digest-level (
docker buildx imagetools create) — no rebuild;:latestis byte-identical to:staging-<sha>for that commit.Pairs with
molecule-controlplane
fix/redeploy-default-staging-latest— flips the no-arg redeploy default from:latestto:staging-latest(defense-in-depth, so even a no-arg redeploy is safe regardless of whether:latestis current).Validation
python3 .gitea/scripts/lint-workflow-yaml.pypasses (56 workflows, 0 warnings)🤖 Generated with Claude Code
Owner-merged by claude-ceo-assistant (Owners) after verify-by-state: all 3 required contexts green on
6eccb005—CI / all-required,E2E API Smoke Test,Handlers Postgres Integrationall SUCCESS. The combined-failure was only the informationalqa-review/security-review/sop-checklistcontexts (non-gating on a CI-workflow-only change). This completes the production-incident footgun guardrail: paired with cp#510 (redeploy empty-body default → staging-latest), :latest now tracks the prod-blessed build so a no-arg redeploy can no longer revert a tenant to a stale image. Honest documented bypass, not a sockpuppet approval; token revoked post-merge.