forked from molecule-ai/molecule-core
Root cause: this workflow only triggered on `branches: [main]`, but staging-CP pins TENANT_IMAGE=:staging-latest (verified via Railway). :staging-latest was only retagged on main push, so: staging-branch code → never built → never reaches staging tenants staging-CP serves → "yesterday's main" indefinitely When staging→main was wedged (path-filter parity bug, canvas teardown race — both fixed earlier today), :staging-latest stopped updating entirely. RFC #2312 (chat upload HTTP-forward) landed on staging but freshly-provisioned staging tenants kept failing chat upload because they pulled pre-RFC-#2312 image. Verified by tearing down a fresh tenant and observing the legacy "workspace container not running" error from the docker-exec code path that RFC #2312 deleted. Pre-2026-04-24 there was a related-but-different incident: TENANT_IMAGE was a static :staging-<sha> pin that drifted 10 days behind. This new incident is "the dynamic pin still drifts when its update workflow doesn't fire." Fix: add `staging` to the branches trigger. Tag policy is unchanged (:staging-<sha> + :staging-latest on every push). canary-verify.yml still runs on main push (workflow_run-gated to `branches: [main]`), preserving the canary-verified :latest promotion for prod tenants. Steady state after this: - staging push → :staging-latest = staging-branch code → staging-CP - main push → :staging-<sha> for canary, :staging-latest retag (post-promote main code), and after canary green → :latest for prod tenants What this does NOT change: - canary-verify.yml flow (still main-only) - redeploy-tenants-on-main.yml (still rolls prod fleet on main push) - publish-canvas-image.yml (self-hosted standalone canvas; orthogonal) - The :latest tag (canary-verified main, unchanged) What this does fix: - RFC #2312-class fixes that land on staging now actually reach staging tenants without waiting for staging→main promote. - The dogfooding observation "staging tenants seem to be running yesterday's code" disappears as a class. Drive-by: also fixed the typo in the path-filter list (was `publish-platform-image.yml`, the actual file is `publish-workspace-server-image.yml`). |
||
|---|---|---|
| .. | ||
| auto-promote-on-e2e.yml | ||
| auto-promote-staging.yml | ||
| auto-sync-main-to-staging.yml | ||
| auto-tag-runtime.yml | ||
| block-internal-paths.yml | ||
| canary-staging.yml | ||
| canary-verify.yml | ||
| check-merge-group-trigger.yml | ||
| ci.yml | ||
| codeql.yml | ||
| e2e-api.yml | ||
| e2e-staging-canvas.yml | ||
| e2e-staging-saas.yml | ||
| e2e-staging-sanity.yml | ||
| pr-guards.yml | ||
| promote-latest.yml | ||
| publish-canvas-image.yml | ||
| publish-runtime.yml | ||
| publish-workspace-server-image.yml | ||
| railway-pin-audit.yml | ||
| redeploy-tenants-on-main.yml | ||
| retarget-main-to-staging.yml | ||
| runtime-pin-compat.yml | ||
| runtime-prbuild-compat.yml | ||
| secret-pattern-drift.yml | ||
| secret-scan.yml | ||
| sweep-cf-orphans.yml | ||
| sweep-cf-tunnels.yml | ||
| sweep-stale-e2e-orgs.yml | ||
| test-ops-scripts.yml | ||