infra(ci): route publish/deploy ship jobs to dedicated publish lane (internal#462) #1376
Merged
devops-engineer
merged 1 commits from 2026-05-16 19:47:27 +00:00
infra/internal-462-publish-deploy-lane into main
1 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
| 16957b7c15 |
infra(ci): route publish/deploy ship jobs to dedicated publish lane (internal#462)
Some checks failed
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 33s
cascade-list-drift-gate / check (pull_request) Successful in 26s
CI / Detect changes (pull_request) Successful in 33s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 43s
E2E API Smoke Test / detect-changes (pull_request) Successful in 40s
E2E Chat / detect-changes (pull_request) Successful in 43s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 24s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 37s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 23s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 2m5s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 2m24s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m49s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 27s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 22s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 2m34s
qa-review / approved (pull_request) Failing after 30s
gate-check-v3 / gate-check (pull_request) Successful in 43s
security-review / approved (pull_request) Failing after 25s
sop-checklist / all-items-acked (pull_request) Successful in 25s
sop-tier-check / tier-check (pull_request) Successful in 23s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 2m1s
CI / Python Lint & Test (pull_request) Successful in 8m50s
CI / Canvas (Next.js) (pull_request) Successful in 24m27s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 14s
CI / Platform (Go) (pull_request) Successful in 26m33s
CI / all-required (pull_request) Successful in 26m46s
E2E Chat / E2E Chat (pull_request) Successful in 22s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 18s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 18s
audit-force-merge / audit (pull_request) Successful in 21s
Urgent prod-deploy publish builds currently FIFO-compete with ordinary PR required-CI on the shared 20-runner pool. PR#1350's (CTO-reported canvas-message-loss fix) production image build sat ~25min behind the PR-CI backlog after merge, directly delaying a user-facing fix. internal#462 comment 32299 + the already-merged operator-config publish-lane scaffolding (config.publish.yaml + publish-lane-ensure.sh, internal#394/#399) define a reserved `publish`/`release` sub-pool (molecule-runner-publish-*, OUTSIDE the managed 1..20 range so it is never auto-drained / recycled / drift-flagged). This retargets the 7 post-merge ship jobs across 5 workflows from `runs-on: ubuntu-latest` to `runs-on: publish` so a merged fix's image build/push/deploy gets reserved capacity and starts immediately, while PR-CI keeps the general pool: - publish-workspace-server-image.yml: build-and-push, deploy-production - publish-canvas-image.yml: build-and-push - publish-runtime.yml: publish, cascade - redeploy-tenants-on-main.yml: redeploy - redeploy-tenants-on-staging.yml: redeploy publish-runtime-autobump.yml is intentionally NOT moved: it is pull_request-triggered (PR-CI by nature, a required status), not a post-merge ship job — the lane reserves capacity for the ship path, not for PR checks. HARD MERGE PRECONDITION: this MUST NOT merge until the publish-lane runners are registered and advertising the `publish` label. Targeting an unregistered label queues jobs indefinitely with zero eligible runners — the exact #599/#576 `docker`-label failure mode. Lane registration is a GO-gated live-fleet mutation (publish-lane-ensure.sh ALLOW_FLEET_MUTATION=1, requires explicit Hongming in-chat GO). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |