[infra-lead-agent] fix(ci): revert publish-* runs-on pin — docker label not yet registered (#576/#599 followup)
Some checks failed
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 7s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 10s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 15s
security-review / approved (pull_request) Failing after 13s
qa-review / approved (pull_request) Failing after 16s
sop-tier-check / tier-check (pull_request) Successful in 13s
gate-check-v3 / gate-check (pull_request) Successful in 23s
CI / Detect changes (pull_request) Successful in 28s
E2E API Smoke Test / detect-changes (pull_request) Successful in 30s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 30s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 31s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 32s
CI / Platform (Go) (pull_request) Successful in 4s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 4s
CI / Canvas (Next.js) (pull_request) Successful in 5s
CI / Python Lint & Test (pull_request) Successful in 5s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 5s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 5s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 7s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 6s
CI / all-required (pull_request) Successful in 2s
audit-force-merge / audit (pull_request) Has been skipped

#599 changed `runs-on: ubuntu-latest` → `runs-on: [ubuntu-latest, docker]` in
publish-workspace-server-image.yml + publish-canvas-image.yml to gate jobs onto
docker-capable runners. But no act-runner currently carries the `docker` label
(the infra-sre registration step from #599's PR body never happened — and #599
was merged anyway, despite the reviewer's stated "MANDATORY SEQUENCING" caveat).
Result: `[ubuntu-latest, docker]` matched ZERO eligible runners; both publish-*
workflows sat "Waiting to run" for >1.5h across main HEADs 41bb9e4849a4c3a7.
That's strictly worse than the pre-#599 coin-flip (~50% success).

This reverts the `runs-on` to `ubuntu-latest` to restore scheduling. Once
infra-sre registers the `docker` label on the socket-having runners (tracked
in #576), #599's pin should be re-applied — the diagnosis was correct, the
sequencing wasn't.

Workflow-only change → §SOP-13 §3 carve-out (tier:low). Author = infra-lead;
merger must be a non-author engineer with the 4-field §3 audit comment posted
first. Operationally urgent — publish image builds (next release/deploy artifact)
have been un-buildable for >1.5h.
This commit is contained in:
Molecule AI · infra-lead 2026-05-11 23:43:13 +00:00
parent 49a4c3a736
commit 3ea24916d0
2 changed files with 15 additions and 11 deletions

View File

@ -54,11 +54,13 @@ env:
jobs:
build-and-push:
name: Build & push canvas image
# NOTE: infra-sre must register a `docker` label on every act-runner that
# mounts /var/run/docker.sock (group=docker, socket perms 660+). Jobs without
# the `docker` label land on runners that lack the socket and fail here.
# See issue #576.
runs-on: [ubuntu-latest, docker]
# TEMPORARY REVERT (infra-lead, 2026-05-12) of #599's `runs-on: [ubuntu-latest, docker]`
# pin. No act-runner currently carries the `docker` label (#599 landed before
# infra-sre registered it), so `[ubuntu-latest, docker]` matched ZERO runners and
# both publish-* workflows sat "Waiting to run" for >1.5h. Reverting to `ubuntu-latest`
# un-breaks scheduling until the `docker` label is registered, then re-apply #599's
# pin. See #576 + #599.
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
continue-on-error: true
steps:

View File

@ -52,12 +52,14 @@ env:
jobs:
build-and-push:
# NOTE: infra-sre must register a `docker` label on every act-runner that
# mounts /var/run/docker.sock (group=docker, socket perms 660+). Jobs without
# the `docker` label land on runners that lack the socket and fail here.
# molecule-runner-1 (no socket) vs molecule-runner-4 (socket) — coin-flip
# without this label gate. See issue #576.
runs-on: [ubuntu-latest, docker]
# TEMPORARY REVERT (infra-lead, 2026-05-12) of #599's `runs-on: [ubuntu-latest, docker]`
# pin. No act-runner currently carries the `docker` label (#599 landed before
# infra-sre registered it), so `[ubuntu-latest, docker]` matched ZERO runners and
# both publish-* workflows sat "Waiting to run" for >1.5h — strictly worse than the
# pre-#599 coin-flip. Reverting to `ubuntu-latest` restores ~50% success (some runs
# land on socket-less runners and fail the health check below) until the `docker`
# label is registered, after which #599's pin should be re-applied. See #576 + #599.
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2