fix(ci): runs-on [publish, release] to route deterministically to op-host runners (matches tc#22) #36
Reference in New Issue
Block a user
Delete Branch "fix/ci-publish-and-of-labels-tc22"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Why
The
runs-on: publishsingle-label routing in this template'spublish-image.ymlis non-deterministic. Both pools advertise the
publishlabel:molecule-runner-publish-{1,2}: working(
runner-base:full-latest-cloudflared-goproxy-pipe)hongming-pc-runner-publish-*(Windows/WSL): broken(
runner-base:full-latest-cloudflared-docker-config-fix) —docker login --password-stdinfails with:Error saving credentials: mkdir /home/hongming: permission denied— same EACCES bug class as internal#597 / internal#603 (act_runner HOME
injection).
When the PC pool picks, publish-image fails non-deterministically. When op-host
picks, it succeeds. Latent risk: random publish-image failures + stale ECR.
Fix
Match the template-codex sibling fix tc#22 (merge commit
0fb25352,discovered live by a3692d6b on tc#21 publish-image run 79 job 1).
runs-on: [publish, release](AND-of-labels) routes deterministically toop-host runners, since they are the ONLY ones advertising BOTH labels
(see op-host
/opt/molecule/runners/config.publish.yamllines 28-29).Scope
runs-on:change + rationale comment blockpublishjob affected; theresolve-versionjob remainsubuntu-latestPriority
Low-priority hygiene — only manifests when PC-publish picks vs op-host. The
existing setup works correctly when op-host picks first; this just makes it
deterministic so future builds don't randomly land on broken runners.
Cross-refs
0fb25352) — sibling fixconfig.publish.yaml— label declarationsReviewers
Reviewers please verify:
publish+releaselabels both exist on op-host runners (NOT on PC pool)[publish, release]AND-of-labels routes to op-host (NOT PC-publish-1/2)The `runs-on: publish` single-label is non-deterministic: hongming-pc-runner-publish-* runners ALSO advertise `publish` but their runner-base image (`runner-base:full-latest-cloudflared-docker-config-fix`) fails `docker login --password-stdin` with: Error saving credentials: mkdir /home/hongming: permission denied — same EACCES bug class as internal#597/#603 act_runner HOME injection. op-host molecule-runner-publish-{1,2} use the WORKING runner-base image (`full-latest-cloudflared-goproxy-pipe`) AND advertise BOTH `publish` + `release` labels (op-host /opt/molecule/runners/config.publish.yaml). Requiring `runs-on: [publish, release]` (AND-of-labels) routes deterministically to op-host. Matches template-codex tc#22 (merge 0fb25352…). Discovered live by a3692d6b on tc#21 merge-commit publish-image run. Low-priority hygiene — only fires when PC-publish picks vs op-host; the existing single-label setup works correctly when op-host picks first. Refs: a3692d6b (codex tc#22 discovery), internal#597, internal#603Five-axis (workflow-infra lens) — APPROVE
1. Substance. Matches tc#22 byte-for-byte in shape: same single-line
runs-on:change frompublish→[publish, release], same rationale comment block, same cross-refs. Confirmed via tc#22 head 6fb680d7 contents API read: identical AND-of-labels comment block + identical YAML form.2. Routing correctness. Verified op-host
/opt/molecule/runners/config.publish.yamldeclares BOTHpublishandreleaselabels (lines 28-29), while hongming-pc-runner-publish-* containers only advertisepublish(their config.yaml does not includereleasein the labels array). AND-of-labels deterministically routes to op-host.3. EACCES bug class. PC-publish runner-base image
runner-base:full-latest-cloudflared-docker-config-fixhas the broken HOME injection (mkdir /home/hongming: permission denied) — same class as internal#597/#603. Op-host imagerunner-base:full-latest-cloudflared-goproxy-pipeworks.4. Blast radius. 1-line YAML change to a single job's
runs-on. No secret/perm/behavior change.resolve-versionjob remainsubuntu-latest. If routing is somehow wrong, the worst case is "publish-image fails to start" — same failure mode as today, just deterministic.5. Risk. Low. Hygiene fix; manifests only when PC-publish wins the schedule race. Sibling tc#22 has been live and clean since merge.
LGTM. APPROVE.
Five-axis (CI discipline lens) — APPROVE
1. Test-mirror & regression safety. This is a placement-determinism fix; no test surface is touched. The change does NOT relax any required check. BP
status_check_contextson main is unchanged (5 contexts: validate / Template validation static+runtime / Adapter unit tests / Secret scan). Post-merge publish-image only fires on push to main → no functional regression risk in PR-gate cycle.2. Idempotence. Subsequent publish-image runs on op-host (where they already mostly land) will now land on op-host every time. ECR digest under same image-name doesn't change semantics — only WHO produces it.
3. Failure modes. Previously: ~stochastic split between op-host (works) and PC-publish (fails with HOME EACCES). Post-merge: 100% op-host. If op-host runners are down for maintenance the job will wait/queue (vs. failing fast on PC) — that's the correct safety property.
4. Two-eyes preserved. Author
infra-runtime-be≠ reviewercore-devops≠ reviewercore-qa. Different lenses, different identities. BPrequired_approvals=2,dismiss_stale_approvals=truehonored.5. Cross-link integrity. tc#22 (
0fb25352) is real and merged (verified); referenced sibling discoverya3692d6band EACCES bug class internal#597/#603 are accurate per session memory.LGTM. APPROVE.