fix(ci): handlers-postgres — sidestep port collision under host-network runner #98
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "fix/handlers-postgres-port-collision-class-b"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Class B Hongming-owned CICD red sweep. Closes the silent failure of
Handlers Postgres Integrationon every staging push + PR since #92 fixed the IPv6 flake.Root cause
With our act_runner config (
container.network: host, applied globally to job AND service containers), two concurrent workflow runs both try to bind0.0.0.0:5432. Second postgres FATALs withAddress in use, Docker auto-removes it (act_runner setsAutoRemove:true), so bypsqltime the container is gone —Connection refused, thenfailed to remove container: No such containerat cleanup.Per-job
container.networkoverride is silently ignored by act_runner (--network and --net in the options will be ignored.).Reproduced manually on operator host 2026-05-08:
Fix
Sidestep
services:entirely. Launch a sibling postgres on the existingmolecule-monorepo-netbridge with a per-run unique name (pg-handlers-${RUN_ID}-${RUN_ATTEMPT}); read its bridge IP viadocker inspect; tests connect to the bridge IP, not127.0.0.1. Host-net job containers can reach bridge-net peers directly. Two parallel runs use different names + different bridge IPs — no collision.Adds:
always()cleanup step so containers don't leak on failuredocs/runbooks/handlers-postgres-integration-port-collision.mddocumenting the substrate behavior + pattern for futureservices:-shaped workflowsHostile self-review (3 weakest spots)
molecule-monorepo-netbridge isn't auto-recreated if it disappears. I added a hard-faildocker network inspectguard with a clear error message. Risk: ops accidentallydocker network prune --forceand CI silently breaks until someone re-runsdocker-compose.infra.yml. Trade-off: the alternative (auto-create on missing) couples the workflow to network parameters that should live in compose files (SSOT).docker rm -f, notdocker stop && docker rm. Force-kill is fine for ephemeral test postgres but doesn't shut down clean. Acceptable here — there's no shared volume or replication.PG_HOSTis set via$GITHUB_ENVfrom one step into the next two. If the start step'sdocker inspectregression-fails to return an IP, downstream steps would error out with empty${PG_HOST}— but the start step now hard-fails first via explicit empty-check, so this is closed.Test plan
pg-handlers-<run_id>-<attempt>in docker logsCo-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com
Handlers-Postgres host-network port-collision fix. Phase 1 hypothesis ranking documented + winner confirmed (manual repro: 2× postgres:15-alpine on --network host = port collision). Switched from services: block to --network molecule-monorepo-net + unique container names per run. ≥2 consecutive green runs (#3210 + #3221) with different IPs proving unique-name design. Runbook added at docs/runbooks/handlers-postgres-integration-port-collision.md. By devops-engineer persona. Hostile self-review documented 3 weakest spots. Ready.
Ghost referenced this pull request2026-05-08 02:16:10 +00:00