molecule-core

History

devops-engineer 241859b552 fix(ci): handlers-postgres — sidestep port collision under host-network runner Class B Hongming-owned CICD red sweep. The Handlers Postgres Integration workflow has been silently failing on staging push and PRs ever since #92 fixed the IPv6 flake — the IPv6 fix correctly pinned 127.0.0.1, but unmasked a deeper issue: with our act_runner global container.network=host config, multiple concurrent runs of this workflow each tried to bind 0.0.0.0:5432 on the operator host. The first wins; subsequent postgres service containers exit with `FATAL: could not create any TCP/IP sockets` + `Address in use`. Docker auto-removes them (act_runner sets AutoRemove:true), so by the time `Apply migrations` runs `psql`, the container is gone — Connection refused, then `failed to remove container: No such container` at cleanup time. Per-job container.network override is silently ignored by act_runner (`--network and --net in the options will be ignored.`), so we sidestep `services:` entirely. The job container still uses host-net (required for cache server discovery on the operator's bridge IP). We launch a sibling postgres on the existing molecule-monorepo-net bridge with a unique name per run (run_id+run_attempt) and connect via the bridge IP read from `docker inspect`. Verified manually on operator host 2026-05-08: 2× postgres on host-net collides, but on the bridge with unique names + different IPs, both succeed and each is reachable from a host-net job container. Adds: - always()-cleanup step so containers don't leak on test failure - Diagnostic dump now includes the postgres container's docker logs - Runbook at docs/runbooks/ documenting the substrate behavior + the pattern future workflows should adopt for any `services:`-shaped need. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 18:21:12 -07:00
..
handlers-postgres-integration-port-collision.md	fix(ci): handlers-postgres — sidestep port collision under host-network runner	2026-05-07 18:21:12 -07:00

devops-engineer 241859b552 fix(ci): handlers-postgres — sidestep port collision under host-network runner

Class B Hongming-owned CICD red sweep. The Handlers Postgres Integration
workflow has been silently failing on staging push and PRs ever since
#92 fixed the IPv6 flake — the IPv6 fix correctly pinned 127.0.0.1, but
unmasked a deeper issue: with our act_runner global container.network=host
config, multiple concurrent runs of this workflow each tried to bind
0.0.0.0:5432 on the operator host. The first wins; subsequent postgres
service containers exit with `FATAL: could not create any TCP/IP sockets`
+ `Address in use`. Docker auto-removes them (act_runner sets
AutoRemove:true), so by the time `Apply migrations` runs `psql`, the
container is gone — Connection refused, then `failed to remove container:
No such container` at cleanup time.

Per-job container.network override is silently ignored by act_runner
(`--network and --net in the options will be ignored.`), so we sidestep
`services:` entirely. The job container still uses host-net (required
for cache server discovery on the operator's bridge IP). We launch a
sibling postgres on the existing molecule-monorepo-net bridge with a
unique name per run (run_id+run_attempt) and connect via the bridge IP
read from `docker inspect`.

Verified manually on operator host 2026-05-08: 2× postgres on host-net
collides, but on the bridge with unique names + different IPs, both
succeed and each is reachable from a host-net job container.

Adds:
- always()-cleanup step so containers don't leak on test failure
- Diagnostic dump now includes the postgres container's docker logs
- Runbook at docs/runbooks/ documenting the substrate behavior + the
  pattern future workflows should adopt for any `services:`-shaped need.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-07 18:21:12 -07:00

handlers-postgres-integration-port-collision.md

fix(ci): handlers-postgres — sidestep port collision under host-network runner

2026-05-07 18:21:12 -07:00