fix(ci): handlers-postgres — sidestep port collision under host-network runner (#98)
Some checks failed
CodeQL / Analyze (${{ matrix.language }}) (go) (push) Successful in 7s
Block internal-flavored paths / Block forbidden paths (push) Successful in 17s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (push) Successful in 7s
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (push) Successful in 18s
CodeQL / Analyze (${{ matrix.language }}) (python) (push) Successful in 7s
CI / Detect changes (push) Successful in 24s
E2E API Smoke Test / detect-changes (push) Successful in 23s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (push) Successful in 17s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 25s
Handlers Postgres Integration / detect-changes (push) Successful in 23s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 20s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 18s
CI / Platform (Go) (push) Successful in 7s
CI / Canvas (Next.js) (push) Successful in 8s
CI / Shellcheck (E2E scripts) (push) Successful in 5s
CI / Python Lint & Test (push) Successful in 7s
cascade-list-drift-gate / check (pull_request) Successful in 17s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 25s
branch-protection drift check / Branch protection drift (pull_request) Successful in 29s
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Successful in 7s
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (pull_request) Successful in 18s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Successful in 7s
CI / Detect changes (pull_request) Successful in 24s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Successful in 8s
E2E API Smoke Test / detect-changes (pull_request) Successful in 25s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 24s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 25s
Harness Replays / detect-changes (pull_request) Successful in 25s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 22s
Retarget main PRs to staging / Retarget to staging (pull_request) Has been skipped
CI / Canvas Deploy Reminder (push) Has been skipped
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 23s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 3m10s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 31s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 1m9s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 23s
E2E API Smoke Test / E2E API Smoke Test (push) Failing after 6m21s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 7m9s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 7m6s
Harness Replays / Harness Replays (pull_request) Successful in 2m5s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 2m52s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 5m43s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 5m28s
CI / Canvas (Next.js) (pull_request) Successful in 8m6s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 6m38s
CI / Python Lint & Test (pull_request) Successful in 7m57s
CI / Platform (Go) (pull_request) Failing after 9m34s
Some checks failed
CodeQL / Analyze (${{ matrix.language }}) (go) (push) Successful in 7s
Block internal-flavored paths / Block forbidden paths (push) Successful in 17s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (push) Successful in 7s
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (push) Successful in 18s
CodeQL / Analyze (${{ matrix.language }}) (python) (push) Successful in 7s
CI / Detect changes (push) Successful in 24s
E2E API Smoke Test / detect-changes (push) Successful in 23s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (push) Successful in 17s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 25s
Handlers Postgres Integration / detect-changes (push) Successful in 23s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 20s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 18s
CI / Platform (Go) (push) Successful in 7s
CI / Canvas (Next.js) (push) Successful in 8s
CI / Shellcheck (E2E scripts) (push) Successful in 5s
CI / Python Lint & Test (push) Successful in 7s
cascade-list-drift-gate / check (pull_request) Successful in 17s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 25s
branch-protection drift check / Branch protection drift (pull_request) Successful in 29s
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Successful in 7s
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (pull_request) Successful in 18s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Successful in 7s
CI / Detect changes (pull_request) Successful in 24s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Successful in 8s
E2E API Smoke Test / detect-changes (pull_request) Successful in 25s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 24s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 25s
Harness Replays / detect-changes (pull_request) Successful in 25s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 22s
Retarget main PRs to staging / Retarget to staging (pull_request) Has been skipped
CI / Canvas Deploy Reminder (push) Has been skipped
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 23s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 3m10s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 31s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 1m9s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 23s
E2E API Smoke Test / E2E API Smoke Test (push) Failing after 6m21s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 7m9s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 7m6s
Harness Replays / Harness Replays (pull_request) Successful in 2m5s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 2m52s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 5m43s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 5m28s
CI / Canvas (Next.js) (pull_request) Successful in 8m6s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 6m38s
CI / Python Lint & Test (pull_request) Successful in 7m57s
CI / Platform (Go) (pull_request) Failing after 9m34s
Switches from services: block to --network molecule-monorepo-net with unique per-run container names. Avoids port-5432 collision when parallel Handlers-Postgres jobs run on host-network act_runner. Approved by security-auditor.
This commit is contained in:
commit
8a3141a763
148
.github/workflows/handlers-postgres-integration.yml
vendored
148
.github/workflows/handlers-postgres-integration.yml
vendored
@ -14,12 +14,42 @@ name: Handlers Postgres Integration
|
|||||||
# self-review caught it took 2 minutes to set up and would have caught
|
# self-review caught it took 2 minutes to set up and would have caught
|
||||||
# the bug at PR-time.
|
# the bug at PR-time.
|
||||||
#
|
#
|
||||||
# This job spins a Postgres service container, applies the migration,
|
# Why this workflow does NOT use `services: postgres:` (Class B fix)
|
||||||
# and runs `go test -tags=integration` against a live DB. Required
|
# ------------------------------------------------------------------
|
||||||
# check on staging branch protection — backend handler PRs cannot
|
# Our act_runner config has `container.network: host` (operator host
|
||||||
# merge without a real-DB regression gate.
|
# /opt/molecule/runners/config.yaml), which act_runner applies to BOTH
|
||||||
|
# the job container AND every service container. With host-net, two
|
||||||
|
# concurrent runs of this workflow both try to bind 0.0.0.0:5432 — the
|
||||||
|
# second postgres FATALs with `could not create any TCP/IP sockets:
|
||||||
|
# Address in use`, and Docker auto-removes it (act_runner sets
|
||||||
|
# AutoRemove:true on service containers). By the time the migrations
|
||||||
|
# step runs `psql`, the postgres container is gone, hence
|
||||||
|
# `Connection refused` then `failed to remove container: No such
|
||||||
|
# container` at cleanup time.
|
||||||
#
|
#
|
||||||
# Cost: ~30s job (postgres pull from GH cache + go build + 4 tests).
|
# Per-job `container.network` override is silently ignored by
|
||||||
|
# act_runner — `--network and --net in the options will be ignored.`
|
||||||
|
# appears in the runner log. Documented constraint.
|
||||||
|
#
|
||||||
|
# So we sidestep `services:` entirely. The job container still uses
|
||||||
|
# host-net (inherited from runner config; required for cache server
|
||||||
|
# discovery on the bridge IP 172.18.0.17:42631). We launch a sibling
|
||||||
|
# postgres on the existing `molecule-monorepo-net` bridge with a
|
||||||
|
# UNIQUE name per run — `pg-handlers-${RUN_ID}-${RUN_ATTEMPT}` — and
|
||||||
|
# read its bridge IP via `docker inspect`. A host-net job container
|
||||||
|
# can reach a bridge-net container directly via the bridge IP (verified
|
||||||
|
# manually on operator host 2026-05-08).
|
||||||
|
#
|
||||||
|
# Trade-offs vs. the original `services:` shape:
|
||||||
|
# + No host-port collision; N parallel runs share the bridge cleanly
|
||||||
|
# + `if: always()` cleanup runs even on test-step failure
|
||||||
|
# - One more step in the workflow (+~3 lines)
|
||||||
|
# - Requires `molecule-monorepo-net` to exist on the operator host
|
||||||
|
# (it does; declared in docker-compose.yml + docker-compose.infra.yml)
|
||||||
|
#
|
||||||
|
# Class B Hongming-owned CICD red sweep, 2026-05-08.
|
||||||
|
#
|
||||||
|
# Cost: ~30s job (postgres pull from cache + go build + 4 tests).
|
||||||
|
|
||||||
on:
|
on:
|
||||||
push:
|
push:
|
||||||
@ -59,20 +89,14 @@ jobs:
|
|||||||
name: Handlers Postgres Integration
|
name: Handlers Postgres Integration
|
||||||
needs: detect-changes
|
needs: detect-changes
|
||||||
runs-on: ubuntu-latest
|
runs-on: ubuntu-latest
|
||||||
services:
|
env:
|
||||||
postgres:
|
# Unique name per run so concurrent jobs don't collide on the
|
||||||
image: postgres:15-alpine
|
# bridge network. ${RUN_ID}-${RUN_ATTEMPT} is unique even across
|
||||||
env:
|
# workflow_dispatch reruns of the same run_id.
|
||||||
POSTGRES_PASSWORD: test
|
PG_NAME: pg-handlers-${{ github.run_id }}-${{ github.run_attempt }}
|
||||||
POSTGRES_DB: molecule
|
# Bridge network already exists on the operator host (declared
|
||||||
ports:
|
# in docker-compose.yml + docker-compose.infra.yml).
|
||||||
- 5432:5432
|
PG_NETWORK: molecule-monorepo-net
|
||||||
# GHA spins this with --health-cmd built in for postgres images.
|
|
||||||
options: >-
|
|
||||||
--health-cmd pg_isready
|
|
||||||
--health-interval 5s
|
|
||||||
--health-timeout 5s
|
|
||||||
--health-retries 10
|
|
||||||
defaults:
|
defaults:
|
||||||
run:
|
run:
|
||||||
working-directory: workspace-server
|
working-directory: workspace-server
|
||||||
@ -89,16 +113,57 @@ jobs:
|
|||||||
with:
|
with:
|
||||||
go-version: 'stable'
|
go-version: 'stable'
|
||||||
|
|
||||||
|
- if: needs.detect-changes.outputs.handlers == 'true'
|
||||||
|
name: Start sibling Postgres on bridge network
|
||||||
|
working-directory: .
|
||||||
|
run: |
|
||||||
|
# Sanity: the bridge network must exist on the operator host.
|
||||||
|
# Hard-fail loud if it doesn't — easier to spot than a silent
|
||||||
|
# auto-create that diverges from the rest of the stack.
|
||||||
|
if ! docker network inspect "${PG_NETWORK}" >/dev/null 2>&1; then
|
||||||
|
echo "::error::Bridge network '${PG_NETWORK}' missing on operator host. Re-run docker-compose.infra.yml or check ops handbook."
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# If a stale container with the same name exists (rerun on
|
||||||
|
# the same run_id), wipe it first.
|
||||||
|
docker rm -f "${PG_NAME}" >/dev/null 2>&1 || true
|
||||||
|
|
||||||
|
docker run -d \
|
||||||
|
--name "${PG_NAME}" \
|
||||||
|
--network "${PG_NETWORK}" \
|
||||||
|
--health-cmd "pg_isready -U postgres" \
|
||||||
|
--health-interval 5s \
|
||||||
|
--health-timeout 5s \
|
||||||
|
--health-retries 10 \
|
||||||
|
-e POSTGRES_PASSWORD=test \
|
||||||
|
-e POSTGRES_DB=molecule \
|
||||||
|
postgres:15-alpine >/dev/null
|
||||||
|
|
||||||
|
# Read back the bridge IP. Always present immediately after
|
||||||
|
# `docker run -d` for bridge networks.
|
||||||
|
PG_HOST=$(docker inspect "${PG_NAME}" \
|
||||||
|
--format "{{(index .NetworkSettings.Networks \"${PG_NETWORK}\").IPAddress}}")
|
||||||
|
if [ -z "${PG_HOST}" ]; then
|
||||||
|
echo "::error::Could not resolve PG_HOST for ${PG_NAME} on ${PG_NETWORK}"
|
||||||
|
docker logs "${PG_NAME}" || true
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
echo "PG_HOST=${PG_HOST}" >> "$GITHUB_ENV"
|
||||||
|
echo "INTEGRATION_DB_URL=postgres://postgres:test@${PG_HOST}:5432/molecule?sslmode=disable" >> "$GITHUB_ENV"
|
||||||
|
echo "Started ${PG_NAME} at ${PG_HOST}:5432"
|
||||||
|
|
||||||
- if: needs.detect-changes.outputs.handlers == 'true'
|
- if: needs.detect-changes.outputs.handlers == 'true'
|
||||||
name: Apply migrations to Postgres service
|
name: Apply migrations to Postgres service
|
||||||
env:
|
env:
|
||||||
PGPASSWORD: test
|
PGPASSWORD: test
|
||||||
run: |
|
run: |
|
||||||
# Wait for postgres to actually accept connections (the
|
# Wait for postgres to actually accept connections. Docker's
|
||||||
# GHA --health-cmd is best-effort but psql can still race).
|
# health-cmd handles container-side readiness, but the wire
|
||||||
|
# to the bridge IP is best-tested with pg_isready directly.
|
||||||
for i in {1..15}; do
|
for i in {1..15}; do
|
||||||
if pg_isready -h 127.0.0.1 -p 5432 -U postgres -q; then break; fi
|
if pg_isready -h "${PG_HOST}" -p 5432 -U postgres -q; then break; fi
|
||||||
echo "waiting for postgres..."; sleep 2
|
echo "waiting for postgres at ${PG_HOST}:5432..."; sleep 2
|
||||||
done
|
done
|
||||||
|
|
||||||
# Apply every .up.sql in lexicographic order with
|
# Apply every .up.sql in lexicographic order with
|
||||||
@ -131,7 +196,7 @@ jobs:
|
|||||||
# not fine once a cross-table atomicity test came in.
|
# not fine once a cross-table atomicity test came in.
|
||||||
set +e
|
set +e
|
||||||
for migration in $(ls migrations/*.sql 2>/dev/null | grep -v '\.down\.sql$' | sort); do
|
for migration in $(ls migrations/*.sql 2>/dev/null | grep -v '\.down\.sql$' | sort); do
|
||||||
if psql -h 127.0.0.1 -U postgres -d molecule -v ON_ERROR_STOP=1 \
|
if psql -h "${PG_HOST}" -U postgres -d molecule -v ON_ERROR_STOP=1 \
|
||||||
-f "$migration" >/dev/null 2>&1; then
|
-f "$migration" >/dev/null 2>&1; then
|
||||||
echo "✓ $(basename "$migration")"
|
echo "✓ $(basename "$migration")"
|
||||||
else
|
else
|
||||||
@ -145,7 +210,7 @@ jobs:
|
|||||||
# fail if any didn't land — that would be a real regression we
|
# fail if any didn't land — that would be a real regression we
|
||||||
# want loud.
|
# want loud.
|
||||||
for tbl in delegations workspaces activity_logs pending_uploads; do
|
for tbl in delegations workspaces activity_logs pending_uploads; do
|
||||||
if ! psql -h 127.0.0.1 -U postgres -d molecule -tA \
|
if ! psql -h "${PG_HOST}" -U postgres -d molecule -tA \
|
||||||
-c "SELECT 1 FROM information_schema.tables WHERE table_name = '$tbl'" \
|
-c "SELECT 1 FROM information_schema.tables WHERE table_name = '$tbl'" \
|
||||||
| grep -q 1; then
|
| grep -q 1; then
|
||||||
echo "::error::$tbl table missing after migration replay — handler integration tests would be meaningless"
|
echo "::error::$tbl table missing after migration replay — handler integration tests would be meaningless"
|
||||||
@ -156,23 +221,32 @@ jobs:
|
|||||||
|
|
||||||
- if: needs.detect-changes.outputs.handlers == 'true'
|
- if: needs.detect-changes.outputs.handlers == 'true'
|
||||||
name: Run integration tests
|
name: Run integration tests
|
||||||
env:
|
|
||||||
# 127.0.0.1, NOT localhost. On Gitea / act_runner the runner host
|
|
||||||
# has IPv6 enabled, so `localhost` resolves to `::1` first, and
|
|
||||||
# the Postgres service container only listens on IPv4 → lib/pq's
|
|
||||||
# first dial hits ECONNREFUSED. The migration step uses psql -h
|
|
||||||
# localhost which falls back to IPv4 cleanly, so the flake hides
|
|
||||||
# there and surfaces only at test time. Pinning IPv4 makes the
|
|
||||||
# whole job deterministic. (Issue #88, item 3.)
|
|
||||||
INTEGRATION_DB_URL: postgres://postgres:test@127.0.0.1:5432/molecule?sslmode=disable
|
|
||||||
run: |
|
run: |
|
||||||
|
# INTEGRATION_DB_URL is exported by the start-postgres step;
|
||||||
|
# points at the per-run bridge IP, not 127.0.0.1, so concurrent
|
||||||
|
# workflow runs don't fight over a host-net 5432 port.
|
||||||
go test -tags=integration -timeout 5m -v ./internal/handlers/ -run "^TestIntegration_"
|
go test -tags=integration -timeout 5m -v ./internal/handlers/ -run "^TestIntegration_"
|
||||||
|
|
||||||
- if: needs.detect-changes.outputs.handlers == 'true' && failure()
|
- if: failure() && needs.detect-changes.outputs.handlers == 'true'
|
||||||
name: Diagnostic dump on failure
|
name: Diagnostic dump on failure
|
||||||
env:
|
env:
|
||||||
PGPASSWORD: test
|
PGPASSWORD: test
|
||||||
run: |
|
run: |
|
||||||
echo "::group::delegations table state"
|
echo "::group::postgres container status"
|
||||||
psql -h 127.0.0.1 -U postgres -d molecule -c "SELECT * FROM delegations LIMIT 50;" || true
|
docker ps -a --filter "name=${PG_NAME}" --format '{{.Status}} {{.Names}}' || true
|
||||||
|
docker logs "${PG_NAME}" 2>&1 | tail -50 || true
|
||||||
echo "::endgroup::"
|
echo "::endgroup::"
|
||||||
|
echo "::group::delegations table state"
|
||||||
|
psql -h "${PG_HOST}" -U postgres -d molecule -c "SELECT * FROM delegations LIMIT 50;" || true
|
||||||
|
echo "::endgroup::"
|
||||||
|
|
||||||
|
- if: always() && needs.detect-changes.outputs.handlers == 'true'
|
||||||
|
name: Stop sibling Postgres
|
||||||
|
working-directory: .
|
||||||
|
run: |
|
||||||
|
# always() so containers don't leak when migrations or tests
|
||||||
|
# fail. The cleanup is best-effort: if the container is
|
||||||
|
# already gone (e.g. concurrent rerun race), don't fail the job.
|
||||||
|
docker rm -f "${PG_NAME}" >/dev/null 2>&1 || true
|
||||||
|
echo "Cleaned up ${PG_NAME}"
|
||||||
|
|
||||||
|
|||||||
137
docs/runbooks/handlers-postgres-integration-port-collision.md
Normal file
137
docs/runbooks/handlers-postgres-integration-port-collision.md
Normal file
@ -0,0 +1,137 @@
|
|||||||
|
# Runbook — Handlers Postgres Integration port-collision substrate
|
||||||
|
|
||||||
|
**Status:** Resolved 2026-05-08 (PR for class B Hongming-owned CICD red sweep).
|
||||||
|
|
||||||
|
## Symptom
|
||||||
|
|
||||||
|
`Handlers Postgres Integration` workflow fails on staging push and PRs.
|
||||||
|
Step `Apply migrations to Postgres service` shows:
|
||||||
|
|
||||||
|
```
|
||||||
|
psql: error: connection to server at "127.0.0.1", port 5432 failed: Connection refused
|
||||||
|
```
|
||||||
|
|
||||||
|
Job-cleanup step further down logs:
|
||||||
|
|
||||||
|
```
|
||||||
|
Cleaning up services for job Handlers Postgres Integration
|
||||||
|
failed to remove container: Error response from daemon: No such container: <id>
|
||||||
|
```
|
||||||
|
|
||||||
|
…confirming the postgres service container was already gone before
|
||||||
|
cleanup ran.
|
||||||
|
|
||||||
|
## Root cause
|
||||||
|
|
||||||
|
Our Gitea act_runner (operator host `5.78.80.188`,
|
||||||
|
`/opt/molecule/runners/config.yaml`) sets:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
container:
|
||||||
|
network: host
|
||||||
|
```
|
||||||
|
|
||||||
|
…which act_runner applies to BOTH the job container AND every
|
||||||
|
`services:` container in a workflow. Multiple workflow instances
|
||||||
|
running concurrently across the 16 parallel runners each try to bind
|
||||||
|
postgres on `0.0.0.0:5432`. The first wins; subsequent instances exit
|
||||||
|
immediately with:
|
||||||
|
|
||||||
|
```
|
||||||
|
LOG: could not bind IPv4 address "0.0.0.0": Address in use
|
||||||
|
HINT: Is another postmaster already running on port 5432?
|
||||||
|
FATAL: could not create any TCP/IP sockets
|
||||||
|
```
|
||||||
|
|
||||||
|
act_runner sets `AutoRemove:true` on service containers, so Docker
|
||||||
|
garbage-collects them as soon as they exit. By the time the migrations
|
||||||
|
step runs `pg_isready` / `psql`, the container is gone and connection
|
||||||
|
refused.
|
||||||
|
|
||||||
|
Reproduction (operator host):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker run --rm -d --name pg-A --network host \
|
||||||
|
-e POSTGRES_PASSWORD=test postgres:15-alpine
|
||||||
|
docker run -d --name pg-B --network host \
|
||||||
|
-e POSTGRES_PASSWORD=test postgres:15-alpine
|
||||||
|
docker logs pg-B # FATAL: could not create any TCP/IP sockets
|
||||||
|
```
|
||||||
|
|
||||||
|
## Why per-job override doesn't work
|
||||||
|
|
||||||
|
The natural fix — per-job `container.network` override — is silently
|
||||||
|
ignored by act_runner. The runner log emits:
|
||||||
|
|
||||||
|
```
|
||||||
|
--network and --net in the options will be ignored.
|
||||||
|
```
|
||||||
|
|
||||||
|
This is a documented act_runner constraint: container network is a
|
||||||
|
runner-wide setting, not per-job. Source: gitea/act_runner config docs
|
||||||
|
+ vegardit/docker-gitea-act-runner issue #7.
|
||||||
|
|
||||||
|
Flipping the global `container.network` to `bridge` would break every
|
||||||
|
other workflow in the repo (cache server discovery,
|
||||||
|
`molecule-monorepo-net` peer access during integration tests, etc.) —
|
||||||
|
unacceptable blast radius for a per-test bug.
|
||||||
|
|
||||||
|
## Fix shape
|
||||||
|
|
||||||
|
`handlers-postgres-integration.yml` no longer uses `services: postgres:`.
|
||||||
|
It launches a sibling postgres container manually on the existing
|
||||||
|
`molecule-monorepo-net` bridge network with a per-run unique name:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
env:
|
||||||
|
PG_NAME: pg-handlers-${{ github.run_id }}-${{ github.run_attempt }}
|
||||||
|
PG_NETWORK: molecule-monorepo-net
|
||||||
|
|
||||||
|
steps:
|
||||||
|
- name: Start sibling Postgres on bridge network
|
||||||
|
run: |
|
||||||
|
docker run -d --name "${PG_NAME}" --network "${PG_NETWORK}" \
|
||||||
|
...
|
||||||
|
postgres:15-alpine
|
||||||
|
PG_HOST=$(docker inspect "${PG_NAME}" \
|
||||||
|
--format "{{(index .NetworkSettings.Networks \"${PG_NETWORK}\").IPAddress}}")
|
||||||
|
echo "PG_HOST=${PG_HOST}" >> "$GITHUB_ENV"
|
||||||
|
|
||||||
|
# … migrations + tests use ${PG_HOST}, not 127.0.0.1 …
|
||||||
|
|
||||||
|
- if: always() && …
|
||||||
|
name: Stop sibling Postgres
|
||||||
|
run: docker rm -f "${PG_NAME}" || true
|
||||||
|
```
|
||||||
|
|
||||||
|
The host-net job container can reach a bridge-net container via the
|
||||||
|
bridge IP directly (verified manually, 2026-05-08). Two parallel runs
|
||||||
|
use different names + different bridge IPs — no collision.
|
||||||
|
|
||||||
|
## Future-proofing
|
||||||
|
|
||||||
|
Other workflows that hit the same shape (any `services:` with a
|
||||||
|
fixed-port image) will exhibit the same failure mode under
|
||||||
|
host-network runner config. Translate using this same pattern:
|
||||||
|
|
||||||
|
1. Drop the `services:` block.
|
||||||
|
2. Use `${{ github.run_id }}-${{ github.run_attempt }}` for unique
|
||||||
|
container name.
|
||||||
|
3. Launch on `molecule-monorepo-net` (already trusted bridge in
|
||||||
|
`docker-compose.infra.yml`).
|
||||||
|
4. Read back the bridge IP via `docker inspect` and export as a step env.
|
||||||
|
5. `if: always()` cleanup step at the end.
|
||||||
|
|
||||||
|
If the count of such workflows grows, factor into a composite action
|
||||||
|
(`./.github/actions/sibling-postgres`) so the substrate logic lives
|
||||||
|
in one place.
|
||||||
|
|
||||||
|
## Related
|
||||||
|
|
||||||
|
- Issue #88 (closed by #92): localhost → 127.0.0.1 fix that unmasked
|
||||||
|
this collision; the IPv6 fix is correct, port collision is the new
|
||||||
|
layer.
|
||||||
|
- Issue #94 created `molecule-monorepo-net` + `alpine:latest` as
|
||||||
|
prereqs.
|
||||||
|
- Saved memory `feedback_act_runner_github_server_url` documents
|
||||||
|
another act_runner-vs-GHA divergence (server URL).
|
||||||
Loading…
Reference in New Issue
Block a user