fix(ci): e2e-api — parallel-safe postgres/redis containers (closes #94) #100
Merged
claude-ceo-assistant
merged 1 commits from 2026-05-08 02:02:57 +00:00
fix/issue-94-e2e-api-parallel-safe-class-b into staging
1 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
| b9d2786f45 |
fix(ci): e2e-api — parallel-safe postgres/redis containers + provisioner setup
All checks were successful
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Successful in 1s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Successful in 2s
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (pull_request) Successful in 4s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 5s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Successful in 2s
Retarget main PRs to staging / Retarget to staging (pull_request) Successful in 3s
branch-protection drift check / Branch protection drift (pull_request) Successful in 8s
CI / Detect changes (pull_request) Successful in 8s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 7s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 7s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 9s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 8s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 8s
E2E API Smoke Test / detect-changes (pull_request) Successful in 9s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 4s
CI / Platform (Go) (pull_request) Successful in 5s
CI / Python Lint & Test (pull_request) Successful in 5s
CI / Canvas (Next.js) (pull_request) Successful in 5s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 4s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 5s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 3s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 1m46s
Class B Hongming-owned CICD red sweep, e2e-api leg. Same substrate hazard as PR #98 (handlers-postgres-integration) — Gitea act_runner configures `container.network: host` operator-wide, so: * Two concurrent e2e-api runs both attempted to bind `-p 15432:5432` and `-p 16379:6379` on the operator host. Verified in run a7/2727 on 2026-05-07: `docker: Error response from daemon: Conflict. The container name "/molecule-ci-redis" is already in use by container af10f438...` — exit 125, job fails before any test runs. * Hardcoded container names `molecule-ci-postgres` / `-redis` plus the leading `docker rm -f` step meant a second job's startup also KILLED the first job's still-running services. Fix shape (mirrors PR #98 bridge-net pattern, adapted because the platform-server is a Go binary on the host, not a containerised step): 1. Per-run unique container names: `pg-e2e-api-${RUN_ID}-${RUN_ATTEMPT}`, `redis-e2e-api-${RUN_ID}-${RUN_ATTEMPT}`. Unique even across reruns of the same run_id. 2. Ephemeral host port per run via `-p 0:5432` / `-p 0:6379` and `docker port` lookup, exported as `DATABASE_URL` / `REDIS_URL` to `$GITHUB_ENV`. No fixed host-port → no collision. 3. `127.0.0.1` (NOT `localhost`) in URLs — IPv6 first-resolve flake fixed in #92 stays fixed. 4. `if: always()` cleanup so containers don't leak when test steps fail. Issue #94 items #2 + #3 also addressed: * Pre-pull `alpine:latest` (provisioner uses it for ephemeral token-write containers in `internal/handlers/container_files.go`). * Idempotent `docker network create molecule-monorepo-net` (the provisioner attaches workspace containers via that bridge — `internal/provisioner/provisioner.go::DefaultNetwork`). Issue #94 item #1 (timeouts) NOT bumped — recent log evidence shows postgres ready in 3s, redis in 1s, platform in 1s when they DO come up. Timeouts are not the bottleneck on the current substrate. NOT addressed here (out of scope, separate change required): * `Run E2E API tests` step has been failing on `Status back online` because the platform's langgraph workspace template image (`ghcr.io/molecule-ai/workspace-template-langgraph:latest`) returns 403 Forbidden post-2026-05-06 GitHub org suspension. That is a template-registry resolution issue (ADR-002 / local-build mode) and belongs in a workspace-server change, not this workflow file. This PR fixes the parallel-collision class and the workflow setup hygiene; the langgraph-403 failure will still surface on runs after this lands until template resolution is fixed separately. Verified manually on operator host 2026-05-08: docker now hands out ephemeral ports on `-p 0:5432`, two parallel runs land on different ports, both reach pg_isready GREEN. Closes #94 (items #2 and #3; item #1 documented as not-bottleneck; langgraph-template-403 referenced for follow-up). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |