forked from molecule-ai/molecule-core
Class B Hongming-owned CICD red sweep, e2e-api leg. Same substrate hazard as PR #98 (handlers-postgres-integration) — Gitea act_runner configures `container.network: host` operator-wide, so: * Two concurrent e2e-api runs both attempted to bind `-p 15432:5432` and `-p 16379:6379` on the operator host. Verified in run a7/2727 on 2026-05-07: `docker: Error response from daemon: Conflict. The container name "/molecule-ci-redis" is already in use by container af10f438...` — exit 125, job fails before any test runs. * Hardcoded container names `molecule-ci-postgres` / `-redis` plus the leading `docker rm -f` step meant a second job's startup also KILLED the first job's still-running services. Fix shape (mirrors PR #98 bridge-net pattern, adapted because the platform-server is a Go binary on the host, not a containerised step): 1. Per-run unique container names: `pg-e2e-api-${RUN_ID}-${RUN_ATTEMPT}`, `redis-e2e-api-${RUN_ID}-${RUN_ATTEMPT}`. Unique even across reruns of the same run_id. 2. Ephemeral host port per run via `-p 0:5432` / `-p 0:6379` and `docker port` lookup, exported as `DATABASE_URL` / `REDIS_URL` to `$GITHUB_ENV`. No fixed host-port → no collision. 3. `127.0.0.1` (NOT `localhost`) in URLs — IPv6 first-resolve flake fixed in #92 stays fixed. 4. `if: always()` cleanup so containers don't leak when test steps fail. Issue #94 items #2 + #3 also addressed: * Pre-pull `alpine:latest` (provisioner uses it for ephemeral token-write containers in `internal/handlers/container_files.go`). * Idempotent `docker network create molecule-monorepo-net` (the provisioner attaches workspace containers via that bridge — `internal/provisioner/provisioner.go::DefaultNetwork`). Issue #94 item #1 (timeouts) NOT bumped — recent log evidence shows postgres ready in 3s, redis in 1s, platform in 1s when they DO come up. Timeouts are not the bottleneck on the current substrate. NOT addressed here (out of scope, separate change required): * `Run E2E API tests` step has been failing on `Status back online` because the platform's langgraph workspace template image (`ghcr.io/molecule-ai/workspace-template-langgraph:latest`) returns 403 Forbidden post-2026-05-06 GitHub org suspension. That is a template-registry resolution issue (ADR-002 / local-build mode) and belongs in a workspace-server change, not this workflow file. This PR fixes the parallel-collision class and the workflow setup hygiene; the langgraph-403 failure will still surface on runs after this lands until template resolution is fixed separately. Verified manually on operator host 2026-05-08: docker now hands out ephemeral ports on `-p 0:5432`, two parallel runs land on different ports, both reach pg_isready GREEN. Closes #94 (items #2 and #3; item #1 documented as not-bottleneck; langgraph-template-403 referenced for follow-up). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| scripts | ||
| workflows | ||
| CODEOWNERS | ||
| dependabot.yml | ||