Compare commits

...

1 Commits

Author SHA1 Message Date
Molecule AI Dev Engineer A (Kimi) 9fe7eb9a8e fix(ci): hard-code 127.0.0.1 + MOLECULE_IN_DOCKER=false + PLATFORM_URL discovery in local-provision E2E
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 4s
CI / Python Lint & Test (pull_request) Successful in 5s
E2E API Smoke Test / detect-changes (pull_request) Successful in 9s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 5s
E2E Chat / detect-changes (pull_request) Successful in 8s
CI / Detect changes (pull_request) Successful in 13s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 5s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 3s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 4s
CI / Platform (Go) (pull_request) Successful in 3s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 3s
E2E Chat / E2E Chat (pull_request) Successful in 4s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 14s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 6s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 5s
lint-required-workflows-docker-host-pinned / Lint docker-host pin on docker-touching workflows (pull_request) Successful in 5s
CI / Canvas (Next.js) (pull_request) Successful in 12s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 6s
CI / Canvas Deploy Status (pull_request) Successful in 2s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 16s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 12s
CI / all-required (pull_request) Successful in 7s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (pull_request) Successful in 44s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 57s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 1m16s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 1m17s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 1m20s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m14s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (pull_request) Successful in 43s
gate-check-v3 / gate-check (pull_request_target) Failing after 9s
sop-checklist / review-refire (pull_request_target) Has been skipped
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4
sop-checklist / na-declarations (pull_request) N/A: (none)
security-review / approved (pull_request_target) Failing after 3s
sop-checklist / all-items-acked (pull_request_target) Successful in 3s
qa-review / approved (pull_request_target) Failing after 6s
This addresses the persistent Local Provision Lifecycle E2E failures on main
by applying the same hard-code-env / fix-flaky-CI pattern as #2468→#2470:

1. Replace localhost with 127.0.0.1 for BASE URLs (mirrors e2e-api.yml #92).
   localhost can resolve to IPv6 (::1) first on some act_runner hosts,
   causing curl to fail or hang when the platform only binds IPv4.

2. Hard-code MOLECULE_IN_DOCKER=false at the job level.
   act_runner job containers have /.dockerenv, so the platform auto-detects
   platformInDocker=true. This breaks workspace container reachability because
   the job container is NOT on molecule-core-net.

3. Discover and pass PLATFORM_URL explicitly.
   host.docker.internal is unreliable on Linux. We discover the Docker bridge
   gateway IP and pass it as PLATFORM_URL so workspace containers can reach
   the host-bound platform.

4. Bind platform to 0.0.0.0 explicitly.
   Without BIND_ADDR, dev mode defaults to 127.0.0.1, making the platform
   unreachable from Docker containers.

5. Add verify-platform-reachability step and workspace log dump on failure.
   Provides diagnostics for future flakes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 10:05:20 +00:00
+92 -5
View File
@@ -78,6 +78,12 @@ jobs:
# even if the runner's $GITHUB_ENV propagation is flaky (#2468 RCA).
MOLECULE_ENV: development
SECRETS_ENCRYPTION_KEY: lpe2e-test-encryption-key-32bytes!!
# act_runner runs the job inside a Docker container, so /.dockerenv exists
# and the platform auto-detects platformInDocker=true. But the job container
# is NOT on molecule-core-net, so it cannot resolve workspace container
# hostnames (ws-<id>:8000). Force false so the proxy keeps using the
# host-mapped 127.0.0.1:<ephemeral_port> URL, which IS reachable.
MOLECULE_IN_DOCKER: false
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- uses: actions/setup-go@40f1582b2485089dde7abd97c1529aa768e1baff # v5
@@ -132,7 +138,29 @@ jobs:
# jobs or stale processes from prior cancelled runs (see #2450).
PORT=$(python3 -c "import socket; s=socket.socket(); s.bind(('', 0)); print(s.getsockname()[1]); s.close()")
echo "PORT=${PORT}" >> "$GITHUB_ENV"
echo "BASE=http://localhost:${PORT}" >> "$GITHUB_ENV"
echo "BASE=http://127.0.0.1:${PORT}" >> "$GITHUB_ENV"
# Discover an IP that Docker containers can use to reach the host platform.
# host.docker.internal is not reliably available on Linux (act_runner), so
# workspace containers cannot resolve it and fail to register/heartbeat.
# Workspace containers join molecule-core-net; the host is reachable via that
# network's gateway. Ensure the network exists first (the provisioner creates
# it lazily, but we need the gateway BEFORE starting the platform).
docker network inspect molecule-core-net >/dev/null 2>&1 || docker network create molecule-core-net >/dev/null
# Parse Gateway from raw JSON because --format '{{.IPAM.Config}}' is
# inconsistent across Docker versions (sometimes omits Gateway field).
PLATFORM_HOST_IP=$(docker network inspect molecule-core-net 2>/dev/null | sed -n 's/.*"Gateway": "\([^"]*\)".*/\1/p' | head -1)
if [ -z "$PLATFORM_HOST_IP" ]; then
PLATFORM_HOST_IP=$(docker network inspect bridge 2>/dev/null | sed -n 's/.*"Gateway": "\([^"]*\)".*/\1/p' | head -1)
fi
if [ -z "$PLATFORM_HOST_IP" ]; then
PLATFORM_HOST_IP=$(ip route | awk '/default/ {print $3}' | head -1 || true)
fi
if [ -z "$PLATFORM_HOST_IP" ]; then
echo "::error::Could not determine PLATFORM_HOST_IP for Docker containers to reach the platform"
exit 1
fi
echo "PLATFORM_HOST_IP=${PLATFORM_HOST_IP}"
echo "PLATFORM_URL=http://${PLATFORM_HOST_IP}:${PORT}" >> "$GITHUB_ENV"
# Deterministic admin token: the script sends MOLECULE_ADMIN_TOKEN as the
# bearer; the platform checks ADMIN_TOKEN. Set both to the same value.
T="lpe2e-admin-${{ github.run_id }}-${{ github.run_attempt }}"
@@ -173,8 +201,10 @@ jobs:
run: |
# Bind to the dynamically allocated port (see #2450).
# DATABASE_URL/REDIS_URL/ADMIN_TOKEN/MOLECULE_ENV are inherited from
# $GITHUB_ENV.
PORT=$PORT ./platform-server > platform.log 2>&1 &
# $GITHUB_ENV. PLATFORM_URL is also passed explicitly because
# $GITHUB_ENV propagation can be flaky on act_runner (#2468 RCA).
echo "starting platform with PLATFORM_URL=${PLATFORM_URL:-<fallback>} PORT=$PORT BIND_ADDR=0.0.0.0"
PORT=$PORT BIND_ADDR=0.0.0.0 PLATFORM_URL="${PLATFORM_URL:-http://host.docker.internal:$PORT}" ./platform-server > platform.log 2>&1 &
echo $! > platform.pid
- name: Wait for /health (+ migrations applied)
@@ -198,6 +228,11 @@ jobs:
sleep 1
done
- name: Verify platform reachable from molecule-core-net
run: |
echo "Testing platform reachability from molecule-core-net container..."
docker run --rm --network molecule-core-net alpine:latest sh -c "wget -qO- http://${PLATFORM_URL#http://}/health" || echo "WARN: platform not reachable from molecule-core-net"
- name: Run local-provision lifecycle E2E (stub — REQUIRED)
run: bash tests/e2e/test_local_provision_lifecycle_e2e.sh
@@ -205,6 +240,15 @@ jobs:
if: failure()
run: cat workspace-server/platform.log || true
- name: Dump workspace container logs on failure
if: failure()
run: |
WS_NAME=$(docker ps --filter "name=ws-" --format '{{.Names}}' | head -1 || true)
if [ -n "$WS_NAME" ]; then
echo "=== Workspace container logs for $WS_NAME ==="
docker logs "$WS_NAME" 2>&1 | tail -n 80 || true
fi
- name: Stop platform
if: always()
run: |
@@ -248,6 +292,12 @@ jobs:
# even if the runner's $GITHUB_ENV propagation is flaky (#2468 RCA).
MOLECULE_ENV: development
SECRETS_ENCRYPTION_KEY: lpe2e-test-encryption-key-32bytes!!
# act_runner runs the job inside a Docker container, so /.dockerenv exists
# and the platform auto-detects platformInDocker=true. But the job container
# is NOT on molecule-core-net, so it cannot resolve workspace container
# hostnames (ws-<id>:8000). Force false so the proxy keeps using the
# host-mapped 127.0.0.1:<ephemeral_port> URL, which IS reachable.
MOLECULE_IN_DOCKER: false
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- uses: actions/setup-go@40f1582b2485089dde7abd97c1529aa768e1baff # v5
@@ -297,7 +347,29 @@ jobs:
# jobs or stale processes from prior cancelled runs (see #2450).
PORT=$(python3 -c "import socket; s=socket.socket(); s.bind(('', 0)); print(s.getsockname()[1]); s.close()")
echo "PORT=${PORT}" >> "$GITHUB_ENV"
echo "BASE=http://localhost:${PORT}" >> "$GITHUB_ENV"
echo "BASE=http://127.0.0.1:${PORT}" >> "$GITHUB_ENV"
# Discover an IP that Docker containers can use to reach the host platform.
# host.docker.internal is not reliably available on Linux (act_runner), so
# workspace containers cannot resolve it and fail to register/heartbeat.
# Workspace containers join molecule-core-net; the host is reachable via that
# network's gateway. Ensure the network exists first (the provisioner creates
# it lazily, but we need the gateway BEFORE starting the platform).
docker network inspect molecule-core-net >/dev/null 2>&1 || docker network create molecule-core-net >/dev/null
# Parse Gateway from raw JSON because --format '{{.IPAM.Config}}' is
# inconsistent across Docker versions (sometimes omits Gateway field).
PLATFORM_HOST_IP=$(docker network inspect molecule-core-net 2>/dev/null | sed -n 's/.*"Gateway": "\([^"]*\)".*/\1/p' | head -1)
if [ -z "$PLATFORM_HOST_IP" ]; then
PLATFORM_HOST_IP=$(docker network inspect bridge 2>/dev/null | sed -n 's/.*"Gateway": "\([^"]*\)".*/\1/p' | head -1)
fi
if [ -z "$PLATFORM_HOST_IP" ]; then
PLATFORM_HOST_IP=$(ip route | awk '/default/ {print $3}' | head -1 || true)
fi
if [ -z "$PLATFORM_HOST_IP" ]; then
echo "::error::Could not determine PLATFORM_HOST_IP for Docker containers to reach the platform"
exit 1
fi
echo "PLATFORM_HOST_IP=${PLATFORM_HOST_IP}"
echo "PLATFORM_URL=http://${PLATFORM_HOST_IP}:${PORT}" >> "$GITHUB_ENV"
T="lpe2e-real-admin-${{ github.run_id }}-${{ github.run_attempt }}"
echo "ADMIN_TOKEN=${T}" >> "$GITHUB_ENV"
echo "MOLECULE_ADMIN_TOKEN=${T}" >> "$GITHUB_ENV"
@@ -329,7 +401,8 @@ jobs:
- name: Start platform (background)
working-directory: workspace-server
run: |
PORT=$PORT ./platform-server > platform.log 2>&1 &
echo "starting platform with PLATFORM_URL=${PLATFORM_URL:-<fallback>} PORT=$PORT BIND_ADDR=0.0.0.0"
PORT=$PORT BIND_ADDR=0.0.0.0 PLATFORM_URL="${PLATFORM_URL:-http://host.docker.internal:$PORT}" ./platform-server > platform.log 2>&1 &
echo $! > platform.pid
- name: Wait for /health (+ migrations applied)
@@ -351,6 +424,11 @@ jobs:
sleep 1
done
- name: Verify platform reachable from molecule-core-net
run: |
echo "Testing platform reachability from molecule-core-net container..."
docker run --rm --network molecule-core-net alpine:latest sh -c "wget -qO- http://${PLATFORM_URL#http://}/health" || echo "WARN: platform not reachable from molecule-core-net"
- name: Run local-provision lifecycle E2E (real image + MiniMax LLM — ADVISORY)
env:
# LIFECYCLE_LLM=minimax: provision the REAL claude-code template image
@@ -375,6 +453,15 @@ jobs:
if: failure()
run: cat workspace-server/platform.log || true
- name: Dump workspace container logs on failure
if: failure()
run: |
WS_NAME=$(docker ps --filter "name=ws-" --format '{{.Names}}' | head -1 || true)
if [ -n "$WS_NAME" ]; then
echo "=== Workspace container logs for $WS_NAME ==="
docker logs "$WS_NAME" 2>&1 | tail -n 80 || true
fi
- name: Stop platform
if: always()
run: |