molecule-core/tests/harness
Hongming Wang 046eccbb7c fix(harness): five-axis self-review fixes before merge
Three findings from re-reviewing PR #2401 with fresh eyes:

1. Critical — port binding to 0.0.0.0
   compose.yml's cf-proxy bound 8080:8080 (default 0.0.0.0). The harness
   uses a hardcoded ADMIN_TOKEN so anyone on the local network or VPN
   could hit /workspaces with admin privileges. Switch to 127.0.0.1:8080
   so admin access is loopback-only — safe for E2E and prevents the
   known-token leak.

2. Required — dead code in cp-stub
   peersFailureMode + __stub/mode + __stub/peers were declared with
   atomic.Value setters but no handler ever READ from them. CP doesn't
   host /registry/peers (the tenant does), so the toggles couldn't
   drive responses. Removed the dead vars + handlers; kept
   redeployFleetCalls counter and __stub/state since those have a real
   consumer in the buildinfo replay.

3. Required — replay's auth-context dependency
   peer-discovery-404.sh's Python eval ran a2a_client.get_peers_with_
   diagnostic() against the live tenant. Without a workspace token
   file, auth_headers() yields empty headers — so the helper might
   exercise a 401 branch instead of the 404 branch the replay claims
   to test.

   Split the assertion into (a) WIRE — direct curl proves the platform
   returns 404 from /registry/<unregistered>/peers — and (b) PARSE —
   feed the helper a mocked 404 via httpx patches, no network/auth.
   Each branch tests exactly what it claims.

   Also added a graceful skip when the workspace runtime in the
   current checkout pre-dates #2399 (no get_peers_with_diagnostic
   yet) — replay falls back to wire-only verification with a clear
   message instead of an opaque AttributeError. After #2399 lands on
   staging, both branches will run.

cp-stub still builds clean. compose.yml validates. Replay's bash
syntax + Python eval both verified locally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 11:32:40 -07:00
..
cf-proxy feat(tests): add production-shape local harness (Phase 1) 2026-04-30 11:22:46 -07:00
cp-stub fix(harness): five-axis self-review fixes before merge 2026-04-30 11:32:40 -07:00
replays fix(harness): five-axis self-review fixes before merge 2026-04-30 11:32:40 -07:00
compose.yml fix(harness): five-axis self-review fixes before merge 2026-04-30 11:32:40 -07:00
down.sh feat(tests): add production-shape local harness (Phase 1) 2026-04-30 11:22:46 -07:00
README.md feat(tests): add production-shape local harness (Phase 1) 2026-04-30 11:22:46 -07:00
seed.sh feat(tests): add production-shape local harness (Phase 1) 2026-04-30 11:22:46 -07:00
up.sh feat(tests): add production-shape local harness (Phase 1) 2026-04-30 11:22:46 -07:00

Production-shape local harness

The harness brings up the SaaS tenant topology on localhost using the same Dockerfile.tenant image that ships to production. Tests run against http://harness-tenant.localhost:8080 and exercise the SAME code path a real tenant takes — including TenantGuard middleware, the /cp/* reverse proxy, the canvas reverse proxy, and a Cloudflare-tunnel-shape header rewrite layer.

Why this exists

Local go run ./cmd/server skips:

  • TenantGuard middleware (no MOLECULE_ORG_ID env)
  • /cp/* reverse proxy mount (no CP_UPSTREAM_URL env)
  • CANVAS_PROXY_URL (canvas runs separately on :3000)
  • Header rewrites that production's CF tunnel + LB perform
  • Strict-auth mode (no live ADMIN_TOKEN)

Bugs that survive go run and ship to production almost always live in one of those layers. The harness activates ALL of them.

Topology

client
  ↓
cf-proxy        nginx, mirrors CF tunnel header rewrites
  ↓ (Host:harness-tenant.localhost, X-Forwarded-*)
tenant          workspace-server/Dockerfile.tenant — same image as prod
  ↓ (CP_UPSTREAM_URL=http://cp-stub:9090, /cp/* proxied)
cp-stub         minimal Go service, mocks CP wire surface
postgres        same version as production
redis           same version as production

Quickstart

cd tests/harness
./up.sh                 # builds + starts all services
./seed.sh               # mints admin token, registers two sample workspaces
./replays/peer-discovery-404.sh
./replays/buildinfo-stale-image.sh
./down.sh               # tear down + remove volumes

First-time setup needs an /etc/hosts entry so harness-tenant.localhost resolves to the local cf-proxy:

echo "127.0.0.1 harness-tenant.localhost" | sudo tee -a /etc/hosts

(macOS resolves *.localhost automatically in some setups; Linux typically does not.)

Replay scripts

Each replay script reproduces a real bug class against the harness so fixes can be verified locally before deploy. The bar for adding a replay is "this bug shipped to production despite local E2E being green" — the script becomes the regression gate that closes that gap.

Replay Closes What it proves
peer-discovery-404.sh #2397 tool_list_peers surfaces the actual reason instead of "may be isolated"
buildinfo-stale-image.sh #2395 GIT_SHA reaches the binary; verify-step comparison logic works

To add a new replay:

  1. Drop a script under replays/ named after the issue.
  2. The script's purpose: reproduce the production failure mode against the harness, then assert the fix is present. PASS criterion is the post-fix behavior.
  3. Wire it into the tests/harness/run-all-replays.sh runner (TODO, Phase 2).

Extending the cp-stub

cp-stub/main.go serves the minimum surface for the existing replays plus a catch-all that returns 501 + a clear message when the tenant asks for a route the stub doesn't implement. To add a new CP route:

  1. Add a mux.HandleFunc in cp-stub/main.go for the path.
  2. Return the same wire shape the real CP returns. The contract is "wire compatibility with the staging CP at the time of writing" — document it with a comment pointing at the real CP handler.
  3. Add a replay script that exercises the path.

What the harness does NOT cover

  • Real TLS / cert handling (CF terminates TLS in production; harness is HTTP-only).
  • Cloudflare API edge cases (rate limits, DNS propagation timing).
  • Real EC2 / SSM / EBS behavior (image-cache replay simulates the outcome but not the AWS API surface).
  • Cross-region or multi-AZ topology.
  • Real production data scale.

These are intentional Phase 1 limits. If a bug class hits one of these gaps, escalate to staging E2E rather than expanding the harness past its mandate of "exercise the tenant binary in production-shape topology."

Roadmap

  • Phase 1 (this PR): harness + cp-stub + cf-proxy + 2 replays.
  • Phase 2: convert tests/e2e/test_api.sh to run against the harness instead of localhost. Make harness-based E2E a required CI check.
  • Phase 3: config-coherence lint that diffs harness env list against production CP's env list, fails CI on drift.