Three changes that bring the local harness from "covers what staging covers minus the SaaS topology" to "exercises every surface we shipped this session against the prod-shape Dockerfile.tenant image." 1. Drop the /etc/hosts requirement. Replays previously needed `127.0.0.1 harness-tenant.localhost` in /etc/hosts to resolve the cf-proxy. That gated the harness behind a sudo step on every fresh dev box and CI runner. The cf-proxy nginx already routes by Host header (matches production CF tunnel: URL is public, Host carries tenant identity), so the no-sudo path is to target loopback :8080 with `Host: harness-tenant.localhost` set as a header. New `tests/harness/_curl.sh` centralises this — curl_anon / curl_admin / curl_workspace / psql_exec wrappers all set the Host + auth headers automatically. seed.sh, peer-discovery-404.sh, buildinfo-stale-image.sh updated to source it. Legacy /etc/hosts users still work via env-var override. 2. Fix the seed.sh FK regression that blocked DB-side replays. POST /workspaces ignores any `id` in the request body and generates one server-side. seed.sh was minting client-side UUIDs that never reached the workspaces table, so any replay that INSERTed into activity_logs (FK-constrained on workspace_id) failed with the workspace-not-found error. Capture the returned id from the response instead. 3. Two new replays cover the surfaces shipped this session. chat-history.sh — exercises the full SaaS-shape wire that PR #2472 (peer_id filter), #2474 (chat_history client tool), and #2476 (before_ts paging) ride on. 8 phases / 16 assertions: peer_id filter, limit cap, before_ts paging, OR-clause covering both source_id and target_id, malformed peer_id 400, malformed before_ts 400, URL-encoded SQLi-shape rejection. Verified PASS against the live harness. channel-envelope-trust-boundary.sh — exercises PR #2471 + #2481 by importing from `molecule_runtime.*` (the wheel-rewritten path) so it catches "wheel build dropped a fix that unit tests still pass." 5 phases / 11 assertions: malicious peer_id scrubbed from envelope, agent_card_url omitted on validation failure, XML-injection bytes scrubbed, valid UUID preserved, _agent_card_url_for direct gate. Verified PASS against published wheel 0.1.79. run-all-replays.sh auto-discovers — no registration needed. Full lifecycle (boot → seed → 4 replays → teardown) runs clean. Roadmap section updated to reflect Phase 1 (this PR) → Phase 2 (multi-tenant + CI gate) → Phase 3 (real CP) → Phase 4 (Miniflare + LocalStack + traffic replay). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6.2 KiB
Production-shape local harness
The harness brings up the SaaS tenant topology on localhost using the
same Dockerfile.tenant image that ships to production. Tests target
the cf-proxy on http://localhost:8080 and pass the tenant identity
via a Host: harness-tenant.localhost header — exactly the way
production CF tunnel routes by Host header. The cf-proxy nginx then
rewrites headers and proxies to the tenant container, exercising the
SAME code path a real tenant takes including TenantGuard middleware,
the /cp/* reverse proxy, the canvas reverse proxy, and a
Cloudflare-tunnel-shape header rewrite layer.
tests/harness/_curl.sh is the helper sourced by every replay —
provides curl_anon, curl_admin, curl_workspace, and psql_exec
wrappers that set the right Host + auth headers automatically. New
replays should source it rather than rolling their own curl.
Why this exists
Local go run ./cmd/server skips:
TenantGuardmiddleware (noMOLECULE_ORG_IDenv)/cp/*reverse proxy mount (noCP_UPSTREAM_URLenv)CANVAS_PROXY_URL(canvas runs separately on:3000)- Header rewrites that production's CF tunnel + LB perform
- Strict-auth mode (no live
ADMIN_TOKEN)
Bugs that survive go run and ship to production almost always live
in one of those layers. The harness activates ALL of them.
Topology
client
↓
cf-proxy nginx, mirrors CF tunnel header rewrites
↓ (Host:harness-tenant.localhost, X-Forwarded-*)
tenant workspace-server/Dockerfile.tenant — same image as prod
↓ (CP_UPSTREAM_URL=http://cp-stub:9090, /cp/* proxied)
cp-stub minimal Go service, mocks CP wire surface
postgres same version as production
redis same version as production
Quickstart
cd tests/harness
./up.sh # builds + starts all services
./seed.sh # mints admin token, registers two sample workspaces
./replays/peer-discovery-404.sh
./replays/buildinfo-stale-image.sh
./down.sh # tear down + remove volumes
To run every replay in one shot (boot, seed, run-all, teardown):
cd tests/harness
./run-all-replays.sh # full lifecycle; non-zero exit if any replay fails
KEEP_UP=1 ./run-all-replays.sh # leave harness up for debugging
REBUILD=1 ./run-all-replays.sh # rebuild images before booting
No /etc/hosts edit required — replays use the cf-proxy's loopback
port and pass Host: harness-tenant.localhost as a header (_curl.sh
handles this automatically). This matches how production CF tunnel
routes: the URL is the public CF endpoint, the Host header carries the
per-tenant identity. Quick check:
curl -H "Host: harness-tenant.localhost" http://localhost:8080/health
(If you have a legacy /etc/hosts entry from older docs, it still
works — BASE and TENANT_HOST both honor env-var overrides.)
Replay scripts
Each replay script reproduces a real bug class against the harness so fixes can be verified locally before deploy. The bar for adding a replay is "this bug shipped to production despite local E2E being green" — the script becomes the regression gate that closes that gap.
| Replay | Closes | What it proves |
|---|---|---|
peer-discovery-404.sh |
#2397 | tool_list_peers surfaces the actual reason instead of "may be isolated" |
buildinfo-stale-image.sh |
#2395 | GIT_SHA reaches the binary; verify-step comparison logic works |
chat-history.sh |
#2472 + #2474 + #2476 | peer_id filter (incl. OR over source/target) + before_ts paging + UUID/RFC3339 trust boundary on the activity route |
channel-envelope-trust-boundary.sh |
#2471 + #2481 | published wheel scrubs malformed peer_id from the channel envelope and from agent_card_url (path-traversal + XML-attr injection) |
To add a new replay:
- Drop a script under
replays/named after the issue. - The script's purpose: reproduce the production failure mode against the harness, then assert the fix is present. PASS criterion is the post-fix behavior.
- The
run-all-replays.shrunner picks up everyreplays/*.shscript automatically — no per-replay registration needed.
Extending the cp-stub
cp-stub/main.go serves the minimum surface for the existing replays
plus a catch-all that returns 501 + a clear message when the tenant
asks for a route the stub doesn't implement. To add a new CP route:
- Add a
mux.HandleFuncincp-stub/main.gofor the path. - Return the same wire shape the real CP returns. The contract is "wire compatibility with the staging CP at the time of writing" — document it with a comment pointing at the real CP handler.
- Add a replay script that exercises the path.
What the harness does NOT cover
- Real TLS / cert handling (CF terminates TLS in production; harness is HTTP-only).
- Cloudflare API edge cases (rate limits, DNS propagation timing).
- Real EC2 / SSM / EBS behavior (image-cache replay simulates the outcome but not the AWS API surface).
- Cross-region or multi-AZ topology.
- Real production data scale.
These are intentional Phase 1 limits. If a bug class hits one of these gaps, escalate to staging E2E rather than expanding the harness past its mandate of "exercise the tenant binary in production-shape topology."
Roadmap
- Phase 1 (shipped): harness + cp-stub + cf-proxy + 4 replays +
run-all-replays.shrunner. No-sudoHost-header path via_curl.sh. Per-replay psql seeding for tests that need DB-side fixtures. - Phase 2 (in flight): multi-tenant — second
tenant-betaservice in compose, second Postgres database, replays for cross-tenant A2A + TenantGuard isolation. Converttests/e2e/test_api.shto target the harness instead of localhost. Make harness-based E2E a required CI check (a workflow that invokesrun-all-replays.shon every PR via the self-hosted Mac runner). - Phase 3: replace
cp-stub/with the realmolecule-controlplaneDocker build. Add a config-coherence lint that diffs harness env list against production CP's env list and fails CI on drift. - Phase 4 (long-term): Miniflare in front of cf-proxy for real CF emulation (WAF, BotID, rate-limit, cf-tunnel headers). LocalStack for the EC2 provisioner. Anonymized prod-traffic recording/replay for SaaS-scale regression detection.