Brings the local harness from "single tenant covering the request path" to "two tenants covering both the request path AND the per-tenant isolation boundary" — the same shape production runs (one EC2 + one Postgres + one MOLECULE_ORG_ID per tenant). Why this matters: the four prior replays exercise the SaaS request path against one tenant. They cannot prove that TenantGuard rejects a misrouted request (production CF tunnel + AWS LB are the failure surface), nor that two tenants doing legitimate work in parallel keep their `activity_logs` / `workspaces` / connection-pool state partitioned. Both are real bug classes — TenantGuard allowlist drift shipped #2398, lib/pq prepared-statement cache collision is documented as an org-wide hazard. What changed: 1. compose.yml — split into two tenants. tenant-alpha + postgres-alpha + tenant-beta + postgres-beta + the shared cp-stub, redis, cf-proxy. Each tenant gets a distinct ADMIN_TOKEN + MOLECULE_ORG_ID and its own Postgres database. cf-proxy depends on both tenants becoming healthy. 2. cf-proxy/nginx.conf — Host-header → tenant routing. `map $host $tenant_upstream` resolves the right backend per request. Required `resolver 127.0.0.11 valid=30s ipv6=off;` because nginx needs an explicit DNS resolver to use a variable in `proxy_pass` (literal hostnames resolve once at startup; variables resolve per request — without the resolver nginx fails closed with 502). `server_name` lists both tenants + the legacy alias so unknown Host headers don't silently route to a default and mask routing bugs. 3. _curl.sh — per-tenant + cross-tenant-negative helpers. `curl_alpha_admin` / `curl_beta_admin` set the right Host + Authorization + X-Molecule-Org-Id triple. `curl_alpha_creds_at_beta` / `curl_beta_creds_at_alpha` exist precisely to make WRONG requests (replays use them to assert TenantGuard rejects). `psql_exec_alpha` / `psql_exec_beta` shell out per-tenant Postgres exec. Legacy aliases (`curl_admin`, `psql_exec`) keep the four pre-Phase-2 replays working without edits. 4. seed.sh — registers parent+child workspaces in BOTH tenants. Captures server-generated IDs via `jq -r '.id'` (POST /workspaces ignores body.id, so the older client-side mint silently desynced from the workspaces table and broke FK-dependent replays). Stashes `ALPHA_PARENT_ID` / `ALPHA_CHILD_ID` / `BETA_PARENT_ID` / `BETA_CHILD_ID` to .seed.env, plus legacy `ALPHA_ID` / `BETA_ID` aliases for backwards compat with chat-history / channel-envelope. 5. New replays. tenant-isolation.sh (13 assertions) — TenantGuard 404s any request whose X-Molecule-Org-Id doesn't match the container's MOLECULE_ORG_ID. Asserts the 404 body has zero tenant/org/forbidden/denied keywords (existence of a tenant must not be probable from the outside). Covers cross-tenant routing misconfigure + allowlist drift + missing-org-header. per-tenant-independence.sh (12 assertions) — both tenants seed activity_logs in parallel with distinct row counts (3 vs 5) and confirm each tenant's history endpoint returns exactly its own counts. Then a concurrent INSERT race (10 rows per tenant in parallel via `&` + wait) catches shared-pool corruption + prepared-statement cache poisoning + redis cross-keyspace bleed. 6. Bug fix: down.sh + dump-logs SECRETS_ENCRYPTION_KEY validation. `docker compose down -v` validates the entire compose file even though it doesn't read the env. up.sh generates a per-run key into its own shell — down.sh runs in a fresh shell that wouldn't see it, so without a placeholder `compose down` exited non-zero before removing volumes. Workspaces silently leaked into the next ./up.sh + seed.sh boot. Caught when tenant-isolation.sh F1/F2 saw 3× duplicate alpha-parent rows accumulated across three prior runs. Same fix applied to the workflow's dump-logs step. 7. requirements.txt — pin molecule-ai-workspace-runtime>=0.1.78. channel-envelope-trust-boundary.sh imports from `molecule_runtime.*` (the wheel-rewritten path) so it catches the failure mode where the wheel build silently strips a fix that unit tests on local source still pass. CI was failing this replay because the wheel wasn't installed — caught in the staging push run from #2492. 8. .github/workflows/harness-replays.yml — Phase 2 plumbing. * Removed /etc/hosts step (Host-header path eliminated the need; scripts already source _curl.sh). * Updated dump-logs to reference the new service names (tenant-alpha + tenant-beta + postgres-alpha + postgres-beta). * Added SECRETS_ENCRYPTION_KEY placeholder env on the dump step. Verified: ./run-all-replays.sh from a clean state — 6/6 passed (buildinfo-stale-image, channel-envelope-trust-boundary, chat-history, peer-discovery-404, per-tenant-independence, tenant-isolation). Roadmap section updated: Phase 2 marked shipped. Phase 3 promoted to "replace cp-stub with real molecule-controlplane Docker build + env coherence lint." Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
174 lines
5.8 KiB
YAML
174 lines
5.8 KiB
YAML
# Production-shape harness for local E2E. Multi-tenant.
|
|
#
|
|
# Reproduces the SaaS tenant topology on localhost using the SAME
|
|
# images that ship to production:
|
|
#
|
|
# client → cf-proxy (nginx, mimics CF tunnel headers, routes by Host)
|
|
# ├─ Host: harness-tenant-alpha.localhost → tenant-alpha
|
|
# │ ↓ (CP_UPSTREAM_URL=http://cp-stub:9090)
|
|
# │ tenant-alpha (workspace-server/Dockerfile.tenant)
|
|
# │ ↓
|
|
# │ postgres-alpha (per-tenant DB, matches prod)
|
|
# ├─ Host: harness-tenant-beta.localhost → tenant-beta
|
|
# │ ↓
|
|
# │ tenant-beta + postgres-beta
|
|
# └─ cp-stub + redis (shared infra; CP is Railway-singleton in prod,
|
|
# redis is shared cluster)
|
|
#
|
|
# The two-tenant topology catches:
|
|
# - TenantGuard cross-tenant escape (alpha-org token shouldn't see
|
|
# beta-tenant data even with a valid bearer)
|
|
# - cf-proxy Host-header routing correctness
|
|
# - Per-tenant DB isolation (workspaces table, activity_logs)
|
|
# - Concurrent multi-tenant operation (no shared mutable state)
|
|
#
|
|
# Quickstart (no /etc/hosts edits — see README):
|
|
# cd tests/harness && ./up.sh && ./seed.sh
|
|
# ./replays/peer-discovery-404.sh
|
|
# ./run-all-replays.sh
|
|
#
|
|
# Env config:
|
|
# GIT_SHA — passed to BOTH tenant builds for /buildinfo verification.
|
|
# CP_STUB_PEERS_MODE — peers failure mode for replay scripts.
|
|
|
|
services:
|
|
# ─── Shared infra (matches prod: CP is Railway-singleton, redis shared) ───
|
|
redis:
|
|
image: redis:7-alpine
|
|
networks: [harness-net]
|
|
healthcheck:
|
|
test: ["CMD", "redis-cli", "ping"]
|
|
interval: 2s
|
|
timeout: 5s
|
|
retries: 10
|
|
|
|
cp-stub:
|
|
build:
|
|
context: ./cp-stub
|
|
environment:
|
|
PORT: "9090"
|
|
CP_STUB_PEERS_MODE: "${CP_STUB_PEERS_MODE:-}"
|
|
networks: [harness-net]
|
|
healthcheck:
|
|
test: ["CMD-SHELL", "wget -q -O- http://localhost:9090/healthz || exit 1"]
|
|
interval: 2s
|
|
timeout: 5s
|
|
retries: 10
|
|
|
|
# ─── Tenant alpha: postgres + workspace-server ────────────────────────
|
|
postgres-alpha:
|
|
image: postgres:16-alpine
|
|
environment:
|
|
POSTGRES_USER: harness
|
|
POSTGRES_PASSWORD: harness
|
|
POSTGRES_DB: molecule
|
|
networks: [harness-net]
|
|
healthcheck:
|
|
test: ["CMD-SHELL", "pg_isready -U harness"]
|
|
interval: 2s
|
|
timeout: 5s
|
|
retries: 10
|
|
|
|
tenant-alpha:
|
|
build:
|
|
context: ../..
|
|
dockerfile: workspace-server/Dockerfile.tenant
|
|
args:
|
|
GIT_SHA: "${GIT_SHA:-harness}"
|
|
depends_on:
|
|
postgres-alpha:
|
|
condition: service_healthy
|
|
redis:
|
|
condition: service_healthy
|
|
cp-stub:
|
|
condition: service_healthy
|
|
environment:
|
|
DATABASE_URL: "postgres://harness:harness@postgres-alpha:5432/molecule?sslmode=disable"
|
|
REDIS_URL: "redis://redis:6379"
|
|
PORT: "8080"
|
|
PLATFORM_URL: "http://tenant-alpha:8080"
|
|
MOLECULE_ENV: "production"
|
|
SECRETS_ENCRYPTION_KEY: "${SECRETS_ENCRYPTION_KEY:?must be set — run via tests/harness/up.sh, which generates one per run}"
|
|
ADMIN_TOKEN: "harness-admin-token-alpha"
|
|
MOLECULE_ORG_ID: "harness-org-alpha"
|
|
CP_UPSTREAM_URL: "http://cp-stub:9090"
|
|
RATE_LIMIT: "1000"
|
|
CANVAS_PROXY_URL: "http://localhost:3000"
|
|
networks: [harness-net]
|
|
healthcheck:
|
|
test: ["CMD-SHELL", "wget -q -O- http://localhost:8080/health || exit 1"]
|
|
interval: 5s
|
|
timeout: 5s
|
|
retries: 20
|
|
|
|
# ─── Tenant beta: postgres + workspace-server (parallel to alpha) ─────
|
|
postgres-beta:
|
|
image: postgres:16-alpine
|
|
environment:
|
|
POSTGRES_USER: harness
|
|
POSTGRES_PASSWORD: harness
|
|
POSTGRES_DB: molecule
|
|
networks: [harness-net]
|
|
healthcheck:
|
|
test: ["CMD-SHELL", "pg_isready -U harness"]
|
|
interval: 2s
|
|
timeout: 5s
|
|
retries: 10
|
|
|
|
tenant-beta:
|
|
build:
|
|
context: ../..
|
|
dockerfile: workspace-server/Dockerfile.tenant
|
|
args:
|
|
GIT_SHA: "${GIT_SHA:-harness}"
|
|
depends_on:
|
|
postgres-beta:
|
|
condition: service_healthy
|
|
redis:
|
|
condition: service_healthy
|
|
cp-stub:
|
|
condition: service_healthy
|
|
environment:
|
|
DATABASE_URL: "postgres://harness:harness@postgres-beta:5432/molecule?sslmode=disable"
|
|
REDIS_URL: "redis://redis:6379"
|
|
PORT: "8080"
|
|
PLATFORM_URL: "http://tenant-beta:8080"
|
|
MOLECULE_ENV: "production"
|
|
SECRETS_ENCRYPTION_KEY: "${SECRETS_ENCRYPTION_KEY:?must be set — run via tests/harness/up.sh, which generates one per run}"
|
|
# Distinct ADMIN_TOKEN — replays use this to verify TenantGuard
|
|
# blocks alpha-token presented at beta's URL.
|
|
ADMIN_TOKEN: "harness-admin-token-beta"
|
|
MOLECULE_ORG_ID: "harness-org-beta"
|
|
CP_UPSTREAM_URL: "http://cp-stub:9090"
|
|
RATE_LIMIT: "1000"
|
|
CANVAS_PROXY_URL: "http://localhost:3000"
|
|
networks: [harness-net]
|
|
healthcheck:
|
|
test: ["CMD-SHELL", "wget -q -O- http://localhost:8080/health || exit 1"]
|
|
interval: 5s
|
|
timeout: 5s
|
|
retries: 20
|
|
|
|
# ─── cf-proxy: routes by Host to the right tenant container ───────────
|
|
# Production shape: same single CF tunnel front-doors every tenant
|
|
# subdomain — the Host header carries the tenant identity, not the
|
|
# routing destination. Local cf-proxy mirrors this exactly.
|
|
cf-proxy:
|
|
image: nginx:1.27-alpine
|
|
depends_on:
|
|
tenant-alpha:
|
|
condition: service_healthy
|
|
tenant-beta:
|
|
condition: service_healthy
|
|
volumes:
|
|
- ./cf-proxy/nginx.conf:/etc/nginx/nginx.conf:ro
|
|
# Bind to 127.0.0.1 only — hardcoded ADMIN_TOKENs make 0.0.0.0
|
|
# exposure unsafe even on a local network.
|
|
ports:
|
|
- "127.0.0.1:8080:8080"
|
|
networks: [harness-net]
|
|
|
|
networks:
|
|
harness-net:
|
|
name: molecule-harness-net
|