Brings the local harness from "single tenant covering the request path"
to "two tenants covering both the request path AND the per-tenant
isolation boundary" — the same shape production runs (one EC2 + one
Postgres + one MOLECULE_ORG_ID per tenant).
Why this matters: the four prior replays exercise the SaaS request
path against one tenant. They cannot prove that TenantGuard rejects
a misrouted request (production CF tunnel + AWS LB are the failure
surface), nor that two tenants doing legitimate work in parallel
keep their `activity_logs` / `workspaces` / connection-pool state
partitioned. Both are real bug classes — TenantGuard allowlist drift
shipped #2398, lib/pq prepared-statement cache collision is documented
as an org-wide hazard.
What changed:
1. compose.yml — split into two tenants.
tenant-alpha + postgres-alpha + tenant-beta + postgres-beta + the
shared cp-stub, redis, cf-proxy. Each tenant gets a distinct
ADMIN_TOKEN + MOLECULE_ORG_ID and its own Postgres database. cf-proxy
depends on both tenants becoming healthy.
2. cf-proxy/nginx.conf — Host-header → tenant routing.
`map $host $tenant_upstream` resolves the right backend per request.
Required `resolver 127.0.0.11 valid=30s ipv6=off;` because nginx
needs an explicit DNS resolver to use a variable in `proxy_pass`
(literal hostnames resolve once at startup; variables resolve per
request — without the resolver nginx fails closed with 502).
`server_name` lists both tenants + the legacy alias so unknown Host
headers don't silently route to a default and mask routing bugs.
3. _curl.sh — per-tenant + cross-tenant-negative helpers.
`curl_alpha_admin` / `curl_beta_admin` set the right
Host + Authorization + X-Molecule-Org-Id triple.
`curl_alpha_creds_at_beta` / `curl_beta_creds_at_alpha` exist
precisely to make WRONG requests (replays use them to assert
TenantGuard rejects). `psql_exec_alpha` / `psql_exec_beta` shell out
per-tenant Postgres exec. Legacy aliases (`curl_admin`, `psql_exec`)
keep the four pre-Phase-2 replays working without edits.
4. seed.sh — registers parent+child workspaces in BOTH tenants.
Captures server-generated IDs via `jq -r '.id'` (POST /workspaces
ignores body.id, so the older client-side mint silently desynced
from the workspaces table and broke FK-dependent replays). Stashes
`ALPHA_PARENT_ID` / `ALPHA_CHILD_ID` / `BETA_PARENT_ID` /
`BETA_CHILD_ID` to .seed.env, plus legacy `ALPHA_ID` / `BETA_ID`
aliases for backwards compat with chat-history / channel-envelope.
5. New replays.
tenant-isolation.sh (13 assertions) — TenantGuard 404s any request
whose X-Molecule-Org-Id doesn't match the container's
MOLECULE_ORG_ID. Asserts the 404 body has zero
tenant/org/forbidden/denied keywords (existence of a tenant must
not be probable from the outside). Covers cross-tenant routing
misconfigure + allowlist drift + missing-org-header.
per-tenant-independence.sh (12 assertions) — both tenants seed
activity_logs in parallel with distinct row counts (3 vs 5) and
confirm each tenant's history endpoint returns exactly its own
counts. Then a concurrent INSERT race (10 rows per tenant in
parallel via `&` + wait) catches shared-pool corruption +
prepared-statement cache poisoning + redis cross-keyspace bleed.
6. Bug fix: down.sh + dump-logs SECRETS_ENCRYPTION_KEY validation.
`docker compose down -v` validates the entire compose file even
though it doesn't read the env. up.sh generates a per-run key into
its own shell — down.sh runs in a fresh shell that wouldn't see it,
so without a placeholder `compose down` exited non-zero before
removing volumes. Workspaces silently leaked into the next
./up.sh + seed.sh boot. Caught when tenant-isolation.sh F1/F2 saw
3× duplicate alpha-parent rows accumulated across three prior runs.
Same fix applied to the workflow's dump-logs step.
7. requirements.txt — pin molecule-ai-workspace-runtime>=0.1.78.
channel-envelope-trust-boundary.sh imports from `molecule_runtime.*`
(the wheel-rewritten path) so it catches the failure mode where
the wheel build silently strips a fix that unit tests on local
source still pass. CI was failing this replay because the wheel
wasn't installed — caught in the staging push run from #2492.
8. .github/workflows/harness-replays.yml — Phase 2 plumbing.
* Removed /etc/hosts step (Host-header path eliminated the need;
scripts already source _curl.sh).
* Updated dump-logs to reference the new service names
(tenant-alpha + tenant-beta + postgres-alpha + postgres-beta).
* Added SECRETS_ENCRYPTION_KEY placeholder env on the dump step.
Verified: ./run-all-replays.sh from a clean state — 6/6 passed
(buildinfo-stale-image, channel-envelope-trust-boundary, chat-history,
peer-discovery-404, per-tenant-independence, tenant-isolation).
Roadmap section updated: Phase 2 marked shipped. Phase 3 promoted to
"replace cp-stub with real molecule-controlplane Docker build + env
coherence lint."
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>