molecule-core/tests/harness/down.sh
Hongming Wang c275716005 harness(phase-2): multi-tenant compose + cross-tenant isolation replays
Brings the local harness from "single tenant covering the request path"
to "two tenants covering both the request path AND the per-tenant
isolation boundary" — the same shape production runs (one EC2 + one
Postgres + one MOLECULE_ORG_ID per tenant).

Why this matters: the four prior replays exercise the SaaS request
path against one tenant. They cannot prove that TenantGuard rejects
a misrouted request (production CF tunnel + AWS LB are the failure
surface), nor that two tenants doing legitimate work in parallel
keep their `activity_logs` / `workspaces` / connection-pool state
partitioned. Both are real bug classes — TenantGuard allowlist drift
shipped #2398, lib/pq prepared-statement cache collision is documented
as an org-wide hazard.

What changed:

1. compose.yml — split into two tenants.
   tenant-alpha + postgres-alpha + tenant-beta + postgres-beta + the
   shared cp-stub, redis, cf-proxy. Each tenant gets a distinct
   ADMIN_TOKEN + MOLECULE_ORG_ID and its own Postgres database. cf-proxy
   depends on both tenants becoming healthy.

2. cf-proxy/nginx.conf — Host-header → tenant routing.
   `map $host $tenant_upstream` resolves the right backend per request.
   Required `resolver 127.0.0.11 valid=30s ipv6=off;` because nginx
   needs an explicit DNS resolver to use a variable in `proxy_pass`
   (literal hostnames resolve once at startup; variables resolve per
   request — without the resolver nginx fails closed with 502).
   `server_name` lists both tenants + the legacy alias so unknown Host
   headers don't silently route to a default and mask routing bugs.

3. _curl.sh — per-tenant + cross-tenant-negative helpers.
   `curl_alpha_admin` / `curl_beta_admin` set the right
   Host + Authorization + X-Molecule-Org-Id triple.
   `curl_alpha_creds_at_beta` / `curl_beta_creds_at_alpha` exist
   precisely to make WRONG requests (replays use them to assert
   TenantGuard rejects). `psql_exec_alpha` / `psql_exec_beta` shell out
   per-tenant Postgres exec. Legacy aliases (`curl_admin`, `psql_exec`)
   keep the four pre-Phase-2 replays working without edits.

4. seed.sh — registers parent+child workspaces in BOTH tenants.
   Captures server-generated IDs via `jq -r '.id'` (POST /workspaces
   ignores body.id, so the older client-side mint silently desynced
   from the workspaces table and broke FK-dependent replays). Stashes
   `ALPHA_PARENT_ID` / `ALPHA_CHILD_ID` / `BETA_PARENT_ID` /
   `BETA_CHILD_ID` to .seed.env, plus legacy `ALPHA_ID` / `BETA_ID`
   aliases for backwards compat with chat-history / channel-envelope.

5. New replays.

   tenant-isolation.sh (13 assertions) — TenantGuard 404s any request
   whose X-Molecule-Org-Id doesn't match the container's
   MOLECULE_ORG_ID. Asserts the 404 body has zero
   tenant/org/forbidden/denied keywords (existence of a tenant must
   not be probable from the outside). Covers cross-tenant routing
   misconfigure + allowlist drift + missing-org-header.

   per-tenant-independence.sh (12 assertions) — both tenants seed
   activity_logs in parallel with distinct row counts (3 vs 5) and
   confirm each tenant's history endpoint returns exactly its own
   counts. Then a concurrent INSERT race (10 rows per tenant in
   parallel via `&` + wait) catches shared-pool corruption +
   prepared-statement cache poisoning + redis cross-keyspace bleed.

6. Bug fix: down.sh + dump-logs SECRETS_ENCRYPTION_KEY validation.
   `docker compose down -v` validates the entire compose file even
   though it doesn't read the env. up.sh generates a per-run key into
   its own shell — down.sh runs in a fresh shell that wouldn't see it,
   so without a placeholder `compose down` exited non-zero before
   removing volumes. Workspaces silently leaked into the next
   ./up.sh + seed.sh boot. Caught when tenant-isolation.sh F1/F2 saw
   3× duplicate alpha-parent rows accumulated across three prior runs.
   Same fix applied to the workflow's dump-logs step.

7. requirements.txt — pin molecule-ai-workspace-runtime>=0.1.78.
   channel-envelope-trust-boundary.sh imports from `molecule_runtime.*`
   (the wheel-rewritten path) so it catches the failure mode where
   the wheel build silently strips a fix that unit tests on local
   source still pass. CI was failing this replay because the wheel
   wasn't installed — caught in the staging push run from #2492.

8. .github/workflows/harness-replays.yml — Phase 2 plumbing.
   * Removed /etc/hosts step (Host-header path eliminated the need;
     scripts already source _curl.sh).
   * Updated dump-logs to reference the new service names
     (tenant-alpha + tenant-beta + postgres-alpha + postgres-beta).
   * Added SECRETS_ENCRYPTION_KEY placeholder env on the dump step.

Verified: ./run-all-replays.sh from a clean state — 6/6 passed
(buildinfo-stale-image, channel-envelope-trust-boundary, chat-history,
peer-discovery-404, per-tenant-independence, tenant-isolation).

Roadmap section updated: Phase 2 marked shipped. Phase 3 promoted to
"replace cp-stub with real molecule-controlplane Docker build + env
coherence lint."

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 21:36:40 -07:00

18 lines
911 B
Bash
Executable File
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

#!/usr/bin/env bash
# Tear down the harness and wipe per-tenant volumes.
#
# SECRETS_ENCRYPTION_KEY placeholder: docker compose validates the entire
# compose file even for `down -v` (a destructive read-only operation that
# doesn't read the env). up.sh generates a per-run key into its own
# shell — this script runs in a fresh shell that wouldn't see it. Without
# the placeholder, `compose down` exits non-zero before removing volumes,
# silently leaking workspaces+activity_logs into the next ./up.sh + seed.sh
# (verified 2026-05-02: tenant-isolation.sh F1/F2 saw 3× duplicate
# alpha-parent + alpha-child rows accumulated across three prior boots).
set -euo pipefail
HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
cd "$HERE"
SECRETS_ENCRYPTION_KEY="${SECRETS_ENCRYPTION_KEY:-down-placeholder}" \
docker compose -f compose.yml down -v --remove-orphans
echo "[harness] down + volumes removed."