Replaces the hardcoded base64 sentinel (630dd0da) with a per-run
generation in up.sh, exported into compose's interpolation environment.
Why:
- Hardcoding a 32-byte base64 string in the repo, even one labelled
"test-only", sets a bad muscle-memory pattern. The next agent or
contributor copies the shape into another harness — or worse, into a
staging .env — and the test-only sentinel turns into something
someone treats as a real key.
- Secret scanners flag key-shaped values regardless of the surrounding
comment claiming intent. Avoiding the literal entirely sidesteps the
false-positive.
- A fresh key per harness lifetime more closely mimics prod's
per-tenant isolation, exercising the same code paths without any
pretense of stable encrypted-data fixtures (which the harness wipes
on every ./down.sh anyway).
Implementation:
- up.sh: `openssl rand -base64 32` if SECRETS_ENCRYPTION_KEY isn't
already set in the caller's env. Honoring a pre-set value lets a
debug session pin a key for reproducibility (e.g. when investigating
encrypted-row corruption).
- compose.yml: `${SECRETS_ENCRYPTION_KEY:?…}` makes a misuse loud —
running `docker compose up` directly bypassing up.sh fails fast with
a clear error pointing at the right entry point, rather than a 100s
unhealthy-tenant timeout.
Both paths verified via `docker compose config`:
- with key exported: value interpolates cleanly
- without it: "required variable SECRETS_ENCRYPTION_KEY is missing a
value: must be set — run via tests/harness/up.sh, which generates
one per run"
The harness brings up the SaaS tenant topology on localhost using the
SAME workspace-server/Dockerfile.tenant image that ships to production.
Tests run against http://harness-tenant.localhost:8080 and exercise the
same code path a real tenant takes:
client
→ cf-proxy (nginx; CF tunnel + LB header rewrites)
→ tenant (Dockerfile.tenant — combined platform + canvas)
→ cp-stub (minimal Go CP stand-in for /cp/* paths)
→ postgres + redis
Why this exists: bugs that survive `go run ./cmd/server` and ship to
prod almost always live in env-gated middleware (TenantGuard, /cp/*
proxy, canvas proxy), header rewrites, or the strict-auth / live-token
mode. The harness activates ALL of them locally so #2395 + #2397-class
bugs can be reproduced before deploy.
Phase 1 surface:
- cp-stub/main.go: minimal CP stand-in. /cp/auth/me, redeploy-fleet,
/__stub/{peers,mode,state} for replay scripts. Catch-all returns
501 with a clear message when a new CP route appears.
- cf-proxy/nginx.conf: rewrites Host to <slug>.localhost, injects
X-Forwarded-*, disables buffering to mirror CF tunnel streaming
semantics.
- compose.yml: one service per topology layer; tenant builds from
the actual production Dockerfile.tenant.
- up.sh / down.sh / seed.sh: lifecycle scripts.
- replays/peer-discovery-404.sh: reproduces #2397 + asserts the
diagnostic helper from PR #2399 surfaces "404" + "registered".
- replays/buildinfo-stale-image.sh: reproduces #2395 + asserts
/buildinfo wire shape + GIT_SHA injection from PR #2398.
- README.md: topology, quickstart, what the harness does NOT cover.
Phases 2-3 (separate PRs):
- Phase 2: convert tests/e2e/test_api.sh to target the harness URL
instead of localhost; make harness-based replays a required CI gate.
- Phase 3: config-coherence lint that diffs harness env list against
production CP's env list, fails CI on drift.
Verification:
- cp-stub builds (go build ./...).
- cp-stub responds to all stubbed endpoints (smoke-tested locally).
- compose.yml passes `docker compose config --quiet`.
- All shell scripts pass `bash -n` syntax check.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>