The prior pass (#2291) made AdminAuth/WorkspaceAuth fail-closed but RETAINED two fail-open patterns 'as a cosmetic tradeoff'. The CTO directive 'nothing should be fail-open' is ABSOLUTE, so this pass removes them too. ZERO fail-open paths now remain anywhere in workspace-server auth. CanvasOrBearer (workspace-server/internal/middleware/wsauth_middleware.go): - DB-error fail-open (`if err != nil { log; c.Next() }`) → now 503 fail-CLOSED via abortAuthLookupError (availability tradeoff, NO access). - lazy-bootstrap fail-open (`if !hasLive { c.Next() }`) → REMOVED. A zero-token install no longer passes EVERYTHING; bootstrap is via ADMIN_TOKEN (dev-start.sh provisions it for local dev; operator/SaaS sets it in prod — local mimics production). - forgeable cross-origin Origin-match pass (canvasOriginAllowed) → REMOVED. A no-bearer request passing purely on a spoofable Origin is effectively open even for a cosmetic route. The canvas now always sends a bearer (NEXT_PUBLIC_ADMIN_TOKEN), so nothing legitimate relied on it. The non-forgeable same-origin path (isSameOriginCanvas, gated by CANVAS_PROXY_URL) is kept. Helper + its 2 unit tests removed. validateDiscoveryCaller (workspace-server/internal/handlers/discovery.go): - DB-error fail-open (`if err != nil { return nil }`) → now writes 503 and returns a non-nil error (caller already `if err != nil { return }`). Bootstrap: ADMIN_TOKEN is the first-token credential (AdminAuth accepts it); documented in docs/runbooks/admin-auth.md (fail-closed everywhere; MOLECULE_ENV no longer gates any auth decision). quickstart.md already covered this. Tests: - no_fail_open_test.go: extended with CanvasOrBearer fail-closed cases (401 zero-token, 503 DB-error). discovery_test.go: added TestPeers/Discover_AuthProbeDBError_FailsClosed (503). - Flipped the stale assertions: CanvasOrBearer NoTokens/CanvasOrigin/DBError now assert fail-closed; removed canvasOriginAllowed tests. - tests/e2e/test_dev_mode.sh: repurposed from 'dev-mode fail-open works' to 'dev-mode is fail-CLOSED' (401 no-bearer, 200 with dev ADMIN_TOKEN). - Seeded the HasAnyLiveToken auth probe (grandfather count=0) in ~13 pre- existing discovery handler-body tests that previously relied on the fail-open swallowing the unmatched probe query. Watch-it-fail: restoring each removed branch turns the matching gate test RED (verified for all three: CanvasOrBearer lazy-bootstrap, CanvasOrBearer DB-error, discovery DB-error), reverting → green. go build ./..., go vet, and full go test ./... (46 pkgs) all green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2.7 KiB
Admin Authentication Runbook
Auth is fail-CLOSED in every environment — ADMIN_TOKEN is the bootstrap credential
Per the CTO "nothing should be fail-open" directive, every auth path on the workspace-server fails closed — there is no dev-mode / zero-token / DB-outage hatch that grants access. This includes:
AdminAuthandWorkspaceAuth(admin + per-workspace routes),CanvasOrBearer(the cosmeticPUT /canvas/viewportroute), andvalidateDiscoveryCaller(/registry/:id/peers,/registry/discover/:id).
Consequence for bootstrap: a brand-new self-hosted / dev install has no
DB-backed tokens yet, and there is no longer a fail-open that lets the first
request through. The only way to reach admin routes (and to mint the first
workspace token via POST /admin/workspaces/:id/tokens) is to set ADMIN_TOKEN
in the platform environment and present it as the bearer. This is the "local
mimics production" principle: there is no zero-config bootstrap.
- Local dev:
scripts/dev-start.shprovisions a deterministicADMIN_TOKENinto.env(and exports the matchingNEXT_PUBLIC_ADMIN_TOKENso the canvas authenticates with it). Seedocs/quickstart.md. - Self-hosted / SaaS: set
ADMIN_TOKENto a strong random secret (openssl rand -base64 32) in the platform env and bake the matchingNEXT_PUBLIC_ADMIN_TOKENinto the canvas bundle.
Required: set MOLECULE_ENV in all non-dev environments
# In your tenant / EC2 / Railway environment variables:
MOLECULE_ENV=production
This matches the production tenant default. NOTE: MOLECULE_ENV no longer gates
any auth decision — it only drives NON-security local-dev conveniences (loopback
bind, relaxed rate limit). Setting it to dev/development does not relax
authentication. Staging and production smoke tests should use the real user/API
workflow: create a workspace, then mint a one-time displayed workspace bearer
with POST /admin/workspaces/:id/tokens.
Admin bearer token (ADMIN_TOKEN)
The platform uses ADMIN_TOKEN as the bearer credential for admin-gated endpoints:
| Endpoint | Auth method |
|---|---|
GET/POST/PATCH/DELETE /workspaces |
Authorization: Bearer <ADMIN_TOKEN> |
GET /admin/liveness |
Authorization: Bearer <ADMIN_TOKEN> |
POST /org/import |
Authorization: Bearer <ADMIN_TOKEN> |
POST /admin/workspaces/:id/tokens |
Authorization: Bearer <ADMIN_TOKEN>; plaintext token returned once |
Missing or invalid bearer → 401 in every environment (fail-closed; no
dev-mode fail-open). If the auth datastore is unreachable, auth-gated routes
return 503 (platform_unavailable) — an availability tradeoff that grants no
access — rather than allowing the request through.