6f56b1fa30
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
CI / Python Lint & Test (pull_request) Successful in 3s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 4s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 2s
CI / Detect changes (pull_request) Successful in 15s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 13s
Harness Replays / detect-changes (pull_request) Successful in 6s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 10s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 6s
gate-check-v3 / gate-check (pull_request_target) Successful in 5s
E2E API Smoke Test / detect-changes (pull_request) Successful in 30s
E2E Chat / detect-changes (pull_request) Successful in 30s
qa-review / approved (pull_request_target) Failing after 4s
sop-checklist / review-refire (pull_request_target) Has been skipped
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 24s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 25s
sop-tier-check / tier-check (pull_request_target) Failing after 5s
security-review / approved (pull_request_target) Failing after 9s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
sop-checklist / na-declarations (pull_request) N/A: (none)
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 5s
sop-checklist / all-items-acked (pull_request_target) Successful in 13s
CI / Canvas (Next.js) (pull_request) Successful in 2s
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Successful in 1m8s
Harness Replays / Harness Replays (pull_request) Successful in 26s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 58s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 55s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 55s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 1m6s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Failing after 2m11s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (local) (pull_request) Successful in 2m25s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (pull_request) Successful in 12s
E2E Chat / E2E Chat (pull_request) Successful in 24s
CI / Canvas Deploy Status (pull_request) Has been skipped
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 59s
E2E Staging SaaS (full lifecycle) / E2E Staging Platform Boot (pull_request) Failing after 7m13s
CI / Platform (Go) (pull_request) Successful in 7m28s
CI / all-required (pull_request) Successful in 17s
qa-review / approved (pull_request_review) Has been skipped
security-review / approved (pull_request_review) Has been skipped
sop-tier-check / tier-check (pull_request_review) Failing after 4s
audit-force-merge / audit (pull_request_target) Successful in 26s
The prior pass (#2291) made AdminAuth/WorkspaceAuth fail-closed but RETAINED two fail-open patterns 'as a cosmetic tradeoff'. The CTO directive 'nothing should be fail-open' is ABSOLUTE, so this pass removes them too. ZERO fail-open paths now remain anywhere in workspace-server auth. CanvasOrBearer (workspace-server/internal/middleware/wsauth_middleware.go): - DB-error fail-open (`if err != nil { log; c.Next() }`) → now 503 fail-CLOSED via abortAuthLookupError (availability tradeoff, NO access). - lazy-bootstrap fail-open (`if !hasLive { c.Next() }`) → REMOVED. A zero-token install no longer passes EVERYTHING; bootstrap is via ADMIN_TOKEN (dev-start.sh provisions it for local dev; operator/SaaS sets it in prod — local mimics production). - forgeable cross-origin Origin-match pass (canvasOriginAllowed) → REMOVED. A no-bearer request passing purely on a spoofable Origin is effectively open even for a cosmetic route. The canvas now always sends a bearer (NEXT_PUBLIC_ADMIN_TOKEN), so nothing legitimate relied on it. The non-forgeable same-origin path (isSameOriginCanvas, gated by CANVAS_PROXY_URL) is kept. Helper + its 2 unit tests removed. validateDiscoveryCaller (workspace-server/internal/handlers/discovery.go): - DB-error fail-open (`if err != nil { return nil }`) → now writes 503 and returns a non-nil error (caller already `if err != nil { return }`). Bootstrap: ADMIN_TOKEN is the first-token credential (AdminAuth accepts it); documented in docs/runbooks/admin-auth.md (fail-closed everywhere; MOLECULE_ENV no longer gates any auth decision). quickstart.md already covered this. Tests: - no_fail_open_test.go: extended with CanvasOrBearer fail-closed cases (401 zero-token, 503 DB-error). discovery_test.go: added TestPeers/Discover_AuthProbeDBError_FailsClosed (503). - Flipped the stale assertions: CanvasOrBearer NoTokens/CanvasOrigin/DBError now assert fail-closed; removed canvasOriginAllowed tests. - tests/e2e/test_dev_mode.sh: repurposed from 'dev-mode fail-open works' to 'dev-mode is fail-CLOSED' (401 no-bearer, 200 with dev ADMIN_TOKEN). - Seeded the HasAnyLiveToken auth probe (grandfather count=0) in ~13 pre- existing discovery handler-body tests that previously relied on the fail-open swallowing the unmatched probe query. Watch-it-fail: restoring each removed branch turns the matching gate test RED (verified for all three: CanvasOrBearer lazy-bootstrap, CanvasOrBearer DB-error, discovery DB-error), reverting → green. go build ./..., go vet, and full go test ./... (46 pkgs) all green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
56 lines
2.7 KiB
Markdown
56 lines
2.7 KiB
Markdown
# Admin Authentication Runbook
|
|
|
|
## Auth is fail-CLOSED in every environment — `ADMIN_TOKEN` is the bootstrap credential
|
|
|
|
Per the CTO "nothing should be fail-open" directive, **every** auth path on the
|
|
workspace-server fails closed — there is no dev-mode / zero-token / DB-outage
|
|
hatch that grants access. This includes:
|
|
|
|
- `AdminAuth` and `WorkspaceAuth` (admin + per-workspace routes),
|
|
- `CanvasOrBearer` (the cosmetic `PUT /canvas/viewport` route), and
|
|
- `validateDiscoveryCaller` (`/registry/:id/peers`, `/registry/discover/:id`).
|
|
|
|
Consequence for **bootstrap**: a brand-new self-hosted / dev install has **no
|
|
DB-backed tokens yet**, and there is no longer a fail-open that lets the first
|
|
request through. The **only** way to reach admin routes (and to mint the first
|
|
workspace token via `POST /admin/workspaces/:id/tokens`) is to set `ADMIN_TOKEN`
|
|
in the platform environment and present it as the bearer. This is the "local
|
|
mimics production" principle: there is no zero-config bootstrap.
|
|
|
|
- **Local dev:** `scripts/dev-start.sh` provisions a deterministic
|
|
`ADMIN_TOKEN` into `.env` (and exports the matching `NEXT_PUBLIC_ADMIN_TOKEN`
|
|
so the canvas authenticates with it). See `docs/quickstart.md`.
|
|
- **Self-hosted / SaaS:** set `ADMIN_TOKEN` to a strong random secret
|
|
(`openssl rand -base64 32`) in the platform env and bake the matching
|
|
`NEXT_PUBLIC_ADMIN_TOKEN` into the canvas bundle.
|
|
|
|
## Required: set `MOLECULE_ENV` in all non-dev environments
|
|
|
|
```bash
|
|
# In your tenant / EC2 / Railway environment variables:
|
|
MOLECULE_ENV=production
|
|
```
|
|
|
|
This matches the production tenant default. NOTE: `MOLECULE_ENV` no longer gates
|
|
any auth decision — it only drives NON-security local-dev conveniences (loopback
|
|
bind, relaxed rate limit). Setting it to `dev`/`development` does **not** relax
|
|
authentication. Staging and production smoke tests should use the real user/API
|
|
workflow: create a workspace, then mint a one-time displayed workspace bearer
|
|
with `POST /admin/workspaces/:id/tokens`.
|
|
|
|
## Admin bearer token (`ADMIN_TOKEN`)
|
|
|
|
The platform uses `ADMIN_TOKEN` as the bearer credential for admin-gated endpoints:
|
|
|
|
| Endpoint | Auth method |
|
|
|----------|-------------|
|
|
| `GET/POST/PATCH/DELETE /workspaces` | `Authorization: Bearer <ADMIN_TOKEN>` |
|
|
| `GET /admin/liveness` | `Authorization: Bearer <ADMIN_TOKEN>` |
|
|
| `POST /org/import` | `Authorization: Bearer <ADMIN_TOKEN>` |
|
|
| `POST /admin/workspaces/:id/tokens` | `Authorization: Bearer <ADMIN_TOKEN>`; plaintext token returned once |
|
|
|
|
Missing or invalid bearer → **401 in every environment** (fail-closed; no
|
|
dev-mode fail-open). If the auth datastore is unreachable, auth-gated routes
|
|
return **503** (`platform_unavailable`) — an availability tradeoff that grants no
|
|
access — rather than allowing the request through.
|