harden(security): remove dev-mode fail-open auth — fail-closed everywhere + dev-token + regression gate #2291
Reference in New Issue
Block a user
Delete Branch "harden/no-fail-open-auth"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
CTO directive: "nothing should be fail-open." This removes the dev-mode fail-open auth hatch so
AdminAuth/WorkspaceAuth(and the discovery caller) always require a real credential — fail-CLOSED in every environment, dev included — fixes local dev to stay authenticated (not open), and adds a regression gate so fail-open can't return.Fail-open call-sites removed (
workspace-server)internal/middleware/wsauth_middleware.go·WorkspaceAuth/workspaces/:id/*request passed whenMOLECULE_ENV=dev+ADMIN_TOKENunset (isDevModeFailOpen())internal/middleware/wsauth_middleware.go·AdminAuth(Tier-1)ADMIN_TOKEN) — the C4/org/importpre-empt holeinternal/middleware/wsauth_middleware.go·AdminAuth(Tier-1b)MOLECULE_ENV=dev+ADMIN_TOKENunset (isDevModeFailOpen())internal/handlers/discovery.go·validateDiscoveryCallerMOLECULE_ENV=dev+ADMIN_TOKENunset (IsDevModeFailOpen())The
isDevModeFailOpen()/IsDevModeFailOpen()helper is deleted. Its two legitimately non-auth uses (rate-limit relaxation inratelimit.go, loopback bind default incmd/server) now key on a new NON-securityisLocalDevEnv()predicate (MOLECULE_ENVonly, decoupled fromADMIN_TOKEN).CanvasOrBearer's cosmetic-only behaviour (PUT /canvas/viewport) is unchanged and verified to be the only route it guards.Result: no code path lets an admin/workspace-protected endpoint through without a real credential.
HasAnyLiveTokenGlobalis still probed inAdminAuthso a datastore outage returns a structured 503 (not a silent pass).Dev path — authenticated, not open
scripts/dev-start.shnow provisions a deterministicADMIN_TOKENinto.envand exports the matchingNEXT_PUBLIC_ADMIN_TOKEN, so the dev Canvas sends a real bearer (canvas/src/lib/api.tsalready attaches it;next.config.tsmatched-pair guard). Docs updated:.env.example,docs/quickstart.md,docs/architecture/overview.md.Regression gate
internal/middleware/no_fail_open_test.go:TestAdminAuth_NoFailOpen_UnderOldHatchConditions+TestWorkspaceAuth_NoFailOpen_UnderOldHatchConditions— assert 401 under the EXACT old-hatch conditions (ADMIN_TOKENunset +MOLECULE_ENV=dev/development× hasLive 0/1).TestNoFailOpenAuthHelperReexists— source-guard forbidding theisDevModeFailOpen(-style helper from re-appearing.RED-against-old proof: temporarily restored the hatch branches → both gate tests + all converted fail-closed tests flipped to FAIL; reverted → all GREEN. Converted the stale fail-open assertions in
wsauth_middleware_test.go,discovery_test.go,security_regression_685_686_687_688_test.go,devmode_test.go,bind_test.go.Other fail-open patterns audited
CanvasOrBearer(no-token lazy-bootstrap + fail-open-on-DB-error) — retained: cosmetic route only (PUT /canvas/viewport), zero data/security impact. Flagged for follow-up.validateDiscoveryCallerfail-open-on-DB-error — retained: documented availability tradeoff on a low-sensitivity endpoint already gated byCanCommunicate. Flagged for follow-up.Verify
go build ./...✓go vet ./internal/middleware/ ./cmd/server/ ./internal/handlers/✓go test ./...→ 46 ok / 0 FAILScope
Untouched per the #2286 boundary:
.gitea/workflows/e2e-api.yml,tests/e2e/*.sh. NOTE: the advisory-onlytests/e2e/test_dev_mode.shstill asserts the old fail-open behaviour — it lives under the #2286-ownedtests/e2e/*.shpath and needs a follow-up there.Do NOT merge — for review.
🤖 Generated with Claude Code
1de9686b78to6f56b1fa30Reviewed: delivers the absolute nothing-fail-open directive. AdminAuth/WorkspaceAuth fail-closed (dev-hatch removed, prior pass); CanvasOrBearer DB-error→503 + lazy-bootstrap removed + forgeable-Origin removed (kept non-forgeable same-origin Referer/Host); validateDiscoveryCaller DB-error→503; bootstrap via ADMIN_TOKEN (dev-start.sh parity, docs fixed); test_dev_mode→fail-closed; regression gate extended with watch-it-fail proof on all 3 branches; go test ./... 46 ok. ONE residual flagged to CTO: discovery validateDiscoveryCaller keeps if-!hasLive→allow (count=0 legacy/pre-upgrade grandfather) — arguably still fail-open but removal risks pre-upgrade workspaces; CTO decision pending, separate follow-up. Approve.
5-axis review: APPROVED.
Correctness: This removes the remaining fail-open auth paths called out in the PR: AdminAuth lazy bootstrap/dev hatch, WorkspaceAuth dev hatch, CanvasOrBearer lazy-bootstrap/DB-error/forgeable-Origin paths, and discovery auth-probe DB-error allowance. The replacement behavior is fail-closed with explicit 401/503 outcomes, while local dev is kept functional by provisioning ADMIN_TOKEN and mirroring NEXT_PUBLIC_ADMIN_TOKEN.
Robustness: The regression tests cover the exact old hatch conditions, DB-error fail-closed behavior, source-guard against reintroducing the old helper, discovery fail-closed paths, and updated local-dev expectations. Existing discovery tests are adjusted by explicitly seeding legitimate grandfather paths instead of relying on auth-probe errors being swallowed. Security: this is a net hardening change; it removes bearer-less access through dev-mode, fresh-install bootstrap, spoofable Origin, and datastore-error paths. Performance: no hot-path algorithmic expansion beyond auth token-state probes already used for gate decisions. Readability: comments and runbook updates clearly separate non-security local-dev detection from auth decisions.
Required-context review: head
6f56b1fa30is mergeable; CI/all-required, E2E API Smoke, Handlers PG, and Platform Go are green. Combined red is ceremony-context noise and was not used as the merge gate.