fix(workspace-server): default-bind to 127.0.0.1 in dev mode + close S-8 LAN fail-open #7

Closed
opened 2026-05-07 05:25:45 +00:00 by Ghost · 0 comments

Summary

The workspace-server fails open on *:8080 for any caller on the same network as the founder's Mac during local development. Same-LAN peer can create / list / delete workspaces with no token. Verified via curl -X POST http://<mac-LAN-ip>:8080/workspaces -d '{"name":"unauth-probe"}' returning 201 Created.

This is finding S-8 from the security posture audit RFC v1 (molecule-ai/internal#28, merged 2026-05-07).

Root cause — both the bind AND the auth chain are permissive

The fail-open requires BOTH conditions, but on a typical local-dev Mac BOTH hold simultaneously:

  1. Bind on *: cmd/server/main.go:341-344 does Addr: fmt.Sprintf(":%s", port). Go's net.Listen("tcp", ":8080") binds every interface, not just loopback. Same-LAN peers can reach the listener.

  2. AdminAuth fail-open: internal/middleware/wsauth_middleware.go:155-186. Two branches let unauthenticated requests through:

    • Tier 1 (line 165-176): ADMIN_TOKEN="" AND no live workspace tokens in DB → c.Next(). Path documented as "fresh install before first token mint."
    • Tier 1b (line 183-186): isDevModeFailOpen()ADMIN_TOKEN="" AND MOLECULE_ENV=dev|developmentc.Next(). Documented in internal/middleware/devmode.go as "local-dev escape hatch so canvas at :3000 keeps working without ADMIN_TOKEN friction."

In production SaaS (per saved memory reference_saas_workspace_server_auth_chain):

  • ADMIN_TOKEN is always set
  • MOLECULE_ENV=production
  • Bound on * is fine because the only network reachability is from the tenant's own EC2 + the controlplane's reverse proxy — no random LAN peers.

Local dev hits both fail-open conditions:

  • Operators don't set ADMIN_TOKEN (the canvas dev workflow doesn't require it)
  • Many set MOLECULE_ENV=dev for the dev-mode escape hatch
  • Bound on * exposes to coffee shop / conference / home guest-wifi peers

Affected surfaces

Surface Notes
workspace-server/cmd/server/main.go:341-344 Where :8080 becomes *:8080
workspace-server/internal/middleware/wsauth_middleware.go:155-186 AdminAuth fail-open (Tier 1, Tier 1b)
workspace-server/internal/middleware/devmode.go:49-56 isDevModeFailOpen() predicate (also called by WorkspaceAuth)
Documentation: ~/.molecule-ai/handbook.md Currently has no section on local-dev exposure model

Prior art reviewed

  • reference_saas_workspace_server_auth_chain (saved memory): documents the production auth chain — tenant subdomain + per-tenant ADMIN_TOKEN + X-Molecule-Org-Id. None of this applies for local dev → no auth on the wire.
  • feedback_local_must_mimic_production (saved memory): "when local-dev skips a prod-only path, bugs there are structurally invisible until they hit a paying tenant." A bind change should NOT break the local-dev mimic; binding to 127.0.0.1 is more restrictive than prod, not less, so it doesn't hide a prod bug — it just shrinks the local attack surface.
  • PR #166 (referenced in router.go:106 + 134-136): wrapped /workspaces POST/DELETE in AdminAuth; closed C1+C20 against attackers WITH valid network reach. Did not change the bind.
  • PR #623 (referenced in wsauth_middleware.go:150-154): closed an Origin-header bypass on CanvasOrBearer. Same authorial intent: don't trust forgeable headers.
  • PR #684 (referenced in devmode.go:43): added the ADMIN_TOKEN opt-in. Half of the eventual closure.

External prior art:

  • Jupyter notebook server: solves the same "local dev needs zero-friction auth" problem by auto-minting a per-process random token on startup, logging it once to stderr, and surfacing it via the URL query string (http://localhost:8888/?token=...). Adopting: shows the pattern is well-trodden. Rejecting for THIS PR: complicates the canvas integration (canvas would need to read the log or the URL fragment); larger change than needed today.
  • Postgres listen_addresses + pg_hba.conf: separate the network-binding decision from the auth-method decision. Adopting: same shape — bind narrowness AND auth strength are independent levers; this fix touches only the bind axis.
  • Docker daemon: binds to a Unix socket by default; TCP exposure is opt-in via -H tcp://.... Adopting: localhost-by-default is the safe shape.
  • Redis protected-mode yes: when binding to all interfaces AND no auth set, Redis refuses external connections by default. Adopting: same coupling — auth-or-bind, never neither. Rejecting at full strength: "refuse all external" is too aggressive for our self-hosted shape.

Proposed approach

Smallest correct fix: introduce BIND_ADDR env var. Default the bind to 127.0.0.1 ONLY when isDevModeFailOpen() returns true (= ADMIN_TOKEN="" AND MOLECULE_ENV is a dev value). Production and non-dev self-hosted keep binding on * (existing behavior).

// cmd/server/main.go, replacing the current Addr line
bindAddr := os.Getenv("BIND_ADDR")
if bindAddr == "" && middleware.IsDevModeFailOpen() {
    bindAddr = "127.0.0.1"
}
srv := &http.Server{
    Addr:    fmt.Sprintf("%s:%s", bindAddr, port),
    Handler: r,
}
log.Printf("Platform starting on %s:%s (dev-mode fail-open=%v)", bindAddr, port, middleware.IsDevModeFailOpen())

This couples the bind-narrowness to the SAME signal the auth fail-open uses. Two safety levers move together; no new state.

Alternatives considered

  1. Always require ADMIN_TOKEN (remove dev-mode fail-open entirely). REJECTED: contradicts the documented design goal (devmode.go:8-21) of zero-friction local smoke tests. Would force every Mac dev to set + propagate ADMIN_TOKEN to canvas via NEXT_PUBLIC_ADMIN_TOKEN, then rebuild canvas bundle. Friction the platform team already chose to avoid.

  2. Auto-mint per-process random token + log to stderr (Jupyter pattern). REJECTED for this PR: bigger change, would require canvas to read either the log file or a startup file to pick up the token. Solving a local-dev UX problem at the cost of integration complexity. Worth doing as a follow-up if the bind narrowing alone proves insufficient.

  3. Bind always * regardless of mode + tighten auth instead. REJECTED: doesn't help operators on shared wifi. Network surface narrowing is a strictly defensive layer in addition to auth, not a replacement. Defense-in-depth.

SSOT decision

The bind address is set in exactly one place: cmd/server/main.go:341-344. This PR keeps that as SSOT — the dispatch logic for "should I narrow to loopback?" lives inline at that one site, calling middleware.IsDevModeFailOpen() (already exposed as a public helper at devmode.go:63). No new file, no new abstraction.

Security-aware design check (the 4 questions, per SOP)

  • Untrusted input? No. The change parses one env var (BIND_ADDR) at startup. The HTTP request flow is unchanged.
  • Auth / sessions / permissions? Indirectly tightening — the fix narrows the network surface that reaches the auth chain. Auth code itself is unchanged. No session model change.
  • Data collection / logging / transmission expansion? No. Adds one log line at startup announcing the bind. No new data emitted on requests. No telemetry.
  • Access change? YES — same-LAN peers can no longer reach the dev-mode service. That IS the goal. Operators who want LAN exposure set BIND_ADDR=0.0.0.0 explicitly (and pair it with ADMIN_TOKEN if they want safety).

Versioning + backwards compatibility

  • BIND_ADDR is a NEW env var; absence preserves the prior behavior except in dev-mode (where the default narrows to 127.0.0.1). This is a behavior change for operators running with MOLECULE_ENV=dev|development and ADMIN_TOKEN unset.
  • Affected operators: the founder's local Mac. Self-hosted operators reading the canvas at http://<machine-LAN-ip>:8080 from a different machine on their LAN will need to either set BIND_ADDR=0.0.0.0 (back to today's shape) OR set ADMIN_TOKEN (which makes isDevModeFailOpen() false and restores the * default).
  • Migration: documented in the operational docs update (handbook addition).
  • Deprecation: none — the fail-open dev-mode path itself is preserved. Only the network reach narrows.
  • No semver bump (internal service). No schema, no API version. The env-var addition is additive.

Acceptance criteria for this issue

  • PR opens, links this issue
  • cmd/server/main.go bind logic uses the conditional
  • 3 unit tests in a new bind_test.go (BIND_ADDR explicit, BIND_ADDR unset + dev → 127.0.0.1, BIND_ADDR unset + non-dev → empty)
  • Mutation test: removing the if middleware.IsDevModeFailOpen() line makes the dev-mode test fail
  • Manual verification on the founder's Mac: MOLECULE_ENV=dev ADMIN_TOKEN= go run ./cmd/serverlsof -iTCP:8080 -sTCP:LISTEN shows 127.0.0.1:8080 not *:8080. curl http://<mac-LAN-ip>:8080/health from a same-LAN device returns connection refused.
  • Handbook updated with the local-dev exposure section
  • No production regression (CI green, manual production-shape sanity)

Out of scope (parked as follow-ups)

  • S-9 (canvas on *:3000): Next.js dev server. Same fix shape (next dev -H 127.0.0.1). Separate PR.
  • S-10 (Postgres on *:5432): Docker Desktop config. Operator one-line fix. Separate PR.
  • Auto-mint per-process token (Jupyter pattern): revisit if bind narrowing proves insufficient. Tracked as v2 follow-up.
## Summary The workspace-server fails open on `*:8080` for any caller on the same network as the founder's Mac during local development. Same-LAN peer can create / list / delete workspaces with no token. Verified via `curl -X POST http://<mac-LAN-ip>:8080/workspaces -d '{"name":"unauth-probe"}'` returning `201 Created`. This is finding S-8 from the security posture audit RFC v1 (`molecule-ai/internal#28`, merged 2026-05-07). ## Root cause — both the bind AND the auth chain are permissive The fail-open requires BOTH conditions, but on a typical local-dev Mac BOTH hold simultaneously: 1. **Bind on `*`**: `cmd/server/main.go:341-344` does `Addr: fmt.Sprintf(":%s", port)`. Go's `net.Listen("tcp", ":8080")` binds every interface, not just loopback. Same-LAN peers can reach the listener. 2. **`AdminAuth` fail-open**: `internal/middleware/wsauth_middleware.go:155-186`. Two branches let unauthenticated requests through: - **Tier 1** (line 165-176): `ADMIN_TOKEN=""` AND no live workspace tokens in DB → `c.Next()`. Path documented as "fresh install before first token mint." - **Tier 1b** (line 183-186): `isDevModeFailOpen()` → `ADMIN_TOKEN=""` AND `MOLECULE_ENV=dev|development` → `c.Next()`. Documented in `internal/middleware/devmode.go` as "local-dev escape hatch so canvas at :3000 keeps working without ADMIN_TOKEN friction." In production SaaS (per saved memory `reference_saas_workspace_server_auth_chain`): - `ADMIN_TOKEN` is always set - `MOLECULE_ENV=production` - Bound on `*` is fine because the only network reachability is from the tenant's own EC2 + the controlplane's reverse proxy — no random LAN peers. Local dev hits both fail-open conditions: - Operators don't set `ADMIN_TOKEN` (the canvas dev workflow doesn't require it) - Many set `MOLECULE_ENV=dev` for the dev-mode escape hatch - Bound on `*` exposes to coffee shop / conference / home guest-wifi peers ## Affected surfaces | Surface | Notes | |---|---| | `workspace-server/cmd/server/main.go:341-344` | Where `:8080` becomes `*:8080` | | `workspace-server/internal/middleware/wsauth_middleware.go:155-186` | AdminAuth fail-open (Tier 1, Tier 1b) | | `workspace-server/internal/middleware/devmode.go:49-56` | `isDevModeFailOpen()` predicate (also called by `WorkspaceAuth`) | | Documentation: `~/.molecule-ai/handbook.md` | Currently has no section on local-dev exposure model | ## Prior art reviewed - **`reference_saas_workspace_server_auth_chain`** (saved memory): documents the production auth chain — tenant subdomain + per-tenant `ADMIN_TOKEN` + `X-Molecule-Org-Id`. None of this applies for local dev → no auth on the wire. - **`feedback_local_must_mimic_production`** (saved memory): "when local-dev skips a prod-only path, bugs there are structurally invisible until they hit a paying tenant." A bind change should NOT break the local-dev mimic; binding to `127.0.0.1` is more restrictive than prod, not less, so it doesn't hide a prod bug — it just shrinks the local attack surface. - **PR `#166`** (referenced in router.go:106 + 134-136): wrapped `/workspaces` POST/DELETE in AdminAuth; closed C1+C20 against attackers WITH valid network reach. Did not change the bind. - **PR `#623`** (referenced in wsauth_middleware.go:150-154): closed an Origin-header bypass on `CanvasOrBearer`. Same authorial intent: don't trust forgeable headers. - **PR `#684`** (referenced in devmode.go:43): added the `ADMIN_TOKEN` opt-in. Half of the eventual closure. External prior art: - **Jupyter notebook server**: solves the same "local dev needs zero-friction auth" problem by auto-minting a per-process random token on startup, logging it once to stderr, and surfacing it via the URL query string (`http://localhost:8888/?token=...`). Adopting: shows the pattern is well-trodden. Rejecting for THIS PR: complicates the canvas integration (canvas would need to read the log or the URL fragment); larger change than needed today. - **Postgres `listen_addresses`** + `pg_hba.conf`: separate the network-binding decision from the auth-method decision. Adopting: same shape — bind narrowness AND auth strength are independent levers; this fix touches only the bind axis. - **Docker daemon**: binds to a Unix socket by default; TCP exposure is opt-in via `-H tcp://...`. Adopting: localhost-by-default is the safe shape. - **Redis `protected-mode yes`**: when binding to all interfaces AND no auth set, Redis refuses external connections by default. Adopting: same coupling — auth-or-bind, never neither. Rejecting at full strength: "refuse all external" is too aggressive for our self-hosted shape. ## Proposed approach **Smallest correct fix**: introduce `BIND_ADDR` env var. Default the bind to `127.0.0.1` ONLY when `isDevModeFailOpen()` returns true (= `ADMIN_TOKEN=""` AND `MOLECULE_ENV` is a dev value). Production and non-dev self-hosted keep binding on `*` (existing behavior). ```go // cmd/server/main.go, replacing the current Addr line bindAddr := os.Getenv("BIND_ADDR") if bindAddr == "" && middleware.IsDevModeFailOpen() { bindAddr = "127.0.0.1" } srv := &http.Server{ Addr: fmt.Sprintf("%s:%s", bindAddr, port), Handler: r, } log.Printf("Platform starting on %s:%s (dev-mode fail-open=%v)", bindAddr, port, middleware.IsDevModeFailOpen()) ``` This couples the bind-narrowness to the SAME signal the auth fail-open uses. Two safety levers move together; no new state. ## Alternatives considered 1. **Always require `ADMIN_TOKEN` (remove dev-mode fail-open entirely)**. REJECTED: contradicts the documented design goal (devmode.go:8-21) of zero-friction local smoke tests. Would force every Mac dev to set + propagate `ADMIN_TOKEN` to canvas via `NEXT_PUBLIC_ADMIN_TOKEN`, then rebuild canvas bundle. Friction the platform team already chose to avoid. 2. **Auto-mint per-process random token + log to stderr (Jupyter pattern)**. REJECTED for this PR: bigger change, would require canvas to read either the log file or a startup file to pick up the token. Solving a local-dev UX problem at the cost of integration complexity. Worth doing as a follow-up if the bind narrowing alone proves insufficient. 3. **Bind always `*` regardless of mode + tighten auth instead**. REJECTED: doesn't help operators on shared wifi. Network surface narrowing is a strictly defensive layer in addition to auth, not a replacement. Defense-in-depth. ## SSOT decision The bind address is set in exactly one place: `cmd/server/main.go:341-344`. This PR keeps that as SSOT — the dispatch logic for "should I narrow to loopback?" lives inline at that one site, calling `middleware.IsDevModeFailOpen()` (already exposed as a public helper at `devmode.go:63`). No new file, no new abstraction. ## Security-aware design check (the 4 questions, per SOP) - **Untrusted input?** No. The change parses one env var (`BIND_ADDR`) at startup. The HTTP request flow is unchanged. - **Auth / sessions / permissions?** Indirectly tightening — the fix narrows the network surface that reaches the auth chain. Auth code itself is unchanged. No session model change. - **Data collection / logging / transmission expansion?** No. Adds one log line at startup announcing the bind. No new data emitted on requests. No telemetry. - **Access change?** YES — same-LAN peers can no longer reach the dev-mode service. That IS the goal. Operators who want LAN exposure set `BIND_ADDR=0.0.0.0` explicitly (and pair it with `ADMIN_TOKEN` if they want safety). ## Versioning + backwards compatibility - `BIND_ADDR` is a NEW env var; absence preserves the prior behavior **except** in dev-mode (where the default narrows to `127.0.0.1`). This is a behavior change for operators running with `MOLECULE_ENV=dev|development` and `ADMIN_TOKEN` unset. - **Affected operators**: the founder's local Mac. Self-hosted operators reading the canvas at `http://<machine-LAN-ip>:8080` from a different machine on their LAN will need to either set `BIND_ADDR=0.0.0.0` (back to today's shape) OR set `ADMIN_TOKEN` (which makes `isDevModeFailOpen()` false and restores the `*` default). - **Migration**: documented in the operational docs update (handbook addition). - **Deprecation**: none — the fail-open dev-mode path itself is preserved. Only the network reach narrows. - No semver bump (internal service). No schema, no API version. The env-var addition is additive. ## Acceptance criteria for this issue - [ ] PR opens, links this issue - [ ] `cmd/server/main.go` bind logic uses the conditional - [ ] 3 unit tests in a new `bind_test.go` (BIND_ADDR explicit, BIND_ADDR unset + dev → 127.0.0.1, BIND_ADDR unset + non-dev → empty) - [ ] Mutation test: removing the `if middleware.IsDevModeFailOpen()` line makes the dev-mode test fail - [ ] Manual verification on the founder's Mac: `MOLECULE_ENV=dev ADMIN_TOKEN= go run ./cmd/server` → `lsof -iTCP:8080 -sTCP:LISTEN` shows `127.0.0.1:8080` not `*:8080`. `curl http://<mac-LAN-ip>:8080/health` from a same-LAN device returns connection refused. - [ ] Handbook updated with the local-dev exposure section - [ ] No production regression (CI green, manual production-shape sanity) ## Out of scope (parked as follow-ups) - **S-9 (canvas on `*:3000`)**: Next.js dev server. Same fix shape (`next dev -H 127.0.0.1`). Separate PR. - **S-10 (Postgres on `*:5432`)**: Docker Desktop config. Operator one-line fix. Separate PR. - **Auto-mint per-process token (Jupyter pattern)**: revisit if bind narrowing proves insufficient. Tracked as v2 follow-up.
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#7
No description provided.