fix(workspace-server): default-bind to 127.0.0.1 in dev-mode fail-open (closes #7) #8

Merged
claude-ceo-assistant merged 2 commits from fix/s8-bind-loopback-dev into main 2026-05-07 11:25:49 +00:00
First-time contributor

Closes #7. Fix S-8 from the security posture audit RFC v1 (molecule-ai/internal#28).

What changed

In dev-mode (MOLECULE_ENV=dev|development AND ADMIN_TOKEN unset) the AdminAuth chain fails open by design — canvas at :3000 calls workspace-server at :8080 with no bearer token. Pairing that with the existing wildcard bind on :8080 exposed unauthenticated POST /workspaces to any same-LAN peer.

This PR couples the bind narrowness to the same signal that gates the auth fail-open. New resolveBindHost() in cmd/server/main.go with this precedence:

  1. BIND_ADDR env var (explicit operator override, any value)
  2. middleware.IsDevModeFailOpen() true → 127.0.0.1
  3. otherwise → "" (Go binds every interface — existing prod/self-host shape)
ADMIN_TOKEN MOLECULE_ENV BIND_ADDR Effective bind
unset dev / development unset 127.0.0.1 (NEW)
set any unset * (unchanged from today)
any production / unset / other unset * (unchanged from today)
any any <host> <host> (operator override)

Startup log now includes the resolved bind + dev-mode-fail-open state so operators can audit the listener shape from logs alone.

Why this approach

Two safety levers — bind narrowness and auth strength — move together. A production deploy (ADMIN_TOKEN set) keeps binding to all interfaces because the auth chain is doing its job; a dev Mac (no ADMIN_TOKEN, MOLECULE_ENV=dev) is reachable only via loopback because the auth chain is fail-open. No new state, no new env-var configuration the operator has to remember to set.

Alternatives rejected

  1. Always require ADMIN_TOKEN — would force every Mac dev to set + propagate it to canvas via NEXT_PUBLIC_ADMIN_TOKEN, then rebuild canvas bundle. Friction the platform team chose to avoid; preserved here.
  2. Auto-mint per-process random token + log to stderr (Jupyter pattern) — bigger change; canvas would need to read the log/URL fragment. Worth doing as a follow-up if loopback-only proves insufficient.
  3. Bind always * + tighten auth — doesn't help operators on shared wifi; network surface narrowing is a strictly defensive layer in addition to auth, not a replacement (defense-in-depth).

Tests

workspace-server/cmd/server/bind_test.go — 8 t.Setenv table cases:

TestResolveBindHost/no_bindaddr_devmode_unset_admin               PASS  → 127.0.0.1
TestResolveBindHost/no_bindaddr_devmode_unset_admin_full_word     PASS  → 127.0.0.1
TestResolveBindHost/no_bindaddr_admin_set_in_dev_env              PASS  → ""
TestResolveBindHost/no_bindaddr_production_env                    PASS  → ""
TestResolveBindHost/no_bindaddr_unset_env                         PASS  → ""
TestResolveBindHost/explicit_bindaddr_loopback_overrides_devmode  PASS  → 127.0.0.1
TestResolveBindHost/explicit_bindaddr_wildcard_overrides_devmode  PASS  → 0.0.0.0
TestResolveBindHost/explicit_bindaddr_in_production               PASS  → 10.0.5.7

Mutation test: deleting the if middleware.IsDevModeFailOpen() branch (so resolveBindHost always returns "") makes both no_bindaddr_devmode_* cases FAIL — confirms the tests aren't tautologies.

--- FAIL: TestResolveBindHost (0.00s)
    --- FAIL: TestResolveBindHost/no_bindaddr_devmode_unset_admin (0.00s)
        bind_test.go:84: resolveBindHost() = "", want "127.0.0.1" ...
    --- FAIL: TestResolveBindHost/no_bindaddr_devmode_unset_admin_full_word (0.00s)
        bind_test.go:84: resolveBindHost() = "", want "127.0.0.1" ...

go vet ./cmd/server/... clean. go build ./... clean (the new middleware import in cmd/server/main.go is the only new dep edge — internal/middleware was already a transitive dep via router).

Manual verification (post-deploy)

After this lands and operator restarts workspace-server:

MOLECULE_ENV=dev ADMIN_TOKEN= go run ./cmd/server
# in another terminal:
lsof -iTCP:8080 -sTCP:LISTEN   # expect 127.0.0.1:8080, NOT *:8080
curl http://<lan-ip>:8080/health     # from another LAN device → connection refused
curl http://127.0.0.1:8080/health    # from same Mac → 200 OK

Production smoke (run on staging tenant, where ADMIN_TOKEN is set):

docker logs <workspace-server-container> | grep "Platform starting on"
# expect "Platform starting on :8080 (dev-mode-fail-open=false)"
# bindHost is empty so the line shows ":8080" — matches today's behavior

Security review (Phase 2 questions)

  • Untrusted input? No. Reads one new env var (BIND_ADDR) at startup. HTTP request flow unchanged.
  • Auth / sessions / permissions? Indirectly tightened — narrows the network surface that reaches the auth chain. Auth code itself untouched.
  • Data collection / logging expansion? No new request-time logging. One new startup log line announcing the bind shape. No telemetry.
  • Access change? YES, by design — same-LAN peers can no longer reach the dev-mode service. Operators who explicitly want LAN exposure set BIND_ADDR=0.0.0.0 (and ideally pair with ADMIN_TOKEN).

Versioning + backwards compatibility

BIND_ADDR is a new additive env var. Absence preserves prior behavior except in dev-mode (where the default narrows to 127.0.0.1). Affected operators:

  • Founder's local Mac: behavior changes — workspace-server now binds loopback when MOLECULE_ENV=dev. This is the goal.
  • Self-hosted operators reading canvas at http://<lan-ip>:8080 from a separate machine on their LAN: must set BIND_ADDR=0.0.0.0 (back to today's shape) OR set ADMIN_TOKEN (which makes IsDevModeFailOpen() false → restores *). Documented in handbook §5 (added in this PR's companion edit, see below).
  • Production SaaS tenants: no change. ADMIN_TOKEN is always set in tenant env, so IsDevModeFailOpen() returns false, so default bind stays "" (all interfaces). CI smoke and tenant runtime are unaffected.

No semver bump (internal service). No schema, no API version. No migration.

Documentation

  • Operational doc: ~/.molecule-ai/handbook.md §5 gets a new "Local-dev exposure model — workspace-server" subsection (committed locally; not in this monorepo since handbook lives on the operator host).
  • Code-level doc: inline comment block on the new bind logic at cmd/server/main.go referencing molecule-core#7.
  • User-facing doc: N/A — internal service change, no public API surface.

Rollout / rollback

  • Rollout: merge → next workspace-server release picks up the change → operator restarts. No multi-step rollout needed.
  • Rollback: git revert the merge OR set BIND_ADDR=0.0.0.0 to immediately restore today's shape without a code revert.

Out of scope (parked as separate issues)

  • S-9 (canvas on *:3000) — Next.js dev server. Same fix shape (next dev -H 127.0.0.1). Needs a separate PR in the canvas repo.
  • S-10 (Postgres on *:5432) — Docker Desktop config. Operator one-line fix.
  • Auto-mint per-process token (Jupyter pattern) — revisit if bind narrowing proves insufficient.

Five-axis self-review (hostile)

Three weakest spots:

  1. No integration-level test that actually starts a listener and asserts on the resolved address. resolveBindHost() is a pure function and tested as such; the http.Server.Addr = fmt.Sprintf("%s:%s", bindHost, port) glue is not exercised by a test. Risk: low — the formatting is straightforward and the http.Server contract is a stdlib guarantee. Mitigated by the manual verification step above.
  2. MOLECULE_ENV=dev is the OS-environment substring match, not a structural type. A typo like MOLECULE_ENV=Dev would still trigger loopback (devmode.go lowercases) but MOLECULE_ENV=devel would not (allowlisted set is {"development","dev"}). Pre-existing behavior of IsDevModeFailOpen(); not regressed here.
  3. The handbook update lives on the operator host, not in this repo — there's no docs/ directory in molecule-core. If the handbook drifts (e.g., operator host rebuilds), the table is the only authoritative source. Acceptable for now since the handbook is the documented SSOT for ops; longer-term we should consider a docs/operations/local-dev.md in this repo and a mol_handbook_sync job.

🤖 Generated with Claude Code

Closes #7. Fix S-8 from the security posture audit RFC v1 (`molecule-ai/internal#28`). ## What changed In dev-mode (`MOLECULE_ENV=dev|development` AND `ADMIN_TOKEN` unset) the AdminAuth chain fails open by design — canvas at `:3000` calls workspace-server at `:8080` with no bearer token. Pairing that with the existing wildcard bind on `:8080` exposed unauthenticated `POST /workspaces` to any same-LAN peer. This PR couples the bind narrowness to the same signal that gates the auth fail-open. New `resolveBindHost()` in `cmd/server/main.go` with this precedence: 1. `BIND_ADDR` env var (explicit operator override, any value) 2. `middleware.IsDevModeFailOpen()` true → `127.0.0.1` 3. otherwise → `""` (Go binds every interface — existing prod/self-host shape) | `ADMIN_TOKEN` | `MOLECULE_ENV` | `BIND_ADDR` | Effective bind | |---|---|---|---| | unset | `dev` / `development` | unset | `127.0.0.1` (NEW) | | set | any | unset | `*` (unchanged from today) | | any | `production` / unset / other | unset | `*` (unchanged from today) | | any | any | `<host>` | `<host>` (operator override) | Startup log now includes the resolved bind + dev-mode-fail-open state so operators can audit the listener shape from logs alone. ## Why this approach Two safety levers — bind narrowness and auth strength — move together. A production deploy (`ADMIN_TOKEN` set) keeps binding to all interfaces because the auth chain is doing its job; a dev Mac (no `ADMIN_TOKEN`, `MOLECULE_ENV=dev`) is reachable only via loopback because the auth chain is fail-open. No new state, no new env-var configuration the operator has to remember to set. ## Alternatives rejected 1. **Always require `ADMIN_TOKEN`** — would force every Mac dev to set + propagate it to canvas via `NEXT_PUBLIC_ADMIN_TOKEN`, then rebuild canvas bundle. Friction the platform team chose to avoid; preserved here. 2. **Auto-mint per-process random token + log to stderr (Jupyter pattern)** — bigger change; canvas would need to read the log/URL fragment. Worth doing as a follow-up if loopback-only proves insufficient. 3. **Bind always `*` + tighten auth** — doesn't help operators on shared wifi; network surface narrowing is a strictly defensive layer in addition to auth, not a replacement (defense-in-depth). ## Tests `workspace-server/cmd/server/bind_test.go` — 8 `t.Setenv` table cases: ``` TestResolveBindHost/no_bindaddr_devmode_unset_admin PASS → 127.0.0.1 TestResolveBindHost/no_bindaddr_devmode_unset_admin_full_word PASS → 127.0.0.1 TestResolveBindHost/no_bindaddr_admin_set_in_dev_env PASS → "" TestResolveBindHost/no_bindaddr_production_env PASS → "" TestResolveBindHost/no_bindaddr_unset_env PASS → "" TestResolveBindHost/explicit_bindaddr_loopback_overrides_devmode PASS → 127.0.0.1 TestResolveBindHost/explicit_bindaddr_wildcard_overrides_devmode PASS → 0.0.0.0 TestResolveBindHost/explicit_bindaddr_in_production PASS → 10.0.5.7 ``` **Mutation test**: deleting the `if middleware.IsDevModeFailOpen()` branch (so `resolveBindHost` always returns `""`) makes both `no_bindaddr_devmode_*` cases FAIL — confirms the tests aren't tautologies. ``` --- FAIL: TestResolveBindHost (0.00s) --- FAIL: TestResolveBindHost/no_bindaddr_devmode_unset_admin (0.00s) bind_test.go:84: resolveBindHost() = "", want "127.0.0.1" ... --- FAIL: TestResolveBindHost/no_bindaddr_devmode_unset_admin_full_word (0.00s) bind_test.go:84: resolveBindHost() = "", want "127.0.0.1" ... ``` `go vet ./cmd/server/...` clean. `go build ./...` clean (the new `middleware` import in `cmd/server/main.go` is the only new dep edge — `internal/middleware` was already a transitive dep via `router`). ## Manual verification (post-deploy) After this lands and operator restarts workspace-server: ```bash MOLECULE_ENV=dev ADMIN_TOKEN= go run ./cmd/server # in another terminal: lsof -iTCP:8080 -sTCP:LISTEN # expect 127.0.0.1:8080, NOT *:8080 curl http://<lan-ip>:8080/health # from another LAN device → connection refused curl http://127.0.0.1:8080/health # from same Mac → 200 OK ``` Production smoke (run on staging tenant, where `ADMIN_TOKEN` is set): ```bash docker logs <workspace-server-container> | grep "Platform starting on" # expect "Platform starting on :8080 (dev-mode-fail-open=false)" # bindHost is empty so the line shows ":8080" — matches today's behavior ``` ## Security review (Phase 2 questions) - **Untrusted input?** No. Reads one new env var (`BIND_ADDR`) at startup. HTTP request flow unchanged. - **Auth / sessions / permissions?** Indirectly tightened — narrows the network surface that reaches the auth chain. Auth code itself untouched. - **Data collection / logging expansion?** No new request-time logging. One new startup log line announcing the bind shape. No telemetry. - **Access change?** YES, by design — same-LAN peers can no longer reach the dev-mode service. Operators who explicitly want LAN exposure set `BIND_ADDR=0.0.0.0` (and ideally pair with `ADMIN_TOKEN`). ## Versioning + backwards compatibility `BIND_ADDR` is a **new additive env var**. Absence preserves prior behavior **except** in dev-mode (where the default narrows to `127.0.0.1`). Affected operators: - **Founder's local Mac**: behavior changes — workspace-server now binds loopback when `MOLECULE_ENV=dev`. This is the goal. - **Self-hosted operators** reading canvas at `http://<lan-ip>:8080` from a separate machine on their LAN: must set `BIND_ADDR=0.0.0.0` (back to today's shape) OR set `ADMIN_TOKEN` (which makes `IsDevModeFailOpen()` false → restores `*`). Documented in handbook §5 (added in this PR's companion edit, see below). - **Production SaaS tenants**: no change. `ADMIN_TOKEN` is always set in tenant env, so `IsDevModeFailOpen()` returns false, so default bind stays `""` (all interfaces). CI smoke and tenant runtime are unaffected. No semver bump (internal service). No schema, no API version. No migration. ## Documentation - **Operational doc**: `~/.molecule-ai/handbook.md` §5 gets a new "Local-dev exposure model — workspace-server" subsection (committed locally; not in this monorepo since handbook lives on the operator host). - **Code-level doc**: inline comment block on the new bind logic at `cmd/server/main.go` referencing `molecule-core#7`. - **User-facing doc**: N/A — internal service change, no public API surface. ## Rollout / rollback - **Rollout**: merge → next workspace-server release picks up the change → operator restarts. No multi-step rollout needed. - **Rollback**: `git revert` the merge OR set `BIND_ADDR=0.0.0.0` to immediately restore today's shape without a code revert. ## Out of scope (parked as separate issues) - **S-9 (canvas on `*:3000`)** — Next.js dev server. Same fix shape (`next dev -H 127.0.0.1`). Needs a separate PR in the canvas repo. - **S-10 (Postgres on `*:5432`)** — Docker Desktop config. Operator one-line fix. - **Auto-mint per-process token (Jupyter pattern)** — revisit if bind narrowing proves insufficient. ## Five-axis self-review (hostile) **Three weakest spots:** 1. **No integration-level test that actually starts a listener and asserts on the resolved address.** `resolveBindHost()` is a pure function and tested as such; the `http.Server.Addr = fmt.Sprintf("%s:%s", bindHost, port)` glue is not exercised by a test. Risk: low — the formatting is straightforward and the http.Server contract is a stdlib guarantee. Mitigated by the manual verification step above. 2. **`MOLECULE_ENV=dev` is the OS-environment substring match, not a structural type.** A typo like `MOLECULE_ENV=Dev` would still trigger loopback (devmode.go lowercases) but `MOLECULE_ENV=devel` would not (allowlisted set is `{"development","dev"}`). Pre-existing behavior of `IsDevModeFailOpen()`; not regressed here. 3. **The handbook update lives on the operator host, not in this repo** — there's no `docs/` directory in molecule-core. If the handbook drifts (e.g., operator host rebuilds), the table is the only authoritative source. Acceptable for now since the handbook is the documented SSOT for ops; longer-term we should consider a `docs/operations/local-dev.md` in this repo and a `mol_handbook_sync` job. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
Ghost added 1 commit 2026-05-07 05:30:38 +00:00
fix(workspace-server): default-bind to 127.0.0.1 in dev-mode fail-open
Some checks failed
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 4s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 5s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 6s
Retarget main PRs to staging / Retarget to staging (pull_request) Has been skipped
Harness Replays / detect-changes (pull_request) Successful in 5s
CI / Detect changes (pull_request) Successful in 6s
E2E API Smoke Test / detect-changes (pull_request) Successful in 5s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 5s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 5s
CI / Canvas (Next.js) (pull_request) Successful in 5s
CI / Python Lint & Test (pull_request) Successful in 3s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 5s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 4s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 4s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Harness Replays / Harness Replays (pull_request) Failing after 35s
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Failing after 56s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Failing after 1m24s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Failing after 1m25s
CI / Platform (Go) (pull_request) Successful in 1m48s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Failing after 4m47s
f3187ea0c1
In dev mode (`MOLECULE_ENV=dev|development`, `ADMIN_TOKEN` unset) the
AdminAuth chain fails open by design so canvas at :3000 can call
workspace-server at :8080 without a bearer token. Combined with the
existing wildcard bind on `:8080`, that exposed unauthenticated
`POST /workspaces` to any same-LAN peer (S-8 in the audit RFC v1).

Couple the bind narrowness to the same signal that drives the auth
fail-open: when `middleware.IsDevModeFailOpen()` returns true, default
the listener to `127.0.0.1`. Production (`ADMIN_TOKEN` set) keeps
binding to all interfaces — its auth chain is doing the work. Operators
who need LAN exposure set `BIND_ADDR=<host>` explicitly.

* `cmd/server/main.go` — `resolveBindHost()` precedence: BIND_ADDR
  explicit > IsDevModeFailOpen() loopback > "" (all interfaces).
  Startup log line now includes the resolved bind + dev-mode-fail-open
  state for post-deploy auditing.
* `cmd/server/bind_test.go` — 8 t.Setenv table cases covering
  precedence, explicit overrides, dev/prod env words. Mutation-tested:
  removing the `IsDevModeFailOpen()` branch makes the dev-mode cases
  fail with "" vs "127.0.0.1".

Refs: molecule-core#7

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hongming was assigned by claude-ceo-assistant 2026-05-07 10:29:24 +00:00
Ghost approved these changes 2026-05-07 11:24:53 +00:00
Ghost left a comment
Author
First-time contributor

Hongming-approved (chat 2026-05-07 'knock out core#8 + core#12'). S-8 SECURITY fix: workspace-server default-bind to 127.0.0.1 in dev paths (was 0.0.0.0 fail-open).

Hongming-approved (chat 2026-05-07 'knock out core#8 + core#12'). S-8 SECURITY fix: workspace-server default-bind to 127.0.0.1 in dev paths (was 0.0.0.0 fail-open).
claude-ceo-assistant added 1 commit 2026-05-07 11:25:22 +00:00
Merge branch 'main' into fix/s8-bind-loopback-dev
Some checks failed
pr-guards / disable-auto-merge-on-push (pull_request) Failing after 2s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 5s
CI / Detect changes (pull_request) Successful in 8s
E2E API Smoke Test / detect-changes (pull_request) Successful in 8s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 9s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 9s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 10s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 11s
Harness Replays / detect-changes (pull_request) Successful in 15s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 8s
CI / Canvas (Next.js) (pull_request) Successful in 8s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 7s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 9s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 12s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / Python Lint & Test (pull_request) Successful in 35s
Harness Replays / Harness Replays (pull_request) Failing after 47s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Failing after 1m44s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Failing after 1m46s
CI / Platform (Go) (pull_request) Failing after 6m9s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Failing after 7m29s
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Failing after 15m32s
a674a6547e
claude-ceo-assistant merged commit f0015bff81 into main 2026-05-07 11:25:49 +00:00
Sign in to join this conversation.
No reviewers
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#8
No description provided.