Files
molecule-core/scripts/dev-start.sh
core-devops 3747fe2f49 harden(security): remove dev-mode fail-open auth — fail-closed everywhere + dev-token + regression gate
CTO directive: "nothing should be fail-open." Remove the dev-mode fail-open
auth hatch so AdminAuth/WorkspaceAuth (and the discovery caller) ALWAYS
require a real credential — fail-CLOSED in every environment, dev included —
fix local dev to stay AUTHENTICATED (not open), and add a regression gate so
fail-open cannot return.

Removed fail-open call-sites (workspace-server):
- internal/middleware/wsauth_middleware.go WorkspaceAuth — deleted the
  isDevModeFailOpen() short-circuit that let a bearer-less /workspaces/:id/*
  request through when MOLECULE_ENV=dev + ADMIN_TOKEN unset.
- internal/middleware/wsauth_middleware.go AdminAuth — deleted BOTH fail-open
  branches: the Tier-1 lazy-bootstrap (no live tokens + no ADMIN_TOKEN ⇒ pass,
  the C4 /org/import pre-empt hole) and the Tier-1b isDevModeFailOpen() dev
  hatch. HasAnyLiveTokenGlobal is still probed for the 503-on-outage semantics
  but opens no path.
- internal/handlers/discovery.go validateDiscoveryCaller — deleted the
  IsDevModeFailOpen() allow branch; discovery now requires a verified CP
  session or valid bearer in every env.
- Removed the isDevModeFailOpen()/IsDevModeFailOpen() helper entirely. The two
  legitimately non-auth uses (rate-limit relaxation in ratelimit.go, loopback
  bind default in cmd/server) now key on a new NON-security isLocalDevEnv()
  predicate (MOLECULE_ENV only, decoupled from ADMIN_TOKEN). CanvasOrBearer's
  cosmetic-only behaviour (PUT /canvas/viewport) is unchanged.

Dev path stays authenticated, not open:
- scripts/dev-start.sh provisions a deterministic ADMIN_TOKEN into .env and
  exports the matching NEXT_PUBLIC_ADMIN_TOKEN so the dev Canvas sends a real
  bearer (canvas/src/lib/api.ts already attaches it; next.config.ts pair-guard).
- Docs updated: .env.example, docs/quickstart.md, docs/architecture/overview.md.

Regression gate:
- internal/middleware/no_fail_open_test.go — asserts AdminAuth + WorkspaceAuth
  fail CLOSED (401) under the EXACT old-hatch conditions (ADMIN_TOKEN unset +
  MOLECULE_ENV=dev/development × hasLive 0/1). Proven RED against a temporarily
  restored hatch, GREEN after. Plus a source-guard test forbidding the
  isDevModeFailOpen(-style helper from re-appearing.
- Converted the stale fail-open assertions in wsauth_middleware_test.go,
  discovery_test.go, security_regression_685_686_687_688_test.go and the
  devmode/bind tests to pin the fail-closed contract.

Audit (other fail-open patterns on the auth surface): CanvasOrBearer and
validateDiscoveryCaller retain a fail-open-on-DB-error (and CanvasOrBearer a
no-token lazy-bootstrap) — both are documented availability tradeoffs on
cosmetic / low-sensitivity routes, left as-is and flagged for follow-up.

Verify: go build ./... ok; go vet middleware/cmd/handlers clean; full module
go test ./... = 46 ok / 0 fail.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 01:02:48 -07:00

244 lines
11 KiB
Bash
Executable File

#!/bin/sh
# dev-start.sh — one-command local development environment.
#
# What it does (in order):
# 1. Generates ADMIN_TOKEN into .env if missing (closes #684 fail-open)
# 2. Runs infra/scripts/setup.sh (postgres + redis + langfuse + clickhouse
# + temporal + populates template/plugin registry from manifest.json)
# 3. Starts the platform (Go :8080), waits for /health
# 4. Starts the canvas (Next.js :3000), waits for HTTP 200
# 5. Prints a readiness banner with API-key add instructions
# 6. On Ctrl-C, kills both background processes and tears down infra
#
# Prerequisites:
# - Docker + Docker Compose v2 (for postgres/redis/langfuse/etc)
# - Go 1.25+ (for the platform binary)
# - Node.js 20+ (for the canvas)
# - jq (for setup.sh's manifest clone — optional;
# without it, template palette will be
# empty until you run clone-manifest.sh
# manually)
#
# Usage:
# ./scripts/dev-start.sh
# # Open http://localhost:3000, add your model API key in
# # Config → Secrets & API Keys, then create your first workspace.
#
# Idempotent: re-running picks up where the last run left off (existing
# .env is preserved, npm install skipped if node_modules present, etc).
set -e
ROOT="$(cd "$(dirname "$0")/.." && pwd)"
ENV_FILE="$ROOT/.env"
cleanup() {
echo ""
echo "==> Shutting down..."
kill $PLATFORM_PID $CANVAS_PID 2>/dev/null || true
# Use setup.sh's compose file (full infra) since that's what we
# brought up. `down` keeps named volumes by default — call with
# --volumes here only if you want a clean slate (we don't, since
# idempotent re-runs are the usual case).
docker compose -f "$ROOT/docker-compose.infra.yml" down 2>/dev/null || true
echo " Done."
}
trap cleanup EXIT INT TERM
# ─────────────────────────────────────────────── 1. dev-mode auth posture
#
# SECURITY (harden/no-fail-open-auth): the workspace-server auth chain is
# now fail-CLOSED in EVERY environment, dev included. There is NO dev-mode
# fail-open escape hatch anymore — AdminAuth / WorkspaceAuth / discovery all
# require a real credential. So local dev must AUTHENTICATE, not run open.
#
# The clean way to keep the canvas working locally is to provision a
# deterministic ADMIN_TOKEN and hand the matching NEXT_PUBLIC_ADMIN_TOKEN to
# the canvas bundle. The canvas already attaches `Authorization: Bearer
# $NEXT_PUBLIC_ADMIN_TOKEN` on every platform call (canvas/src/lib/api.ts),
# and next.config.ts warns if the pair is half-set. We set BOTH here.
#
# MOLECULE_ENV=development — dev conveniences (loopback bind, relaxed
# rate limit). NOT an auth lever.
# ADMIN_TOKEN=<dev value> — server-side bearer AdminAuth/WorkspaceAuth
# enforce (Tier-2b). Real credential.
# NEXT_PUBLIC_ADMIN_TOKEN — same value, baked into the canvas bundle so
# the browser sends the matching bearer.
#
# For SaaS the platform is provisioned with a random ADMIN_TOKEN + the
# canvas image baked with the matching NEXT_PUBLIC_ADMIN_TOKEN, plus
# MOLECULE_ENV=production. Same shape, stronger secret.
if [ -f "$ENV_FILE" ] && grep -q '^MOLECULE_ENV=' "$ENV_FILE"; then
echo "==> Reusing MOLECULE_ENV from existing .env"
else
echo "==> Setting MOLECULE_ENV=development in .env"
{
if [ -f "$ENV_FILE" ]; then
cat "$ENV_FILE"
echo ""
fi
echo "# Generated by scripts/dev-start.sh on $(date -u +%Y-%m-%dT%H:%M:%SZ)"
echo "# Local-dev conveniences (loopback bind, relaxed rate limit)."
echo "# Auth is fail-closed even in dev — see ADMIN_TOKEN below."
echo "MOLECULE_ENV=development"
} > "$ENV_FILE.tmp"
mv "$ENV_FILE.tmp" "$ENV_FILE"
echo " Saved to $ENV_FILE"
fi
# Provision a deterministic dev ADMIN_TOKEN (idempotent — preserved across
# re-runs). This is the credential the canvas authenticates with locally; it
# is NOT a secret (it only guards your own localhost stack), so a fixed,
# well-known value is fine and keeps re-runs reproducible.
DEV_ADMIN_TOKEN="dev-local-admin-token"
if [ -f "$ENV_FILE" ] && grep -q '^ADMIN_TOKEN=' "$ENV_FILE"; then
echo "==> Reusing ADMIN_TOKEN from existing .env"
else
echo "==> Provisioning dev ADMIN_TOKEN in .env (fail-closed auth, authenticated canvas)"
{
cat "$ENV_FILE"
echo ""
echo "# Dev ADMIN_TOKEN — the canvas authenticates with this locally."
echo "# Auth is fail-closed; without a matching bearer the canvas 401s."
echo "# Fixed value is fine: it only guards your localhost stack."
echo "ADMIN_TOKEN=$DEV_ADMIN_TOKEN"
} > "$ENV_FILE.tmp"
mv "$ENV_FILE.tmp" "$ENV_FILE"
echo " Saved to $ENV_FILE"
fi
# Source .env so the platform inherits ADMIN_TOKEN (and anything else
# the user has added — e.g. ANTHROPIC_API_KEY for skipping the canvas
# Secrets UI). `set -a` exports every assignment in the sourced file
# without us having to know the var names.
set -a
# shellcheck disable=SC1090
. "$ENV_FILE"
set +a
# The canvas reads NEXT_PUBLIC_ADMIN_TOKEN at build/dev time and attaches it
# as the bearer on every platform call. Mirror the server-side ADMIN_TOKEN
# into it so the matched-pair guard in canvas/next.config.ts is satisfied and
# the browser authenticates. Exported for the `npm run dev` child below.
export NEXT_PUBLIC_ADMIN_TOKEN="$ADMIN_TOKEN"
# ─────────────────────────────────────────────── 2. infra + templates
# Use setup.sh (not raw docker-compose) so the template registry gets
# populated from manifest.json. Without that, the canvas template
# palette is empty and the user has to manually clone repos — exactly
# the friction this script exists to eliminate.
echo "==> Running infra/scripts/setup.sh (infra + template registry)"
"$ROOT/infra/scripts/setup.sh"
# ─────────────────────────────────────────────── 3. platform
#
# Two paths:
# (a) `go` is on PATH → run the platform directly via `go run`.
# Fast iteration, attaches to /tmp/molecule-platform.log.
# (b) `go` is NOT on PATH → fall back to the published platform
# container image. Slower first run (image pull) but the script
# still works on a fresh dev box without forcing the dev to
# install Go just to read logs.
#
# The earlier version of this script silently called `go run` and died
# with `go: not found` on dev boxes where Go wasn't installed; the
# script's own prerequisite list (line 13-21) said "Go 1.25+" but the
# user had no signpost between "open the doc" and "command not found
# at line 111." This branch makes the failure path either succeed
# (fallback) or fail loud with explicit install guidance.
if command -v go >/dev/null 2>&1; then
echo "==> Starting Platform (Go :8080)"
cd "$ROOT/workspace-server"
go run ./cmd/server > /tmp/molecule-platform.log 2>&1 &
PLATFORM_PID=$!
else
echo "==> Go not found on PATH — falling back to docker-compose platform service"
echo " (Install Go 1.25+ for faster iteration: https://go.dev/dl/)"
cd "$ROOT"
# Bring up just the platform service from docker-compose.yml. infra/setup.sh
# already brought up postgres+redis+etc on docker-compose.infra.yml; this
# adds the platform container on top, mapped to :8080 so the rest of this
# script's wait-for-/health loop works unchanged.
docker compose up -d --build platform > /tmp/molecule-platform.log 2>&1 || {
echo " ✗ docker compose up platform failed — see /tmp/molecule-platform.log"
echo " Either install Go 1.25+ (https://go.dev/dl/) and rerun, or fix the docker fallback."
exit 1
}
# PLATFORM_PID is unset on this path; cleanup() handles that with `kill ... 2>/dev/null || true`.
PLATFORM_PID=
fi
echo " Waiting for Platform /health..."
PLATFORM_READY=0
for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 \
21 22 23 24 25 26 27 28 29 30; do
if curl -sf http://localhost:8080/health >/dev/null 2>&1; then
echo " Platform ready (t+${i}s)"
PLATFORM_READY=1
break
fi
sleep 1
done
if [ "$PLATFORM_READY" -ne 1 ]; then
echo " ✗ Platform did not respond in 30s — check /tmp/molecule-platform.log"
exit 1
fi
# ─────────────────────────────────────────────── 4. canvas
echo "==> Starting Canvas (Next.js :3000)"
cd "$ROOT/canvas"
if [ ! -d node_modules ]; then
echo " First-run: npm install (~30-60s)"
npm install
fi
npm run dev > /tmp/molecule-canvas.log 2>&1 &
CANVAS_PID=$!
echo " Waiting for Canvas HTTP 200..."
CANVAS_READY=0
for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 \
21 22 23 24 25 26 27 28 29 30; do
code=$(curl -sf -o /dev/null -w "%{http_code}" http://localhost:3000/ 2>/dev/null || echo "0")
if [ "$code" = "200" ]; then
echo " Canvas ready (t+${i}s)"
CANVAS_READY=1
break
fi
sleep 1
done
if [ "$CANVAS_READY" -ne 1 ]; then
echo " ✗ Canvas did not respond in 30s — check /tmp/molecule-canvas.log"
exit 1
fi
# ─────────────────────────────────────────────── 5. readiness banner
cat <<EOF
═══════════════════════════════════════════════════════════
Molecule AI dev environment ready
Canvas: http://localhost:3000
Platform: http://localhost:8080 (bound to loopback in dev)
Auth: fail-closed — canvas authenticates with the dev ADMIN_TOKEN
(ADMIN_TOKEN + NEXT_PUBLIC_ADMIN_TOKEN, see .env)
Logs: /tmp/molecule-platform.log
/tmp/molecule-canvas.log
Next steps:
1. Open http://localhost:3000 in a browser.
2. Add your model API key in
Config → Secrets & API Keys → Global
(skip if ANTHROPIC_API_KEY / OPENAI_API_KEY is already
set in .env — the platform inherits it.)
3. Click a template card or "+ Create blank workspace".
Press Ctrl-C to stop all services.
═══════════════════════════════════════════════════════════
EOF
wait