The previous commit on this branch added a dev-mode fail-open branch to
AdminAuth so the Canvas dashboard could enumerate workspaces after the
first token lands in the DB. Verification via Chrome (clicking a
workspace to open its side panel) surfaced the same class of bug on a
different middleware — `WorkspaceAuth` — triggering:
API GET /workspaces/<id>/activity?type=a2a_receive&source=canvas&limit=50:
401 {"error":"missing workspace auth token"}
Root cause is identical to AdminAuth's: in local dev the Canvas (at
localhost:3000) calls the platform (at localhost:8080) cross-port, so
`isSameOriginCanvas`'s Host==Referer check fails. Without a bearer
token, every per-workspace read (/activity, /delegations, /memories,
/events/stream, /schedules, etc.) 401s and the side panel is unusable.
### Fix
Symmetric extension in `WorkspaceAuth` (workspace-server/internal/middleware/wsauth_middleware.go):
after the existing `isSameOriginCanvas` fallback, add a narrow escape
hatch that stays fail-open only when BOTH
- `ADMIN_TOKEN` is unset (operator has not opted in to the #684
closure), AND
- `MOLECULE_ENV` is explicitly a dev mode (`development` / `dev`).
SaaS tenants never hit this branch because hosted provisioning sets
both `ADMIN_TOKEN` and `MOLECULE_ENV=production`. The comment in the
code also links back to AdminAuth's Tier-1b for consistency.
### Tests
Three new table-driven tests in wsauth_middleware_test.go mirror the
AdminAuth tier-1b suite, exercising the positive path and both
negative cases:
- `TestWorkspaceAuth_DevModeEscapeHatch_NoBearer_FailsOpen` — the
happy path (dev mode, no admin token → 200)
- `TestWorkspaceAuth_DevModeEscapeHatch_IgnoredInProduction` — the
SaaS-safety guarantee (production + no admin token → 401)
- `TestWorkspaceAuth_DevModeEscapeHatch_IgnoredWhenAdminTokenSet` —
explicit `ADMIN_TOKEN` wins; dev mode does not silently override
the opt-in
### Comprehensive audit of adjacent middlewares
Re-scanned every file under workspace-server/internal/middleware/ and
every handler that invokes `AbortWithStatusJSON(Unauthorized)` directly,
to check for other surfaces where local dev might silently 401.
Findings, already OK:
- `CanvasOrBearer` — cosmetic routes already accept localhost:3000
via `canvasOriginAllowed` (Origin header check); no change needed.
- `tenant_guard.go` — no-op when `MOLECULE_ORG_ID` is unset (self-
hosted / dev); no change needed.
- `session_auth.go` — verifies against `CP_UPSTREAM_URL`; returns
(false, false) in local dev so callers fall through to bearer; no
change needed.
- `socket.go` `HandleConnect` — Canvas browser clients don't send
`X-Workspace-ID` so skip the bearer check; agent clients do and
validate as today. No change needed.
- Handlers in handlers/{discovery,registry,secrets,plugins_install,
a2a_proxy_helpers,schedules}.go — all workspace-scoped routes
called by the workspace runtime, not the Canvas browser. Unaffected.
- `handlers/admin_test_token.go` — already `MOLECULE_ENV`-aware (the
convention this hatch mirrors).
### End-to-end verification
1. Fresh-nuked DB, platform + canvas restarted with `MOLECULE_ENV=development`
2. `POST /workspaces` → token lands in DB (Tier-1 would close here)
3. Probed every Canvas-hit endpoint with no bearer, with Canvas-like
`Origin: http://localhost:3000`:
200 /workspaces
200 /workspaces/<id>/activity
200 /workspaces/<id>/delegations
200 /workspaces/<id>/memories
200 /approvals/pending
200 /events
4. Chrome browser test: opened http://localhost:3000, clicked a
workspace tile — the side panel rendered with the full 13-tab
structure (Chat, Activity, Details, Skills, Terminal, Config,
Schedule, Channels, Files, Memory, Traces, Events, Audit) and no
`Failed to load chat history` error. "No messages yet" placeholder
shows instead of the 401 retry screen.
5. `go test -race ./internal/middleware/` — clean
6. `bash tests/e2e/test_api.sh` — 61/61 pass
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
402 lines
16 KiB
Go
402 lines
16 KiB
Go
package middleware
|
|
|
|
import (
|
|
"crypto/subtle"
|
|
"database/sql"
|
|
"errors"
|
|
"log"
|
|
"net/http"
|
|
"os"
|
|
"strings"
|
|
|
|
"github.com/Molecule-AI/molecule-monorepo/platform/internal/orgtoken"
|
|
"github.com/Molecule-AI/molecule-monorepo/platform/internal/wsauth"
|
|
"github.com/gin-gonic/gin"
|
|
)
|
|
|
|
// WorkspaceAuth returns a Gin middleware that enforces per-workspace bearer-token
|
|
// authentication on /workspaces/:id/* sub-routes.
|
|
//
|
|
// Strict: every request MUST carry Authorization: Bearer <token> matching a
|
|
// live (non-revoked) token for the workspace. No grace period, no fail-open.
|
|
//
|
|
// History: originally this middleware had a lazy-bootstrap grace period for
|
|
// pre-Phase-30.1 workspaces without a live token, so rolling upgrades didn't
|
|
// brick in-flight agents. #318 tightened the fake-UUID leak (non-existent
|
|
// workspace IDs were falling through). #351 then showed the remaining hole:
|
|
// test-artifact workspaces from prior DAST runs still exist in the DB with
|
|
// empty configs and no tokens, so they pass WorkspaceExists + fall through
|
|
// the grace period — leaking global-secret key names to any unauth caller on
|
|
// the Docker network. Phase 30.1 shipped months ago; every live workspace has
|
|
// since gone through multiple boot cycles and acquired a token. The grace
|
|
// period no longer serves legitimate traffic. Removing it entirely closes
|
|
// #351 without affecting registration (which is on /registry/register,
|
|
// outside this middleware's scope).
|
|
//
|
|
// Intended for route groups that cover all /workspaces/:id/* paths.
|
|
// The /workspaces/:id/a2a route must be registered on the root router (outside
|
|
// this group) because it already authenticates callers via CanCommunicate.
|
|
func WorkspaceAuth(database *sql.DB) gin.HandlerFunc {
|
|
return func(c *gin.Context) {
|
|
workspaceID := c.Param("id")
|
|
if workspaceID == "" {
|
|
c.AbortWithStatusJSON(http.StatusUnauthorized, gin.H{"error": "missing workspace ID"})
|
|
return
|
|
}
|
|
ctx := c.Request.Context()
|
|
|
|
tok := wsauth.BearerTokenFromHeader(c.GetHeader("Authorization"))
|
|
if tok != "" {
|
|
// Admin token fallback — lets the canvas dashboard read workspace
|
|
// activity, traces, delegations with a single admin credential.
|
|
adminSecret := os.Getenv("ADMIN_TOKEN")
|
|
if adminSecret != "" && subtle.ConstantTimeCompare([]byte(tok), []byte(adminSecret)) == 1 {
|
|
c.Next()
|
|
return
|
|
}
|
|
// Org-scoped API token — user-minted from canvas UI. Grants
|
|
// access to EVERY workspace in the org (that's the explicit
|
|
// product spec: one org key can touch each workspace). Same
|
|
// power surface as ADMIN_TOKEN but named, revocable, audited.
|
|
// Check before per-workspace token so an org-key presenter
|
|
// doesn't hit the narrower ValidateToken failure path.
|
|
if id, prefix, orgID, err := orgtoken.Validate(ctx, database, tok); err == nil {
|
|
c.Set("org_token_id", id)
|
|
c.Set("org_token_prefix", prefix)
|
|
// org_id may be "" for pre-migration tokens (NULL column).
|
|
// Don't set the context key in that case so downstream callers
|
|
// can distinguish "unanchored token" (exists==false) from
|
|
// "anchored to this org" (exists==true, value non-empty).
|
|
if orgID != "" {
|
|
c.Set("org_id", orgID)
|
|
}
|
|
c.Next()
|
|
return
|
|
} else if !errors.Is(err, orgtoken.ErrInvalidToken) {
|
|
log.Printf("wsauth: WorkspaceAuth: orgtoken.Validate: %v", err)
|
|
c.AbortWithStatusJSON(http.StatusInternalServerError, gin.H{"error": "auth check failed"})
|
|
return
|
|
}
|
|
// Per-workspace token — narrowest scope, bound to this :id.
|
|
if err := wsauth.ValidateToken(ctx, database, workspaceID, tok); err != nil {
|
|
c.AbortWithStatusJSON(http.StatusUnauthorized, gin.H{"error": "invalid workspace auth token"})
|
|
return
|
|
}
|
|
c.Next()
|
|
return
|
|
}
|
|
// Same-origin canvas on tenant image — Referer matches Host.
|
|
if isSameOriginCanvas(c) {
|
|
c.Next()
|
|
return
|
|
}
|
|
// Local-dev escape hatch. Mirrors the Tier-1b branch in AdminAuth:
|
|
// on `go run ./cmd/server` + `npm run dev` the Canvas (at
|
|
// localhost:3000) calls the platform (at localhost:8080) cross-port,
|
|
// so isSameOriginCanvas's Host==Referer check fails. Without a
|
|
// bearer, every GET /workspaces/:id/activity / /delegations call
|
|
// 401s and the Canvas can't show chat history or agent comms.
|
|
// Gated on MOLECULE_ENV=development + ADMIN_TOKEN unset so SaaS
|
|
// (always MOLECULE_ENV=production + ADMIN_TOKEN set) never hits it.
|
|
if os.Getenv("ADMIN_TOKEN") == "" {
|
|
env := strings.ToLower(strings.TrimSpace(os.Getenv("MOLECULE_ENV")))
|
|
if env == "development" || env == "dev" {
|
|
c.Next()
|
|
return
|
|
}
|
|
}
|
|
c.AbortWithStatusJSON(http.StatusUnauthorized, gin.H{"error": "missing workspace auth token"})
|
|
return
|
|
}
|
|
}
|
|
|
|
// AdminAuth returns a Gin middleware for global/admin routes (e.g.
|
|
// /settings/secrets, /admin/secrets) that have no per-workspace scope.
|
|
//
|
|
// # Credential tier (evaluated in order)
|
|
//
|
|
// 1. Lazy-bootstrap fail-open: if no live workspace token exists anywhere on
|
|
// the platform (fresh install / pre-Phase-30 upgrade), every request passes
|
|
// through so existing deployments keep working.
|
|
//
|
|
// 2. ADMIN_TOKEN env var (recommended, closes #684): when set, the bearer
|
|
// MUST equal this value exactly (constant-time comparison). Workspace
|
|
// bearer tokens are intentionally rejected even if valid — a compromised
|
|
// workspace agent must not be able to read global secrets, steal GitHub App
|
|
// installation tokens, or enumerate pending approvals across the platform.
|
|
// Set ADMIN_TOKEN to a strong random secret (e.g. openssl rand -base64 32).
|
|
//
|
|
// 3. Fallback — workspace token (deprecated, backward-compat): when
|
|
// ADMIN_TOKEN is not set and workspace tokens do exist globally, any valid
|
|
// workspace bearer token is still accepted. This preserves existing
|
|
// behaviour for deployments that have not yet configured ADMIN_TOKEN, but
|
|
// it leaves the blast-radius isolation gap described in #684 open. Set
|
|
// ADMIN_TOKEN to eliminate this fallback.
|
|
//
|
|
// NOTE: canvasOriginAllowed / isSameOriginCanvas are intentionally NOT called
|
|
// here. The Origin header is trivially forgeable by any container on the
|
|
// Docker network; using it as an auth bypass would let an attacker reach
|
|
// /settings/secrets, /bundles/import, /events, etc. without a bearer token.
|
|
// Those short-circuits belong ONLY in CanvasOrBearer (cosmetic routes). (#623)
|
|
func AdminAuth(database *sql.DB) gin.HandlerFunc {
|
|
return func(c *gin.Context) {
|
|
ctx := c.Request.Context()
|
|
adminSecret := os.Getenv("ADMIN_TOKEN")
|
|
|
|
hasLive, err := wsauth.HasAnyLiveTokenGlobal(ctx, database)
|
|
if err != nil {
|
|
log.Printf("wsauth: AdminAuth: HasAnyLiveTokenGlobal failed: %v", err)
|
|
c.AbortWithStatusJSON(http.StatusInternalServerError, gin.H{"error": "auth check failed"})
|
|
return
|
|
}
|
|
if !hasLive {
|
|
// Tier 1: fail-open is ONLY safe when ADMIN_TOKEN is unset
|
|
// (self-hosted dev, pre-Phase-30 upgrade). Hosted SaaS always
|
|
// sets ADMIN_TOKEN at provision time, and C4 (SaaS-launch
|
|
// blocker) showed that without this guard an attacker can
|
|
// pre-empt the first user by POSTing /org/import before any
|
|
// token gets minted. When ADMIN_TOKEN is set we fall through
|
|
// into the same bearer-check path Tier-2 uses below.
|
|
if adminSecret == "" {
|
|
c.Next()
|
|
return
|
|
}
|
|
}
|
|
|
|
// Tier 1b: Local-dev escape hatch. On `go run ./cmd/server` the
|
|
// Canvas has no bearer token (there's no WorkOS session, no
|
|
// baked NEXT_PUBLIC_ADMIN_TOKEN), so the moment the first
|
|
// workspace token lands in the DB Tier 1 closes and Canvas → 401
|
|
// on every GET /workspaces. This reopens fail-open *only* when
|
|
// - ADMIN_TOKEN is empty (i.e. the operator has not opted in
|
|
// to the Phase-30 closure), AND
|
|
// - MOLECULE_ENV is explicitly a dev mode.
|
|
// SaaS never hits this branch because tenant provisioning sets
|
|
// both ADMIN_TOKEN and MOLECULE_ENV=production. Matches the
|
|
// existing convention in handlers/admin_test_token.go which
|
|
// gates the test-token endpoint on MOLECULE_ENV != "production".
|
|
if adminSecret == "" {
|
|
env := strings.ToLower(strings.TrimSpace(os.Getenv("MOLECULE_ENV")))
|
|
if env == "development" || env == "dev" {
|
|
c.Next()
|
|
return
|
|
}
|
|
}
|
|
|
|
// SaaS-canvas path: when the request carries a WorkOS session
|
|
// cookie AND the CP confirms it's valid, accept without a
|
|
// bearer. This is how the tenant's Next.js canvas UI
|
|
// authenticates — the browser has a session cookie scoped
|
|
// to .moleculesai.app, and we verify it upstream against
|
|
// /cp/auth/me (short-cached; see verifiedCPSession).
|
|
//
|
|
// Only runs when CP_UPSTREAM_URL is set (prod SaaS); self-
|
|
// hosted / dev deploys without a CP fall through to the
|
|
// bearer-only path unchanged.
|
|
if cookieHeader := c.GetHeader("Cookie"); cookieHeader != "" {
|
|
if ok, _ := verifiedCPSession(cookieHeader); ok {
|
|
c.Next()
|
|
return
|
|
}
|
|
// Cookie presented but invalid: fall through to the
|
|
// bearer-check path, which will 401. We do NOT abort
|
|
// here so molecli / CLI users with both a cookie and
|
|
// a stale cookie + valid bearer still pass.
|
|
}
|
|
|
|
// Bearer token is the ONLY accepted credential for admin routes.
|
|
tok := wsauth.BearerTokenFromHeader(c.GetHeader("Authorization"))
|
|
if tok == "" {
|
|
c.AbortWithStatusJSON(http.StatusUnauthorized, gin.H{"error": "admin auth required"})
|
|
return
|
|
}
|
|
|
|
// Tier 2a: org-scoped API tokens (user-minted via canvas UI).
|
|
// Precedes the ADMIN_TOKEN check because these are the
|
|
// tokens users actually manage — named, revocable, audited.
|
|
// ADMIN_TOKEN is the bootstrap/break-glass credential that
|
|
// still works but is NOT visible through the UI. Both grant
|
|
// the same access surface (full org admin); the tier split
|
|
// is about provenance + rotation, not privilege.
|
|
//
|
|
// Validate() runs ONE indexed lookup (token_hash partial
|
|
// index with revoked_at IS NULL) + an async last_used_at
|
|
// bump. Cost per request: one SELECT + one UPDATE, both
|
|
// hitting the same narrow partial index.
|
|
if id, prefix, orgID, err := orgtoken.Validate(ctx, database, tok); err == nil {
|
|
c.Set("org_token_id", id)
|
|
c.Set("org_token_prefix", prefix)
|
|
// Conditional set — see WorkspaceAuth branch above for rationale.
|
|
if orgID != "" {
|
|
c.Set("org_id", orgID)
|
|
}
|
|
c.Next()
|
|
return
|
|
} else if !errors.Is(err, orgtoken.ErrInvalidToken) {
|
|
// DB error — fail closed and log. Don't expose DB text.
|
|
log.Printf("wsauth: AdminAuth: orgtoken.Validate: %v", err)
|
|
c.AbortWithStatusJSON(http.StatusInternalServerError, gin.H{"error": "auth check failed"})
|
|
return
|
|
}
|
|
|
|
// Tier 2b (#684 fix): dedicated ADMIN_TOKEN — workspace bearer tokens
|
|
// must not grant access to admin routes.
|
|
if adminSecret != "" {
|
|
if subtle.ConstantTimeCompare([]byte(tok), []byte(adminSecret)) != 1 {
|
|
c.AbortWithStatusJSON(http.StatusUnauthorized, gin.H{"error": "invalid admin auth token"})
|
|
return
|
|
}
|
|
c.Next()
|
|
return
|
|
}
|
|
|
|
// Tier 3 (deprecated): ADMIN_TOKEN not configured — fall back to any
|
|
// valid workspace token. Operators should set ADMIN_TOKEN to close #684.
|
|
if err := wsauth.ValidateAnyToken(ctx, database, tok); err != nil {
|
|
c.AbortWithStatusJSON(http.StatusUnauthorized, gin.H{"error": "invalid admin auth token"})
|
|
return
|
|
}
|
|
c.Next()
|
|
}
|
|
}
|
|
|
|
// CanvasOrBearer is a softer admin-auth variant used ONLY for cosmetic
|
|
// canvas routes where forging the request has zero security impact (PUT
|
|
// /canvas/viewport: worst case an attacker resets the shared viewport
|
|
// position, user refreshes the page, problem solved).
|
|
//
|
|
// Accepts either:
|
|
//
|
|
// 1. A valid bearer token (same contract as AdminAuth) — covers molecli,
|
|
// agent-to-platform calls, and anyone using the API directly.
|
|
// 2. A browser Origin header that matches CORS_ORIGINS (canvas itself).
|
|
// This is NOT a strict auth boundary — curl can forge Origin — but for
|
|
// cosmetic-only routes the trade-off is acceptable. Non-cosmetic routes
|
|
// MUST NOT use this middleware (see #194 review on why it would re-open
|
|
// #164 CRITICAL if applied to /bundles/import).
|
|
//
|
|
// Lazy-bootstrap fail-open preserved: zero-token installs pass everything
|
|
// through so fresh self-hosted / dev sessions aren't bricked.
|
|
func CanvasOrBearer(database *sql.DB) gin.HandlerFunc {
|
|
return func(c *gin.Context) {
|
|
ctx := c.Request.Context()
|
|
|
|
hasLive, err := wsauth.HasAnyLiveTokenGlobal(ctx, database)
|
|
if err != nil {
|
|
log.Printf("wsauth: CanvasOrBearer HasAnyLiveTokenGlobal failed: %v — allowing request", err)
|
|
c.Next()
|
|
return
|
|
}
|
|
if !hasLive {
|
|
c.Next()
|
|
return
|
|
}
|
|
|
|
// Path 1: bearer present → bearer MUST validate. Do not fall through
|
|
// to Origin on an invalid bearer — an attacker with a revoked /
|
|
// expired token + a matching Origin would otherwise bypass auth.
|
|
// Empty bearer → skip to Origin path (canvas never sends one).
|
|
if tok := wsauth.BearerTokenFromHeader(c.GetHeader("Authorization")); tok != "" {
|
|
// Admin token accepted for canvas dashboard
|
|
adminSecret := os.Getenv("ADMIN_TOKEN")
|
|
if adminSecret != "" && subtle.ConstantTimeCompare([]byte(tok), []byte(adminSecret)) == 1 {
|
|
c.Next()
|
|
return
|
|
}
|
|
if err := wsauth.ValidateAnyToken(ctx, database, tok); err != nil {
|
|
c.AbortWithStatusJSON(http.StatusUnauthorized, gin.H{"error": "invalid admin auth token"})
|
|
return
|
|
}
|
|
c.Next()
|
|
return
|
|
}
|
|
|
|
// Path 2: canvas origin match (cross-origin canvas).
|
|
if canvasOriginAllowed(c.GetHeader("Origin")) {
|
|
c.Next()
|
|
return
|
|
}
|
|
|
|
// Path 3: same-origin canvas (tenant image).
|
|
if isSameOriginCanvas(c) {
|
|
c.Next()
|
|
return
|
|
}
|
|
|
|
c.AbortWithStatusJSON(http.StatusUnauthorized, gin.H{"error": "admin auth required"})
|
|
}
|
|
}
|
|
|
|
// canvasOriginAllowed returns true if origin matches any entry in the
|
|
// CORS_ORIGINS env var (comma-separated) or the localhost defaults.
|
|
// Exact-match only; no prefix or wildcard logic — that's handled by the
|
|
// real CORS middleware upstream. The intent here is "did this request come
|
|
// from the canvas page the user is already logged into?" — a binary check.
|
|
func canvasOriginAllowed(origin string) bool {
|
|
if origin == "" {
|
|
return false
|
|
}
|
|
allowed := []string{"http://localhost:3000", "http://localhost:3001"}
|
|
if v := os.Getenv("CORS_ORIGINS"); v != "" {
|
|
for _, o := range strings.Split(v, ",") {
|
|
if o = strings.TrimSpace(o); o != "" {
|
|
allowed = append(allowed, o)
|
|
}
|
|
}
|
|
}
|
|
for _, a := range allowed {
|
|
if a == origin {
|
|
return true
|
|
}
|
|
}
|
|
return false
|
|
}
|
|
|
|
// isSameOriginCanvas returns true when the request appears to come from the
|
|
// canvas UI served by the same Go process (tenant image). In this topology,
|
|
// the browser sends same-origin requests with an empty Origin header but a
|
|
// Referer matching the request Host. We accept these requests because the
|
|
// canvas is the trusted frontend — same as if Origin matched CORS_ORIGINS.
|
|
//
|
|
// This only fires when CANVAS_PROXY_URL is set (i.e. the combined tenant
|
|
// image is active), so self-hosted / dev setups with separate canvas and
|
|
// platform origins are unaffected.
|
|
// canvasProxyActive is true when the platform runs as a combined tenant
|
|
// image (CANVAS_PROXY_URL set at boot). Cached once to avoid os.Getenv
|
|
// on every request.
|
|
var canvasProxyActive = os.Getenv("CANVAS_PROXY_URL") != ""
|
|
|
|
// IsSameOriginCanvas is the exported version for use outside the middleware
|
|
// package (e.g. workspace.go field-level auth). Same logic as the internal
|
|
// callers in AdminAuth/WorkspaceAuth/CanvasOrBearer.
|
|
func IsSameOriginCanvas(c *gin.Context) bool {
|
|
return isSameOriginCanvas(c)
|
|
}
|
|
|
|
func isSameOriginCanvas(c *gin.Context) bool {
|
|
if !canvasProxyActive {
|
|
return false
|
|
}
|
|
host := c.Request.Host
|
|
if host == "" {
|
|
return false
|
|
}
|
|
// Check Referer first (standard browser requests).
|
|
referer := c.GetHeader("Referer")
|
|
if referer != "" {
|
|
// Referer must start with https://<host>/ or http://<host>/ (trailing
|
|
// slash required to prevent hongming-wang.moleculesai.app.evil.com from
|
|
// matching hongming-wang.moleculesai.app).
|
|
if strings.HasPrefix(referer, "https://"+host+"/") ||
|
|
strings.HasPrefix(referer, "http://"+host+"/") ||
|
|
referer == "https://"+host ||
|
|
referer == "http://"+host {
|
|
return true
|
|
}
|
|
}
|
|
// Fallback: check Origin header (WebSocket upgrade requests may not have
|
|
// Referer but always send Origin).
|
|
origin := c.GetHeader("Origin")
|
|
return origin == "https://"+host || origin == "http://"+host
|
|
}
|