molecule-core/workspace-server/internal/router/cp_proxy.go
Hongming Wang d03f2d47e0 fix: close cross-tenant authz + cp_proxy admin-traversal gaps
Addresses three Critical findings from today's code review of the
SaaS-canvas routing stack.

## Critical-1: session verification scoped to the current tenant

session_auth.go previously verified via GET /cp/auth/me, which
only answers "is someone logged in" — NOT "is this user in the
org they're targeting." Every WorkOS-authed user (including folks
who only signed up via app.moleculesai.app with no tenant
relationship) could call /workspaces, /approvals/pending,
/bundles/import, /org/import etc. on ANY tenant they could reach.
Cross-tenant read: user at acme.moleculesai.app could hit
bob.moleculesai.app/workspaces with their cookie and get Bob's
workspaces.

Fix:
  - CP gains GET /cp/auth/tenant-member?slug=<slug> which joins
    org_members × organizations and only returns member:true when
    the authenticated user is actually in that org.
  - Tenant sets MOLECULE_ORG_SLUG at boot via user-data.
  - session_auth now calls tenant-member (not /me), passing its
    own slug. Cache key includes slug so one tenant's cached
    positive never satisfies another's check.

## Critical-2: cp_proxy path allowlist (lateral-movement fix)

cp_proxy.go forwarded any /cp/* path upstream with the cookie
and bearer attached. Since /cp/admin/* accepts sessions as one
of its auth tiers, a tenant-authed user could curl
/cp/admin/tenants/other-slug/diagnostics through their tenant
and the CP would honor it — turning any tenant into a lateral
hop into admin surface.

Fix: explicit allowlist of paths the canvas browser bundle
actually needs (/cp/auth, /cp/orgs, /cp/billing, /cp/templates,
/cp/legal). Everything else 404s at the tenant before cookies
leave. Fail-closed: future UI paths require explicit entries.

## Important-1,2: bounded session cache + split positive/negative TTL

Previous sync.Map cache grew unbounded (one entry per unique
Cookie header for process lifetime) and cached failures for 30s,
meaning a 3s CP blip locked users out for the full window.

Fix:
  - Bounded map with batch random eviction at cap (10k entries ×
    ~100 bytes = 1 MB ceiling). Random eviction is O(1)
    expected; we don't need precise LRU.
  - Periodic sweeper goroutine (2 min) reclaims expired entries
    even when they're not re-hit.
  - Positive TTL 30s, negative TTL 5s — short negative so CP
    flakes self-heal fast.
  - Transport errors NOT cached (would otherwise trap every
    user during a multi-second upstream outage).
  - Cache key = sha256(slug + cookie) so raw session tokens
    don't sit in process memory, and cross-tenant isolation is
    structural not policy.

## Important-3: TenantGuard /cp/* bypass documented

Added a security note to the bypass explaining why it's safe
only under the current setup (cp_proxy allowlist + tunnel-only
ingress), and what would require revisiting (SG opens :8080
inbound to the VPC).

## Tests

  - session_auth_test.go: 12 new tests — empty cookie, missing
    slug, no CP, member:true happy path with cache hit, member:
    false, 401 upstream, malformed JSON, transport error not
    cached, cross-tenant isolation (same cookie different
    tenants hit upstream separately), bounded eviction, expired
    entries, cache key collision resistance.
  - cp_proxy_test.go: new — isCPProxyAllowedPath covers 17
    allow/block cases, forwarding preserves Cookie+Auth, Host
    rewritten, blocked paths 404 without calling upstream.

All platform tests pass. CP provisioner tests pass after
threading cfg.OrgSlug into the container env.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 13:45:57 -07:00

135 lines
5.3 KiB
Go

package router
import (
"log"
"net/http"
"net/http/httputil"
"net/url"
"strings"
"github.com/gin-gonic/gin"
)
// cpProxyAllowedPrefixes is the explicit list of /cp/* paths the
// tenant will forward to the CP. Anything else 404s BEFORE the cookie
// and Authorization headers leave the tenant.
//
// Why an allowlist, not a denylist: /cp/admin/* endpoints accept a
// WorkOS session cookie (scoped to .moleculesai.app) as one of their
// auth tiers. A tenant-authed user visiting <tenant>.moleculesai.app
// and crafting a request to /cp/admin/tenants/other-slug/diagnostics
// would have the tenant happily forward their cookie upstream. The CP
// would then see a legitimate admin session and honor the request —
// effectively turning any tenant into an admin-access lateral-
// movement hop. (Observed as a theoretical risk in today's review.)
//
// Only paths that are legitimately used by the canvas browser bundle
// go in this list. If a new UI fetch needs a new /cp/ prefix, add it
// here — fail-closed is the default.
var cpProxyAllowedPrefixes = []string{
"/cp/auth/", // me, tenant-member, login/signup/callback for return flows
"/cp/orgs", // list / get / provision-status / export
"/cp/billing/", // checkout + portal
"/cp/templates", // template registry reads
"/cp/legal/", // terms document (served on CP)
}
// isCPProxyAllowedPath enforces the allowlist. Prefix match with an
// optional trailing slash tolerance (/cp/orgs matches /cp/orgs AND
// /cp/orgs/acme). Rejects any path that doesn't start with /cp/ so
// the handler isn't inadvertently mounted on other prefixes.
func isCPProxyAllowedPath(p string) bool {
if !strings.HasPrefix(p, "/cp/") {
return false
}
for _, prefix := range cpProxyAllowedPrefixes {
if p == prefix || strings.HasPrefix(p, prefix+"/") || strings.HasPrefix(p, prefix) && prefixMatches(p, prefix) {
return true
}
}
return false
}
// prefixMatches handles the case where the allowlist entry itself ends
// in a slash (e.g. /cp/auth/): that means "anything under /cp/auth/".
// Entries without a trailing slash (/cp/orgs) match both the exact path
// and any subpath. Separate function so the intent is readable.
func prefixMatches(path, prefix string) bool {
if strings.HasSuffix(prefix, "/") {
return strings.HasPrefix(path, prefix)
}
return path == prefix || strings.HasPrefix(path, prefix+"/")
}
// newCPProxy returns a Gin handler that reverse-proxies /cp/* requests
// to the control plane. Lives beside newCanvasProxy because they solve
// the same problem — tenant browser fetches targeted at a single
// same-origin base — for the mirror-image endpoint set.
//
// Why this exists: canvas's browser bundle calls both CP endpoints
// (/cp/auth/me, /cp/orgs, /cp/billing/checkout) AND tenant-platform
// endpoints (/canvas/viewport, /approvals/pending). They share ONE
// build-time base URL (NEXT_PUBLIC_PLATFORM_URL). Baking the CP
// origin breaks tenant calls; baking the tenant origin breaks CP
// calls. The only sane fix is same-origin fetches + let the server
// split the traffic. This handler is the /cp/* leg of that split;
// newCanvasProxy is the UI leg.
//
// Security:
// - We do NOT strip Cookie/Authorization here: those carry the
// WorkOS session cookie and must reach the CP to resolve the
// user. That's the whole point of this proxy.
// - We DO rewrite the Host header to the CP upstream so CORS and
// cookie-domain logic upstream see themselves, not the tenant.
// - We do NOT strip X-Forwarded-For — upstream may want it for
// audit and rate-limit keying.
// - The proxy ONLY forwards /cp/* paths. The upstream URL is
// env-configured and its scheme is enforced https in prod via
// url.Parse (the caller passes the URL; we reject anything
// that isn't http/https at construction time).
//
// Rate / timeout note: we do NOT set a custom Transport with
// aggressive timeouts because CP endpoints are fast and any hang
// is already bounded by the caller's browser-level timeout. If a
// future slow endpoint warrants a bound, add here not at the
// gateway.
func newCPProxy(targetURL string) gin.HandlerFunc {
target, err := url.Parse(targetURL)
if err != nil {
log.Fatalf("cp_proxy: invalid CP_UPSTREAM_URL %q: %v", targetURL, err)
}
if target.Scheme != "http" && target.Scheme != "https" {
log.Fatalf("cp_proxy: CP_UPSTREAM_URL scheme must be http(s), got %q", target.Scheme)
}
if target.Host == "" {
log.Fatalf("cp_proxy: CP_UPSTREAM_URL missing host: %q", targetURL)
}
proxy := &httputil.ReverseProxy{
Director: func(req *http.Request) {
req.URL.Scheme = target.Scheme
req.URL.Host = target.Host
// Host header rewrite: CP middleware (CORS, cookie-domain)
// keys off Host; rewriting avoids "origin not allowed" on
// upstream OPTIONS preflight.
req.Host = target.Host
},
ErrorHandler: func(w http.ResponseWriter, _ *http.Request, err error) {
log.Printf("cp_proxy: %v", err)
w.WriteHeader(http.StatusBadGateway)
_, _ = w.Write([]byte("control plane unavailable"))
},
}
return func(c *gin.Context) {
// Allowlist enforcement: block anything outside the browser-
// canvas-facing /cp/* surface. Returns 404 (not 403) to avoid
// leaking which paths exist on the CP side.
if !isCPProxyAllowedPath(c.Request.URL.Path) {
c.AbortWithStatus(http.StatusNotFound)
return
}
proxy.ServeHTTP(c.Writer, c.Request)
}
}