molecule-core/workspace-server/internal/handlers/ssrf_test.go
Hongming Wang 1125a029b8 fix(platform): unblock SaaS workspace registration end-to-end
Every workspace in the cross-EC2 SaaS provisioning shape was failing
registration, heartbeat, or A2A routing. Four distinct blockers sat
between "EC2 is up" and "agent responds"; three are platform-side and
fixed here (the fourth is in the CP user-data, separate PR).

1. SSRF validator blocked RFC-1918 (registry.go + mcp.go)
   validateAgentURL and isPrivateOrMetadataIP rejected 172.16.0.0/12,
   which contains the AWS default VPC range (172.31.x.x) that every
   sibling workspace EC2 registers from. Registration returned 400 and
   the 10-min provision sweep flipped status to failed. RFC-1918 +
   IPv6 ULA are now gated behind saasMode(); link-local (169.254/16),
   loopback, IPv6 metadata (fe80::/10, ::1), and TEST-NET stay blocked
   unconditionally in both modes.

   saasMode() resolution order:
     1. MOLECULE_DEPLOY_MODE=saas|self-hosted (explicit operator flag)
     2. MOLECULE_ORG_ID presence (legacy implicit signal, kept for
        back-compat so existing deployments don't need a config change)

   isPrivateOrMetadataIP now actually checks IPv6 — previously it
   returned false on any non-IPv4 input, which would let a registered
   [::1] or [fe80::...] URL bypass the SSRF check entirely.

2. Orphan auth-token minting (workspace_provision.go)
   issueAndInjectToken mints a token and stuffs it into
   cfg.ConfigFiles[".auth_token"]. The Docker provisioner writes that
   file into the /configs volume — the CP provisioner ignores it
   (only cfg.EnvVars crosses the wire). Result: live token in DB, no
   plaintext on disk, RegistryHandler.requireWorkspaceToken 401s every
   /registry/register attempt because the workspace is no longer in
   the "no live token → bootstrap-allowed" state. Now no-ops in SaaS
   mode; the register handler already mints on first successful
   register and returns the plaintext in the response body for the
   runtime to persist locally.

   Also removes the redundant wsauth.IssueToken call at the bottom of
   provisionWorkspaceCP, which created the same orphan-token pattern
   a second time.

3. Compaction artefacts (bundle/importer.go, handlers/org_tokens.go,
   scheduler.go, workspace_provision.go)
   Four pre-existing compile errors on main from an earlier session's
   code truncation: missing tuple destructuring on ExecContext /
   redactSecrets / orgTokenActor, missing close-brace in
   Scheduler.fireSchedule's panic recovery. All one-line mechanical
   fixes; without them the binary would not build.

Tests
-----
ssrf_test.go adds:
  * TestSaasMode — covers the env resolution ladder (explicit flag
    wins over legacy signal, case-insensitive, whitespace tolerant)
  * TestIsPrivateOrMetadataIP_SaaSMode — asserts RFC-1918 + IPv6 ULA
    flip to allowed, metadata/loopback/TEST-NET still blocked
  * TestIsPrivateOrMetadataIP_IPv6 — regression guard for the old
    "returns false for all IPv6" behaviour

Follow-up issue for CP-sourced workspace_id attestation will be filed
separately — closes the residual intra-VPC SSRF + token-race windows
the SaaS-mode relaxation introduces.

Verified end-to-end today on workspace 6565a2e0 (hermes runtime, OpenAI
provider) — agent returned "PONG" in 1.4s after register → heartbeat →
A2A proxy → runtime.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 03:06:46 -07:00

217 lines
7.7 KiB
Go

package handlers
import (
"net"
"testing"
)
// isSafeURL is defined in mcp.go.
// isPrivateOrMetadataIP is defined in mcp.go.
// saasMode is defined in registry.go.
// TestSaasMode covers the env-resolution ladder so a self-hosted
// operator can't accidentally flip into SaaS mode by leaving a stale
// MOLECULE_ORG_ID around, and an explicit MOLECULE_DEPLOY_MODE wins
// over the legacy implicit signal.
func TestSaasMode(t *testing.T) {
cases := []struct {
name string
deployMode string
orgID string
want bool
}{
{"both unset", "", "", false},
{"legacy org id only", "", "7b2179dc-8cc6-4581-a3c6-c8bff4481086", true},
{"explicit saas", "saas", "", true},
{"explicit saas overrides missing org", "SaaS", "", true}, // case-insensitive
{"explicit self-hosted wins over legacy org id", "self-hosted", "some-org", false},
{"explicit selfhosted wins over legacy org id", "selfhosted", "some-org", false},
{"explicit standalone wins over legacy org id", "standalone", "some-org", false},
{"whitespace-only deploy mode falls through to legacy", " ", "some-org", true},
{"whitespace-only org id falls through to false", "", " ", false},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
t.Setenv("MOLECULE_DEPLOY_MODE", tc.deployMode)
t.Setenv("MOLECULE_ORG_ID", tc.orgID)
if got := saasMode(); got != tc.want {
t.Errorf("saasMode() = %v, want %v (MOLECULE_DEPLOY_MODE=%q MOLECULE_ORG_ID=%q)",
got, tc.want, tc.deployMode, tc.orgID)
}
})
}
}
// TestIsPrivateOrMetadataIP_SaaSMode covers the SaaS-mode relaxation:
// RFC-1918 and ULA ranges are allowed, but metadata / loopback / TEST-NET
// classes stay blocked in every mode. Regression guard for the core
// SaaS provisioning fix (issue: workspaces register with their VPC
// private IP, which is 172.31.x.x on AWS default VPCs).
func TestIsPrivateOrMetadataIP_SaaSMode(t *testing.T) {
t.Setenv("MOLECULE_DEPLOY_MODE", "saas")
t.Setenv("MOLECULE_ORG_ID", "")
cases := []struct {
name string
ipStr string
want bool
}{
// RFC-1918 must be ALLOWED in SaaS mode.
{"172.31 allowed in saas", "172.31.44.78", false},
{"10/8 allowed in saas", "10.0.0.5", false},
{"192.168 allowed in saas", "192.168.1.1", false},
// IPv6 ULA must be ALLOWED in SaaS mode (AWS IPv6 VPC analogue).
{"fd00 ULA allowed in saas", "fd12:3456:789a::1", false},
// Metadata / loopback stay BLOCKED even in SaaS mode.
{"169.254 still blocked", "169.254.169.254", true},
{"127/8 still blocked", "127.0.0.1", true},
{"::1 still blocked", "::1", true},
{"fe80 still blocked", "fe80::1", true},
// TEST-NET stays blocked.
{"192.0.2.x still blocked", "192.0.2.5", true},
{"198.51.100.x still blocked", "198.51.100.5", true},
{"203.0.113.x still blocked", "203.0.113.5", true},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
ip := net.ParseIP(tc.ipStr)
if ip == nil {
t.Fatalf("ParseIP(%q) returned nil", tc.ipStr)
}
if got := isPrivateOrMetadataIP(ip); got != tc.want {
t.Errorf("isPrivateOrMetadataIP(%s) = %v, want %v", tc.ipStr, got, tc.want)
}
})
}
}
// TestIsPrivateOrMetadataIP_IPv6 covers the IPv6 gap the previous
// implementation had — it returned false for every IPv6 literal
// unconditionally, which would let a registered [::1] or [fe80::…]
// URL bypass the SSRF check entirely.
func TestIsPrivateOrMetadataIP_IPv6(t *testing.T) {
t.Setenv("MOLECULE_DEPLOY_MODE", "")
t.Setenv("MOLECULE_ORG_ID", "")
cases := []struct {
name string
ipStr string
want bool
}{
{"::1 loopback blocked", "::1", true},
{"fe80 link-local blocked", "fe80::1", true},
{"fe80 link-local with mac blocked", "fe80::a00:27ff:fe00:1", true},
{"fc00 ULA blocked (non-saas)", "fc00::1", true},
{"fd00 ULA blocked (non-saas)", "fd12::1", true},
{"public v6 allowed", "2606:4700:4700::1111", false}, // 1.1.1.1 v6
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
ip := net.ParseIP(tc.ipStr)
if ip == nil {
t.Fatalf("ParseIP(%q) returned nil", tc.ipStr)
}
if got := isPrivateOrMetadataIP(ip); got != tc.want {
t.Errorf("isPrivateOrMetadataIP(%s) = %v, want %v", tc.ipStr, got, tc.want)
}
})
}
}
func TestIsPrivateOrMetadataIP(t *testing.T) {
cases := []struct {
name string
ipStr string
want bool
}{
// Must be blocked: RFC-1918 private
{"10.0.0.1", "10.0.0.1", true},
{"10.255.255.254", "10.255.255.254", true},
{"172.16.0.0", "172.16.0.0", true},
{"172.31.255.255", "172.31.255.255", true},
{"192.168.0.1", "192.168.0.1", true},
{"192.168.255.255", "192.168.255.255", true},
// Must be blocked: cloud metadata link-local
{"169.254.169.254", "169.254.169.254", true},
{"169.254.0.1", "169.254.0.1", true},
// Must be blocked: carrier-grade NAT
{"100.64.0.1", "100.64.0.1", true},
{"100.127.255.254", "100.127.255.254", true},
// Must be blocked: documentation ranges
{"192.0.2.1", "192.0.2.1", true},
{"198.51.100.1", "198.51.100.1", true},
{"203.0.113.1", "203.0.113.1", true},
// Must be allowed: public IP addresses
{"8.8.8.8", "8.8.8.8", false},
{"1.1.1.1", "1.1.1.1", false},
{"203.0.113.254", "203.0.113.254", false}, // TEST-NET-3 max — above 203.0.113.0/24 range end
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
ip := net.ParseIP(tc.ipStr)
if ip == nil {
t.Fatalf("ParseIP(%q) returned nil", tc.ipStr)
}
got := isPrivateOrMetadataIP(ip)
if got != tc.want {
t.Errorf("isPrivateOrMetadataIP(%s) = %v, want %v", tc.ipStr, got, tc.want)
}
})
}
}
func TestIsSafeURL(t *testing.T) {
cases := []struct {
name string
rawURL string
wantErr bool
}{
// Valid: public HTTPS
{"public https", "https://agent.example.com:8080/a2a", false},
{"public http", "http://agent.example.com/a2a", false},
// Loopback is blocked by isSafeURL even in dev — the orchestrator
// controls access via WorkspaceAuth + CanCommunicate, not via this URL check.
// Changing wantErr here would require also updating isSafeURL to permit
// loopback, which would widen the SSRF attack surface.
{"localhost blocked", "http://127.0.0.1:8000", true},
{"localhost with path", "http://127.0.0.1:9000", true},
// Forbidden: non-HTTP(S) scheme
{"file scheme blocked", "file:///etc/passwd", true},
{"ftp scheme blocked", "ftp://internal/", true},
{"mailto scheme blocked", "mailto://user@example.com", true},
{"data scheme blocked", "data:text/html,<script>alert(1)</script>", true},
// Forbidden: IP literals — cloud metadata
{"AWS IMDS blocked", "http://169.254.169.254/latest/meta-data/", true},
{"IMDS 169.254.0.1 blocked", "http://169.254.0.1/", true},
// Forbidden: IP literals — loopback
{"loopback 127.0.0.1 blocked", "http://127.0.0.1:8080", true},
{"loopback 127.255.255.255 blocked", "http://127.255.255.255:9000", true},
// Forbidden: IP literals — RFC-1918 private
{"10.x private blocked", "http://10.0.0.1:8080", true},
{"172.x private blocked", "http://172.16.0.5:8000", true},
{"192.x private blocked", "http://192.168.1.1:8000", true},
// Forbidden: IP literals — link-local multicast
{"link-local multicast 224.0.0.1 blocked", "http://224.0.0.1/", true},
{"link-local multicast 224.x.x.x blocked", "http://224.0.0.251:8080", true},
// Forbidden: empty hostname
{"empty hostname rejected", "http://:8080/a2a", true},
// Forbidden: IP literals — unspecified
{"0.0.0.0 blocked", "http://0.0.0.0:8080", true},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
err := isSafeURL(tc.rawURL)
if tc.wantErr && err == nil {
t.Errorf("isSafeURL(%q): expected error, got nil", tc.rawURL)
}
if !tc.wantErr && err != nil {
t.Errorf("isSafeURL(%q): expected nil, got %v", tc.rawURL, err)
}
})
}
}