fix(registry): allow pending-DNS platform tunnel URL at register (#36 register half) #2425
Reference in New Issue
Block a user
Delete Branch "fix/validate-agent-url-pending-tunnel"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
The REGISTER half of #36 (provision half = Hetzner location-failover, cp#619). Cross-cloud workspaces register advertising their per-workspace Cloudflare tunnel hostname (ws-.); the DNS record is eventually-consistent and a FAST-booting box (Hetzner ~1s) registers before it propagates -> validateAgentURL net.LookupIP fails -> 400 -> the runtime does NOT retry a 4xx -> agent_card never lands. AWS/GCP boot slow enough to miss the race (only the fast cloud broke). Diagnosed live: faithful Hetzner repro boxes 400 against a WARM tenant with 'hostname ... cannot be resolved (DNS error)'. Fix: on DNS failure, allow the hostname in SaaS mode IFF it is a platform-tunnel hostname (ws- prefix under the platform domain, MOLECULE_APP_DOMAIN default moleculesai.app) -- not an SSRF vector (only the platform controls that domain; metadata/loopback blocks still apply once it resolves). Self-hosted keeps the strict block. SECURITY-sensitive (SSRF validator) -- see the scoped rationale + tests. Generated with Claude Code
Cross-cloud workspaces (e.g. Hetzner under a GCP tenant) register advertising their per-workspace Cloudflare tunnel hostname ws-<id>.<appDomain>. That DNS record is eventually-consistent, and a FAST-booting box (a Hetzner cpx reports 'workspace ready after ~1s') registers BEFORE it propagates → validateAgentURL's net.LookupIP fails → the handler returns 400 → and the runtime does NOT retry a 4xx → so agent_card never lands and the agent never comes online. AWS/GCP boot slowly enough to miss the race, which is why ONLY the fast cloud broke. Diagnosed live: faithful Hetzner repro boxes register against a warm tenant and still 400 with {"error":"hostname \"ws-...\" cannot be resolved (DNS error)..."} Fix: when DNS resolution fails, allow the hostname through in SaaS mode iff it is a platform-tunnel hostname (ws-<id> under the platform's own domain, MOLECULE_APP_DOMAIN default moleculesai.app). Such a hostname is NOT an SSRF vector — only the platform controls DNS there, so an attacker cannot point it at 169.254/127/private space, and the unconditional metadata/ loopback blocks still apply once it resolves. Restores the pre-#1130 'let an unresolvable platform URL through' behaviour, scoped to the trusted tunnel domain. Self-hosted keeps the strict block. This is the register half of #36; the provision half (Hetzner location capacity failover) shipped in cp#619. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>/qa-recheck
/security-recheck