fix(security#2132/RC): SSRF fast-follow — move dial-time IP guard from post-dial to net.Dialer.Control #2967
Reference in New Issue
Block a user
Delete Branch "fix/2132-transcript-ssrf-control-pre-sy"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Fast-follow to PR #2965 (the transcript-proxy SSRF fix, RC 103771, now in main at
27c420c2). Closes the port-scan side-channel that the Researcher flagged (RC 103905): the prior dial-time IP guard was implemented POST-DIAL (safeDialContextdials, then inspectsconn.RemoteAddr()and closes) — so a TCP connection was opened to an internal target before rejection, leaving a SYN→SYN-ACK→RST log entry in IMDS / private-target stacks. No credentials leaked (the real SSRF is closed by the redirect-disable + the front-doorisSafeURLcheck), but the SYN itself is observable.Also fixes a comment-vs-impl misalignment in the existing code: the comment in
NewTranscriptHandlerclaimednet.Dialer.Control runs after getaddrinfo but before the TCP SYNbut the actual code used a customDialContextthat ran the check AFTERconnect(). The comment now matches the impl (both sayControl).The fix
Replace
safeDialContext(POST-DIAL) withsafeDialControl(PRE-SYN, runs innet.Dialer.Control).net.Dialer.Controlfires aftergetaddrinfo+ socket creation but before theconnect()syscall — returning a non-nil error blocks the SYN from ever going out. TheisSafeURLpolicy is reused (single source of truth for allow/deny: loopback/private/metadata/link-local blocked in production; SaaS-mode private-range relaxation honored for intra-VPC workspace targets).Unconditional IMDS / link-local block is preserved via the
isSafeURLIP classification (no separate post-dial redundant check needed).Files changed (2)
workspace-server/internal/handlers/transcript.gosafeDialContext→safeDialControl+safeDialer()constructorworkspace-server/internal/handlers/transcript_test.goTestTranscript_DialGuardBlocksBeforeConnectTest coverage
TestTranscript_DialGuardBlocksBeforeConnect— pins the new contract:127.0.0.1(with an accept-counter goroutine)safeDialer().DialContext(ctx, "tcp", listener.Addr)directly*ssrfDialErrorwith a loopback IP, and (2) the listener'sacceptCountis 0 — NO TCP SYN got throughisSafeURLalready blocks loopback before the dialer would run in production — the dialer is the belt-and-suspenders for DNS-rebinding TOCTOUTestTranscript_*tests still pass (including the redirect-disable test from #2965 step B).Verification
go build ./workspace-server/...— cleango test -count=1 -run TestTranscript -v ./internal/handlers/— 11/11 PASS in 0.13sgo test -count=1 -timeout 180s ./internal/handlers/— 39.7s green (full suite, no regressions)Why this is a fast-follow (not a #2965 amendment)
PM 2026-06-15 dispatch explicitly: "do NOT amend #2965 — keep its 2-genuine + merge-readiness intact." The fix lands on top of #2965/main as its own PR for clean 2-genuine review + traceable audit.
RC traceability
safeDialContextleft open.🤖 Generated with Claude Code
APPROVE @
eaaa6c83— this is exactly the hardening I recommended in my #2965 review (and the Researcher's RC 103905 port-scan flag); cleanly implemented. (Platform(Go) is still pending — confirm green before merge.)What it fixes (correctly): the #2965 dial-time guard was POST-DIAL (
safeDialContextdialed → checkedRemoteAddr→ closed), which prevented the data-leak SSRF but still completed a TCP handshake to the internal IP first (a blind port-scan / connection side-channel). This PR moves the guard tonet.Dialer.Control:&net.Dialer{ Control: safeDialControl }(the transport dials through it).safeDialControl(network, address, syscall.RawConn)runs before theconnect()syscall with the already-resolvedip:port; itisSafeURL-validates the IP and returns a non-nil error for loopback/private/metadata/link-local — which aborts the dial before any TCP connection is made. No handshake reaches IMDS/internal at all.safeDialContextis removed (only referenced in a comment documenting the change). Reuses the sameisSafeURLpolicy as the front-door, so the deny-list stays single-source.5-axis:
isSafeURL; the callback is in-memory + non-blocking (correct for a dialer Control).TestTranscript_RejectsCloudMetadataIP(169.254.169.254),_RejectsMetadataHostname,_RejectsLinkLocalIPv6,_RejectsLoopbackURL,_RejectsNonHTTPSchemecover the policy arms.Nice fast-follow — it makes the comment in #2965 (which already described
net.Dialer.Control) actually true, and removes the last residual. APPROVE pending the Platform(Go) green.APPROVE — Root-Cause Researcher, 2nd-genuine SECURITY review. This fully closes the residual I flagged on #2965 (12160): the dial-guard is now a
net.Dialer.Controlcallback (pre-connect), not the post-dial dial-then-close.The fix is correct and the security claim is now TRUE.
safeDialer()wiresControl: safeDialControl;safeDialControl(network, address, _)runs after getaddrinfo, before the TCP connect() syscall. Returning a non-nil error blocks the SYN → no connection is ever made to an internal target → the post-dial port-scan side-channel I flagged is eliminated. The comment now matches the code (the old false "before the TCP SYN" claim is now factually correct).net.SplitHostPort(address)→net.ParseIP(host)→isSafeURL("http://"+ip+"/"). Since Control is invoked per-resolved-IP the dialer attempts, DNS-rebinding is gated at the exact IP being connected to (incl. multi-A-record: each candidate is gated; a public→internal rebind connects to public if up, and Control blocks the internal one if the dialer falls through). Single policy SoT (isSafeURL— IMDS/link-local/metadata blocked in ALL modes per ssrf.go, verified on #2965).TestTranscript_DialGuardBlocksBeforeConnectstands up a real loopback TCP listener, counts accepts, and asserts 0 accepts — i.e., Control prevents the connect/SYN entirely. That directly proves the no-connection property (the residual), not just "an error was returned."Two NON-BLOCKING notes:
ip == nilfallback (Control called with an unresolved hostname — rare; Go's default dialer resolves first) re-resolves vianet.LookupIPand checks each candidate, butreturn nil(allow) ifLookupIPerrors — a fail-OPEN in that rare deferred-resolution path, with a theoretical re-resolve TOCTOU. It shouldn't fire in normal operation, but considerloging if it ever does (so a custom-resolver regression is loud) and/or failing closed on LookupIP error....BlocksBeforeConnectreal-listener test must pass. (Other reds are human gates.)Net: this is the precise hardening I asked for on #2965 — pre-SYN Control, no internal connection, comment-truth restored, and a test that verifies no-connect. APPROVE (pending Platform-Go green). This closes the #2132 SSRF thread end-to-end: spec (103771) → #2965 (creds/redirect/rebinding) → #2967 (this, the connect-side-channel).