TestPooledWithEICTunnel_PreservesFnErr (and any sqlmock-using neighbour
test) was at risk of inheriting stale INSERT calls from a previous
test's coalesceRestart goroutine that survived its t.Cleanup boundary.
The production callsite shape is `go h.RestartByID(...)` from
a2a_proxy.go, a2a_proxy_helpers.go and main.go. When that goroutine's
runRestartCycle panics, coalesceRestart's deferred recover swallows it
to keep the platform process alive — but in tests, nothing waits for
the goroutine to fully exit. If it's still draining LogActivity-shaped
work after the test returns, those INSERTs land in the next test's
sqlmock connection as kind=DELEGATION_FAILED /
kind=WORKSPACE_PROVISION_FAILED, surfacing as "INSERT-not-expected".
Fix: introduce drainCoalesceGoroutine(t, wsID, cycle) test helper that
spawns coalesceRestart on a goroutine (matching production) and
registers a t.Cleanup with sync.WaitGroup.Wait so the test can't
declare itself done while a goroutine is still alive.
Convert TestCoalesceRestart_PanicInCycleClearsState to use the helper
(previously it called coalesceRestart synchronously, which never
exercised the production goroutine-survival contract).
Add TestCoalesceRestart_DrainHelperWaitsForGoroutineExit as the
regression guard: cycle blocks 150ms then panics; the test asserts
t.Run elapsed >= 150ms (proving the Wait barrier engaged) AND the
deferred close ran (proving the panic-recovery defer chain executed)
AND state.running was cleared. Verified the assertion is real by
mutation-testing: removing t.Cleanup(wg.Wait) makes this test FAIL
deterministically with elapsed <300µs.
Per saved memory feedback_assert_exact_not_substring: the regression
test asserts an exact-shape contract (elapsed >= blockFor) rather than
a substring-in-output, so it discriminates between "drain works" and
"drain skipped".
Per Phase 3: 10/10 race-detector runs pass for all TestCoalesceRestart_*
tests. Full ./internal/handlers/... suite green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>