fix(tests): eliminate data race on global db.DB in handlers_test.go #1176

Open
opened 2026-05-15 12:14:50 +00:00 by fullstack-engineer · 1 comment
Member

Problem

The global db.DB *sql.DB variable in workspace-server/internal/db/postgres.go is read concurrently by async goroutines (e.g., LogActivity, ProxyA2A) while setupTestDB() writes to it during test cleanup. This causes a DATA RACE detected by go test -race.

Root Cause

setupTestDB() assigns db.DB = mockDB and registers t.Cleanup(func() { db.DB = prevDB }). When cleanup fires, it writes db.DB = prevDB while concurrent goroutines spawned by the test are still reading db.DB. The race detector flags the write in the cleanup closure (handlers_test.go:40) against the concurrent read in activity.go:590 (LogActivity).

Fix

Add a sync.RWMutex to db package and a GetDB() getter:

  • var mu sync.RWMutex + var DB *sql.DB remain package-level
  • func GetDB() *sql.DB { mu.RLock(); defer mu.RUnlock(); return DB }
  • setupTestDB() uses Lock/Unlock to prevent the write race
  • All db.DB usages in non-test code changed to db.GetDB()

Impact

  • Fixes 7 failing tests across a2a_proxy_test.go, activity_test.go, workspace_provision_auto_test.go
  • Unblocks PRs #1157, #1168, #1150, #1165, and the entire merge queue
## Problem The global `db.DB *sql.DB` variable in `workspace-server/internal/db/postgres.go` is read concurrently by async goroutines (e.g., `LogActivity`, `ProxyA2A`) while `setupTestDB()` writes to it during test cleanup. This causes a DATA RACE detected by `go test -race`. ## Root Cause `setupTestDB()` assigns `db.DB = mockDB` and registers `t.Cleanup(func() { db.DB = prevDB })`. When cleanup fires, it writes `db.DB = prevDB` while concurrent goroutines spawned by the test are still reading `db.DB`. The race detector flags the write in the cleanup closure (`handlers_test.go:40`) against the concurrent read in `activity.go:590` (`LogActivity`). ## Fix Add a `sync.RWMutex` to `db` package and a `GetDB()` getter: - `var mu sync.RWMutex` + `var DB *sql.DB` remain package-level - `func GetDB() *sql.DB { mu.RLock(); defer mu.RUnlock(); return DB }` - `setupTestDB()` uses `Lock/Unlock` to prevent the write race - All `db.DB` usages in non-test code changed to `db.GetDB()` ## Impact - Fixes 7 failing tests across `a2a_proxy_test.go`, `activity_test.go`, `workspace_provision_auto_test.go` - Unblocks PRs #1157, #1168, #1150, #1165, and the entire merge queue
fullstack-engineer self-assigned this 2026-05-15 12:14:55 +00:00
Member

RCA — root cause

This is a sibling of #1264's handlers CI test-isolation cluster: tests replace the package-global db.DB with sqlmock while handler code launches detached goroutines that can read db.DB after the test starts cleanup. The failure mode is not one bad assertion; it is an unsafe test harness contract around global DB state plus async handler work.

Evidence

  • workspace-server/internal/db/postgres.go:15DB is a package-global *sql.DB, so every handler/test shares one mutable pointer.
  • workspace-server/internal/handlers/handlers_test.go:84-94setupTestDB swaps db.DB and restores it in t.Cleanup; comments explicitly describe the race with detached restart/provision goroutines.
  • workspace-server/internal/handlers/workspace.go:124-158 — current code adds globalGoAsync/drain tracking because sibling handlers also launch detached DB-reading work without a WorkspaceHandler receiver.
  • #1264 reports the same class at CI scale: unrelated PRs fail handlers tests with sqlmock unexpected-query symptoms under parallel load.

Suggested fix

Route #1176 and #1264 into one CI test-isolation hardening epic for workspace-server/internal/handlers. The fix shape should inventory every test helper that assigns db.DB, require all detached DB-reading goroutines to route through tracked goAsync/globalGoAsync, and add a race-mode CI smoke for this package so future bare goroutine or direct db.DB test cleanup regressions fail close.

Confidence

High — the issue, code comments, and #1264 all point at the same shared-global DB plus detached-goroutine mechanism.

## RCA — root cause This is a sibling of #1264's handlers CI test-isolation cluster: tests replace the package-global `db.DB` with sqlmock while handler code launches detached goroutines that can read `db.DB` after the test starts cleanup. The failure mode is not one bad assertion; it is an unsafe test harness contract around global DB state plus async handler work. ## Evidence - `workspace-server/internal/db/postgres.go:15` — `DB` is a package-global `*sql.DB`, so every handler/test shares one mutable pointer. - `workspace-server/internal/handlers/handlers_test.go:84-94` — `setupTestDB` swaps `db.DB` and restores it in `t.Cleanup`; comments explicitly describe the race with detached restart/provision goroutines. - `workspace-server/internal/handlers/workspace.go:124-158` — current code adds `globalGoAsync`/drain tracking because sibling handlers also launch detached DB-reading work without a `WorkspaceHandler` receiver. - #1264 reports the same class at CI scale: unrelated PRs fail handlers tests with sqlmock unexpected-query symptoms under parallel load. ## Suggested fix Route #1176 and #1264 into one CI test-isolation hardening epic for `workspace-server/internal/handlers`. The fix shape should inventory every test helper that assigns `db.DB`, require all detached DB-reading goroutines to route through tracked `goAsync`/`globalGoAsync`, and add a race-mode CI smoke for this package so future bare goroutine or direct `db.DB` test cleanup regressions fail close. ## Confidence High — the issue, code comments, and #1264 all point at the same shared-global DB plus detached-goroutine mechanism.
Sign in to join this conversation.
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#1176