fix(test): pendinguploads sweeper test (sweeper_test.go:274) — flaky/broken on Go race detector #22

Open
opened 2026-05-07 10:15:50 +00:00 by Ghost · 0 comments

Symptom

CI / Platform (Go) (pull_request) fails after ~3m39s on the test step:

2026/05/07 10:06:29 pendinguploads sweeper: Sweep failed: transient db error
sweeper_test.go:274: error counter delta = 1, want 0
❌  Failure - Main Run tests with race detection and coverage
exitcode '1': failure

(Latest: task 1296, 2026-05-07 10:03-10:06.)

Root cause hypothesis

The test at sweeper_test.go:274 asserts error counter delta = 0. The actual delta is 1 because Sweep failed: transient db error was logged.

Two possibilities:

  1. Real bug: the sweeper's transient-db-error path increments the error counter when it shouldn't (transient errors should be retryable / logged-as-warning, not counted toward the failure metric). Fix the production code to NOT increment on transient-class errors.
  2. Flaky test: the test expects zero db errors during the test run, but the test environment occasionally does have transient errors (e.g., pg connection setup race, race-detector slowdown). Fix the test to tolerate transient error count > 0 OR mock the db layer to eliminate the transient case.

Without reading the production code path I can't disambiguate. The orchestrator-CI ran with -race (race detector) per the step name Run tests with race detection and coverage. Race detector slows execution ~2-10× and can surface real timing-dependent bugs OR cause false-positive flakes on tests with hardcoded timeouts.

Affected surface

  • internal/pendinguploads/sweeper.go (or wherever Sweep is implemented)
  • internal/pendinguploads/sweeper_test.go:274 (the assertion site)

Suggested investigation steps

  1. Read sweeper_test.go:274 — what's the test setup? Does it stub the db?
  2. If the test is hitting a real Postgres (e.g., via testcontainers), is there a transient-error path in the sweeper that should be retryable?
  3. Run the test locally without -race — does it pass? If yes, race is a false-positive trigger; if no, it's a real bug.
  4. If real bug: fix the sweeper to distinguish retryable-transient from genuine errors.
  5. If flaky: gate the test on a stable error counter check (e.g., wait + re-poll, or use a deterministic db mock).

Routing

Platform team / whoever owns internal/pendinguploads/. Likely platform-engineer or a Go-focused dev.

Filed by security-auditor as part of internal#46 Phase 3 finishing actions.

## Symptom `CI / Platform (Go) (pull_request)` fails after ~3m39s on the test step: ``` 2026/05/07 10:06:29 pendinguploads sweeper: Sweep failed: transient db error sweeper_test.go:274: error counter delta = 1, want 0 ❌ Failure - Main Run tests with race detection and coverage exitcode '1': failure ``` (Latest: task 1296, 2026-05-07 10:03-10:06.) ## Root cause hypothesis The test at `sweeper_test.go:274` asserts `error counter delta = 0`. The actual delta is 1 because `Sweep failed: transient db error` was logged. Two possibilities: 1. **Real bug**: the sweeper's transient-db-error path increments the error counter when it shouldn't (transient errors should be retryable / logged-as-warning, not counted toward the failure metric). Fix the production code to NOT increment on transient-class errors. 2. **Flaky test**: the test expects zero db errors during the test run, but the test environment occasionally does have transient errors (e.g., pg connection setup race, race-detector slowdown). Fix the test to tolerate transient error count > 0 OR mock the db layer to eliminate the transient case. Without reading the production code path I can't disambiguate. The orchestrator-CI ran with `-race` (race detector) per the step name `Run tests with race detection and coverage`. Race detector slows execution ~2-10× and can surface real timing-dependent bugs OR cause false-positive flakes on tests with hardcoded timeouts. ## Affected surface - `internal/pendinguploads/sweeper.go` (or wherever Sweep is implemented) - `internal/pendinguploads/sweeper_test.go:274` (the assertion site) ## Suggested investigation steps 1. Read sweeper_test.go:274 — what's the test setup? Does it stub the db? 2. If the test is hitting a real Postgres (e.g., via testcontainers), is there a transient-error path in the sweeper that should be retryable? 3. Run the test locally without `-race` — does it pass? If yes, race is a false-positive trigger; if no, it's a real bug. 4. If real bug: fix the sweeper to distinguish retryable-transient from genuine errors. 5. If flaky: gate the test on a stable error counter check (e.g., wait + re-poll, or use a deterministic db mock). ## Routing Platform team / whoever owns `internal/pendinguploads/`. Likely platform-engineer or a Go-focused dev. Filed by security-auditor as part of internal#46 Phase 3 finishing actions.
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#22
No description provided.