[bug] [test-infra] Canvas vitest timeouts on CI: 3 first-tests flap on 5000ms default cold-start #96

Closed
opened 2026-05-08 01:17:13 +00:00 by claude-ceo-assistant · 0 comments

Symptom (4 PRs reproducing)

Canvas (Next.js) job fails on PR #82, #81, #54, #53 with 3 timed-out vitest tests, identical signature on each PR:

FAIL src/components/__tests__/ActivityTab.test.tsx:105
  ActivityTab — filter bar > renders all 7 filter options
  Test timed out in 5000ms.  (actual: 5230ms)

FAIL src/components/__tests__/CreateWorkspaceDialog.test.tsx:51
  CreateWorkspaceDialog > opens the dialog when New Workspace button is clicked
  Test timed out in 5000ms.  (actual: 6453ms)

FAIL src/components/tabs/__tests__/ConfigTab.provider.test.tsx:163
  ConfigTab — Provider override (Option B PR-5) > PUTs the new provider on Save
  Test timed out in 5000ms.  (actual: 5605ms)

The 4 PRs have nothing in common in canvas/ — PR #82 is Go vanity imports (canvas package.json + 1 page), #81 is workflow YAML (no canvas/), #54 is one canvas/lib helper, #53 is canvas/next.config.ts. Same 3 tests fail. Other tests in those same files PASS (ActivityTab.test.tsx has 33 OTHER passing tests; ConfigTab.provider.test.tsx has 12 passing).

Phase 1 root cause

All three failing tests are the first test in their file's first describe block. They pay the full per-file cold-start cost on the runner:

  • v8 coverage instrumentation init (vitest config has coverage.provider: 'v8', CI invokes npx vitest run --coverage)
  • JSDOM environment bootstrap (each // @vitest-environment jsdom file pays for window/document setup once)
  • Module-graph import for @/components/... and @/lib/... transitive closure (first React render in the file pulls these)
  • React tree mount + Testing Library re-render bookkeeping

Evidence — CI run 2321/jobs/2 total times reported by vitest's own footer:

Duration  199.99s
  transform 48.83s, import 197.56s, tests 350.34s, environment 589.87s

For comparison, local run on the SAME branch with identical --coverage: 9-15s total, all 1407 tests green. The CI runner is ~13x slower to import/transform under v8 coverage.

The cliff-edge is real:

  • ContextMenu.keyboard.test.tsx > renders with role='menu' — 4463ms (PASS, 537ms margin)
  • ExternalConnectionSection.test.tsx whole file — 4955ms total (PASS, 45ms margin)
  • 3 failing tests — 5230 / 5605 / 6453ms (FAIL, just over the cliff)

Next CI run, any of those 4-second tests is one Docker host-CPU-load hiccup away from also flapping.

Why this is environmental, not a code bug

  1. Component code is correct. ActivityTab's FILTERS const has 7 entries (All, A2A In, A2A Out, Tasks, Skill Promo, Logs, Errors) — matches the test's expected array exactly.
  2. The failing tests are mostly synchronous (it(name, () => { ... })) — there's no awaitable race in them. The full 5000ms is consumed by import/transform/JSDOM-init that has nothing to do with the test body.
  3. Local repro produces 0 failures across multiple runs.
  4. The same 3 tests fail on 4 PRs whose canvas/ deltas don't intersect.

Phase 2 design

Picked: bump testTimeout in canvas/vitest.config.ts to 30000ms when process.env.CI is truthy, keep 5000ms locally.

This is the documented vitest pattern (https://vitest.dev/config/testtimeout) and called out specifically for v8-coverage-instrumented CI runs (https://vitest.dev/guide/coverage#profiling-test-performance). Industry guides for React 19 + Testing Library + vitest under coverage explicitly recommend 15-30s in CI (e.g. johal.in 2026 guide).

Alternatives rejected

  1. Per-test it(name, opts, fn) timeout bump — would require touching all 3 test files, leaves the cliff-edge tests (4463ms ContextMenu, 4955ms ExternalConnectionSection) un-protected, and the moment another test file becomes the first-test-cold-start victim, we file another bug. Whack-a-mole.
  2. Disable --coverage on PR runs, only run on main — coverage observability was the explicit point of #1815 (commented in vitest.config.ts). Removing coverage on PR is regression of that work.
  3. Refactor first-tests to be no-ops or move heavy imports — the heavy imports are the real component-under-test. Can't move them. And first-test status is a per-file accident (any reorder breaks it).
  4. Bump only the 3 failing tests — same shortcoming as (1). Non-pluggable, doesn't address the class.

Why root-cause-vs-symptom is unambiguous here

The SOP says prefer root-cause fix. Here the 'root cause' IS the env: vitest default of 5000ms was authored before v8-coverage-instrumented React 19 tests on Gitea Actions self-hosted Docker were a thing. The 5000ms default is the symptom of the wrong assumption that 5s is enough for a cold-start render under coverage. A CI-conditional 30000ms is fixing the wrong-assumption-default, not papering over a logic race. (Verified by line-by-line read of all 3 failing tests — no logic race exists; the tests pass synchronously once import is done.)

Phase 3 plan

  • One PR (devops-engineer persona) editing canvas/vitest.config.ts:
    • Add testTimeout: process.env.CI ? 30000 : 5000
    • Add comment explaining the cold-start + v8-coverage rationale + link to this issue
  • No production-code changes.
  • No doc changes (vitest.config.ts comment is the doc).

Phase 4 verify

  • Run npx vitest run --coverage locally 5x → confirm no slowdown locally (process.env.CI not set).
  • Push PR → wait for full Canvas (Next.js) job → confirm green.
  • Hostile self-review: 3 weakest spots.

Hostile self-review (3 weakest spots)

  1. 30000ms is a guess, not a measurement. Mitigation: chose 6x the 5000ms default, leaves 5x headroom over the worst observed test (6453ms). If this is too aggressive a bump and a future component bug introduces a real waitFor-never-resolves race, we'd silently sit at 30s/test in CI before failing. Counter: vitest still emits per-test duration in the log; if a test starts taking 25s+ that will show up in CI duration regression and a human will notice. We can dial down once Gitea-Actions-Docker runner perf improves.
  2. Doesn't fix the root-root-cause (Docker runner overhead). True. But the root-root-cause fix (faster runner, switch off v8-coverage-on-PR, drop JSDOM for happy-dom, etc.) is a multi-week project with tradeoffs the team hasn't agreed to. The right trade today is unblock 4 PRs from this one-class flap and file a follow-up to revisit runner perf.
  3. Doesn't catch a real test that should take 8s but races to 6s on local. Mitigation: keeping 5000ms local-default ensures local-dev stays sensitive to genuinely-slow tests. Only the CI path bumps. Genuine logic-races still fail locally.

Cross-references

## Symptom (4 PRs reproducing) Canvas (Next.js) job fails on PR #82, #81, #54, #53 with 3 timed-out vitest tests, identical signature on each PR: ``` FAIL src/components/__tests__/ActivityTab.test.tsx:105 ActivityTab — filter bar > renders all 7 filter options Test timed out in 5000ms. (actual: 5230ms) FAIL src/components/__tests__/CreateWorkspaceDialog.test.tsx:51 CreateWorkspaceDialog > opens the dialog when New Workspace button is clicked Test timed out in 5000ms. (actual: 6453ms) FAIL src/components/tabs/__tests__/ConfigTab.provider.test.tsx:163 ConfigTab — Provider override (Option B PR-5) > PUTs the new provider on Save Test timed out in 5000ms. (actual: 5605ms) ``` The 4 PRs have nothing in common in canvas/ — PR #82 is Go vanity imports (canvas package.json + 1 page), #81 is workflow YAML (no canvas/), #54 is one canvas/lib helper, #53 is canvas/next.config.ts. Same 3 tests fail. Other tests in those same files PASS (ActivityTab.test.tsx has 33 OTHER passing tests; ConfigTab.provider.test.tsx has 12 passing). ## Phase 1 root cause All three failing tests are **the first test in their file's first describe block**. They pay the full per-file cold-start cost on the runner: - v8 coverage instrumentation init (vitest config has `coverage.provider: 'v8'`, CI invokes `npx vitest run --coverage`) - JSDOM environment bootstrap (each `// @vitest-environment jsdom` file pays for window/document setup once) - Module-graph import for `@/components/...` and `@/lib/...` transitive closure (first React render in the file pulls these) - React tree mount + Testing Library re-render bookkeeping Evidence — CI run `2321/jobs/2` total times reported by vitest's own footer: ``` Duration 199.99s transform 48.83s, import 197.56s, tests 350.34s, environment 589.87s ``` For comparison, local run on the SAME branch with identical `--coverage`: 9-15s total, all 1407 tests green. The CI runner is **~13x slower** to import/transform under v8 coverage. The cliff-edge is real: - `ContextMenu.keyboard.test.tsx > renders with role='menu'` — 4463ms (PASS, 537ms margin) - `ExternalConnectionSection.test.tsx` whole file — 4955ms total (PASS, 45ms margin) - 3 failing tests — 5230 / 5605 / 6453ms (FAIL, just over the cliff) Next CI run, any of those 4-second tests is one Docker host-CPU-load hiccup away from also flapping. ## Why this is environmental, not a code bug 1. Component code is correct. ActivityTab's `FILTERS` const has 7 entries (`All, A2A In, A2A Out, Tasks, Skill Promo, Logs, Errors`) — matches the test's expected array exactly. 2. The failing tests are mostly **synchronous** (`it(name, () => { ... })`) — there's no awaitable race in them. The full 5000ms is consumed by import/transform/JSDOM-init that has nothing to do with the test body. 3. Local repro produces 0 failures across multiple runs. 4. The same 3 tests fail on 4 PRs whose canvas/ deltas don't intersect. ## Phase 2 design Picked: **bump `testTimeout` in `canvas/vitest.config.ts` to 30000ms when `process.env.CI` is truthy, keep 5000ms locally.** This is the documented vitest pattern (https://vitest.dev/config/testtimeout) and called out specifically for v8-coverage-instrumented CI runs (https://vitest.dev/guide/coverage#profiling-test-performance). Industry guides for React 19 + Testing Library + vitest under coverage explicitly recommend 15-30s in CI (e.g. johal.in 2026 guide). ### Alternatives rejected 1. **Per-test `it(name, opts, fn)` timeout bump** — would require touching all 3 test files, leaves the cliff-edge tests (4463ms ContextMenu, 4955ms ExternalConnectionSection) un-protected, and the moment another test file becomes the first-test-cold-start victim, we file another bug. Whack-a-mole. 2. **Disable `--coverage` on PR runs, only run on main** — coverage observability was the explicit point of #1815 (commented in vitest.config.ts). Removing coverage on PR is regression of that work. 3. **Refactor first-tests to be no-ops or move heavy imports** — the heavy imports are the real component-under-test. Can't move them. And first-test status is a per-file accident (any reorder breaks it). 4. **Bump only the 3 failing tests** — same shortcoming as (1). Non-pluggable, doesn't address the class. ### Why root-cause-vs-symptom is unambiguous here The SOP says prefer root-cause fix. Here the 'root cause' IS the env: vitest default of 5000ms was authored before v8-coverage-instrumented React 19 tests on Gitea Actions self-hosted Docker were a thing. The 5000ms default is the symptom of the wrong assumption that 5s is enough for a cold-start render under coverage. A CI-conditional 30000ms is **fixing the wrong-assumption-default**, not papering over a logic race. (Verified by line-by-line read of all 3 failing tests — no logic race exists; the tests pass synchronously once import is done.) ## Phase 3 plan - One PR (`devops-engineer` persona) editing `canvas/vitest.config.ts`: - Add `testTimeout: process.env.CI ? 30000 : 5000` - Add comment explaining the cold-start + v8-coverage rationale + link to this issue - No production-code changes. - No doc changes (vitest.config.ts comment is the doc). ## Phase 4 verify - Run `npx vitest run --coverage` locally 5x → confirm no slowdown locally (`process.env.CI` not set). - Push PR → wait for full Canvas (Next.js) job → confirm green. - Hostile self-review: 3 weakest spots. ## Hostile self-review (3 weakest spots) 1. **30000ms is a guess, not a measurement.** Mitigation: chose 6x the 5000ms default, leaves 5x headroom over the worst observed test (6453ms). If this is too aggressive a bump and a future component bug introduces a real waitFor-never-resolves race, we'd silently sit at 30s/test in CI before failing. Counter: vitest still emits per-test duration in the log; if a test starts taking 25s+ that will show up in CI duration regression and a human will notice. We can dial down once Gitea-Actions-Docker runner perf improves. 2. **Doesn't fix the root-root-cause (Docker runner overhead).** True. But the root-root-cause fix (faster runner, switch off v8-coverage-on-PR, drop JSDOM for happy-dom, etc.) is a multi-week project with tradeoffs the team hasn't agreed to. The right trade today is unblock 4 PRs from this one-class flap and file a follow-up to revisit runner perf. 3. **Doesn't catch a real test that should take 8s but races to 6s on local.** Mitigation: keeping 5000ms local-default ensures local-dev stays sensitive to genuinely-slow tests. Only the CI path bumps. Genuine logic-races still fail locally. ## Cross-references - Hongming SOP: feedback_long_term_robust_automated, feedback_fix_root_not_symptom, feedback_real_subprocess_test_for_boot_path - Failing run: https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/2321/jobs/2 (PR #82) - vitest docs: https://vitest.dev/config/testtimeout, https://vitest.dev/guide/coverage#profiling-test-performance - Affected PRs: #82, #81, #54, #53
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#96
No description provided.