[bug] [test-infra] Canvas vitest timeouts on CI: 3 first-tests flap on 5000ms default cold-start #96
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Symptom (4 PRs reproducing)
Canvas (Next.js) job fails on PR #82, #81, #54, #53 with 3 timed-out vitest tests, identical signature on each PR:
The 4 PRs have nothing in common in canvas/ — PR #82 is Go vanity imports (canvas package.json + 1 page), #81 is workflow YAML (no canvas/), #54 is one canvas/lib helper, #53 is canvas/next.config.ts. Same 3 tests fail. Other tests in those same files PASS (ActivityTab.test.tsx has 33 OTHER passing tests; ConfigTab.provider.test.tsx has 12 passing).
Phase 1 root cause
All three failing tests are the first test in their file's first describe block. They pay the full per-file cold-start cost on the runner:
coverage.provider: 'v8', CI invokesnpx vitest run --coverage)// @vitest-environment jsdomfile pays for window/document setup once)@/components/...and@/lib/...transitive closure (first React render in the file pulls these)Evidence — CI run
2321/jobs/2total times reported by vitest's own footer:For comparison, local run on the SAME branch with identical
--coverage: 9-15s total, all 1407 tests green. The CI runner is ~13x slower to import/transform under v8 coverage.The cliff-edge is real:
ContextMenu.keyboard.test.tsx > renders with role='menu'— 4463ms (PASS, 537ms margin)ExternalConnectionSection.test.tsxwhole file — 4955ms total (PASS, 45ms margin)Next CI run, any of those 4-second tests is one Docker host-CPU-load hiccup away from also flapping.
Why this is environmental, not a code bug
FILTERSconst has 7 entries (All, A2A In, A2A Out, Tasks, Skill Promo, Logs, Errors) — matches the test's expected array exactly.it(name, () => { ... })) — there's no awaitable race in them. The full 5000ms is consumed by import/transform/JSDOM-init that has nothing to do with the test body.Phase 2 design
Picked: bump
testTimeoutincanvas/vitest.config.tsto 30000ms whenprocess.env.CIis truthy, keep 5000ms locally.This is the documented vitest pattern (https://vitest.dev/config/testtimeout) and called out specifically for v8-coverage-instrumented CI runs (https://vitest.dev/guide/coverage#profiling-test-performance). Industry guides for React 19 + Testing Library + vitest under coverage explicitly recommend 15-30s in CI (e.g. johal.in 2026 guide).
Alternatives rejected
it(name, opts, fn)timeout bump — would require touching all 3 test files, leaves the cliff-edge tests (4463ms ContextMenu, 4955ms ExternalConnectionSection) un-protected, and the moment another test file becomes the first-test-cold-start victim, we file another bug. Whack-a-mole.--coverageon PR runs, only run on main — coverage observability was the explicit point of #1815 (commented in vitest.config.ts). Removing coverage on PR is regression of that work.Why root-cause-vs-symptom is unambiguous here
The SOP says prefer root-cause fix. Here the 'root cause' IS the env: vitest default of 5000ms was authored before v8-coverage-instrumented React 19 tests on Gitea Actions self-hosted Docker were a thing. The 5000ms default is the symptom of the wrong assumption that 5s is enough for a cold-start render under coverage. A CI-conditional 30000ms is fixing the wrong-assumption-default, not papering over a logic race. (Verified by line-by-line read of all 3 failing tests — no logic race exists; the tests pass synchronously once import is done.)
Phase 3 plan
devops-engineerpersona) editingcanvas/vitest.config.ts:testTimeout: process.env.CI ? 30000 : 5000Phase 4 verify
npx vitest run --coveragelocally 5x → confirm no slowdown locally (process.env.CInot set).Hostile self-review (3 weakest spots)
Cross-references