molecule-core/workspace-server/internal/scheduler
Hongming Wang c778b62202 feat(metrics): add molecule_phantom_busy_resets_total counter (#2865)
Closes #2865 (split-B of the #2669 root-cause stack).

The phantom-busy sweep in workspace-server/internal/scheduler/scheduler.go
already logs each row reset, but no aggregate metric surfaces "how often
is this firing." A regression that causes high reset rates (e.g.
controlplane#481's missing env vars, or future drift in the workspace
runtime's task-lifecycle accounting) only surfaces when users complain.

Fix: counter exposed at /metrics as molecule_phantom_busy_resets_total,
incremented from sweepPhantomBusy after each row whose active_tasks
was reset. Same shape as existing molecule_websocket_connections_active.

Operator-side dashboard: alert when daily phantom-busy reset count
> 0.5% of active workspaces. Today's steady-state is near-zero; any
increase is a regression signal.

Tests:
  - TestTrackPhantomBusyReset_IncrementsCounter
  - TestTrackPhantomBusyReset_RaceFreeUnderConcurrentWrites (50×200
    concurrent writes; tests atomic invariant)
  - TestHandler_ExposesPhantomBusyResetsCounter (asserts HELP + TYPE
    + value lines in Prometheus text format)
  - TestHandler_PhantomBusyResetsZeroByDefault (fresh-process 0
    contract — prevents a future refactor from accidentally dropping
    the metric from /metrics)

Race-detector clean. Vet clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:45:24 -07:00
..
native_scheduler_test.go feat(runtime): native_scheduler skip — primitive #3 of 6 2026-04-26 22:47:00 -07:00
scheduler_test.go fix(scheduler): prevent wedge on invalid UTF-8 + unbounded DB ops (#2026) 2026-04-24 11:00:47 -07:00
scheduler.go feat(metrics): add molecule_phantom_busy_resets_total counter (#2865) 2026-05-05 04:45:24 -07:00