forked from molecule-ai/molecule-core
Closes #2865 (split-B of the #2669 root-cause stack). The phantom-busy sweep in workspace-server/internal/scheduler/scheduler.go already logs each row reset, but no aggregate metric surfaces "how often is this firing." A regression that causes high reset rates (e.g. controlplane#481's missing env vars, or future drift in the workspace runtime's task-lifecycle accounting) only surfaces when users complain. Fix: counter exposed at /metrics as molecule_phantom_busy_resets_total, incremented from sweepPhantomBusy after each row whose active_tasks was reset. Same shape as existing molecule_websocket_connections_active. Operator-side dashboard: alert when daily phantom-busy reset count > 0.5% of active workspaces. Today's steady-state is near-zero; any increase is a regression signal. Tests: - TestTrackPhantomBusyReset_IncrementsCounter - TestTrackPhantomBusyReset_RaceFreeUnderConcurrentWrites (50×200 concurrent writes; tests atomic invariant) - TestHandler_ExposesPhantomBusyResetsCounter (asserts HELP + TYPE + value lines in Prometheus text format) - TestHandler_PhantomBusyResetsZeroByDefault (fresh-process 0 contract — prevents a future refactor from accidentally dropping the metric from /metrics) Race-detector clean. Vet clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| native_scheduler_test.go | ||
| scheduler_test.go | ||
| scheduler.go | ||