forked from molecule-ai/molecule-core
Workspaces stuck in status='provisioning' previously surfaced in three
bad ways:
1. **Details tab crashed** with `Cannot read properties of undefined
(reading 'toLocaleString')`. `BudgetSection` + `WorkspaceUsage`
assumed full response shapes but a provisioning-stuck workspace
returns partial `{}`. Guard each deep field with `?? 0` and cover
the partial-response case with regression tests.
2. **Missing required env vars failed silently** 15+ minutes later as
a cosmetic "Provisioning Timeout" banner. The in-container preflight
catches them but by then the container has already crashed without
calling /registry/register, so the workspace sat in 'provisioning'
forever. Mirror the preflight server-side: parse config.yaml's
`runtime_config.required_env` before launch, fail fast with a
WORKSPACE_PROVISION_FAILED event naming the missing vars.
3. **No backend timeout** ever flipped a stuck workspace to 'failed'.
Add a registry sweeper (10m default, env-overridable) that detects
workspaces stuck past the window, flips them to 'failed', and emits
WORKSPACE_PROVISION_TIMEOUT. Race-safe: the UPDATE re-checks the
status + age predicate so a concurrent register/restart wins.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|---|---|---|
| .. | ||
| artifacts | ||
| bundle | ||
| channels | ||
| crypto | ||
| db | ||
| envx | ||
| events | ||
| handlers | ||
| metrics | ||
| middleware | ||
| models | ||
| orgtoken | ||
| plugins | ||
| provisioner | ||
| registry | ||
| router | ||
| scheduler | ||
| supervised | ||
| ws | ||
| wsauth | ||