forked from molecule-ai/molecule-core
Independent review of #2362 caught a Critical gap: the previous commit fixed the Stop dispatch in runRestartCycle but left the provisionWorkspace dispatch unconditionally Docker-only. So on SaaS the auto-restart cycle would Stop the EC2 successfully (good), then NPE inside provisionWorkspace's `h.provisioner.VolumeHasFile` call. coalesceRestart's recover()-without- re-raise (a deliberate platform-stability safeguard) silently swallowed the panic, leaving the workspace permanently stuck in status='provisioning' because the UPDATE on workspace_restart.go:450 had already run. Net pre-amendment effect on SaaS: dead agent → structured 503 (good) → workspace flipped to 'offline' (good) → cpProv.Stop succeeded (good) → provisionWorkspace NPE swallowed (bad) → workspace permanently 'provisioning' until manual canvas restart. The headline claim of #2362 ("SaaS auto-restart now works") was false on the path it shipped. Fix: dispatch the reprovision call the same way every other call site in the package does (workspace.go:431-433, workspace_restart.go:197+596) — branch on `h.cpProv != nil` and call provisionWorkspaceCP for SaaS, provisionWorkspace for Docker. Tests: - New TestRunRestartCycle_SaaSPath_DispatchesViaCPProv asserts cpProv.Stop is called when the SaaS path runs (would have caught the NPE if provisionWorkspace had been called instead). - fakeCPProv updated: methods record calls and return nil/empty by default rather than panicking. The previous "panic on unexpected call" pattern was unsafe — the panic fires on the async restart goroutine spawned by maybeMarkContainerDead AFTER the test assertions ran, so the test passed by accident even though the production path was broken (which is exactly how the Critical bug landed). - Existing tests still pass (full handlers + provisioner suites green). Branch-count audit refresh: runRestartCycle dispatch decisions: 1. h.provisioner != nil → provisioner.Stop + provisionWorkspace ✓ (existing tests) 2. h.cpProv != nil → cpProv.Stop + provisionWorkspaceCP ✓ (NEW test) 3. both nil → coalesceRestart never called (RestartByID gate) ✓ Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| cmd/server | ||
| internal | ||
| migrations | ||
| pkg/provisionhook | ||
| .ci-force | ||
| .gitignore | ||
| .golangci.yaml | ||
| Dockerfile | ||
| Dockerfile.tenant | ||
| entrypoint-tenant.sh | ||
| go.mod | ||
| go.sum | ||