molecule-core/workspace-server/internal
Hongming Wang bdfa45572e fix(restart): clear running flag on panic in cycle()
Self-review caught a regression I introduced in #2266: if cycle() panics
(e.g. a future provisionWorkspace nil-deref or any runtime error from
the DB / Docker / encryption stacks it touches), the loop never reaches
`state.running = false`. The flag stays true forever, the early-return
guard at the top of coalesceRestart fires for every subsequent call,
and that workspace is permanently locked out of restarts until the
platform process restarts.

The pre-fix code had similar exposure (panic killed the goroutine
before defer wsMu.Unlock() ran in some Go versions), but my pending-
flag version made it worse: the guard is sticky, not ephemeral.

Fix: defer the state-clear so it always runs on exit, including panic.
Recover (and DON'T re-raise) so the panic doesn't propagate to the
goroutine boundary and crash the whole platform process — RestartByID
is always called via `go h.RestartByID(...)` from HTTP handlers, and
an unrecovered goroutine panic in Go terminates the program. Crashing
the platform for every tenant because one workspace's cycle panicked
is the wrong availability tradeoff. The panic message + full stack
trace via runtime/debug.Stack() are still logged for debuggability.

Regression test in TestCoalesceRestart_PanicInCycleClearsState:

  1. First call's cycle panics. coalesceRestart's defer must swallow
     the panic — assert no panic propagates out (would crash the
     platform process from a goroutine in production).
  2. Second call must run a fresh cycle (proves running was cleared).

All 7 tests pass with -race -count=10.

Surfaced via /code-review-and-quality self-review of #2266; the
re-raise-after-recover anti-pattern (originally argued as "don't
mask bugs") came up in the comprehensive review and was corrected
to log-with-stack-and-suppress for availability.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 00:00:12 -07:00
..
artifacts chore: sync staging to main — 1188 commits, 5 conflicts resolved (#1743) 2026-04-23 18:30:18 +00:00
bundle fix(platform): unblock SaaS workspace registration end-to-end 2026-04-21 03:06:46 -07:00
channels feat(channels): first-class Lark/Feishu support via schema-driven config 2026-04-24 11:51:15 -07:00
crypto chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
db test: schema_migrations tracking — 4 cases (first boot, re-boot, mixed, down.sql filter) 2026-04-18 11:52:27 -07:00
envx chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
events test(handlers): introduce events.EventEmitter interface (#1814 partial) 2026-04-26 09:05:52 -07:00
handlers fix(restart): clear running flag on panic in cycle() 2026-04-29 00:00:12 -07:00
imagewatch feat(workspace-server): GHCR digest watcher closes runtime CD chain (#2114) 2026-04-26 13:36:26 -07:00
metrics chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
middleware merge: resolve staging conflicts (a2a_proxy + workspace_crud) 2026-04-26 10:43:22 -07:00
models feat(runtime): adapter-declared idle_timeout_override end-to-end 2026-04-26 22:38:01 -07:00
orgtoken fix: F1085 rm scope concat + GH#756 ValidateToken terminal guard + CI test fixes 2026-04-24 07:16:54 +00:00
plugins chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
provisioner fix(provisioner): treat "removal already in progress" as no-op success 2026-04-27 13:25:32 -07:00
registry fix(orphan-sweeper): close TOCTOU race with issueAndInjectToken on restart 2026-04-27 17:28:50 -07:00
router merge: resolve staging conflicts (a2a_proxy + workspace_crud) 2026-04-26 10:43:22 -07:00
scheduler feat(runtime): native_scheduler skip — primitive #3 of 6 2026-04-26 22:47:00 -07:00
supervised chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
ws chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
wsauth chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00