molecule-core

History

Hongming Wang f088090b27 fix(restart): coalesce concurrent restart requests via pending flag The naive mutex-with-TryLock pattern in RestartByID was silently dropping the second of two close-together restart requests. SetSecret and SetModel both fire `go restartFunc(...)` from their HTTP handlers, and both DB writes commit before either restart goroutine reaches loadWorkspaceSecrets. If the second goroutine arrives while the first holds the per-workspace mutex, TryLock returns false and the second is logged-and-dropped: Auto-restart: skipping <id> — restart already in progress The first goroutine's loadWorkspaceSecrets ran before the second write committed, so the new container boots without that env var. Surfaced during the RFC #2251 V1.0 measurement as hermes returning "No LLM provider configured" when MODEL_PROVIDER landed after the API-key write and lost its restart to the mutex (HERMES_DEFAULT_MODEL absent → start.sh fell back to nousresearch/hermes-4-70b → derived provider=openrouter → no OPENROUTER_API_KEY → request-time error). The same race hits any back-to-back secret/model save flow including the canvas's "set MiniMax key + pick model" UX. Fix: pending-flag / coalescing pattern. Any restart request that arrives while one is in flight sets `pending=true` and returns. The in-flight runner, on completion, checks the flag and runs another cycle. This collapses N concurrent requests into at most 2 sequential cycles (the current one + one more that picks up everyone who arrived during it), while guaranteeing the final container always sees the latest secrets. Concrete contract: - 1 request, no concurrency: 1 cycle - N concurrent requests during 1 in-flight cycle: 2 cycles total - N sequential requests (no overlap): N cycles - Per-workspace state — different workspaces never serialize Coalescing is extracted into `coalesceRestart(workspaceID, cycle func())` so the gate logic is testable without the full WorkspaceHandler / DB / provisioner stack. RestartByID now wraps that with the production cycle function. runRestartCycle calls provisionWorkspace SYNCHRONOUSLY (drops the historical `go`) so the loop's pending-flag check happens AFTER the new container is up — without that, the next cycle's Stop call would race the previous cycle's still-spawning provision goroutine. sendRestartContext stays async; it's a one-way notification. Tests in workspace_restart_coalesce_test.go cover all five contract points + race-detector clean over 10 iterations: - Single call → 1 cycle - 5 concurrent during in-flight → exactly 2 cycles total - 3 sequential → 3 cycles - Pending-during-cycle picked up (targeted bug repro) - State cleared after drain (running flag reset) - Per-workspace isolation (no cross-workspace serialization) Refs: molecule-core#2256 (V1.0 gate measurement); root cause for the "No LLM provider configured" symptom seen during hermes/MiniMax repro. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>		2026-04-28 23:31:56 -07:00
..
cmd/server	feat(runtime): native_scheduler skip — primitive #3 of 6	2026-04-26 22:47:00 -07:00
internal	fix(restart): coalesce concurrent restart requests via pending flag	2026-04-28 23:31:56 -07:00
migrations	chore: second-pass review polish — symmetry + clearer test fixtures	2026-04-25 08:48:30 -07:00
pkg/provisionhook	feat(#1957 ): wire gh-identity plugin into workspace-server	2026-04-24 15:01:41 +00:00
.ci-force	chore: force Platform(Go) CI run on main — validate go vet clean	2026-04-21 15:43:19 +00:00
.gitignore	feat(ws-server): pull env from CP on startup	2026-04-19 02:41:15 -07:00
.golangci.yaml	chore(workspace-server): add golangci.yaml disabling errcheck	2026-04-24 07:16:54 +00:00
Dockerfile	chore: extract ContextMenu Zustand fix + a2a_proxy local-docker SSRF bypass + workspace-server Dockerfile GID entrypoint	2026-04-22 20:00:16 -07:00
Dockerfile.tenant	feat(terminal): remote path via aws ec2-instance-connect + pty	2026-04-21 18:13:29 -07:00
entrypoint-tenant.sh	fix(security): add USER directive before ENTRYPOINT in all tenant images (#1155 )	2026-04-20 23:51:33 +00:00
go.mod	chore(deps): batch dep bumps — 11 safe upgrades from 2026-04-28 dependabot wave	2026-04-28 16:25:46 -07:00
go.sum	chore(deps): batch dep bumps — 11 safe upgrades from 2026-04-28 dependabot wave	2026-04-28 16:25:46 -07:00