molecule-core/workspace-server/internal/registry
Hongming Wang cb12601414 fix(platform): make Provisioner.Stop return real errors so cleanup gates fire
Review caught a critical issue with 12c49183: the headline "skip
RemoveVolume when Stop fails" guarantee was dead code. `Provisioner.Stop`
unconditionally `return nil`'d after logging the underlying
ContainerRemove error, so the new `if err := h.provisioner.Stop(...);
err != nil { skip volume }` guard in workspace_crud.go AND the same
guard in the orphan sweeper could never fire. RemoveVolume always
ran, predictably failing with "volume in use" when Stop hadn't
actually killed the container — which is the exact production bug
the commit claimed to fix.

Now Stop:
  - returns nil on successful remove (no change)
  - returns nil when the container is already gone (uses the existing
    isContainerNotFound helper — that's the cleanup post-condition,
    not a failure)
  - returns the wrapped Docker error otherwise (daemon timeout, ctx
    cancellation, socket EOF — anything that means the container
    might still be alive)

Audited every Provisioner.Stop caller in the tree (team.go,
workspace_restart.go ×4, workspace.go) — all of them already
discard the return value, so the widened error surface is purely
opt-in for the new cleanup paths and breaks no existing behaviour.

Other review-driven fixes in this commit:

- workspace_crud.go: detached `broadcaster.RecordAndBroadcast` from
  the request ctx too. RecordAndBroadcast does INSERT INTO
  structure_events + Redis Publish; if the canvas hangs up, a
  request-ctx-bound INSERT can be cancelled mid-write and the
  WORKSPACE_REMOVED event never lands, leaving other WS clients
  ignorant of the cascade.

- orphan_sweeper.go: added isLikelyWorkspaceID guard before turning
  Docker container prefixes into SQL LIKE patterns. The Docker
  name filter is a SUBSTRING match (not prefix), so non-workspace
  containers like `my-ws-tool` slip through; the in-loop HasPrefix
  in provisioner trims most, but the in-sweeper alphabet check
  (hex + dashes only) is the second line of defence and also
  blocks SQL LIKE wildcards (`_`, `%`) from reaching the query.
  Two new tests pin this — TestSweepOnce_FiltersNonWorkspacePrefixes
  and TestIsLikelyWorkspaceID with 10 alphabet cases.

- provisioner.go: comment added to ListWorkspaceContainerIDPrefixes
  flagging the substring/HasPrefix relationship as load-bearing.

Verified: full Go test suite passes; all 8 sweeper tests pass
(2 new for the LIKE-pattern guard); existing dispatch / delete /
provisioner tests unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 23:32:48 -07:00
..
access_test.go chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
access.go chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
healthsweep_test.go chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
healthsweep.go chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
hibernation_test.go chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
hibernation.go chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
liveness_test.go chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
liveness.go chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
orphan_sweeper_test.go fix(platform): make Provisioner.Stop return real errors so cleanup gates fire 2026-04-25 23:32:48 -07:00
orphan_sweeper.go fix(platform): make Provisioner.Stop return real errors so cleanup gates fire 2026-04-25 23:32:48 -07:00
provisiontimeout_test.go fix(sweeper): emit WORKSPACE_PROVISION_FAILED so canvas updates UI 2026-04-20 20:38:41 -07:00
provisiontimeout.go fix(sweeper): emit WORKSPACE_PROVISION_FAILED so canvas updates UI 2026-04-20 20:38:41 -07:00