molecule-core/workspace-server
Hongming Wang 92d99d96fe fix(provisioner): treat "removal already in progress" as no-op success
Cascade-deleting a 7-workspace org returned 500 with

  "workspace marked removed, but 2 stop call(s) failed — please retry:
   stop eeb99b5d-...: force-remove ws-eeb99b5d-607: Error response
   from daemon: removal of container ws-eeb99b5d-607 is already in
   progress"

even though the DB-side post-condition succeeded (removed_count=7) and
the containers WERE removed shortly after. The fanout fired Stop() on
every workspace concurrently and the orphan sweeper happened to reap
two of them at the same instant, so Docker rejected the second
ContainerRemove with "removal already in progress" — a race-condition
ack, not a real failure. Retrying just races the same in-flight
removal.

The post-condition we care about (the container WILL be gone) is
identical to a successful removal, so Stop() should treat it the
same way it already treats "No such container" — a no-op return nil
that lets the caller proceed with volume cleanup. Real daemon
failures (timeout, EOF, ctx cancel) still surface as errors.

Two pieces:

  - New isRemovalInProgress() predicate using the same string-match
    approach as isContainerNotFound (docker/docker has no typed
    errdef for this; the CLI itself relies on the message).

  - Stop() now treats the predicate as success, with a log line
    distinct from the not-found path so debugging can tell which
    race fired.

Both substrings ("removal of container" + "already in progress") must
match — "already in progress" alone would false-positive on unrelated
operations like image pulls. Truth table pinned in 7 new test cases.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 13:25:32 -07:00
..
cmd/server feat(runtime): native_scheduler skip — primitive #3 of 6 2026-04-26 22:47:00 -07:00
internal fix(provisioner): treat "removal already in progress" as no-op success 2026-04-27 13:25:32 -07:00
migrations chore: second-pass review polish — symmetry + clearer test fixtures 2026-04-25 08:48:30 -07:00
pkg/provisionhook feat(#1957): wire gh-identity plugin into workspace-server 2026-04-24 15:01:41 +00:00
.ci-force chore: force Platform(Go) CI run on main — validate go vet clean 2026-04-21 15:43:19 +00:00
.gitignore
.golangci.yaml chore(workspace-server): add golangci.yaml disabling errcheck 2026-04-24 07:16:54 +00:00
Dockerfile chore: extract ContextMenu Zustand fix + a2a_proxy local-docker SSRF bypass + workspace-server Dockerfile GID entrypoint 2026-04-22 20:00:16 -07:00
Dockerfile.tenant feat(terminal): remote path via aws ec2-instance-connect + pty 2026-04-21 18:13:29 -07:00
entrypoint-tenant.sh fix(security): add USER directive before ENTRYPOINT in all tenant images (#1155) 2026-04-20 23:51:33 +00:00
go.mod deps(redis): bump go-redis/v9 v9.7.0 → v9.7.3 (GHSA-92cp-5422-2mw7) 2026-04-27 06:54:13 -07:00
go.sum deps(redis): bump go-redis/v9 v9.7.0 → v9.7.3 (GHSA-92cp-5422-2mw7) 2026-04-27 06:54:13 -07:00