molecule-core/workspace-server
Hongming Wang 11c9ed2a46 fix(provision): StopWorkspaceAuto mirror — close SaaS EC2-leak class
Closes #2813 (team-collapse) and #2814 (workspace delete).

Two leaks, one class. Both call sites had the same shape pre-fix:

  if h.provisioner != nil {
      h.provisioner.Stop(ctx, wsID)
  }

On SaaS where h.provisioner (Docker) is nil and h.cpProv is set, that
gate evaluates false and the EC2 keeps running. Workspace gets marked
removed in DB; EC2 lives on until the orphan sweeper catches it.

Same drift class as PR #2811's org-import provision bug — a Docker-
only check on what should be a both-backend operation. Confirmed in
production: PR #2811's verification step deleted a test workspace and
the EC2 stayed running until I terminated it manually.

Fix: WorkspaceHandler.StopWorkspaceAuto(ctx, wsID) — symmetric mirror
of provisionWorkspaceAuto. CP first, Docker second, no-op when neither
is wired (a workspace nobody is running can't be stopped — that's a
no-op, not a failure, distinct from provision's mark-failed contract).

Three call-site changes:
- team.go:208 (Collapse) → h.wh.StopWorkspaceAuto(ctx, childID)
- workspace_crud.go:432 (stopAndRemove) → h.StopWorkspaceAuto(...);
  RemoveVolume stays Docker-only behind an explicit gate since
  CP-managed workspaces have no host-bind volumes
- TeamHandler.provisioner field + NewTeamHandler's *Provisioner param
  removed as dead code (Stop was the only call site)

Volume cleanup separation is intentional: the abstraction is "stop
the running workload," not "tear down all state." Callers that need
volume cleanup keep their `if h.provisioner != nil { RemoveVolume }`
gate AFTER the Stop call.

Tests:
- TestStopWorkspaceAuto_RoutesToCPWhenSet — SaaS path
- TestStopWorkspaceAuto_RoutesToDockerWhenOnlyDocker — self-hosted
- TestStopWorkspaceAuto_NoBackendIsNoOp — pins the contract distinction
  from provisionWorkspaceAuto's mark-failed
- TestNoCallSiteCallsBareStop — source-level pin against
  `.provisioner.Stop(` / `.cpProv.Stop(` outside the dispatcher,
  per-backend bodies, restart helper, and the Docker-daemon-direct
  short-lived-container path. Strips Go comments before substring
  match so archaeology in code comments doesn't trip the gate.
- Verified: pin FAILS against the buggy shape (workspace_crud.go
  reversion); team.go reversion compile-fails because the field is
  gone — even stronger than the test.

Out of scope (tracked under #2799):
- workspace_restart.go's manual if-cpProv-else dispatch with retry
  semantics tuned for the restart hot path. Functionally equivalent
  + wraps cpStopWithRetry, so it's not the bug class this PR closes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 20:00:23 -07:00
..
cmd Memory v2 fixup Critical: wire plugin from main.go (was fully dormant) 2026-05-04 10:22:30 -07:00
internal fix(provision): StopWorkspaceAuto mirror — close SaaS EC2-leak class 2026-05-04 20:00:23 -07:00
migrations feat(db): add per-peer btree indexes on activity_logs for chat_history scale (#2478) 2026-05-03 11:34:35 -07:00
pkg/provisionhook
.ci-force
.gitignore
.golangci.yaml
Dockerfile feat(deploy): verify each tenant /buildinfo matches published SHA after redeploy 2026-04-30 10:55:08 -07:00
Dockerfile.tenant feat(deploy): verify each tenant /buildinfo matches published SHA after redeploy 2026-04-30 10:55:08 -07:00
entrypoint-tenant.sh
go.mod chore(deps)(deps): bump github.com/creack/pty in /workspace-server 2026-05-02 19:22:48 +00:00
go.sum chore(deps)(deps): bump github.com/creack/pty in /workspace-server 2026-05-02 19:22:48 +00:00