The existing sweeper only reaps ws-* containers whose workspace row
has status='removed'. That misses the entire wiped-DB case: an
operator does `docker compose down -v` (kills the postgres volume),
the previous platform's ws-* containers keep running, the new
platform boots into an empty workspaces table — first pass finds
zero candidates and those containers leak forever. Symptom users
hit today: 7 ws-* containers from 11h ago, no rows in DB, no
visibility in Canvas, eating CPU + memory.
Fix shape:
1. Provisioner stamps every ws-* container + volume with
`molecule.platform.managed=true`. Without a label, the sweeper
would have to assume any unlabeled ws-* container might belong
to a sibling platform stack on a shared Docker daemon.
2. Provisioner exposes ListManagedContainerIDPrefixes — a label-filter
counterpart to the existing name-filter.
3. Sweeper splits sweepOnce into two independent passes:
- sweepRemovedRows (unchanged behavior; status='removed' only)
- sweepLabeledOrphansWithoutRows (new; labeled containers whose
workspace_id has no row in the table at all)
Each pass has its own short-circuit so an empty result or transient
error in one doesn't block the other — load-bearing because the
wiped-DB pass exists precisely for cases where the removed-row
pass finds nothing.
Safe under multi-platform-on-shared-daemon: only containers carrying
our label get reaped, sibling stacks' containers are invisible to this
pass. (For now the label is a constant string; a future per-instance
UUID layer can refine "ours" further if a real shared-daemon scenario
emerges.)
Migration: existing platforms running pre-PR builds have UNLABELED
ws-* containers. After this lands they continue to NOT be reaped by
the new path (no label = invisible). They'll only be cleaned via
manual intervention or once the operator recreates them — same as
today. No regression.
Tests cover all five branches of the new pass: happy-path reap,
no-reap when row exists, mixed reap-some-keep-some, Docker error
short-circuits cleanly, non-UUID prefixes get filtered before the
SQL query.
Pairs with PR #2122 (script-level fix). Together they close the
orphan-leak path for both `bash scripts/nuke-and-rebuild.sh` users
(handled by the script) AND `docker compose down -v` users (handled
by the runtime).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>