molecule-core/workspace-server/internal
Hongming Wang 5167e482d0 fix(cp-provisioner): surface CP non-2xx on Stop to plug EC2 leak
http.Client.Do only errors on transport failure — a CP 5xx (AWS
hiccup, missing IAM, transient outage) was silently treated as
success. Workspace row then flipped to status='removed' and the EC2
stayed alive forever with no DB pointer (the "orphan EC2 on a
0-customer account" scenario flagged in workspace_crud.go #1843).
Found while triaging 13 zombie workspace EC2s on demo-prep staging.

Adds a status-code check that returns an error tagged with the
workspace ID + status + bounded body excerpt, so the existing
loud-fail path in workspace_crud.go's Delete handler can populate
stop_failures and surface a 500. Body read is io.LimitReader-capped
at 512 bytes to keep error logs sane during a CP outage.

Tests: 4 new (5xx surfaces, 4xx surfaces, 2xx variants 200/202/204
all succeed, long body is truncated). Test-first verified — the
first three fail on the buggy code and all four pass on the fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 22:59:01 -07:00
..
artifacts
buildinfo feat(deploy): verify each tenant /buildinfo matches published SHA after redeploy 2026-04-30 10:55:08 -07:00
bundle refactor(workspace-status): typed constants + AST-based drift gate 2026-04-30 10:41:41 -07:00
channels feat(channels): first-class Lark/Feishu support via schema-driven config 2026-04-24 11:51:15 -07:00
crypto
db refactor(workspace-status): catch missed literal in workspace_bootstrap.go + add literal-drift gate 2026-04-30 10:51:01 -07:00
envx
events test(handlers): introduce events.EventEmitter interface (#1814 partial) 2026-04-26 09:05:52 -07:00
handlers test(sweeper): integration coverage for manifest-override + accessor consolidation 2026-05-01 22:00:36 -07:00
imagewatch feat(workspace-server): GHCR digest watcher closes runtime CD chain (#2114) 2026-04-26 13:36:26 -07:00
metrics
middleware fix(tenant-guard): allowlist /buildinfo so redeploy verifier can reach it 2026-04-30 12:54:51 -07:00
models refactor(workspace-status): typed constants + AST-based drift gate 2026-04-30 10:41:41 -07:00
orgtoken
plugins
provisioner fix(cp-provisioner): surface CP non-2xx on Stop to plug EC2 leak 2026-05-01 22:59:01 -07:00
registry test(sweeper): integration coverage for manifest-override + accessor consolidation 2026-05-01 22:00:36 -07:00
router feat(workspace-server): PUT /provider endpoint for explicit LLM provider (#196) 2026-04-30 22:25:48 -07:00
scheduler feat(runtime): native_scheduler skip — primitive #3 of 6 2026-04-26 22:47:00 -07:00
supervised
ws
wsauth refactor(wsauth): extract lookupTokenByHash to dedup auth predicate across 3 callers 2026-04-30 03:11:38 -07:00