molecule-core/workspace-server/internal
Hongming Wang c23ff848aa
fix(cp-provisioner): look up real EC2 instance_id for Stop + IsRunning (#1738)
Resolves a "Save & Restart cascade" failure on SaaS tenants. Observed
2026-04-22 on hongmingwang workspace a8af9d79 after a Config-tab save:

  03:13:20 workspace deprovision: TerminateInstances
           InvalidInstanceID.Malformed: a8af9d79-... is malformed
  03:13:21 workspace provision: CreateSecurityGroup
           InvalidGroup.Duplicate: workspace-a8af9d79-394 already
           exists for VPC vpc-09f85513b85d7acee

Root cause: CPProvisioner.Stop and IsRunning passed the workspace UUID
as the `instance_id` query param to CP. CP forwarded it to EC2
TerminateInstances, which rejected it (EC2 ids are i-…, not UUIDs).
The failed terminate left the workspace's SG attached → the immediate
re-provision hit InvalidGroup.Duplicate → user saw `provisioning
failed`.

Fix: both methods now call a new `resolveInstanceID` that reads
`workspaces.instance_id` from the tenant DB and passes the real EC2
id downstream. When no row / no instance_id exists, Stop is a no-op
and IsRunning returns (false, nil) so restart cascades can freshly
re-provision.

resolveInstanceID is exposed as a `var` package-level func so tests
can swap it for a pairs-map stub without standing up sqlmock — the
per-table DB scaffolding was a heavier price than the surface
warranted given these tests are about the CP HTTP flow downstream
of the lookup, not the lookup SQL itself.

Adds regression tests:
  - TestStop_EmptyInstanceIDIsNoop: no DB row → no CP call
  - TestIsRunning_UsesDBInstanceID: DB id round-trips to CP
  - TestIsRunning_EmptyInstanceIDReturnsFalse: no instance → false/nil
Updates existing tests to assert the resolved instance_id (i-abc123
variants) instead of the previous buggy workspaceID.

After this lands, user's existing workspaces with stale instance_id
bindings still need a manual cleanup of the orphaned EC2 + SG (done
for a8af9d79 today). Future restarts use the correct id.

Co-authored-by: Hongming Wang <hongmingwang.rabbit@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 18:25:29 +00:00
..
artifacts test(handlers): add CWE-22 regression suite + KI-005 terminal access fix + tests (#1574) 2026-04-22 15:30:11 +00:00
bundle fix(platform): unblock SaaS workspace registration end-to-end 2026-04-21 03:06:46 -07:00
channels test(handlers): add CWE-22 regression suite + KI-005 terminal access fix + tests (#1574) 2026-04-22 15:30:11 +00:00
crypto chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
db test: schema_migrations tracking — 4 cases (first boot, re-boot, mixed, down.sql filter) 2026-04-18 11:52:27 -07:00
envx chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
events chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
handlers fix(canvas/a11y): aria-hidden SVGs, MissingKeysModal semantics, session cookie auth (#1744) 2026-04-23 17:39:38 +00:00
metrics chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
middleware fix(canvas/a11y): aria-hidden SVGs, MissingKeysModal semantics, session cookie auth (#1744) 2026-04-23 17:39:38 +00:00
models fix: CWE-78 rm scope, go vet failures, delegation idempotency 2026-04-21 18:22:30 +00:00
orgtoken fix(platform-go-ci): align test mocks with schema drift + org_id context contract (#1755) 2026-04-23 07:14:33 +00:00
plugins chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
provisioner fix(cp-provisioner): look up real EC2 instance_id for Stop + IsRunning (#1738) 2026-04-23 18:25:29 +00:00
registry fix(sweeper): emit WORKSPACE_PROVISION_FAILED so canvas updates UI 2026-04-20 20:38:41 -07:00
router fix(review): address code review blockers on tool-trace + instructions 2026-04-22 16:18:06 -07:00
scheduler feat(scheduler): sweepPhantomBusy — clear stuck active_tasks from crashed runs (extracted from #1664) 2026-04-22 19:57:49 -07:00
supervised chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
ws chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
wsauth chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00