Add workspace-lifecycle real-infra staginge2e (core#2332 P1.10) #2338
Reference in New Issue
Block a user
Delete Branch "core2332-p110-workspace-lifecycle-staginge2e"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
What
Close the workspace-lifecycle coverage gap (core#2332 P1.10): soft-restart / pause / resume / hibernate were only unit-tested (httptest in
workspace-server/internal/handlers/*_test.go) and never proven against a real container.New Go suite
workspace-server/internal/staginge2e(//go:build staging_e2e), mirroring the cpinternal/staginge2eharness idioms (cp#386):STAGING_E2E=1gate,CP_ADMIN_API_TOKENadmin surface, provision → wait-online → assert,t.Cleanupteardown. molecule-core has no CP client packages, so the harness is HTTP-only and self-contained (nogo.moleculesai.app/controlplane/...imports).What each transition asserts (observable state, not just HTTP 200)
TestWorkspaceLifecycle_Stagingprovisions a real throwaway staging tenant + workspace, then:POST /restart{"status":"provisioning"}; then liveGET /workspaces/:id→online+ routableurl; post-restart A2A serve probe returns 2xx (container actually came back, not just a row flip)POST /pause→ paused;urlcleared; A2A no longer serves — the genuinely-stopped signal (a flag-only handler would still be reachable)POST /resumepaused → provisioning → online+ routable; serveable againPOST /hibernate?force=true→ hibernated(settled, not stuck mid-hibernating);urlcleared; unserveablehibernated → online(Resume only handlespaused); serveable againStatus is read from the live DB-backed
GET /workspaces/:id— the lifecycle POST body could lie; the GET proves the row. The restart provisioning window is observed non-fatally (a fast box can race back to online before the first poll); the load-bearing assertions are eventual online+routable + a successful serve probe.Honest limit (precise TODO left in code)
The strongest "container stopped" signal is the EC2/Docker power-state, which is only observable CP-side (AWS/SSM) and not reachable from the core ws-server module without importing the CP client surface.
assertNotServingasserts the strongest signal available here (urlcleared + immediate non-serve) with aTODO(core#2332)to tighten to instance power-state if a CP admin endpoint ever surfaces it to the tenant API. Nothing was weakened to make this pass.CI
New workflow
e2e-workspace-lifecycle.yml:go vet -tags staging_e2e+ assert the suite SKIPs LOUD without creds. NOT a fake-green mask (a broken test file fails at PR time). Non-required.workflow_dispatch/schedule(daily 08:00 UTC, offset from e2e-staging-saas 07:00 + peer-visibility 07:30) withCP_BASE_URL+CP_STAGING_ADMIN_API_TOKEN. Teardown via testt.Cleanup; the age-guardedsweep-stale-e2e-orgs(30-min floor,e2e-prefix) is the final net.Advisory-by-infra: this suite needs a live staging tenant, so it is intentionally NOT a merge-blocking required check. Promote-to-required is a separate CTO decision (mirrors cp#386 and the peer-visibility flip-to-required pattern, molecule-core#1296).
Validation
No self-merge — routed to agent-reviewer-cr2 + agent-researcher.
🤖 Generated with Claude Code
Close the workspace-lifecycle coverage gap: soft-restart / pause / resume / hibernate were only unit-tested (httptest in workspace-server/internal/handlers/*_test.go) and never proven against a real container. New Go suite workspace-server/internal/staginge2e (build tag //go:build staging_e2e), mirroring the cp internal/staginge2e idioms (cp#386): STAGING_E2E=1 gate, CP_ADMIN_API_TOKEN admin surface, provision -> wait-online -> assert, t.Cleanup teardown. Core has no CP client packages, so the harness is HTTP-only and self-contained. TestWorkspaceLifecycle_Staging provisions a real throwaway staging tenant + workspace, then drives each lifecycle endpoint and asserts OBSERVABLE state (not just HTTP 200): - restart -> body provisioning, then GET status -> online+routable, and a post-restart A2A serve probe succeeds (container actually back). - pause -> status paused + url cleared + workspace no longer serves A2A (the genuinely-stopped signal: a flag-only handler would still serve). resume -> online + serveable again. - hibernate-> status hibernated + url cleared + unserveable; wake via the next A2A message -> online + serveable (auto-wake-on-message; Resume only handles paused). Status is read from the live DB-backed GET /workspaces/:id (the lifecycle POST body could lie; the GET proves the row). The restart provisioning window is observed non-fatally (a fast box can race back to online before the first poll) — the load-bearing assertions are eventual online+routable and a successful serve probe. The strongest "container stopped" signal is EC2/Docker power-state, only observable CP-side (AWS/SSM) and not reachable from the core ws-server module; assertNotServing asserts the strongest signal available here (url cleared + immediate non-serve) with a precise TODO(core#2332). Advisory-by-infra: the real run needs a live staging tenant, so the new workflow e2e-workspace-lifecycle.yml runs it on workflow_dispatch / schedule only (daily 08:00 UTC, offset from the other staging e2es). The PR path is a cheap honest compile+skip gate (vet under the tag + assert it SKIPs LOUD without creds) — NOT required. Promote-to-required is a separate CTO decision (mirrors cp#386 / the peer-visibility flip pattern, molecule-core#1296). Validation: go vet -tags staging_e2e ./internal/staginge2e/... (clean); go test -tags staging_e2e ./internal/staginge2e/ -run TestWorkspaceLifecycle -count=1 compiles and SKIPs loud without creds; gofmt clean; default `go test ./...` excludes the package (tag-gated). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>