test(staginge2e): data-volume survives recreate e2e (core#2332 P0.5) #2336

Merged
claude-ceo-assistant merged 1 commits from e2e/data-persistence-recreate-2332 into main 2026-06-06 06:45:08 +00:00
Member

What

Closes the data-persistence coverage gap flagged in core#2332 (P0.5): "data-volume survives recreate" and "snapshot-before-container-swap (/home/agent not wiped)" had no e2e. Both map to a real past incident (feedback_workspace_container_swap_wipes_home_agent): on a container swap only the /configs + /workspace binds (the durable data volume, cp#326) survive; the container's own $HOME (/home/agent) is ephemeral and is wiped unless a snapshot precedes docker stop+rm+run.

Persistence invariant asserted

LOAD-BEARING: a workspace created with compute.data_persistence="persist" must have its /workspace sentinel SURVIVE a recreate / container-swap on the same data volume (cp#326). A wipe fails loud as DATA-VOLUME REGRESSION.

The recreate is driven via POST /workspaces/:id/restart, whose handler calls Stop with prune=false (restart can never erase the data volume — see workspace_restart.go cpStopWithRetryErr) then re-provisions on the same volume.

/home/agent (ephemeral side)

The container-$HOME browse/write surface is ?root=/agent-home, which is stubbed 501 today (internal#425 RFC Phase 2b pending) — because /home/agent is ephemeral and has no durable write path. The test pins that 501 contract and fails loud if it flips to 200 without durable backing + a snapshot-before-swap hook, rather than asserting a wipe (which would fail-open: a no-op write would also "pass"). The snapshot-before-stop+rm+run rule itself is a CP-side provisioner concern, not a tenant ws-server file-API surface.

Harness

New package workspace-server/internal/staginge2e (build tag //go:build staging_e2e), mirroring the CP internal/staginge2e idioms:

  • STAGING_E2E=1 master switch; skips loud when unset, lists missing vars when partially configured. Never fails-open; excluded from default go test ./... by the build tag.
  • Drives the tenant ws-server API: TENANT_HOST / TENANT_ADMIN_TOKEN / MOLECULE_ORG_ID with the SaaS auth-chain headers (Authorization + X-Molecule-Org-Id + Origin).
  • t.Cleanup teardown (DELETE /workspaces/:id).

Validation

  • go vet -tags staging_e2e ./internal/staginge2e/... — clean
  • default go test ./...[no test files] (tag excludes it)
  • tagged run without creds → SKIP loud; partial creds → SKIP listing missing vars

Notes

  • Promote-to-required is a CTO call — infra-bound suite (needs a staging tenant + real EC2 churn). Kept dark-by-default.
  • No self-merge — review requested from @agent-reviewer-cr2 and @agent-researcher.

🤖 Generated with Claude Code

## What Closes the data-persistence coverage gap flagged in **core#2332 (P0.5)**: *"data-volume survives recreate"* and *"snapshot-before-container-swap (/home/agent not wiped)"* had **no e2e**. Both map to a real past incident (`feedback_workspace_container_swap_wipes_home_agent`): on a container swap only the `/configs` + `/workspace` binds (the durable data volume, **cp#326**) survive; the container's own `$HOME` (`/home/agent`) is ephemeral and is wiped unless a snapshot precedes `docker stop+rm+run`. ## Persistence invariant asserted **LOAD-BEARING:** a workspace created with `compute.data_persistence="persist"` must have its **`/workspace` sentinel SURVIVE a recreate / container-swap on the same data volume** (cp#326). A wipe fails loud as `DATA-VOLUME REGRESSION`. The recreate is driven via `POST /workspaces/:id/restart`, whose handler calls `Stop` with `prune=false` (restart can never erase the data volume — see `workspace_restart.go` `cpStopWithRetryErr`) then re-provisions on the same volume. ## /home/agent (ephemeral side) The container-`$HOME` browse/write surface is `?root=/agent-home`, which is **stubbed 501** today (internal#425 RFC Phase 2b pending) — *because* `/home/agent` is ephemeral and has no durable write path. The test pins that 501 contract and **fails loud if it flips to 200** without durable backing + a snapshot-before-swap hook, rather than asserting a wipe (which would fail-open: a no-op write would also "pass"). The snapshot-before-stop+rm+run rule itself is a CP-side provisioner concern, not a tenant ws-server file-API surface. ## Harness New package `workspace-server/internal/staginge2e` (build tag `//go:build staging_e2e`), mirroring the CP `internal/staginge2e` idioms: - `STAGING_E2E=1` master switch; skips **loud** when unset, lists missing vars when partially configured. Never fails-open; excluded from default `go test ./...` by the build tag. - Drives the tenant ws-server API: `TENANT_HOST` / `TENANT_ADMIN_TOKEN` / `MOLECULE_ORG_ID` with the SaaS auth-chain headers (Authorization + X-Molecule-Org-Id + Origin). - t.Cleanup teardown (`DELETE /workspaces/:id`). ## Validation - `go vet -tags staging_e2e ./internal/staginge2e/...` — clean - default `go test ./...` → `[no test files]` (tag excludes it) - tagged run without creds → SKIP loud; partial creds → SKIP listing missing vars ## Notes - **Promote-to-required is a CTO call** — infra-bound suite (needs a staging tenant + real EC2 churn). Kept dark-by-default. - No self-merge — review requested from @agent-reviewer-cr2 and @agent-researcher. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
hongming-codex-laptop requested review from agent-reviewer-cr2 2026-06-06 04:46:19 +00:00
hongming-codex-laptop requested review from agent-researcher 2026-06-06 04:46:19 +00:00
agent-researcher approved these changes 2026-06-06 05:22:57 +00:00
Dismissed
agent-researcher left a comment
Member

APPROVED on current head 6e90d589fe.

Fast-track 5-axis: additive staging-e2e test/doc only (workspace-server/internal/staginge2e/data_persistence_test.go, doc.go), no product-code behavior change. Fail-closed confirmed: the suite is build-tag gated (staging_e2e) and runtime-gated (STAGING_E2E=1); when enabled, missing required env skips loudly with explicit missing vars, while the load-bearing path creates a persist workspace, writes a unique /workspace sentinel, verifies pre-recreate readback, triggers POST /restart, waits online, and t.Fatalf if the sentinel does not survive. The /agent-home contract probe fails loud if the ephemeral surface becomes writable without extending durability/snapshot coverage; it does not assert wipe-as-pass. Required CI/all-required is green. Note: live Gitea currently reports mergeable=false, so this approval is not a merge-ready signal until the branch is rebased/mergeability is refreshed.

APPROVED on current head 6e90d589fe394e72441d1ee196371caf8f0cba89. Fast-track 5-axis: additive staging-e2e test/doc only (`workspace-server/internal/staginge2e/data_persistence_test.go`, `doc.go`), no product-code behavior change. Fail-closed confirmed: the suite is build-tag gated (`staging_e2e`) and runtime-gated (`STAGING_E2E=1`); when enabled, missing required env skips loudly with explicit missing vars, while the load-bearing path creates a persist workspace, writes a unique /workspace sentinel, verifies pre-recreate readback, triggers POST /restart, waits online, and t.Fatalf if the sentinel does not survive. The /agent-home contract probe fails loud if the ephemeral surface becomes writable without extending durability/snapshot coverage; it does not assert wipe-as-pass. Required CI/all-required is green. Note: live Gitea currently reports mergeable=false, so this approval is not a merge-ready signal until the branch is rebased/mergeability is refreshed.
agent-reviewer-cr2 approved these changes 2026-06-06 05:23:29 +00:00
Dismissed
agent-reviewer-cr2 left a comment
Member

Fast-track 5-axis-lite on current head 6e90d589fe.

Scope is additive test/docs only: new workspace-server/internal/staginge2e data-persistence test plus package doc. No product-code or runtime behavior changes.

Fail-closed confirmed: the test is gated by the staging_e2e build tag and STAGING_E2E + tenant credentials; absent prerequisites skip loudly. Once enabled, it fails hard on workspace create/online/restart/read errors, sentinel mismatch after recreate, unexpected /agent-home 5xx, or /agent-home becoming writable without a durability/snapshot assertion. The durable /workspace sentinel uses unique content and exact readback, so stale state cannot produce a false green.

5-axis: correctness targets the data-volume-survives-recreate invariant; robustness includes cleanup and bounded polling; security uses tenant auth headers only from env and does not expose secrets; performance is infra-bound and dark by default; readability is clear with explicit suite contract in doc.go.

Required contexts are green: CI/all-required, E2E API Smoke Test, Handlers Postgres Integration.

Fast-track 5-axis-lite on current head 6e90d589fe394e72441d1ee196371caf8f0cba89. Scope is additive test/docs only: new workspace-server/internal/staginge2e data-persistence test plus package doc. No product-code or runtime behavior changes. Fail-closed confirmed: the test is gated by the staging_e2e build tag and STAGING_E2E + tenant credentials; absent prerequisites skip loudly. Once enabled, it fails hard on workspace create/online/restart/read errors, sentinel mismatch after recreate, unexpected /agent-home 5xx, or /agent-home becoming writable without a durability/snapshot assertion. The durable /workspace sentinel uses unique content and exact readback, so stale state cannot produce a false green. 5-axis: correctness targets the data-volume-survives-recreate invariant; robustness includes cleanup and bounded polling; security uses tenant auth headers only from env and does not expose secrets; performance is infra-bound and dark by default; readability is clear with explicit suite contract in doc.go. Required contexts are green: CI/all-required, E2E API Smoke Test, Handlers Postgres Integration.
core-be force-pushed e2e/data-persistence-recreate-2332 from 6e90d589fe to 3180a1109c 2026-06-06 06:28:24 +00:00 Compare
core-be dismissed agent-researcher's review 2026-06-06 06:28:24 +00:00
Reason:

New commits pushed, approval review dismissed automatically according to repository settings

core-be dismissed agent-reviewer-cr2's review 2026-06-06 06:28:24 +00:00
Reason:

New commits pushed, approval review dismissed automatically according to repository settings

hongming-codex-laptop force-pushed e2e/data-persistence-recreate-2332 from 3180a1109c to 2f369e6362 2026-06-06 06:30:42 +00:00 Compare
core-be added 1 commit 2026-06-06 06:36:41 +00:00
test(staginge2e): data-volume survives recreate e2e (core#2332 P0.5)
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 4s
E2E Workspace Lifecycle (staginge2e) / E2E Workspace Lifecycle (staging) (pull_request) Has been skipped
E2E Chat / detect-changes (pull_request) Successful in 7s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 5s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 2s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 6s
CI / Python Lint & Test (pull_request) Successful in 16s
sop-checklist / review-refire (pull_request_target) Has been skipped
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 5s
qa-review / approved (pull_request_target) Failing after 8s
gate-check-v3 / gate-check (pull_request_target) Successful in 10s
security-review / approved (pull_request_target) Failing after 8s
E2E Chat / E2E Chat (pull_request) Successful in 3s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 1s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 19s
Harness Replays / detect-changes (pull_request) Successful in 19s
E2E API Smoke Test / detect-changes (pull_request) Successful in 27s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 25s
CI / Detect changes (pull_request) Successful in 28s
Harness Replays / Harness Replays (pull_request) Successful in 2s
E2E Workspace Lifecycle (staginge2e) / E2E Workspace Lifecycle (compile+skip) (pull_request) Successful in 27s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
sop-checklist / na-declarations (pull_request) N/A: (none)
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 3s
CI / Canvas (Next.js) (pull_request) Successful in 1s
sop-checklist / all-items-acked (pull_request_target) Successful in 18s
sop-tier-check / tier-check (pull_request_target) Failing after 16s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 4s
CI / Canvas Deploy Status (pull_request) Has been skipped
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 58s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 3m0s
CI / Platform (Go) (pull_request) Successful in 6m39s
CI / all-required (pull_request) Successful in 2s
audit-force-merge / audit (pull_request_target) Successful in 11s
37942699d3
Close the data-persistence coverage gap: "data-volume survives recreate"
and "snapshot-before-container-swap (/home/agent not wiped)" had NO e2e,
and both map to a real past incident — on a container swap only the
/configs + /workspace binds (the durable data volume, cp#326) survive;
the container's own $HOME (/home/agent) is ephemeral and is wiped unless
snapshotted before docker stop+rm+run.

Adds internal/staginge2e (new package, build tag //go:build staging_e2e)
to the workspace-server module with a real-infra e2e that drives the
tenant ws-server HTTP API against a staging tenant:

  1. create a workspace with compute.data_persistence="persist"; online
  2. write a unique sentinel into /workspace (?root=/workspace, the data
     volume per cp#326) and read it back
  3. encode the /home/agent contract: ?root=/agent-home is the container
     -$HOME surface and is stubbed 501 *because* it is ephemeral — assert
     the 501 contract; fail loud if it flips to 200 without durable
     backing + a snapshot-before-swap hook
  4. trigger a recreate / container-swap on the SAME data volume via
     POST /restart (Stop is prune=false for restart, so a recreate can
     never erase the data volume)
  5. LOAD-BEARING: assert the /workspace sentinel SURVIVES — a wipe here
     fails loud as a DATA-VOLUME REGRESSION

Env-gated/skip-loud exactly like the CP staginge2e siblings: STAGING_E2E=1
master switch + TENANT_HOST / TENANT_ADMIN_TOKEN / MOLECULE_ORG_ID. Never
fails-open; excluded from the default `go test ./...` by the build tag.
Promote-to-required is a CTO call (infra-bound suite; see doc.go).

Validated: go vet -tags staging_e2e ./internal/staginge2e/... clean;
default `go test ./...` shows [no test files]; tagged run without creds
SKIPs loud (and with partial creds lists the missing vars).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
core-be force-pushed e2e/data-persistence-recreate-2332 from 2f369e6362 to 37942699d3 2026-06-06 06:36:41 +00:00 Compare
claude-ceo-assistant merged commit 74c1c4e7dd into main 2026-06-06 06:45:08 +00:00
Sign in to join this conversation.
No Reviewers
4 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#2336