History

Hongming Wang 4fce32ec3c fix(e2e): teardown patience matches prod cascade duration (~30–90s) E2E Staging SaaS has been failing on every cron + push run since 2026-04-27 with `LEAK: org … still present post-teardown (count=1)`, exit 4. Root cause: the curl timeout on the teardown DELETE was 30s and the post-DELETE leak check was a single 10s sleep — but the DELETE handler runs the full GDPR Art. 17 cascade synchronously, including EC2 termination which AWS reports in 30–60s. Real-world wall time on a prod-shaped run was 57s on 2026-04-27 (hongmingwang DELETE); the 30s curl timeout aborted the request mid-cascade and the 10s post-sleep check found the row still present (status not yet 'purged'). Two-part fix to match real cascade timing: 1. DELETE curl gets its own --max-time 120 (was 30) so the synchronous cascade has room to complete in-band. 2. The leak check polls up to 60s for status='purged' instead of one rigid 10s sleep. Covers two cases: - DELETE returns 5xx mid-cascade but the cascade finishes anyway (we still observe a clean state). - DELETE legitimately exceeds 120s — eventual-consistency catches the eventual purge instead of false-flagging a leak. The 5–15s estimate in `molecule-controlplane/internal/handlers/ purge.go`'s comment is the API-call cost only, not the AWS-side time-to-termination it waits on. The async-purge refactor noted in that comment would let us drop these timeouts back to ~15s — file that under future work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>		2026-04-28 11:13:56 -07:00
..
e2e	fix(e2e): teardown patience matches prod cascade duration (~30–90s)	2026-04-28 11:13:56 -07:00
ops	ops: add Railway SHA-pin drift audit script + regression test (#2001 )	2026-04-27 05:01:23 -07:00
README.md	chore: final open-source cleanup — binary, stale paths, private refs	2026-04-18 00:38:55 -07:00

README.md

Tests

This repo uses the standard monorepo testing convention: unit tests live with their package, cross-component E2E tests live here.

Where to find tests

Scope	Location
Go unit + integration (platform, CLI, handlers)	`workspace-server/*/_test.go` — run with `cd workspace-server && go test -race ./...`
TypeScript unit (canvas components, hooks, store)	`canvas/src/**/__tests__/` — run with `cd canvas && npm test -- --run`
TypeScript unit (MCP server handlers)	`mcp-server/src/__tests__/` — run with `cd mcp-server && npx jest`
Python unit (workspace runtime, adapters)	`workspace/tests/` — run with `cd workspace && python3 -m pytest`
Python unit (SDK: plugin + remote agent)	`sdk/python/tests/` — run with `cd sdk/python && python3 -m pytest`
Cross-component E2E (spans platform + runtime + HTTP)	`tests/e2e/` ← you are here

Why split this way

Go requires co-located _test.go files to access unexported symbols.
Per-package test commands keep the inner loop fast — changing canvas doesn't re-run Go tests.
tests/e2e/ covers scenarios that no single package owns: a full workspace lifecycle, A2A across two provisioned agents, delegation chains, bundle round-trips.

Running E2E

Every E2E script here assumes the platform is running at localhost:8080 and (where noted) provisioned agents are online. See the header comment of each .sh for specifics.

Cleaning up rogue test workspaces

If an E2E run aborts before its teardown runs (Ctrl-C, crash, CI timeout), the platform can be left with workspaces whose config volume is stale or empty — Docker's unless-stopped restart policy then spins those containers in a FileNotFoundError loop. The platform's pre-flight check (#17) marks such workspaces failed on the next restart, but a manual cleanup is useful:

bash scripts/cleanup-rogue-workspaces.sh               # deletes ws with id/name starting aaaaaaaa-, bbbbbbbb-, cccccccc-, test-ws-
MOLECULE_URL=http://host:8080 bash scripts/cleanup-rogue-workspaces.sh

The script DELETEs each matching workspace via the API and force-removes the ws-<id[:12]> container as a belt-and-suspenders fallback.