The A2A e2e historically asserted only response SHAPE (test_a2a_e2e.sh
checked '"kind":"text"' only). A broken agent returns its error AS a
text part -- {"kind":"text","text":"Agent error (Exception) ..."} --
which STILL matches the shape check, so it PASSED on a fully broken
agent. That is why the 2026-05-2x drained-key / byok-misroute failures
(agents-team PM + reno marketing erroring on every LLM call) sailed
through CI. "Channel returns text shape" is not "agent completed an LLM
round-trip."
Adds, ADDITIVELY (no existing assertion weakened or removed):
- tests/e2e/lib/completion_assert.sh -- reusable gates:
* a2a_assert_real_completion: deterministic known-answer round-trip;
asserts CONTAINS the expected token AND NOT an error-as-text marker
(Agent error / Exception / error result / MISSING_BYOK_CREDENTIAL).
* provider_liveness_matrix + offered_platform_models_for_runtime:
per-offered-provider cheap (max_tokens:4) probe; the offered set is
read from the providers.yaml SSOT (runtimes.<rt>.providers[platform]
.models) -- not a hardcoded list -- so the matrix tracks the SSOT.
* assert_byok_not_platform_proxy: #1994 regression guard -- a
byok-resolving workspace must NOT resolve platform_managed (reads the
same derived resolver GET /admin/workspaces/:id/llm-billing-mode the
provision strip gate uses).
- tests/e2e/test_staging_full_saas.sh (the live-agent lane, MiniMax
primary): new stanzas 8b (PINEAPPLE known-answer, the core gate),
8c (byok-routing guard), 8d (SSOT-driven per-provider liveness matrix).
- tests/e2e/test_a2a_e2e.sh: added check_no_error_as_text on Echo + SEO
replies so the brief's literal shape-only example now FAILS on an
error-as-text payload.
- tests/e2e/test_completion_assert_unit.sh: offline fail-direction proof
(16 cases) that the negative gates are load-bearing -- error-as-text
MUST fail, platform_managed MUST trip the #1994 guard. Wired into
ci.yml "Run E2E bash unit tests (no live infra)" (required, per-PR +
main). e2e-staging-saas.yml paths filter extended to re-trigger the
live lane on lib changes.
No #1994 fix code touched -- tests/e2e + workflow wiring only.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Tests
This repo uses the standard monorepo testing convention: unit tests live with their package, cross-component E2E tests live here.
Where to find tests
| Scope | Location |
|---|---|
| Go unit + integration (platform, CLI, handlers) | workspace-server/**/*_test.go — run with cd workspace-server && go test -race ./... |
| TypeScript unit (canvas components, hooks, store) | canvas/src/**/__tests__/ — run with cd canvas && npm test -- --run |
| TypeScript unit (MCP server handlers) | mcp-server/src/__tests__/ — run with cd mcp-server && npx jest |
| Python unit (workspace runtime, adapters) | molecule-ai-workspace-runtime/tests/ in the standalone runtime repo |
| Python unit (SDK: plugin + remote agent) | sdk/python/tests/ — run with cd sdk/python && python3 -m pytest |
| Cross-component E2E (spans platform + runtime + HTTP) | tests/e2e/ ← you are here |
Why split this way
- Go requires co-located
_test.gofiles to access unexported symbols. - Per-package test commands keep the inner loop fast — changing canvas doesn't re-run Go tests.
tests/e2e/covers scenarios that no single package owns: a full workspace lifecycle, A2A across two provisioned agents, delegation chains, bundle round-trips.
Running E2E
Every E2E script here assumes the platform is running at localhost:8080 and (where noted) provisioned agents are online. See the header comment of each .sh for specifics.
Cleaning up rogue test workspaces
If an E2E run aborts before its teardown runs (Ctrl-C, crash, CI timeout),
the platform can be left with workspaces whose config volume is stale or
empty — Docker's unless-stopped restart policy then spins those
containers in a FileNotFoundError loop. The platform's pre-flight check
(#17) marks such workspaces failed on the next restart, but a manual
cleanup is useful:
bash scripts/cleanup-rogue-workspaces.sh # deletes ws with id/name starting aaaaaaaa-, bbbbbbbb-, cccccccc-, test-ws-
MOLECULE_URL=http://host:8080 bash scripts/cleanup-rogue-workspaces.sh
The script DELETEs each matching workspace via the API and
force-removes the ws-<id[:12]> container as a belt-and-suspenders
fallback.