molecule-core/tests
Hongming Wang fa9e29f2f5 fix(canary): reframe smoke prompt to give GPT-4o explicit permission to echo
Canary started flaking 2026-05-01 22:11 with model-refusal replies:
  - "I'm unable to do that."
  - "I'm unable to fulfill that request. Can I assist you with anything else?"
  - "I'm unable to reply with responses that don't allow me to fulfill tasks…"
3 fails / 10 recent runs ≈ 30% flake.

Trigger: 2026-04-30's Platform Capabilities preamble (#2332) added the
directive "Use them proactively" to the top of every system prompt.
Combined with the heavy A2A + HMA tool docs further down, the model
reads the contrived bare-echo prompt ("Reply with exactly: PONG") as
out-of-role and intermittently refuses.

Real user prompts don't hit this — only the synthetic smoke prompt does,
so the right fix is in the canary's prompt phrasing, not the platform's
system prompt (which is correctly priming agents toward tool use). New
phrasing explicitly tells the model "this is a smoke test" and "no
tools or memory are needed" so it has permission to comply.

Also updates the child workspace's CHILD_PONG prompt with the same
framing — same failure mode would have hit it once full-mode runs again.

No code change to system prompt, no test infra change. Just two prompt
strings + a load-bearing comment so future readers don't trim back to
the brittle phrasing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 23:53:24 -07:00
..
e2e fix(canary): reframe smoke prompt to give GPT-4o explicit permission to echo 2026-05-01 23:53:24 -07:00
harness harness(phase-2-followup): fix assert_status mislabel + honest race comment 2026-05-01 22:00:04 -07:00
ops ops: add Railway SHA-pin drift audit script + regression test (#2001) 2026-04-27 05:01:23 -07:00
README.md chore: final open-source cleanup — binary, stale paths, private refs 2026-04-18 00:38:55 -07:00

Tests

This repo uses the standard monorepo testing convention: unit tests live with their package, cross-component E2E tests live here.

Where to find tests

Scope Location
Go unit + integration (platform, CLI, handlers) workspace-server/**/*_test.go — run with cd workspace-server && go test -race ./...
TypeScript unit (canvas components, hooks, store) canvas/src/**/__tests__/ — run with cd canvas && npm test -- --run
TypeScript unit (MCP server handlers) mcp-server/src/__tests__/ — run with cd mcp-server && npx jest
Python unit (workspace runtime, adapters) workspace/tests/ — run with cd workspace && python3 -m pytest
Python unit (SDK: plugin + remote agent) sdk/python/tests/ — run with cd sdk/python && python3 -m pytest
Cross-component E2E (spans platform + runtime + HTTP) tests/e2e/you are here

Why split this way

  • Go requires co-located _test.go files to access unexported symbols.
  • Per-package test commands keep the inner loop fast — changing canvas doesn't re-run Go tests.
  • tests/e2e/ covers scenarios that no single package owns: a full workspace lifecycle, A2A across two provisioned agents, delegation chains, bundle round-trips.

Running E2E

Every E2E script here assumes the platform is running at localhost:8080 and (where noted) provisioned agents are online. See the header comment of each .sh for specifics.

Cleaning up rogue test workspaces

If an E2E run aborts before its teardown runs (Ctrl-C, crash, CI timeout), the platform can be left with workspaces whose config volume is stale or empty — Docker's unless-stopped restart policy then spins those containers in a FileNotFoundError loop. The platform's pre-flight check (#17) marks such workspaces failed on the next restart, but a manual cleanup is useful:

bash scripts/cleanup-rogue-workspaces.sh               # deletes ws with id/name starting aaaaaaaa-, bbbbbbbb-, cccccccc-, test-ws-
MOLECULE_URL=http://host:8080 bash scripts/cleanup-rogue-workspaces.sh

The script DELETEs each matching workspace via the API and force-removes the ws-<id[:12]> container as a belt-and-suspenders fallback.