forked from molecule-ai/molecule-core
Adds a reproduction harness for Issue 4 of the 2026-04-28 CP review, referenced in RFC molecule-core#2251. The RFC review (issue #2251 comment) flagged that Issue 4 was hypothesized but not reproduced before V1.0 implementation begins — this script closes that gap. What it does: - Provisions a coordinator (PM, claude-code-default) + 1 child (Researcher, langgraph) via the platform API. - Sends an A2A kickoff with a synthesis-heavy task that requires SYNTHESIS_DEPTH (default 3) sequential delegations followed by a 600-word post-delegation synthesis. - Times the coordinator's full A2A round-trip with millisecond precision and emits one JSON event per phase (machine-readable). - Pulls the coordinator's heartbeat trace post-run so the team can see whether any platform-side state transition fired during the long synthesis (the V1.0 RFC's MAX_TASK_EXECUTION_SECS would surface as such a transition; absence of one in this trace confirms the RFC's premise). Why a measurement harness, not a pass/fail test: Issue 4's claim is "absence of platform-side bound", which is hard to assert in a single CI run. Outputting structured measurement data lets the team interpret across multiple runs / staging vs prod / different SYNTHESIS_DEPTH values rather than relying on one reproduction snapshot. The script's header has the full interpretation guide: - ELAPSED < 60s → not informative (LLM was just fast) - 60–300s → within DELEGATION_TIMEOUT, ambiguous - >= 300s without trace transitions → BUG CONFIRMED - curl_failed → coordinator hung past A2A_TIMEOUT or genuinely slow (disambiguate by querying status separately) Doesn't run in CI by default — invoked manually against staging or a local platform with PLATFORM=... and OPENROUTER_API_KEY=... env vars. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| ops | ||
| build_runtime_package.py | ||
| build-images.sh | ||
| bundle-compile.sh | ||
| canary-smoke.sh | ||
| cleanup-rogue-workspaces.sh | ||
| clone-manifest.sh | ||
| dev-start.sh | ||
| import-agent.sh | ||
| lockdown-tenant-sg.sh | ||
| measure-coordinator-task-bounds.sh | ||
| nuke-and-rebuild.sh | ||
| post-rebuild-setup.sh | ||
| refresh-workspace-images.sh | ||
| rollback-latest.sh | ||
| test-a2a-cross-runtime.sh | ||
| test-all-adapters.sh | ||
| test-all.sh | ||
| test-cross-agent-chat.sh | ||
| test-nuke-and-rebuild.sh | ||
| test-team-e2e.sh | ||