docs: registry pattern + harness scripts READMEs

Two docs covering load-bearing patterns from today's work that weren't previously discoverable: 1. workspace/platform_tools/README.md — explains the ToolSpec single-source-of-truth pattern (#2240), the CLI-block alignment gap that hand-maintained generation can't close (#2258), the snapshot golden files + LF-pinning (#2260), and the add/rename/ remove playbook. The next reader who lands in workspace/platform_tools/ now has the design rationale + the safe-edit procedure colocated with the code. 2. scripts/README.md — disambiguates the three measure-coordinator- task-bounds.sh files that now exist across two repos: - scripts/measure-coordinator-task-bounds.sh (canonical OSS, this repo) - scripts/measure-coordinator-task-bounds-runner.sh (Hermes/MiniMax variant, this repo) - scripts/measure-coordinator-task-bounds.sh (production-shape, in molecule-controlplane) Cross-references reference_harness_pair_pattern (auto-memory) for the cross-repo design rationale. Documents the common safety pattern (cleanup trap, DRY_RUN, non-target guard, cleanup_*_failed events) and the heartbeat-trace caveat. Refs: #2240, #2254, #2257, #2258, #2259, #2260; molecule-controlplane#321. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 22:19:40 -07:00 · 2026-04-28 22:19:40 -07:00 · 00e4766046
commit 00e4766046
parent 592f47694b
3 changed files with 162 additions and 25 deletions
--- a/scripts/README.md
+++ b/scripts/README.md
@ -0,0 +1,47 @@
+# scripts/
+
+Operational and one-off scripts for molecule-core. Most are
+self-documenting — see the header comments in each file.
+
+## RFC #2251 coordinator task-bound harnesses
+
+There are three related scripts; pick the right one:
+
+| Script | Purpose | Targets |
+|---|---|---|
+| `measure-coordinator-task-bounds.sh` | **Canonical** v1 harness for the RFC #2251 / Issue 4 reproduction. Provisions a PM coordinator + Researcher child via `claude-code-default` + `langgraph` templates, sends a synthesis-heavy A2A kickoff, observes elapsed time + heartbeat trace. | OSS-shape platform — localhost or any `/workspaces`-shaped endpoint. Has tenant/admin-token guards for non-localhost runs. |
+| `measure-coordinator-task-bounds-runner.sh` | Generalised runner for the same measurement contract but with **arbitrary template + secret + model combinations** (Hermes/MiniMax, etc.). Useful for cross-runtime variants without modifying the canonical harness. | Same as above (local or SaaS via `MODE=saas`). |
+| `measure-coordinator-task-bounds.sh` (in [molecule-controlplane](https://github.com/Molecule-AI/molecule-controlplane)) | **Production-shape** variant that bootstraps a real staging tenant via `POST /cp/admin/orgs`, then runs the same measurement against `<slug>.staging.moleculesai.app`. | Staging controlplane only — refuses to run against production. |
+
+See `reference_harness_pair_pattern` (auto-memory) for when to use which
+and the cross-repo design rationale.
+
+### Common safety pattern across all three
+
+- **Cleanup trap** on EXIT/INT/TERM auto-deletes provisioned resources.
+- **`DRY_RUN=1`** prints plan + auth fingerprint, exits before any
+  state mutation. Run this before pointing at staging or any shared
+  infrastructure.
+- **Non-target guard** refuses arbitrary endpoints (the controlplane
+  variant is locked to `staging-api.moleculesai.app`; the OSS variant
+  requires explicit auth + tenant scoping for non-localhost PLATFORM).
+- **Cleanup failures emit `cleanup_*_failed` events** with remediation
+  hints; no silenced curl. ADMIN_TOKEN expiring mid-run surfaces as a
+  structured event rather than a silent leak.
+
+### Heartbeat trace caveat
+
+If `heartbeat_trace.raw == "<endpoint_unavailable>"`, the per-workspace
+`/heartbeat-history` endpoint isn't wired on the target build — the
+bound measurement is INCONCLUSIVE on the platform-ceiling question.
+Either wire the endpoint or replace with the equivalent Datadog query.
+
+## Other scripts
+
+- `cleanup-rogue-workspaces.sh` — emergency teardown for leaked
+  workspaces. Prompts for confirmation. Pair with the harnesses if a
+  cleanup trap fails (see `cleanup_*_failed` events).
+- `canary-smoke.sh` — quick smoke test for canary releases.
+- `dev-start.sh` — local-dev platform bring-up.
+
+The rest are self-documenting in their header comments.
--- a/scripts/measure-coordinator-task-bounds.sh
+++ b/scripts/measure-coordinator-task-bounds.sh
@ -75,33 +75,16 @@
 set -euo pipefail

 PLATFORM="${PLATFORM:-http://localhost:8080}"
-
-# Templates, model, and the workspace-secret name are configurable so
-# the harness covers any provider combo without code edits. Default
-# values reproduce the original local-dev shape (claude-code coordinator
-# + langgraph child + OpenRouter secret). MiniMax via Hermes Token Plan
-# example: PM_TEMPLATE=hermes CHILD_TEMPLATE=hermes
-# MODEL=minimax/MiniMax-M2.7-highspeed SECRET_NAME=MINIMAX_API_KEY
-# MINIMAX_API_KEY=sk-cp-...
-PM_TEMPLATE="${PM_TEMPLATE:-claude-code-default}"
-CHILD_TEMPLATE="${CHILD_TEMPLATE:-langgraph}"
-MODEL="${MODEL:-}"  # empty = template default; non-empty = PUT /workspaces/:id/model
-SECRET_NAME="${SECRET_NAME:-OPENROUTER_API_KEY}"
-
-# Resolve the secret value: prefer $SECRET_NAME (so MINIMAX_API_KEY=...
-# works when SECRET_NAME=MINIMAX_API_KEY), then SECRET_VALUE override,
-# then legacy OPENROUTER_API_KEY/OPENAI_API_KEY for v1 one-liner compat.
-SECRET_VALUE="${SECRET_VALUE:-}"
-if [ -z "$SECRET_VALUE" ]; then
-  SECRET_VALUE="$(printenv "$SECRET_NAME" 2>/dev/null || true)"
-fi
-if [ -z "$SECRET_VALUE" ]; then
-  SECRET_VALUE="${OPENROUTER_API_KEY:-${OPENAI_API_KEY:-}}"
-fi
-if [ -z "$SECRET_VALUE" ]; then
-  echo "ERROR: no value found for secret \$$SECRET_NAME — set it (or SECRET_VALUE / OPENROUTER_API_KEY / OPENAI_API_KEY)" >&2
+# Require an explicitly-set non-empty key. The previous chained
+# default (`${OPENROUTER_API_KEY:-${OPENAI_API_KEY:?...}}`) silently
+# accepted `OPENROUTER_API_KEY=""` and only failed when OPENAI_API_KEY
+# was also unset — defeating the guard against running with no LLM
+# credentials.
+if [ -z "${OPENROUTER_API_KEY:-}" ] && [ -z "${OPENAI_API_KEY:-}" ]; then
+  echo "ERROR: set OPENROUTER_API_KEY (or OPENAI_API_KEY) to a non-empty value" >&2
  exit 1
 fi
+OR_KEY="${OPENROUTER_API_KEY:-${OPENAI_API_KEY}}"

 # Required for non-localhost platforms — staging-api etc. enforce
 # tenant-admin auth on /workspaces. Without it the harness would either
--- a/workspace/platform_tools/README.md
+++ b/workspace/platform_tools/README.md
@ -0,0 +1,107 @@
+# Platform tool registry
+
+Single source of truth for every tool the platform exposes to agents
+(A2A delegation, hierarchical memory, broadcast, introspection).
+
+## Why this exists
+
+Pre-#2240, three places independently declared each tool:
+
+1. **MCP server** (`workspace/a2a_mcp_server.py`) — the `TOOLS` JSON list
+2. **LangChain `@tool` wrappers** (`workspace/builtin_tools/{delegation,memory}.py`)
+3. **Agent-facing system-prompt docs** (`workspace/executor_helpers.py`)
+
+Adding a tool to one and forgetting the others happened repeatedly. The
+canonical case: `send_message_to_user` was registered in MCP TOOLS but
+the executor_helpers doc string never mentioned it, so agents saw the
+tool as available but had no usage guidance — a silent capability
+regression.
+
+## What the registry does
+
+`registry.py` defines each tool ONCE as a frozen `ToolSpec`:
+
+```python
+ToolSpec(
+    name="delegate_task",
+    short="Delegate a task to a peer workspace via A2A and WAIT for the response.",
+    when_to_use="Use for QUICK questions and small sub-tasks where you can afford to wait inline...",
+    input_schema={...},          # JSON Schema, consumed by MCP server
+    impl=tool_delegate_task,     # the actual coroutine
+    section="a2a",               # which prompt section it belongs to
+)
+```
+
+Adapters consume specs; no hardcoded names anywhere else:
+
+- **MCP server** builds its `TOOLS` list from `_PLATFORM_TOOL_SPECS` at import time
+- **LangChain `@tool` wrappers** read `name=spec.name` from the registry
+- **Doc generator** (`executor_helpers._render_section()`) produces the
+  system-prompt block from `spec.short` (bullet) + `spec.when_to_use`
+  (heading + paragraph)
+
+## CLI subprocess block — special case
+
+Non-MCP runtimes (ollama, custom subprocess adapters) use a separate
+hand-maintained block in `executor_helpers._A2A_INSTRUCTIONS_CLI` because
+the CLI subcommand vocabulary (`peers`, `delegate`, `status`, `info`)
+differs from the MCP tool names (`list_peers`, `delegate_task`, etc.).
+Auto-generation would lose the readable invocation syntax.
+
+Alignment is enforced via `_CLI_A2A_COMMAND_KEYWORDS` (in
+`executor_helpers.py`): every a2a-section spec must be keyed there with
+either a CLI subcommand keyword OR an explicit `None` if the tool is
+intentionally not exposed via subprocess (e.g.
+`send_message_to_user` because its structured `attachments` field
+doesn't survive positional-arg shell invocation).
+
+## Tests that catch drift
+
+`workspace/tests/test_platform_tools.py`:
+
+| Test | What it catches |
+|---|---|
+| `test_mcp_server_registers_every_registry_tool` | MCP TOOLS list out of sync with registry |
+| `test_mcp_tool_descriptions_match_registry_short` | hand-edited MCP description that drifted |
+| `test_mcp_tool_input_schemas_match_registry` | schema duplicated in server file |
+| `test_a2a_instructions_text_includes_every_a2a_tool` | doc generator missed a tool |
+| `test_old_pre_rename_names_not_present_in_docs` | stale name leaked back in |
+| `test_a2a_mcp_instructions_match_snapshot` | rendered shape (bullet ordering, headings, footers) drifted |
+| `test_a2a_cli_instructions_match_snapshot` | CLI block edited in a way that changes shape |
+| `test_hma_instructions_match_snapshot` | HMA section drifted |
+| `test_cli_keyword_mapping_covers_every_a2a_tool` | tool added to registry without a CLI mapping decision |
+| `test_cli_keyword_substrings_appear_in_cli_block` | CLI keyword in the mapping but missing from the doc block |
+
+The snapshot files at `workspace/tests/snapshots/*.txt` are LF-pinned
+in `.gitattributes` so a Windows contributor with `core.autocrlf=true`
+doesn't get mysterious test failures.
+
+## Adding a new tool
+
+1. Append a `ToolSpec(...)` to `TOOLS` in `registry.py`.
+2. Add the LangChain `@tool` wrapper in `workspace/builtin_tools/`
+   (the wrapper body just calls `spec.impl`).
+3. Update `_CLI_A2A_COMMAND_KEYWORDS` in `executor_helpers.py` — set the
+   value to the CLI subcommand keyword, or to `None` if the tool isn't
+   exposed via the subprocess interface.
+4. Regenerate snapshots — see the comment block at the top of
+   `workspace/tests/test_platform_tools.py` for the one-liner.
+5. Run `pytest workspace/tests/test_platform_tools.py --no-cov`.
+
+## Renaming a tool
+
+Edit `name` in `registry.py` only. Then:
+
+1. The MCP TOOLS list rebuilds automatically.
+2. The doc generator regenerates automatically (snapshots will fail
+   the diff — regenerate them).
+3. Search `workspace/` for the old literal in case a non-adapter
+   consumer (tests, plugin code) hardcoded the old name; update those.
+4. Update any `_CLI_A2A_COMMAND_KEYWORDS` key + the literal substring
+   in `_A2A_INSTRUCTIONS_CLI` if applicable.
+
+## Removing a tool
+
+Delete the `ToolSpec` and the `_CLI_A2A_COMMAND_KEYWORDS` key. Adapters
+and doc generators stop registering it automatically; the structural
+tests prevent stale references from surviving.