forked from molecule-ai/molecule-core
docs: registry pattern + harness scripts READMEs
Two docs covering load-bearing patterns from today's work that weren't previously discoverable: 1. workspace/platform_tools/README.md — explains the ToolSpec single-source-of-truth pattern (#2240), the CLI-block alignment gap that hand-maintained generation can't close (#2258), the snapshot golden files + LF-pinning (#2260), and the add/rename/ remove playbook. The next reader who lands in workspace/platform_tools/ now has the design rationale + the safe-edit procedure colocated with the code. 2. scripts/README.md — disambiguates the three measure-coordinator- task-bounds.sh files that now exist across two repos: - scripts/measure-coordinator-task-bounds.sh (canonical OSS, this repo) - scripts/measure-coordinator-task-bounds-runner.sh (Hermes/MiniMax variant, this repo) - scripts/measure-coordinator-task-bounds.sh (production-shape, in molecule-controlplane) Cross-references reference_harness_pair_pattern (auto-memory) for the cross-repo design rationale. Documents the common safety pattern (cleanup trap, DRY_RUN, non-target guard, cleanup_*_failed events) and the heartbeat-trace caveat. Refs: #2240, #2254, #2257, #2258, #2259, #2260; molecule-controlplane#321. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
592f47694b
commit
00e4766046
47
scripts/README.md
Normal file
47
scripts/README.md
Normal file
@ -0,0 +1,47 @@
|
||||
# scripts/
|
||||
|
||||
Operational and one-off scripts for molecule-core. Most are
|
||||
self-documenting — see the header comments in each file.
|
||||
|
||||
## RFC #2251 coordinator task-bound harnesses
|
||||
|
||||
There are three related scripts; pick the right one:
|
||||
|
||||
| Script | Purpose | Targets |
|
||||
|---|---|---|
|
||||
| `measure-coordinator-task-bounds.sh` | **Canonical** v1 harness for the RFC #2251 / Issue 4 reproduction. Provisions a PM coordinator + Researcher child via `claude-code-default` + `langgraph` templates, sends a synthesis-heavy A2A kickoff, observes elapsed time + heartbeat trace. | OSS-shape platform — localhost or any `/workspaces`-shaped endpoint. Has tenant/admin-token guards for non-localhost runs. |
|
||||
| `measure-coordinator-task-bounds-runner.sh` | Generalised runner for the same measurement contract but with **arbitrary template + secret + model combinations** (Hermes/MiniMax, etc.). Useful for cross-runtime variants without modifying the canonical harness. | Same as above (local or SaaS via `MODE=saas`). |
|
||||
| `measure-coordinator-task-bounds.sh` (in [molecule-controlplane](https://github.com/Molecule-AI/molecule-controlplane)) | **Production-shape** variant that bootstraps a real staging tenant via `POST /cp/admin/orgs`, then runs the same measurement against `<slug>.staging.moleculesai.app`. | Staging controlplane only — refuses to run against production. |
|
||||
|
||||
See `reference_harness_pair_pattern` (auto-memory) for when to use which
|
||||
and the cross-repo design rationale.
|
||||
|
||||
### Common safety pattern across all three
|
||||
|
||||
- **Cleanup trap** on EXIT/INT/TERM auto-deletes provisioned resources.
|
||||
- **`DRY_RUN=1`** prints plan + auth fingerprint, exits before any
|
||||
state mutation. Run this before pointing at staging or any shared
|
||||
infrastructure.
|
||||
- **Non-target guard** refuses arbitrary endpoints (the controlplane
|
||||
variant is locked to `staging-api.moleculesai.app`; the OSS variant
|
||||
requires explicit auth + tenant scoping for non-localhost PLATFORM).
|
||||
- **Cleanup failures emit `cleanup_*_failed` events** with remediation
|
||||
hints; no silenced curl. ADMIN_TOKEN expiring mid-run surfaces as a
|
||||
structured event rather than a silent leak.
|
||||
|
||||
### Heartbeat trace caveat
|
||||
|
||||
If `heartbeat_trace.raw == "<endpoint_unavailable>"`, the per-workspace
|
||||
`/heartbeat-history` endpoint isn't wired on the target build — the
|
||||
bound measurement is INCONCLUSIVE on the platform-ceiling question.
|
||||
Either wire the endpoint or replace with the equivalent Datadog query.
|
||||
|
||||
## Other scripts
|
||||
|
||||
- `cleanup-rogue-workspaces.sh` — emergency teardown for leaked
|
||||
workspaces. Prompts for confirmation. Pair with the harnesses if a
|
||||
cleanup trap fails (see `cleanup_*_failed` events).
|
||||
- `canary-smoke.sh` — quick smoke test for canary releases.
|
||||
- `dev-start.sh` — local-dev platform bring-up.
|
||||
|
||||
The rest are self-documenting in their header comments.
|
||||
@ -75,33 +75,16 @@
|
||||
set -euo pipefail
|
||||
|
||||
PLATFORM="${PLATFORM:-http://localhost:8080}"
|
||||
|
||||
# Templates, model, and the workspace-secret name are configurable so
|
||||
# the harness covers any provider combo without code edits. Default
|
||||
# values reproduce the original local-dev shape (claude-code coordinator
|
||||
# + langgraph child + OpenRouter secret). MiniMax via Hermes Token Plan
|
||||
# example: PM_TEMPLATE=hermes CHILD_TEMPLATE=hermes
|
||||
# MODEL=minimax/MiniMax-M2.7-highspeed SECRET_NAME=MINIMAX_API_KEY
|
||||
# MINIMAX_API_KEY=sk-cp-...
|
||||
PM_TEMPLATE="${PM_TEMPLATE:-claude-code-default}"
|
||||
CHILD_TEMPLATE="${CHILD_TEMPLATE:-langgraph}"
|
||||
MODEL="${MODEL:-}" # empty = template default; non-empty = PUT /workspaces/:id/model
|
||||
SECRET_NAME="${SECRET_NAME:-OPENROUTER_API_KEY}"
|
||||
|
||||
# Resolve the secret value: prefer $SECRET_NAME (so MINIMAX_API_KEY=...
|
||||
# works when SECRET_NAME=MINIMAX_API_KEY), then SECRET_VALUE override,
|
||||
# then legacy OPENROUTER_API_KEY/OPENAI_API_KEY for v1 one-liner compat.
|
||||
SECRET_VALUE="${SECRET_VALUE:-}"
|
||||
if [ -z "$SECRET_VALUE" ]; then
|
||||
SECRET_VALUE="$(printenv "$SECRET_NAME" 2>/dev/null || true)"
|
||||
fi
|
||||
if [ -z "$SECRET_VALUE" ]; then
|
||||
SECRET_VALUE="${OPENROUTER_API_KEY:-${OPENAI_API_KEY:-}}"
|
||||
fi
|
||||
if [ -z "$SECRET_VALUE" ]; then
|
||||
echo "ERROR: no value found for secret \$$SECRET_NAME — set it (or SECRET_VALUE / OPENROUTER_API_KEY / OPENAI_API_KEY)" >&2
|
||||
# Require an explicitly-set non-empty key. The previous chained
|
||||
# default (`${OPENROUTER_API_KEY:-${OPENAI_API_KEY:?...}}`) silently
|
||||
# accepted `OPENROUTER_API_KEY=""` and only failed when OPENAI_API_KEY
|
||||
# was also unset — defeating the guard against running with no LLM
|
||||
# credentials.
|
||||
if [ -z "${OPENROUTER_API_KEY:-}" ] && [ -z "${OPENAI_API_KEY:-}" ]; then
|
||||
echo "ERROR: set OPENROUTER_API_KEY (or OPENAI_API_KEY) to a non-empty value" >&2
|
||||
exit 1
|
||||
fi
|
||||
OR_KEY="${OPENROUTER_API_KEY:-${OPENAI_API_KEY}}"
|
||||
|
||||
# Required for non-localhost platforms — staging-api etc. enforce
|
||||
# tenant-admin auth on /workspaces. Without it the harness would either
|
||||
|
||||
107
workspace/platform_tools/README.md
Normal file
107
workspace/platform_tools/README.md
Normal file
@ -0,0 +1,107 @@
|
||||
# Platform tool registry
|
||||
|
||||
Single source of truth for every tool the platform exposes to agents
|
||||
(A2A delegation, hierarchical memory, broadcast, introspection).
|
||||
|
||||
## Why this exists
|
||||
|
||||
Pre-#2240, three places independently declared each tool:
|
||||
|
||||
1. **MCP server** (`workspace/a2a_mcp_server.py`) — the `TOOLS` JSON list
|
||||
2. **LangChain `@tool` wrappers** (`workspace/builtin_tools/{delegation,memory}.py`)
|
||||
3. **Agent-facing system-prompt docs** (`workspace/executor_helpers.py`)
|
||||
|
||||
Adding a tool to one and forgetting the others happened repeatedly. The
|
||||
canonical case: `send_message_to_user` was registered in MCP TOOLS but
|
||||
the executor_helpers doc string never mentioned it, so agents saw the
|
||||
tool as available but had no usage guidance — a silent capability
|
||||
regression.
|
||||
|
||||
## What the registry does
|
||||
|
||||
`registry.py` defines each tool ONCE as a frozen `ToolSpec`:
|
||||
|
||||
```python
|
||||
ToolSpec(
|
||||
name="delegate_task",
|
||||
short="Delegate a task to a peer workspace via A2A and WAIT for the response.",
|
||||
when_to_use="Use for QUICK questions and small sub-tasks where you can afford to wait inline...",
|
||||
input_schema={...}, # JSON Schema, consumed by MCP server
|
||||
impl=tool_delegate_task, # the actual coroutine
|
||||
section="a2a", # which prompt section it belongs to
|
||||
)
|
||||
```
|
||||
|
||||
Adapters consume specs; no hardcoded names anywhere else:
|
||||
|
||||
- **MCP server** builds its `TOOLS` list from `_PLATFORM_TOOL_SPECS` at import time
|
||||
- **LangChain `@tool` wrappers** read `name=spec.name` from the registry
|
||||
- **Doc generator** (`executor_helpers._render_section()`) produces the
|
||||
system-prompt block from `spec.short` (bullet) + `spec.when_to_use`
|
||||
(heading + paragraph)
|
||||
|
||||
## CLI subprocess block — special case
|
||||
|
||||
Non-MCP runtimes (ollama, custom subprocess adapters) use a separate
|
||||
hand-maintained block in `executor_helpers._A2A_INSTRUCTIONS_CLI` because
|
||||
the CLI subcommand vocabulary (`peers`, `delegate`, `status`, `info`)
|
||||
differs from the MCP tool names (`list_peers`, `delegate_task`, etc.).
|
||||
Auto-generation would lose the readable invocation syntax.
|
||||
|
||||
Alignment is enforced via `_CLI_A2A_COMMAND_KEYWORDS` (in
|
||||
`executor_helpers.py`): every a2a-section spec must be keyed there with
|
||||
either a CLI subcommand keyword OR an explicit `None` if the tool is
|
||||
intentionally not exposed via subprocess (e.g.
|
||||
`send_message_to_user` because its structured `attachments` field
|
||||
doesn't survive positional-arg shell invocation).
|
||||
|
||||
## Tests that catch drift
|
||||
|
||||
`workspace/tests/test_platform_tools.py`:
|
||||
|
||||
| Test | What it catches |
|
||||
|---|---|
|
||||
| `test_mcp_server_registers_every_registry_tool` | MCP TOOLS list out of sync with registry |
|
||||
| `test_mcp_tool_descriptions_match_registry_short` | hand-edited MCP description that drifted |
|
||||
| `test_mcp_tool_input_schemas_match_registry` | schema duplicated in server file |
|
||||
| `test_a2a_instructions_text_includes_every_a2a_tool` | doc generator missed a tool |
|
||||
| `test_old_pre_rename_names_not_present_in_docs` | stale name leaked back in |
|
||||
| `test_a2a_mcp_instructions_match_snapshot` | rendered shape (bullet ordering, headings, footers) drifted |
|
||||
| `test_a2a_cli_instructions_match_snapshot` | CLI block edited in a way that changes shape |
|
||||
| `test_hma_instructions_match_snapshot` | HMA section drifted |
|
||||
| `test_cli_keyword_mapping_covers_every_a2a_tool` | tool added to registry without a CLI mapping decision |
|
||||
| `test_cli_keyword_substrings_appear_in_cli_block` | CLI keyword in the mapping but missing from the doc block |
|
||||
|
||||
The snapshot files at `workspace/tests/snapshots/*.txt` are LF-pinned
|
||||
in `.gitattributes` so a Windows contributor with `core.autocrlf=true`
|
||||
doesn't get mysterious test failures.
|
||||
|
||||
## Adding a new tool
|
||||
|
||||
1. Append a `ToolSpec(...)` to `TOOLS` in `registry.py`.
|
||||
2. Add the LangChain `@tool` wrapper in `workspace/builtin_tools/`
|
||||
(the wrapper body just calls `spec.impl`).
|
||||
3. Update `_CLI_A2A_COMMAND_KEYWORDS` in `executor_helpers.py` — set the
|
||||
value to the CLI subcommand keyword, or to `None` if the tool isn't
|
||||
exposed via the subprocess interface.
|
||||
4. Regenerate snapshots — see the comment block at the top of
|
||||
`workspace/tests/test_platform_tools.py` for the one-liner.
|
||||
5. Run `pytest workspace/tests/test_platform_tools.py --no-cov`.
|
||||
|
||||
## Renaming a tool
|
||||
|
||||
Edit `name` in `registry.py` only. Then:
|
||||
|
||||
1. The MCP TOOLS list rebuilds automatically.
|
||||
2. The doc generator regenerates automatically (snapshots will fail
|
||||
the diff — regenerate them).
|
||||
3. Search `workspace/` for the old literal in case a non-adapter
|
||||
consumer (tests, plugin code) hardcoded the old name; update those.
|
||||
4. Update any `_CLI_A2A_COMMAND_KEYWORDS` key + the literal substring
|
||||
in `_A2A_INSTRUCTIONS_CLI` if applicable.
|
||||
|
||||
## Removing a tool
|
||||
|
||||
Delete the `ToolSpec` and the `_CLI_A2A_COMMAND_KEYWORDS` key. Adapters
|
||||
and doc generators stop registering it automatically; the structural
|
||||
tests prevent stale references from surviving.
|
||||
Loading…
Reference in New Issue
Block a user