molecule-core/scripts/README.md
Hongming Wang ef95628202 docs: registry pattern + harness scripts READMEs
Two docs covering load-bearing patterns from today's work that
weren't previously discoverable:

1. workspace/platform_tools/README.md — explains the ToolSpec
   single-source-of-truth pattern (#2240), the CLI-block alignment
   gap that hand-maintained generation can't close (#2258), the
   snapshot golden files + LF-pinning (#2260), and the add/rename/
   remove playbook. The next reader who lands in
   workspace/platform_tools/ now has the design rationale + the
   safe-edit procedure colocated with the code.

2. scripts/README.md — disambiguates the three measure-coordinator-
   task-bounds.sh files that now exist across two repos:

     - scripts/measure-coordinator-task-bounds.sh        (canonical OSS, this repo)
     - scripts/measure-coordinator-task-bounds-runner.sh (Hermes/MiniMax variant, this repo)
     - scripts/measure-coordinator-task-bounds.sh        (production-shape, in molecule-controlplane)

   Cross-references reference_harness_pair_pattern (auto-memory) for
   the cross-repo design rationale. Documents the common safety
   pattern (cleanup trap, DRY_RUN, non-target guard,
   cleanup_*_failed events) and the heartbeat-trace caveat.

Refs: #2240, #2254, #2257, #2258, #2259, #2260; molecule-controlplane#321.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 22:20:00 -07:00

48 lines
2.8 KiB
Markdown

# scripts/
Operational and one-off scripts for molecule-core. Most are
self-documenting — see the header comments in each file.
## RFC #2251 coordinator task-bound harnesses
There are three related scripts; pick the right one:
| Script | Purpose | Targets |
|---|---|---|
| `measure-coordinator-task-bounds.sh` | **Canonical** v1 harness for the RFC #2251 / Issue 4 reproduction. Provisions a PM coordinator + Researcher child via `claude-code-default` + `langgraph` templates, sends a synthesis-heavy A2A kickoff, observes elapsed time + heartbeat trace. | OSS-shape platform — localhost or any `/workspaces`-shaped endpoint. Has tenant/admin-token guards for non-localhost runs. |
| `measure-coordinator-task-bounds-runner.sh` | Generalised runner for the same measurement contract but with **arbitrary template + secret + model combinations** (Hermes/MiniMax, etc.). Useful for cross-runtime variants without modifying the canonical harness. | Same as above (local or SaaS via `MODE=saas`). |
| `measure-coordinator-task-bounds.sh` (in [molecule-controlplane](https://github.com/Molecule-AI/molecule-controlplane)) | **Production-shape** variant that bootstraps a real staging tenant via `POST /cp/admin/orgs`, then runs the same measurement against `<slug>.staging.moleculesai.app`. | Staging controlplane only — refuses to run against production. |
See `reference_harness_pair_pattern` (auto-memory) for when to use which
and the cross-repo design rationale.
### Common safety pattern across all three
- **Cleanup trap** on EXIT/INT/TERM auto-deletes provisioned resources.
- **`DRY_RUN=1`** prints plan + auth fingerprint, exits before any
state mutation. Run this before pointing at staging or any shared
infrastructure.
- **Non-target guard** refuses arbitrary endpoints (the controlplane
variant is locked to `staging-api.moleculesai.app`; the OSS variant
requires explicit auth + tenant scoping for non-localhost PLATFORM).
- **Cleanup failures emit `cleanup_*_failed` events** with remediation
hints; no silenced curl. ADMIN_TOKEN expiring mid-run surfaces as a
structured event rather than a silent leak.
### Heartbeat trace caveat
If `heartbeat_trace.raw == "<endpoint_unavailable>"`, the per-workspace
`/heartbeat-history` endpoint isn't wired on the target build — the
bound measurement is INCONCLUSIVE on the platform-ceiling question.
Either wire the endpoint or replace with the equivalent Datadog query.
## Other scripts
- `cleanup-rogue-workspaces.sh` — emergency teardown for leaked
workspaces. Prompts for confirmation. Pair with the harnesses if a
cleanup trap fails (see `cleanup_*_failed` events).
- `canary-smoke.sh` — quick smoke test for canary releases.
- `dev-start.sh` — local-dev platform bring-up.
The rest are self-documenting in their header comments.