molecule-core/scripts/README.md
Hongming Wang 00e4766046 docs: registry pattern + harness scripts READMEs
Two docs covering load-bearing patterns from today's work that
weren't previously discoverable:

1. workspace/platform_tools/README.md — explains the ToolSpec
   single-source-of-truth pattern (#2240), the CLI-block alignment
   gap that hand-maintained generation can't close (#2258), the
   snapshot golden files + LF-pinning (#2260), and the add/rename/
   remove playbook. The next reader who lands in
   workspace/platform_tools/ now has the design rationale + the
   safe-edit procedure colocated with the code.

2. scripts/README.md — disambiguates the three measure-coordinator-
   task-bounds.sh files that now exist across two repos:

     - scripts/measure-coordinator-task-bounds.sh        (canonical OSS, this repo)
     - scripts/measure-coordinator-task-bounds-runner.sh (Hermes/MiniMax variant, this repo)
     - scripts/measure-coordinator-task-bounds.sh        (production-shape, in molecule-controlplane)

   Cross-references reference_harness_pair_pattern (auto-memory) for
   the cross-repo design rationale. Documents the common safety
   pattern (cleanup trap, DRY_RUN, non-target guard,
   cleanup_*_failed events) and the heartbeat-trace caveat.

Refs: #2240, #2254, #2257, #2258, #2259, #2260; molecule-controlplane#321.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 22:19:40 -07:00

2.8 KiB

scripts/

Operational and one-off scripts for molecule-core. Most are self-documenting — see the header comments in each file.

RFC #2251 coordinator task-bound harnesses

There are three related scripts; pick the right one:

Script Purpose Targets
measure-coordinator-task-bounds.sh Canonical v1 harness for the RFC #2251 / Issue 4 reproduction. Provisions a PM coordinator + Researcher child via claude-code-default + langgraph templates, sends a synthesis-heavy A2A kickoff, observes elapsed time + heartbeat trace. OSS-shape platform — localhost or any /workspaces-shaped endpoint. Has tenant/admin-token guards for non-localhost runs.
measure-coordinator-task-bounds-runner.sh Generalised runner for the same measurement contract but with arbitrary template + secret + model combinations (Hermes/MiniMax, etc.). Useful for cross-runtime variants without modifying the canonical harness. Same as above (local or SaaS via MODE=saas).
measure-coordinator-task-bounds.sh (in molecule-controlplane) Production-shape variant that bootstraps a real staging tenant via POST /cp/admin/orgs, then runs the same measurement against <slug>.staging.moleculesai.app. Staging controlplane only — refuses to run against production.

See reference_harness_pair_pattern (auto-memory) for when to use which and the cross-repo design rationale.

Common safety pattern across all three

  • Cleanup trap on EXIT/INT/TERM auto-deletes provisioned resources.
  • DRY_RUN=1 prints plan + auth fingerprint, exits before any state mutation. Run this before pointing at staging or any shared infrastructure.
  • Non-target guard refuses arbitrary endpoints (the controlplane variant is locked to staging-api.moleculesai.app; the OSS variant requires explicit auth + tenant scoping for non-localhost PLATFORM).
  • Cleanup failures emit cleanup_*_failed events with remediation hints; no silenced curl. ADMIN_TOKEN expiring mid-run surfaces as a structured event rather than a silent leak.

Heartbeat trace caveat

If heartbeat_trace.raw == "<endpoint_unavailable>", the per-workspace /heartbeat-history endpoint isn't wired on the target build — the bound measurement is INCONCLUSIVE on the platform-ceiling question. Either wire the endpoint or replace with the equivalent Datadog query.

Other scripts

  • cleanup-rogue-workspaces.sh — emergency teardown for leaked workspaces. Prompts for confirmation. Pair with the harnesses if a cleanup trap fails (see cleanup_*_failed events).
  • canary-smoke.sh — quick smoke test for canary releases.
  • dev-start.sh — local-dev platform bring-up.

The rest are self-documenting in their header comments.