molecule-core/scripts
Hongming Wang f01f374072 feat(mcp): add molecule-mcp doctor onboarding diagnostic
Closes #2934 item 6 — the deferred follow-up from Ryan's onboarding-
friction report. Quote: "this single command would have saved me
30 of the 45 minutes."

When push delivery fails or the install half-works, the operator
today has no signal — they hand-grep the Claude Code binary or
chase the `from versions: none` red herring. Doctor renders six
checks in one screen with concrete next-step suggestions:

  1. Python version    >=3.11? (wheel's pin)
  2. Wheel install     molecule-ai-workspace-runtime importable +
                        version surfaced
  3. PATH for binary   `molecule-mcp` resolves on PATH; if not,
                        prints the resolved user-site bin dir to
                        add (or recommends pipx)
  4. Env vars          PLATFORM_URL + WORKSPACE_ID + token (env or
                        *_FILE or .auth_token)
  5. Platform reach    GET ${PLATFORM_URL}/healthz returns 2xx
  6. Registry register POST /registry/register with the resolved
                        token returns 2xx — end-to-end auth check

Each line: `[OK|WARN|FAIL] <label>: <status>` plus a `next:` hint
when not OK. ANSI colors auto-disable on non-TTY / NO_COLOR.

Exit code: 0 on all-OK or only-WARN, 1 on any FAIL — scriptable
from CI install-checks.

## Files

`workspace/mcp_doctor.py`  (new) — six check functions + `run()`
                                   entry point. Uses urllib (stdlib)
                                   so doctor works even on a partial
                                   install where `requests` is missing.

`workspace/mcp_cli.py`             Subcommand dispatch:
                                     molecule-mcp doctor   → mcp_doctor.run()
                                     molecule-mcp --help   → usage banner
                                     molecule-mcp          → server (unchanged)

`workspace/tests/test_mcp_doctor.py`  (new) — 10 tests covering each
                                       check's pass/fail/skip path
                                       plus the end-to-end exit-code
                                       contract on a stripped env.

`scripts/build_runtime_package.py`    Adds `mcp_doctor` to
                                       TOP_LEVEL_MODULES so the
                                       wheel ships the new module.

## Out of scope (deferred follow-ups)
- Claude Code-specific checks (parse ~/.claude.json, verify each
  MCP entry is plugin-sourced + dev-channels flag set). That's a
  separate Claude-Code-shaped doctor; lives in the channel plugin.
- Automated remediation. Doctor is diagnostic — tells the operator
  what's wrong + how to fix it, doesn't apply changes.

## Verification
  - python -m pytest tests/test_mcp_doctor.py -v   → 10/10 PASS
  - python -m pytest tests/test_mcp_cli*.py        → 67/67 PASS
    (existing CLI suite still green; subcommand dispatch added
    before env-validation, doesn't disturb the server-boot path)
  - manual: `molecule-mcp doctor` on a stripped env renders 4 FAIL
    + 2 WARN + exit code 1, with each `next:` hint actionable

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 15:44:36 -07:00
..
demo-freeze-snapshots ops: demo-day freeze + rollback runbook 2026-05-01 12:04:30 -07:00
ops feat(ops): add sweep-aws-secrets janitor — orphan tenant bootstrap secrets 2026-05-03 02:38:08 -07:00
build_runtime_package.py feat(mcp): add molecule-mcp doctor onboarding diagnostic 2026-05-05 15:44:36 -07:00
build-images.sh initial commit — Molecule AI platform 2026-04-13 11:55:37 -07:00
bundle-compile.sh initial commit — Molecule AI platform 2026-04-13 11:55:37 -07:00
canary-smoke.sh feat(canary): smoke harness + GHA verification workflow (Phase 2) 2026-04-19 03:30:19 -07:00
check-cascade-list-vs-manifest.sh feat(ci): structural drift gate for cascade list vs manifest (RFC #388 PR-3) 2026-05-03 03:52:39 -07:00
cleanup-rogue-workspaces.sh fix(provisioner): stop rogue config-missing restart loop (#17) 2026-04-14 07:32:58 -07:00
clone-manifest.sh fix(quickstart): wire up template/plugin registry via manifest.json 2026-04-23 14:55:34 -07:00
demo-day-runbook.md ops: demo-day freeze + rollback runbook 2026-05-01 12:04:30 -07:00
demo-freeze.sh ops: demo-day freeze + rollback runbook 2026-05-01 12:04:30 -07:00
demo-thaw.sh ops: demo-day freeze + rollback runbook 2026-05-01 12:04:30 -07:00
dev-start.sh fix(dev-start): detect missing Go and fall back to docker-compose platform 2026-04-29 20:04:37 -07:00
import-agent.sh initial commit — Molecule AI platform 2026-04-13 11:55:37 -07:00
lockdown-tenant-sg.sh feat(security): Phase 35.1 — SG lockdown script for tenant EC2 instances 2026-04-18 12:01:41 -07:00
measure-coordinator-task-bounds-runner.sh fix(harness-runner): switch from non-existent /heartbeat-history to /activity 2026-04-28 23:12:51 -07:00
measure-coordinator-task-bounds.sh docs: registry pattern + harness scripts READMEs 2026-04-28 22:19:40 -07:00
nuke-and-rebuild.sh fix(scripts): nuke-and-rebuild self-bootstraps templates; add E2E test 2026-04-26 14:37:04 -07:00
post-rebuild-setup.sh security: remove hardcoded API keys from post-rebuild-setup.sh 2026-04-20 13:02:52 -07:00
README.md docs(scripts): rename /heartbeat-history → /activity in README 2026-04-29 02:23:00 -07:00
refresh-workspace-images.sh feat(platform/admin): /admin/workspace-images/refresh + Docker SDK + GHCR auth 2026-04-26 10:17:21 -07:00
rollback-latest.sh fix(scripts): correct platform dir path + add ROOT isolation (shellcheck clean) 2026-04-22 15:42:24 +00:00
test_build_runtime_package.py chore: rewriter unit tests + drop misleading noqa on import inbox 2026-04-30 20:45:32 -07:00
test-a2a-cross-runtime.sh initial commit — Molecule AI platform 2026-04-13 11:55:37 -07:00
test-all-adapters.sh initial commit — Molecule AI platform 2026-04-13 11:55:37 -07:00
test-all-runtimes-a2a-e2e.sh test(e2e): wire SaaS auth headers (TENANT_ADMIN_TOKEN + TENANT_ORG_ID) 2026-05-02 04:36:23 -07:00
test-all.sh initial commit — Molecule AI platform 2026-04-13 11:55:37 -07:00
test-cross-agent-chat.sh initial commit — Molecule AI platform 2026-04-13 11:55:37 -07:00
test-hermes-plugin-e2e.sh test(e2e): unified A2A round-trip parity harness across all 4 runtimes 2026-05-02 04:36:23 -07:00
test-nuke-and-rebuild.sh fix(scripts): nuke-and-rebuild self-bootstraps templates; add E2E test 2026-04-26 14:37:04 -07:00
test-team-e2e.sh initial commit — Molecule AI platform 2026-04-13 11:55:37 -07:00
wheel_smoke.py feat(mcp): notifications/claude/channel for push-feel inbox UX 2026-04-30 20:10:01 -07:00

scripts/

Operational and one-off scripts for molecule-core. Most are self-documenting — see the header comments in each file.

RFC #2251 coordinator task-bound harnesses

There are three related scripts; pick the right one:

Script Purpose Targets
measure-coordinator-task-bounds.sh Canonical v1 harness for the RFC #2251 / Issue 4 reproduction. Provisions a PM coordinator + Researcher child via claude-code-default + langgraph templates, sends a synthesis-heavy A2A kickoff, observes elapsed time + activity trace. OSS-shape platform — localhost or any /workspaces-shaped endpoint. Has tenant/admin-token guards for non-localhost runs.
measure-coordinator-task-bounds-runner.sh Generalised runner for the same measurement contract but with arbitrary template + secret + model combinations (Hermes/MiniMax, etc.). Useful for cross-runtime variants without modifying the canonical harness. Same as above (local or SaaS via MODE=saas).
measure-coordinator-task-bounds.sh (in molecule-controlplane) Production-shape variant that bootstraps a real staging tenant via POST /cp/admin/orgs, then runs the same measurement against <slug>.staging.moleculesai.app. Staging controlplane only — refuses to run against production.

See reference_harness_pair_pattern (auto-memory) for when to use which and the cross-repo design rationale.

Common safety pattern across all three

  • Cleanup trap on EXIT/INT/TERM auto-deletes provisioned resources.
  • DRY_RUN=1 prints plan + auth fingerprint, exits before any state mutation. Run this before pointing at staging or any shared infrastructure.
  • Non-target guard refuses arbitrary endpoints (the controlplane variant is locked to staging-api.moleculesai.app; the OSS variant requires explicit auth + tenant scoping for non-localhost PLATFORM).
  • Cleanup failures emit cleanup_*_failed events with remediation hints; no silenced curl. ADMIN_TOKEN expiring mid-run surfaces as a structured event rather than a silent leak.

Activity trace caveat

If activity_trace.raw == "<endpoint_unavailable>", the per-workspace /activity endpoint isn't wired on the target build — the bound measurement is INCONCLUSIVE on the platform-ceiling question. Either wire the endpoint or replace with the equivalent Datadog query. Note that /activity accepts a since_secs query parameter; see the endpoint handler for the supported range.

Other scripts

  • cleanup-rogue-workspaces.sh — emergency teardown for leaked workspaces. Prompts for confirmation. Pair with the harnesses if a cleanup trap fails (see cleanup_*_failed events).
  • canary-smoke.sh — quick smoke test for canary releases.
  • dev-start.sh — local-dev platform bring-up.

The rest are self-documenting in their header comments.