feat(mcp): add molecule-mcp doctor onboarding diagnostic

Closes #2934 item 6 — the deferred follow-up from Ryan's onboarding-
friction report. Quote: "this single command would have saved me
30 of the 45 minutes."

When push delivery fails or the install half-works, the operator
today has no signal — they hand-grep the Claude Code binary or
chase the `from versions: none` red herring. Doctor renders six
checks in one screen with concrete next-step suggestions:

  1. Python version    >=3.11? (wheel's pin)
  2. Wheel install     molecule-ai-workspace-runtime importable +
                        version surfaced
  3. PATH for binary   `molecule-mcp` resolves on PATH; if not,
                        prints the resolved user-site bin dir to
                        add (or recommends pipx)
  4. Env vars          PLATFORM_URL + WORKSPACE_ID + token (env or
                        *_FILE or .auth_token)
  5. Platform reach    GET ${PLATFORM_URL}/healthz returns 2xx
  6. Registry register POST /registry/register with the resolved
                        token returns 2xx — end-to-end auth check

Each line: `[OK|WARN|FAIL] <label>: <status>` plus a `next:` hint
when not OK. ANSI colors auto-disable on non-TTY / NO_COLOR.

Exit code: 0 on all-OK or only-WARN, 1 on any FAIL — scriptable
from CI install-checks.

## Files

`workspace/mcp_doctor.py`  (new) — six check functions + `run()`
                                   entry point. Uses urllib (stdlib)
                                   so doctor works even on a partial
                                   install where `requests` is missing.

`workspace/mcp_cli.py`             Subcommand dispatch:
                                     molecule-mcp doctor   → mcp_doctor.run()
                                     molecule-mcp --help   → usage banner
                                     molecule-mcp          → server (unchanged)

`workspace/tests/test_mcp_doctor.py`  (new) — 10 tests covering each
                                       check's pass/fail/skip path
                                       plus the end-to-end exit-code
                                       contract on a stripped env.

`scripts/build_runtime_package.py`    Adds `mcp_doctor` to
                                       TOP_LEVEL_MODULES so the
                                       wheel ships the new module.

## Out of scope (deferred follow-ups)
- Claude Code-specific checks (parse ~/.claude.json, verify each
  MCP entry is plugin-sourced + dev-channels flag set). That's a
  separate Claude-Code-shaped doctor; lives in the channel plugin.
- Automated remediation. Doctor is diagnostic — tells the operator
  what's wrong + how to fix it, doesn't apply changes.

## Verification
  - python -m pytest tests/test_mcp_doctor.py -v   → 10/10 PASS
  - python -m pytest tests/test_mcp_cli*.py        → 67/67 PASS
    (existing CLI suite still green; subcommand dispatch added
    before env-validation, doesn't disturb the server-boot path)
  - manual: `molecule-mcp doctor` on a stripped env renders 4 FAIL
    + 2 WARN + exit code 1, with each `next:` hint actionable

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Hongming Wang 2026-05-05 15:44:36 -07:00
parent 1edee1131b
commit f01f374072
4 changed files with 553 additions and 0 deletions

View File

@ -80,6 +80,7 @@ TOP_LEVEL_MODULES = {
"internal_file_read", "internal_file_read",
"main", "main",
"mcp_cli", "mcp_cli",
"mcp_doctor",
"mcp_heartbeat", "mcp_heartbeat",
"mcp_inbox_pollers", "mcp_inbox_pollers",
"mcp_workspace_resolver", "mcp_workspace_resolver",

View File

@ -93,7 +93,34 @@ def main() -> None:
``{"id": ..., "token": ...}`` entries. One register + heartbeat ``{"id": ..., "token": ...}`` entries. One register + heartbeat
+ inbox poller per entry; messages from any workspace land in + inbox poller per entry; messages from any workspace land in
the same agent inbox tagged with ``arrival_workspace_id``. the same agent inbox tagged with ``arrival_workspace_id``.
Subcommand:
``molecule-mcp doctor`` runs an onboarding diagnostic against the
current shell environment + platform reachability and exits.
Closes Ryan's #2934 item 6.
""" """
# Subcommand dispatch — must come BEFORE env-var validation so
# `molecule-mcp doctor` can run on a partially-configured shell
# and tell the operator what's missing. Argv shapes:
# molecule-mcp → run server (this function's main path)
# molecule-mcp doctor → run diagnostic, exit
# molecule-mcp --help → defer to doctor for now (no other
# flags are supported yet)
if len(sys.argv) > 1:
if sys.argv[1] in ("doctor", "--doctor"):
import mcp_doctor
sys.exit(mcp_doctor.run())
if sys.argv[1] in ("--help", "-h", "help"):
print(
"molecule-mcp — Molecule AI universal MCP server\n\n"
"Usage:\n"
" molecule-mcp Run the MCP stdio server (registers + heartbeats)\n"
" molecule-mcp doctor Run onboarding diagnostic + exit\n\n"
"Required env: PLATFORM_URL, WORKSPACE_ID (or MOLECULE_WORKSPACES),\n"
" MOLECULE_WORKSPACE_TOKEN (or MOLECULE_WORKSPACE_TOKEN_FILE)\n",
)
sys.exit(0)
if not os.environ.get("PLATFORM_URL", "").strip(): if not os.environ.get("PLATFORM_URL", "").strip():
_print_missing_env_help( _print_missing_env_help(
["PLATFORM_URL"], ["PLATFORM_URL"],

396
workspace/mcp_doctor.py Normal file
View File

@ -0,0 +1,396 @@
"""molecule-mcp doctor — diagnostic subcommand for first-run install.
Run via ``molecule-mcp doctor``. Prints a checklist of common
onboarding failure modes and concrete next-step suggestions for each
failed check.
Closes Ryan's #2934 item 6 ("Add a molecule-mcp doctor subcommand —
this single command would have saved me 30 of the 45 minutes").
Pairs with #2935 (Python>=3.11 callout, PATH guidance, TOKEN_FILE
support) those fixed the snippet, this gives the operator a way to
self-diagnose when something still goes wrong.
Six checks, in operator-encounter order:
1. Python version wheel requires >=3.11 (pip says
"no versions found" on older).
2. Wheel install molecule_runtime importable + version reported.
3. PATH for molecule-mcp pip user-site installs land at
~/Library/Python/3.X/bin which isn't on
PATH on a fresh macOS shell. Most common
"claude mcp add can't find molecule-mcp"
cause.
4. Env vars PLATFORM_URL set + reachable;
WORKSPACE_ID set; auth token resolvable
(env or *_FILE or .auth_token).
5. Platform health GET ${PLATFORM_URL}/healthz returns 2xx.
Catches DNS/firewall/wrong-scheme issues
before the operator hits the real
register call.
6. Registry register POST ${PLATFORM_URL}/registry/register
with the resolved workspace_id+token
returns 2xx. End-to-end auth verification.
Each check prints one of:
[OK] <one-line status>
[WARN] <one-line status> next: <fix suggestion>
[FAIL] <one-line status> next: <fix suggestion>
Exit 0 if all pass or only WARNs; exit 1 if any FAIL so the
subcommand is scriptable from CI / install-checks too.
Out of scope for now (deferred follow-ups):
- Claude Code-specific checks (parse ~/.claude.json, verify each
MCP entry is plugin-sourced + dev-channels flag is set). That's
a separate Claude-Code-specific doctor and lives in the
claude-code-channel plugin, not the universal-MCP doctor.
- Automated remediation (running the suggested fix). Doctor is
a diagnostic tool it tells the operator what's wrong + how
to fix it, doesn't apply changes.
"""
from __future__ import annotations
import importlib
import importlib.metadata
import os
import shutil
import sys
from typing import Optional
# urllib avoids a hard dep on `requests` for the doctor — the real
# CLI already imports requests via mcp_heartbeat, but doctor should
# keep working even on a partial install where requests is missing
# (that itself is a finding worth surfacing).
from urllib import request as urllib_request
from urllib.error import URLError
# ANSI colors are friendly on TTYs; auto-disable on pipe / NO_COLOR
# for CI logs where the escape sequences clutter the diff.
def _color(name: str) -> str:
if not sys.stdout.isatty() or os.environ.get("NO_COLOR"):
return ""
return {
"green": "\033[32m",
"yellow": "\033[33m",
"red": "\033[31m",
"dim": "\033[2m",
"reset": "\033[0m",
}.get(name, "")
def _ok(label: str, msg: str) -> None:
print(f" {_color('green')}[OK]{_color('reset')} {label}: {msg}")
def _warn(label: str, msg: str, fix: str) -> None:
print(f" {_color('yellow')}[WARN]{_color('reset')} {label}: {msg}")
print(f" {_color('dim')}next:{_color('reset')} {fix}")
def _fail(label: str, msg: str, fix: str) -> None:
print(f" {_color('red')}[FAIL]{_color('reset')} {label}: {msg}")
print(f" {_color('dim')}next:{_color('reset')} {fix}")
# Each check returns a "ok" | "warn" | "fail" verdict so the caller
# can compute an exit code without re-walking the print stream.
Verdict = str # "ok" | "warn" | "fail"
def check_python_version() -> Verdict:
label = "Python version"
major, minor = sys.version_info[:2]
if (major, minor) >= (3, 11):
_ok(label, f"Python {major}.{minor} (wheel requires >=3.11)")
return "ok"
_fail(
label,
f"Python {major}.{minor} is below the wheel's >=3.11 floor",
"upgrade Python (brew install python@3.12 / apt install python3.12) "
"or run molecule-mcp via a 3.11+ venv.",
)
return "fail"
def check_wheel_install() -> Verdict:
label = "Wheel install"
try:
version = importlib.metadata.version("molecule-ai-workspace-runtime")
except importlib.metadata.PackageNotFoundError:
_fail(
label,
"molecule-ai-workspace-runtime not found in this interpreter's site-packages",
"pip install molecule-ai-workspace-runtime "
"(or pipx install molecule-ai-workspace-runtime to get the "
"binary on PATH automatically).",
)
return "fail"
try:
importlib.import_module("molecule_runtime.mcp_cli")
except ImportError as e:
_fail(
label,
f"package found ({version}) but `molecule_runtime.mcp_cli` won't import: {e}",
"reinstall the wheel (pip install --force-reinstall "
"molecule-ai-workspace-runtime); if it still fails, file "
"a bug with the traceback.",
)
return "fail"
_ok(label, f"molecule-ai-workspace-runtime=={version}")
return "ok"
def check_path_for_binary() -> Verdict:
label = "PATH for molecule-mcp"
found = shutil.which("molecule-mcp")
if found:
_ok(label, f"resolves to {found}")
return "ok"
# Not on PATH — work out where pip put it so the suggestion is
# actionable instead of generic.
user_base = os.environ.get("PYTHONUSERBASE")
if not user_base:
try:
import site
user_base = site.getuserbase()
except Exception:
user_base = None
hint = (
f"add `{user_base}/bin` to PATH"
if user_base
else "switch to `pipx install molecule-ai-workspace-runtime` so the "
"binary lands in pipx's managed bin/ on PATH"
)
_fail(
label,
"molecule-mcp not found on PATH",
f"{hint}, or invoke via `python -m molecule_runtime.mcp_cli` directly.",
)
return "fail"
def _resolve_token_summary() -> Optional[str]:
"""Return a short non-secret description of how the token was
sourced (e.g. "env MOLECULE_WORKSPACE_TOKEN", "file .auth_token"),
or None if no token is reachable.
"""
if os.environ.get("MOLECULE_WORKSPACE_TOKEN", "").strip():
return "env MOLECULE_WORKSPACE_TOKEN"
file_var = os.environ.get("MOLECULE_WORKSPACE_TOKEN_FILE", "").strip()
if file_var:
if os.path.isfile(file_var):
return f"file {file_var} (via MOLECULE_WORKSPACE_TOKEN_FILE)"
return None
# Per-runtime container path used by the in-platform path; rarely
# set on external setups but check anyway so the message is
# accurate for both shapes.
try:
import configs_dir
candidate = configs_dir.resolve() / ".auth_token"
if candidate.is_file():
return f"file {candidate}"
except Exception:
pass
return None
def check_env_vars() -> Verdict:
label = "Env vars"
missing: list[str] = []
if not os.environ.get("PLATFORM_URL", "").strip():
missing.append("PLATFORM_URL")
if not os.environ.get("WORKSPACE_ID", "").strip() and not os.environ.get(
"MOLECULE_WORKSPACES", "",
).strip():
missing.append("WORKSPACE_ID (or MOLECULE_WORKSPACES)")
token_summary = _resolve_token_summary()
if not token_summary and not os.environ.get("MOLECULE_WORKSPACES", "").strip():
# MOLECULE_WORKSPACES is a JSON-array env that bundles its
# own per-workspace tokens — if it's set we trust the
# resolver to validate.
missing.append(
"MOLECULE_WORKSPACE_TOKEN (or MOLECULE_WORKSPACE_TOKEN_FILE, or "
"/configs/.auth_token)",
)
if missing:
_fail(
label,
f"unset: {', '.join(missing)}",
"see the canvas Connect-External-Agent modal — the snippet "
"exports all three. Use MOLECULE_WORKSPACE_TOKEN_FILE for the "
"token to keep secrets out of shell history.",
)
return "fail"
_ok(
label,
f"PLATFORM_URL + WORKSPACE_ID set; token from {token_summary or 'MOLECULE_WORKSPACES'}",
)
return "ok"
def _http_get(url: str, timeout: float = 5.0) -> tuple[Optional[int], Optional[str]]:
"""Best-effort GET that swallows transport errors and returns
(status, error_message). Status is None when the request couldn't
complete; error_message is None when the request returned 2xx.
"""
try:
# Origin header — staging tenants enforce same-origin via WAF;
# /healthz tolerates either way but matching production headers
# surfaces auth-style 401s correctly during the doctor run.
req = urllib_request.Request(
url,
headers={"Origin": os.environ.get("PLATFORM_URL", "").rstrip("/")},
)
with urllib_request.urlopen(req, timeout=timeout) as resp:
return resp.status, None
except URLError as e:
return None, str(e.reason if hasattr(e, "reason") else e)
except Exception as e:
return None, str(e)
def check_platform_health() -> Verdict:
label = "Platform reachability"
base = os.environ.get("PLATFORM_URL", "").strip().rstrip("/")
if not base:
_warn(label, "skipped (PLATFORM_URL unset — see Env vars)", "set PLATFORM_URL first")
return "warn"
if not base.startswith(("http://", "https://")):
_fail(
label,
f"PLATFORM_URL missing scheme: {base!r}",
"set PLATFORM_URL to include https:// — e.g. "
"PLATFORM_URL=https://your-tenant.staging.moleculesai.app",
)
return "fail"
if base.endswith("/"):
_warn(
label,
"PLATFORM_URL has trailing slash (will be stripped automatically)",
"remove the trailing slash to match the snippet shape",
)
status, err = _http_get(f"{base}/healthz")
if status is None:
_fail(label, f"GET {base}/healthz failed: {err}", "check DNS + firewall + scheme")
return "fail"
if not (200 <= status < 300):
_fail(label, f"GET {base}/healthz returned HTTP {status}", "verify the tenant subdomain is correct + provisioned")
return "fail"
_ok(label, f"GET {base}/healthz → {status}")
return "ok"
def check_register() -> Verdict:
"""Light end-to-end auth check via POST /registry/register.
Doesn't write a heartbeat or change platform state beyond what
a normal molecule-mcp boot would do the register endpoint is
idempotent. Skipped when env vars failed earlier so the operator
isn't shown a redundant 401.
"""
label = "Registry register"
base = os.environ.get("PLATFORM_URL", "").strip().rstrip("/")
workspace_id = os.environ.get("WORKSPACE_ID", "").strip()
token_summary = _resolve_token_summary()
if not (base and workspace_id and token_summary):
_warn(label, "skipped (Env vars must pass first)", "fix Env vars, re-run")
return "warn"
# Get the token value matching the resolver's source preference.
token = os.environ.get("MOLECULE_WORKSPACE_TOKEN", "").strip()
if not token:
file_var = os.environ.get("MOLECULE_WORKSPACE_TOKEN_FILE", "").strip()
if file_var and os.path.isfile(file_var):
try:
token = open(file_var).read().strip()
except Exception as e:
_fail(label, f"could not read token file: {e}", "fix the file permissions or path")
return "fail"
if not token:
_warn(label, "skipped (no token resolvable)", "set MOLECULE_WORKSPACE_TOKEN or MOLECULE_WORKSPACE_TOKEN_FILE")
return "warn"
import json
body = json.dumps({
"id": workspace_id,
"url": "", # external URL is unknown to doctor; register accepts empty
"agent_card": {"name": "doctor-probe", "version": "0.0.0"},
}).encode()
req = urllib_request.Request(
f"{base}/registry/register",
data=body,
method="POST",
headers={
"Authorization": f"Bearer {token}",
"Content-Type": "application/json",
"Origin": base,
},
)
try:
with urllib_request.urlopen(req, timeout=8.0) as resp:
status = resp.status
except URLError as e:
# Pull HTTP code from HTTPError; transport errors don't have one.
status = getattr(e, "code", None)
err = str(e.reason if hasattr(e, "reason") else e)
if status is None:
_fail(label, f"POST {base}/registry/register failed: {err}", "check network")
return "fail"
except Exception as e:
_fail(label, f"POST register failed: {e}", "check network")
return "fail"
if status == 401:
_fail(
label,
"401 Unauthorized — token rejected",
"tokens are shown only once at workspace-create time; "
"re-create the workspace OR rotate via canvas Tokens tab.",
)
return "fail"
if status == 404:
_fail(
label,
f"404 — workspace_id {workspace_id} not found on {base}",
"verify WORKSPACE_ID matches a real workspace + the tenant "
"subdomain in PLATFORM_URL.",
)
return "fail"
if not (200 <= status < 300):
_fail(label, f"POST register returned HTTP {status}", "see platform logs")
return "fail"
_ok(label, f"POST {base}/registry/register → {status}")
return "ok"
CHECKS = [
check_python_version,
check_wheel_install,
check_path_for_binary,
check_env_vars,
check_platform_health,
check_register,
]
def run() -> int:
"""Run all checks and return a process exit code (0 ok, 1 if any fail)."""
print("molecule-mcp doctor — onboarding diagnostic")
print()
verdicts = []
for chk in CHECKS:
try:
verdicts.append(chk())
except Exception as e:
# A buggy check shouldn't kill the rest of the doctor run.
print(f" [BUG] {chk.__name__}: unexpected {type(e).__name__}: {e}")
verdicts.append("fail")
print()
fails = sum(1 for v in verdicts if v == "fail")
warns = sum(1 for v in verdicts if v == "warn")
if fails:
print(f"{fails} check(s) failed, {warns} warning(s). Fix the FAIL items above and re-run.")
return 1
if warns:
print(f"All required checks passed; {warns} warning(s) — review the next-step hints.")
return 0
print("All checks passed.")
return 0

View File

@ -0,0 +1,129 @@
"""Tests for the molecule-mcp doctor subcommand (#2934 item 6).
Each `check_*` function is unit-tested in isolation via env
manipulation. The integration test (`test_run_no_env_returns_1`) pins
the end-to-end exit code on a stripped environment what an operator
running the command for the first time on an untouched shell sees.
"""
from __future__ import annotations
import os
import sys
from pathlib import Path
from unittest import mock
import pytest
# Workspace tests run from the workspace/ directory; mcp_doctor is
# imported with the same `import mcp_doctor` shape as the rest of
# the runtime (per pyproject's package layout).
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
import mcp_doctor # noqa: E402
def test_module_exposes_six_checks():
"""The doctor's checklist is six items today. Pin the count so
a future PR that drops a check (e.g. silently merges two) gets
flagged in review.
"""
assert len(mcp_doctor.CHECKS) == 6
def test_check_python_version_passes_on_311_plus():
"""Pin the floor at 3.11 (matches the wheel's requires_python)."""
with mock.patch.object(sys, "version_info", (3, 11, 0, "final", 0)):
assert mcp_doctor.check_python_version() == "ok"
with mock.patch.object(sys, "version_info", (3, 12, 5, "final", 0)):
assert mcp_doctor.check_python_version() == "ok"
def test_check_python_version_fails_on_310():
"""3.10 is below the wheel's >=3.11 floor — must FAIL, not WARN.
pip silently filters the wheel out on 3.10 with `from versions:
none`, which reads as "package missing" operators have spent
45min chasing that. The doctor's job is to call this out
explicitly.
"""
with mock.patch.object(sys, "version_info", (3, 10, 12, "final", 0)):
assert mcp_doctor.check_python_version() == "fail"
def test_check_env_vars_fails_when_all_unset(monkeypatch):
monkeypatch.delenv("PLATFORM_URL", raising=False)
monkeypatch.delenv("WORKSPACE_ID", raising=False)
monkeypatch.delenv("MOLECULE_WORKSPACES", raising=False)
monkeypatch.delenv("MOLECULE_WORKSPACE_TOKEN", raising=False)
monkeypatch.delenv("MOLECULE_WORKSPACE_TOKEN_FILE", raising=False)
assert mcp_doctor.check_env_vars() == "fail"
def test_check_env_vars_passes_with_token_env(monkeypatch):
monkeypatch.setenv("PLATFORM_URL", "https://x.moleculesai.app")
monkeypatch.setenv("WORKSPACE_ID", "ws-test")
monkeypatch.setenv("MOLECULE_WORKSPACE_TOKEN", "tok-abc")
monkeypatch.delenv("MOLECULE_WORKSPACE_TOKEN_FILE", raising=False)
monkeypatch.delenv("MOLECULE_WORKSPACES", raising=False)
assert mcp_doctor.check_env_vars() == "ok"
def test_check_env_vars_passes_with_token_file(monkeypatch, tmp_path):
"""Ryan #2934 item 3 fix: token from a file (or keychain shim)
instead of inline env var so secrets stay out of shell history.
The doctor must accept that path equally with the inline form.
"""
token_path = tmp_path / "token"
token_path.write_text("tok-from-file")
monkeypatch.setenv("PLATFORM_URL", "https://x.moleculesai.app")
monkeypatch.setenv("WORKSPACE_ID", "ws-test")
monkeypatch.setenv("MOLECULE_WORKSPACE_TOKEN_FILE", str(token_path))
monkeypatch.delenv("MOLECULE_WORKSPACE_TOKEN", raising=False)
monkeypatch.delenv("MOLECULE_WORKSPACES", raising=False)
assert mcp_doctor.check_env_vars() == "ok"
def test_check_platform_health_warns_when_url_unset(monkeypatch):
monkeypatch.delenv("PLATFORM_URL", raising=False)
assert mcp_doctor.check_platform_health() == "warn"
def test_check_platform_health_fails_on_missing_scheme(monkeypatch):
"""A bare hostname is the second-most-common config error after
missing-token (per the snippet's NOTE on Origin/PLATFORM_URL).
The error message must say 'missing scheme' not 'DNS error'
so the operator can diagnose without inspecting the URL string.
"""
monkeypatch.setenv("PLATFORM_URL", "x.moleculesai.app")
assert mcp_doctor.check_platform_health() == "fail"
def test_check_register_skipped_without_env(monkeypatch):
monkeypatch.delenv("PLATFORM_URL", raising=False)
monkeypatch.delenv("WORKSPACE_ID", raising=False)
monkeypatch.delenv("MOLECULE_WORKSPACE_TOKEN", raising=False)
# Skipped (warn), NOT failed — failing here would double-count
# the env-vars failure noise.
assert mcp_doctor.check_register() == "warn"
def test_run_returns_1_when_any_fail(monkeypatch, capsys):
"""End-to-end: stripped environment → at least one FAIL →
exit 1. Pin the exit-code contract so this is scriptable from
CI / install-checks too.
"""
for k in (
"PLATFORM_URL",
"WORKSPACE_ID",
"MOLECULE_WORKSPACES",
"MOLECULE_WORKSPACE_TOKEN",
"MOLECULE_WORKSPACE_TOKEN_FILE",
):
monkeypatch.delenv(k, raising=False)
code = mcp_doctor.run()
out = capsys.readouterr().out
assert code == 1
# The summary line must mention at least one failure count so
# an automated wrapper can grep for it.
assert "check(s) failed" in out
# And the human-facing label must be present so someone reading
# CI logs sees what the section is about, not a wall of [FAIL].
assert "molecule-mcp doctor" in out