forked from molecule-ai/molecule-core
d30c813ff9
13 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
f01f374072 |
feat(mcp): add molecule-mcp doctor onboarding diagnostic
Closes #2934 item 6 — the deferred follow-up from Ryan's onboarding- friction report. Quote: "this single command would have saved me 30 of the 45 minutes." When push delivery fails or the install half-works, the operator today has no signal — they hand-grep the Claude Code binary or chase the `from versions: none` red herring. Doctor renders six checks in one screen with concrete next-step suggestions: 1. Python version >=3.11? (wheel's pin) 2. Wheel install molecule-ai-workspace-runtime importable + version surfaced 3. PATH for binary `molecule-mcp` resolves on PATH; if not, prints the resolved user-site bin dir to add (or recommends pipx) 4. Env vars PLATFORM_URL + WORKSPACE_ID + token (env or *_FILE or .auth_token) 5. Platform reach GET ${PLATFORM_URL}/healthz returns 2xx 6. Registry register POST /registry/register with the resolved token returns 2xx — end-to-end auth check Each line: `[OK|WARN|FAIL] <label>: <status>` plus a `next:` hint when not OK. ANSI colors auto-disable on non-TTY / NO_COLOR. Exit code: 0 on all-OK or only-WARN, 1 on any FAIL — scriptable from CI install-checks. ## Files `workspace/mcp_doctor.py` (new) — six check functions + `run()` entry point. Uses urllib (stdlib) so doctor works even on a partial install where `requests` is missing. `workspace/mcp_cli.py` Subcommand dispatch: molecule-mcp doctor → mcp_doctor.run() molecule-mcp --help → usage banner molecule-mcp → server (unchanged) `workspace/tests/test_mcp_doctor.py` (new) — 10 tests covering each check's pass/fail/skip path plus the end-to-end exit-code contract on a stripped env. `scripts/build_runtime_package.py` Adds `mcp_doctor` to TOP_LEVEL_MODULES so the wheel ships the new module. ## Out of scope (deferred follow-ups) - Claude Code-specific checks (parse ~/.claude.json, verify each MCP entry is plugin-sourced + dev-channels flag set). That's a separate Claude-Code-shaped doctor; lives in the channel plugin. - Automated remediation. Doctor is diagnostic — tells the operator what's wrong + how to fix it, doesn't apply changes. ## Verification - python -m pytest tests/test_mcp_doctor.py -v → 10/10 PASS - python -m pytest tests/test_mcp_cli*.py → 67/67 PASS (existing CLI suite still green; subcommand dispatch added before env-validation, doesn't disturb the server-boot path) - manual: `molecule-mcp doctor` on a stripped env renders 4 FAIL + 2 WARN + exit code 1, with each `next:` hint actionable Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
28ef75d25e |
refactor(workspace): split mcp_cli.py (626 LOC) into focused modules (RFC #2873 iter 3)
Splits the standalone molecule-mcp wrapper into three single-concern
modules per the OSS-shape refactor program:
* mcp_heartbeat.py — register POST + heartbeat loop + auth-failure
escalation + inbound-secret persistence
* mcp_workspace_resolver.py — single + multi-workspace env validation
+ on-disk token-file read + operator-help printer
* mcp_inbox_pollers.py — activate inbox singleton + spawn one daemon
poller per workspace
mcp_cli.py becomes a 193-LOC orchestrator: validates env, calls each
module's helpers, hands off to a2a_mcp_server.cli_main. The console-
script entry molecule-mcp = molecule_runtime.mcp_cli:main is preserved.
Back-compat aliases (mcp_cli._build_agent_card, _heartbeat_loop,
_resolve_workspaces, etc.) re-export the new modules' authoritative
functions so existing tests + wheel_smoke.py + any downstream caller
keeps working unchanged. A new test file pins each alias as the
exact same callable (drift gate via `is`).
Tests:
* 62 existing test_mcp_cli.py + test_mcp_cli_multi_workspace.py
pass against the split.
* Two heartbeat-loop persist tests + the auth-escalation caplog
setup updated to target mcp_heartbeat (the module where the loop
body now lives) instead of mcp_cli (still works through aliases
for direct calls, but Python's name resolution inside the loop
body uses the new module's namespace).
* test_mcp_cli_split.py adds 11 new tests: alias drift gate +
inbox-poller single + multi-workspace branches + degraded
inbox-import logging path (none of those existed before).
Refs RFC #2873.
|
||
|
|
3195657837 |
fix: bot-lint nits — drop unused imports, add reason to except
Resolves three github-code-quality threads blocking PR-2739 merge: - workspace/tests/test_mcp_cli_multi_workspace.py: remove unused `import os` and `from unittest.mock import patch` (left over from an earlier test draft that mocked at the os.environ layer). - workspace/mcp_cli.py:523: replace bare `pass` in the register_workspace_token ImportError handler with a debug log line + one-line comment explaining the silent-degrade contract (older installs that don't yet ship the helper fall back to the legacy single-token path; single-workspace operators see no behavior change). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
829ab66462 |
mcp: support multi-workspace external-agent registration (PR-1)
External MCP agents (e.g. Claude Code installed on a company PC) can
now register against MULTIPLE workspaces from a single process — the
agent participates as a peer in workspace A (company) AND workspace B
(personal) simultaneously, with one merged inbox tagged so replies
route to the correct tenant.
Use case (verbatim from operator): "I have this computer AI thats in
company's PC, he is going to be put in company's workspace, but
personally, I want to register it to my own workspace as well, so
that I can talk to it and asking him to do work."
## What changed
**Wire format** — new env var:
MOLECULE_WORKSPACES='[
{"id":"<company-wsid>","token":"<company-tok>"},
{"id":"<personal-wsid>","token":"<personal-tok>"}
]'
When set, mcp_cli iterates the array and spawns one (register +
heartbeat + inbox poller) trio per workspace. Single-workspace mode
(WORKSPACE_ID + MOLECULE_WORKSPACE_TOKEN) is unchanged — every
existing operator's setup keeps working bit-for-bit.
**Per-workspace token registry** (platform_auth.py):
register_workspace_token(wsid, tok) — populated by mcp_cli once
per workspace before any thread spawns; thread-safe registration
+ lock-free reads on the hot path. auth_headers(workspace_id=...)
routes to the per-workspace token; auth_headers() with no arg
uses the legacy resolution path unchanged (back-compat).
**Per-workspace inbox cursors** (inbox.py):
InboxState now supports cursor_paths={wsid: Path,...}. Each poller
advances its own cursor — one workspace's slow poll can't stall
another, and a 410 only resets the affected workspace's cursor.
Single-workspace constructor (cursor_path=Path(...)) still works
exactly as before via __post_init__ promotion to the empty-string
key. Cursor filenames disambiguated by workspace_id[:8] when
multi-workspace; single-workspace keeps the legacy filename so
upgrade doesn't invalidate on-disk state.
**Arrival workspace tagging** (inbox.py):
InboxMessage.arrival_workspace_id — tells the agent which OF ITS
workspaces the inbound message arrived on. Set by the poller from
the cursor key. to_dict() omits the field when empty so single-
workspace consumers see no shape change.
**Reply routing** (a2a_tools.py + a2a_mcp_server.py + registry.py):
send_message_to_user(workspace_id=...) — optional override that
selects which workspace's /notify endpoint to POST to (and which
token authenticates). Multi-workspace agents pass the inbound
message's arrival_workspace_id; single-workspace agents omit it
and route to the only registered workspace via the legacy URL.
## Out of scope (future PRs)
- PR-2: cross-workspace delegation auto-routing — when an agent
receives a request from personal-ws "delegate to ops-bot" and
ops-bot lives in company-ws, the agent should auto-pick its
company-ws identity for the outbound delegate_task. Today the
agent must pass via_workspace explicitly (or fall through to
primary workspace).
- PR-3: memory namespacing — commit_memory() still writes to the
primary workspace's memory regardless of inbound context. Will
revisit when the new memory system (PR #2733 just landed) settles.
## Tests
workspace/tests/test_mcp_cli_multi_workspace.py — 24 new tests:
* MOLECULE_WORKSPACES JSON parsing (valid + 6 error shapes)
* Token registry register / lookup / rotation / clear
* auth_headers routing by workspace_id with legacy fallback
* Per-workspace cursor save/load/reset isolation
* arrival_workspace_id present-when-set, omitted-when-empty
* default_cursor_path namespacing
All 110 pre-existing tests in test_mcp_cli.py / test_inbox.py /
test_platform_auth.py still pass — back-compat is mechanical.
Refs: project memory entry "External agent multi-workspace
registration", design questions answered 2026-05-04 by user
(JSON env var; explicit memory writes deferred to PR-3).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
b8fdbd9fab |
fix(runtime): register configs_dir in TOP_LEVEL_MODULES + drop alias
Wheel-build smoke gate detected `configs_dir` missing from scripts/build_runtime_package.py:TOP_LEVEL_MODULES. Without it the build would ship `import configs_dir` un-rewritten and every external-runtime install would die on `ModuleNotFoundError` at first import. Two callers used `import configs_dir as _configs_dir` to belt-and- suspenders against an imagined name collision, but the rewriter rejects `import X as Y` because the rewrite would produce `import molecule_runtime.X as X as Y` (invalid syntax). No actual collision exists (only docstring/comment references). Switched to plain `import configs_dir`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
c636022d2f |
fix(runtime): auto-fallback CONFIGS_DIR for non-container hosts (closes #2458)
The runtime persists per-workspace state (`.auth_token`,
`.platform_inbound_secret`, `.mcp_inbox_cursor`) under `/configs` —
the workspace-EC2 mount path. Inside a container that's writable,
agent-owned. Outside a container, `/configs` either doesn't exist or
isn't writable by an unprivileged user.
The default broke the external-runtime path (`pip install
molecule-ai-workspace-runtime` + `molecule-mcp` on a Mac/Linux
laptop). First heartbeat tries to persist `.platform_inbound_secret`
and crashes:
[Errno 30] Read-only file system: '/configs'
The heartbeat thread logs and dies. Workspace flips offline within
a minute. Operator sees no actionable error.
Adds workspace/configs_dir.py — single resolution point with a tiered
fallback:
1. CONFIGS_DIR env var, if set — explicit operator override
(preserves existing tests + custom deployments verbatim).
2. /configs — if it exists AND is writable. In-container default;
unchanged behavior for every prod workspace.
3. ~/.molecule-workspace — created with mode 0700 so per-file 0600
perms aren't undermined by a world-readable parent.
Migrates the four readers (platform_auth, platform_inbound_auth,
mcp_cli, inbox) to call configs_dir.resolve() instead of
inlining `Path(os.environ.get("CONFIGS_DIR", "/configs"))`.
Existing tests that assert the old `/configs`-as-default contract
updated to assert the new contract: when CONFIGS_DIR is unset, path
resolves to a writable location — `/configs` if present, fallback
otherwise. Tests skip the fallback branch on hosts that DO have a
writable `/configs` (CI containers).
Verified the original repro is fixed: with no CONFIGS_DIR set on
macOS, configs_dir.resolve() returns ~/.molecule-workspace, the dir
exists, and writes succeed.
Test suite: 1454 passed, 3 skipped, 2 xfailed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
c4bb803329 |
feat(mcp_cli): agent_card from env vars (capability discovery)
External molecule-mcp runtimes register with hardcoded agent_card.name
= molecule-mcp-{id[:8]} and skills=[]. That made every external
workspace look identical on the canvas and gave peer agents calling
list_peers no signal beyond name — they had to guess capabilities.
Three new env vars let the operator declare identity + capabilities
without code changes:
* MOLECULE_AGENT_NAME — display name on canvas (default unchanged)
* MOLECULE_AGENT_DESCRIPTION — one-line description (default empty)
* MOLECULE_AGENT_SKILLS — comma-separated skill names
Comma-separated skills get expanded to {"name": "..."} objects — the
minimum shape that satisfies both shared_runtime.summarize_peers
(reads s["name"]) AND canvas SkillsTab.tsx (id falls back to name).
Strict-superset behaviour: when no env vars are set, agent_card
matches the previous hardcoded value exactly. No regression for
operators who haven't migrated.
Why this matters end-to-end:
* Canvas Skills tab now shows each declared skill as a chip
* Peer agents calling list_peers see {name, skills} per peer and
can route delegations to the right specialist
* Same applies to the canvas Details tab + workspace card hover
Tests cover: defaults match prior behaviour; name override; CSV →
skill objects; whitespace stripping + empty entries dropped;
description omitted when unset (keeps wire payload minimal);
whitespace-only name falls back to default; end-to-end through
_platform_register's payload.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
d887ce8e96 |
fix(mcp_cli): escalate consecutive heartbeat 401s with re-onboard guidance
The universal molecule-mcp wheel runs in a daemon thread, posting
/registry/heartbeat every 20s. When the workspace gets deleted
server-side (DELETE /workspaces/:id), the platform revokes all tokens
for that workspace. Previous behaviour: heartbeat would 401 forever,
log at WARNING per tick, no actionable signal anywhere.
Failure mode hit on hongmingwang tenant 2026-04-30: workspace
a1771dba was deleted at some prior time, the channel-bridge .env
still pointed at it, MCP tools 401-ed silently with the operator
having no idea why. The register-time path at mcp_cli.py:104-111
already does loud + actionable for 401 (sys.exit(3) with regenerate-
from-canvas-Tokens text) — extend the same pattern to the heartbeat.
Behaviour:
* count < 3: WARNING per tick (could be transient blip)
* count == 3: ERROR with re-onboard instructions, names the dead
workspace_id, points at the canvas Tokens tab
* count > 3 and every 20 ticks (~7 min): re-log ERROR so a session
that started after the first ERROR still catches it
5xx and other non-auth HTTP errors do NOT increment the auth-failure
counter — that would mislead the operator (e.g. a server blip would
trigger "token revoked" when the token is fine).
Tests cover: single 401 stays at WARNING; 3 consecutive 401s escalate
to ERROR with the right keywords; 403 treated identically; recovery
via 200 resets the counter; 5xx never triggers the auth path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
a5c5139e3a |
fix(workspace): deliver platform_inbound_secret on every heartbeat
Heartbeat now echoes the workspace's platform_inbound_secret on every
beat (mirroring /registry/register), and the molecule-mcp client
persists it to /configs/.platform_inbound_secret on receipt.
Symptom (2026-04-30, hongmingwang tenant): chat upload returned 503
"workspace will pick it up on its next heartbeat" and then 401 on
retry — permanent until workspace restart. The 503 message was a lie:
heartbeat used to discard the platform_inbound_secret entirely; only
register delivered it, and register fires once at startup.
Server (Go):
- Heartbeat handler reuses readOrLazyHealInboundSecret (the same
helper chat_files + register use), so heartbeat-time recovery
covers the rotate / mid-life NULL-column case the existing
register-time heal can't reach.
- Failure is non-fatal: liveness contract trumps secret delivery,
chat_files retries lazy-heal on its own next request.
Client (Python):
- _persist_inbound_secret_from_heartbeat parses the heartbeat 200
response and persists via platform_inbound_auth.save_inbound_secret.
- All exceptions swallowed — heartbeat liveness > secret persistence;
next tick (≤20s) retries.
Tests:
- Server: pin secret-present, lazy-heal-mint-on-NULL, and heal-
failure-omits-field branches.
- Client: pin persist-on-200, skip-on-empty, skip-on-non-dict-body,
skip-on-401, swallow-save-OSError.
|
||
|
|
b47d4ceb00 |
feat(workspace-runtime): add inbox polling for standalone molecule-mcp path
The universal MCP server (a2a_mcp_server.py) was outbound-only — agents
in standalone runtimes (Claude Code, hermes, codex, etc.) could
delegate, list peers, and write memories, but never observed the
canvas-user or peer-agent messages addressed to them. This blocked
"constantly responding" loops without forcing operators back onto a
runtime-specific channel plugin.
This PR closes the inbound gap with a poller-fed in-memory queue and
three new MCP tools:
- wait_for_message(timeout_secs?) — block until next message arrives
- inbox_peek(limit?) — list pending messages (non-destructive)
- inbox_pop(activity_id) — drop a handled message
A daemon thread polls /workspaces/:id/activity?type=a2a_receive every
5s, fills the queue from the cursor (since_id), and persists the cursor
to ${CONFIGS_DIR}/.mcp_inbox_cursor so a restart doesn't replay backlog.
On 410 (cursor pruned) we fall back to since_secs=600 for a bounded
recovery window. Activity-row → InboxMessage extraction mirrors the
molecule-mcp-claude-channel plugin's extractText (envelope shapes #1-3
+ summary fallback).
mcp_cli.main starts the poller alongside the existing register +
heartbeat threads. In-container runtimes (which have push delivery via
canvas WebSocket) skip activation, so inbox tools return an
informational "(inbox not enabled)" message instead of double-delivery.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
b54ceb799f |
fix: address 5-axis review findings on PR #2413
Critical:
- ExternalConnectModal.tsx: filledUniversalMcp substitution searched
for WORKSPACE_AUTH_TOKEN but the snippet's placeholder is now
MOLECULE_WORKSPACE_TOKEN (changed in the previous polish commit
|
||
|
|
427300f3a4 |
feat: make molecule-mcp standalone (built-in register + heartbeat) + recover awaiting_agent on heartbeat
Two paired fixes that together let an external operator run a single
process (molecule-mcp) and see their workspace come up online in the
canvas — the bug surfaced live when status stuck at "awaiting_agent /
OFFLINE" despite an active MCP server.
Platform side (workspace-server/internal/handlers/registry.go):
Heartbeat handler already auto-recovers offline → online and
provisioning → online, but NOT awaiting_agent → online. Healthsweep
flips stale-heartbeat external workspaces TO awaiting_agent, and
with no recovery path the workspace stays "OFFLINE — Restart" in the
canvas forever. Add the symmetric branch: if currentStatus ==
"awaiting_agent" and a heartbeat arrives, flip to online + broadcast
WORKSPACE_ONLINE. Mirrors the existing offline/provisioning patterns
exactly. Test: TestHeartbeatHandler_AwaitingAgentToOnline asserts
the SQL UPDATE fires with the awaiting_agent guard clause.
Wheel side (workspace/mcp_cli.py):
molecule-mcp was outbound-only — operators had to run a separate
SDK process to register + heartbeat. Now mcp_cli.main():
1. Calls /registry/register at startup (idempotent upsert flips
status awaiting_agent → online via the existing register path).
2. Spawns a daemon thread that POSTs /registry/heartbeat every
20s. 20s is comfortably under the healthsweep stale window so
a single missed beat doesn't cause status churn.
3. Runs the MCP stdio loop in the foreground.
Both calls set Origin: ${PLATFORM_URL} so the SaaS edge WAF accepts
them. Threaded heartbeat (not asyncio) chosen because it doesn't
need to share an event loop with the MCP stdio server — daemon=True
cleanly dies when the operator's runtime exits.
MOLECULE_MCP_DISABLE_HEARTBEAT=1 escape hatch lets in-container
callers (which have heartbeat.py running already) reuse the entry
point without double-heartbeating. Default is enabled.
End-to-end verification (live, against
hongmingwang.moleculesai.app, workspace 8dad3e29-...):
pre-fix: status=awaiting_agent → canvas shows OFFLINE forever
post-fix: ran `molecule-mcp` for 5s standalone → canvas state:
status=online runtime=external agent=molecule-mcp-8dad3e29
Test coverage: 7 new mcp_cli tests (register-at-startup, heartbeat-
thread-spawned, disable-env-skips-both, env-and-file token resolution,
register payload shape, heartbeat endpoint + headers); 1 new platform
test (awaiting_agent → online recovery). Full workspace + handlers
suites green: 1355 Python, full Go handlers passing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
169e284d57 |
feat(workspace-runtime): expose universal MCP server to runtime=external operators
Ship the baseline universal MCP path that any external runtime (Claude
Code, hermes, codex, anything that speaks MCP stdio) can use, before
optimizing per-runtime channels. Today the workspace MCP server only
spins up inside the container; external operators have no way to call
the 8 platform tools (delegate_task, list_peers, send_message_to_user,
commit_memory, etc.) from outside.
Three additive changes:
1. **`platform_auth.get_token()` env-var fallback** — adds
`MOLECULE_WORKSPACE_TOKEN` as a fallback when no
`${CONFIGS_DIR}/.auth_token` file exists. File-first preserves
in-container behavior unchanged. External operators (no /configs
volume) now have a way to supply the token without faking the
filesystem layout.
2. **`molecule-mcp` console script** — adds a new entry point in the
published `molecule-ai-workspace-runtime` PyPI wheel. Operators run
`pip install molecule-ai-workspace-runtime`, set 3 env vars
(WORKSPACE_ID, PLATFORM_URL, MOLECULE_WORKSPACE_TOKEN), and register
the binary in their agent's MCP config. `mcp_cli.main` is a thin
validator wrapper — it checks env BEFORE importing the heavy
`a2a_mcp_server` module so a misconfigured first-run gets a friendly
3-line error instead of a 20-line module-level RuntimeError
traceback.
3. **Wheel smoke gate** — extends `scripts/wheel_smoke.py` to assert
`cli_main` and `mcp_cli.main` are importable. Same regression class
as the 0.1.16 main_sync incident: a silent rename or unrewritten
import here would break every external operator on the next wheel
publish (memory: feedback_runtime_publish_pipeline_gates.md).
Test coverage:
- `tests/test_platform_auth.py` — 8 new tests for the env-var fallback:
file-priority, env-fallback, whitespace handling, cache, header
construction, empty-env-as-unset.
- `tests/test_mcp_cli.py` — 8 new tests for the validator: each
required var separately, file-or-env satisfies token requirement,
whitespace-only env treated as missing, help mentions canvas Tokens
tab.
- Full `workspace/tests/` suite green: 1346 passed, 1 skipped.
- Local end-to-end: built wheel, installed in venv, ran `molecule-mcp`
with no env → friendly error; with env → MCP server starts.
Why now / why this shape: user redirect was "support the baseline
first so all runtimes can use, then optimize". A claude-only MCP
channel leaves hermes/codex/third-party operators broken on
runtime=external. This PR ships the runtime-agnostic baseline; per-
runtime polish (claude-channel push delivery, hermes-native
bindings) is a follow-up PR. PR #2412 fixed the partner bug where
canvas Restart silently revoked the operator's token — the two
together unblock the external-runtime story end-to-end.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|