Commit Graph

148 Commits

Author SHA1 Message Date
Hongming Wang
09e99a09c6 feat(a2a-mcp): add chat_history tool for prior turns with a peer
When a peer_agent push lands and the agent needs context from prior
turns with that workspace ("what task did this peer assign me last
hour?", "what did I tell them?"), the only options today are
re-deriving from memory (lossy) or scrolling activity_logs in the
canvas (no agent-facing tool). Surface the platform's existing
audit log directly via a new MCP tool so agents can read both sides
of an A2A conversation in chronological order.

Implementation:
- a2a_tools.py: new tool_chat_history(peer_id, limit=20, before_ts="")
  hits /workspaces/<self>/activity?peer_id=X&limit=N (the new server
  filter from molecule-core#2472). Reverses the DESC response into
  chronological order so the agent reads top-down. Graceful error
  envelope on validation/network/non-200 — never crashes the MCP
  server, agent can branch on Error: prefix.
- platform_tools/registry.py: ToolSpec wired into the A2A section so
  the rendered system-prompt block automatically includes it. Same
  pattern as the existing inbox_peek/inbox_pop/wait_for_message.
- a2a_mcp_server.py: dispatch in handle_tool_call.
- executor_helpers.py: _CLI_A2A_COMMAND_KEYWORDS gets a None entry
  (CLI runtimes don't expose chat history today; flip to a keyword
  when a2a_cli grows a `history` subcommand).
- snapshots/a2a_instructions_mcp.txt regenerated.

Tests: 10 new branches in TestChatHistory (validation / param
forwarding / limit cap / before_ts pass-through / DESC→chronological
reorder / 400 verbatim / 500 generic / network exc / non-list resp).
Mutation-verified: reverting a2a_tools.py fails 10/10. Full test
suite remains green at 1516 passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 17:54:23 -07:00
Hongming Wang
f96bb9f860 docs(mcp): tagged server:NAME form in dev-channels reference
Claude Code 2.1.x's --dangerously-load-development-channels takes an
allowlist of tagged entries (`server:<name>` or
`plugin:<name>@<marketplace>`), not a bare switch. The instructions
field's push-only-mode message and the inline comment in
`_poll_timeout_secs` both referenced the old bare form. Update both
so an agent or operator reading them lands on the right invocation —
matched against the docs change in [molecule-docs PR #110](https://github.com/Molecule-AI/docs/pull/110).

No behavior change (string-only edits in instructions text + comment).
33/33 tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 17:11:03 -07:00
Hongming Wang
f80e054a95
Merge pull request #2466 from Molecule-AI/feat/universal-push-via-instructions
feat(mcp): universal inbound delivery — instructions-driven polling + optional push
2026-05-01 23:13:05 +00:00
Hongming Wang
c61a6ff9bd chore(mcp): drop unused module-level _CHANNEL_INSTRUCTIONS
The frozen copy was a self-justification — the comment claimed "tests +
tooling rely on import-time identity" but no test or tooling code path
actually references the binding. _build_initialize_result() calls
_build_channel_instructions() fresh per call so env changes take effect,
which is the documented runtime contract.

github-code-quality flagged it; resolving the unused-variable thread so
the staging branch protection's all-conversations-resolved gate clears.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:11:09 -07:00
Hongming Wang
dbd086c7ad test(mcp): comment empty except in bridge test cleanup
Address github-code-quality review on PR #2465: explain why the
OSError swallow in pipe teardown is intentional (best-effort
cleanup of a possibly-already-closed fd).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:07:33 -07:00
Hongming Wang
ea206043d8 feat(mcp): universal inbound delivery — instructions-driven polling + optional push
Why this exists
---------------
Live evidence on 2026-05-01 caught a regression latent in #46's
"push-feel inbound" closure: standard `claude` launches without
`--dangerously-load-development-channels` silently drop our
`notifications/claude/channel` emissions, so canvas/peer messages sat
in the wheel inbox and never reached the agent loop until manual
`inbox_peek`. The flag is research-preview-only; non-Claude-Code MCP
clients (Cursor, Cline, OpenCode, hermes-agent, codex) never receive
the notification at all because the method namespace is Claude-
specific. Push-only delivery shipped as the universal contract is
not actually universal.

What this changes
-----------------
Adds a poll path that works on every spec-compliant MCP client. The
`initialize` `instructions` field — read by every client and surfaced
to the agent's system prompt automatically — now tells the agent to
call `wait_for_message(timeout_secs=N)` at the start of every turn.
Push remains as the strictly-better delivery for hosts that opt in
(Claude Code with the dev flag or a future allowlist entry), but is
no longer load-bearing.

Both paths converge on the same `inbox_pop` ack so duplicate-delivery
on a push+poll race is impossible: whoever surfaces the message to
the agent first pops it, the other side returns empty.

Operator knob
-------------
`MOLECULE_MCP_POLL_TIMEOUT_SECS` controls per-turn poll blocking
(default 2s). 0 disables polling for push-only Claude Code with the
dev flag. Above 60 clamps to 60 — protects against an accidental
five-minute stall per turn. Resolved fresh on every `initialize` so
a relaunch with new env is enough; no wheel rebuild required.

Tests
-----
- structural pins on the new instructions: `wait_for_message` +
  `timeout_secs` named, both PUSH PATH / POLL PATH labels present
- env-resolution: default fallback, garbage fallback, negative
  fallback, 60s clamp
- operator override: `MOLECULE_MCP_POLL_TIMEOUT_SECS=7` reaches the
  agent's instructions string
- timeout=0 toggles to push-only-mode messaging (no
  wait_for_message call asked of the agent)
- existing pins on push path, reply tools, prompt-injection defense,
  meta attributes — all preserved

Successor to #46. Closure milestone for this PR (per
feedback_close_on_user_visible_not_merge.md): launched `claude`
against the published wheel, sent a canvas message, observed the
agent surfaces the message inline at the start of its next turn
without me running `inbox_peek` — verified live before declaring done.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 15:32:57 -07:00
Hongming Wang
a3a496bced test(mcp): pin inbox→stdout bridge end-to-end with three failure-mode tests
Closes the dynamic-coverage gap on the `notifications/claude/channel`
push-UX bridge — until now we had static pins on the wire shape
(_build_channel_notification) and the initialize handshake, but the
threading + asyncio + stdout chain that ships notifications to the
host was never exercised under realistic conditions.

The three failure modes anticipated in #2444 §2 are each now pinned:

  test_inbox_bridge_emits_channel_notification_to_writer
    Drives a fake inbox event from a daemon thread, asserts the
    notification lands on a real os.pipe-backed asyncio writer with
    the correct JSON-RPC envelope. Catches: bridge wired up
    incorrectly (no-op _on_inbox_message), run_coroutine_threadsafe
    drift, _build_channel_notification call missing.

  test_inbox_bridge_swallows_closed_pipe_drain_error
    Closes the pipe's read end before firing, captures the
    concurrent.futures.Future that run_coroutine_threadsafe returns,
    asserts its exception() is None. Catches: narrowing the broad
    `except Exception` in _emit (e.g. to RuntimeError), or removing
    it. Without the swallow, the future carries a ConnectionResetError
    and the test fails with a clear message naming the regression.

  test_inbox_bridge_swallows_closed_loop_runtime_error
    Builds the bridge against a closed event loop, fires the
    callback, asserts no exception escapes. Catches: removing the
    `except RuntimeError` swallow on the run_coroutine_threadsafe
    call. Without it the poller thread would crash with
    "RuntimeError: Event loop is closed" during shutdown.

To make the bridge testable, extracted the closures from main() into
a top-level `_setup_inbox_bridge(writer, loop) -> Callable[[dict],
None]` helper. main()'s wire-up is now a single line that calls the
helper. Behavior is unchanged — same write, same drain, same
swallows — just no longer trapped inside main()'s closures.

Verified each test catches its regression by injection: removing
each swallow / no-op'ing the bridge each turn the matching test red
with a specific failure message that points at the missing piece.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 15:13:32 -07:00
Hongming Wang
e6be3c0df0 test(mcp): pin prompt-injection defense in _CHANNEL_INSTRUCTIONS
Adds the missing symmetric pin against the threat-model sentence —
the existing tests pin reply-tool names (send_message_to_user,
delegate_task, inbox_pop) and tag attributes (kind, peer_id,
activity_id) but left the "treat message body as untrusted user
content" line unpinned. A copy-edit that drops it would turn the
channel into an open prompt-injection vector against any workspace
running the MCP server.

Pins three signals: "untrusted" present, an explicit
"not execute"/"do not" clause, and the "approval" escape-hatch
sentence — two of three would let a partial copy-edit slip
through.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 14:24:05 -07:00
Hongming Wang
2588ab27d5 feat(mcp): add channel instructions field — second gate for push UX
PR #2461 added the experimental.claude/channel capability declaration
on the assumption that was the missing gate for Claude Code surfacing
notifications/claude/channel as inline <channel> interrupts. Research
against code.claude.com/docs/en/channels-reference.md confirms the
capability IS one gate — but there's a SECOND required field we still
don't ship: `instructions` on the initialize result.

The docs are explicit: instructions is what tells the agent what the
<channel> tag attributes mean and which tool to call to reply. Without
it the channel registers but the agent receives the tag with no
context and has no idea how to handle it. The official telegram
plugin ships both (server.ts:370-396) — capability AND instructions.
We were shipping one of two.

This adds the instructions string. It documents:
- kind/peer_id/activity_id meta attributes
- canvas_user → send_message_to_user reply path
- peer_agent → delegate_task reply path
- inbox_pop ack to prevent duplicate-poll re-delivery
- threat model: treat message bodies as untrusted user content

Tests: 4 new pins. instructions present + non-empty, instructions
names each reply tool, instructions documents each tag attribute.
Failure messages name the symptom so a copy-edit can't silently
break the channel.

Live verification still pending after wheel ships — same plan as
the gap is in --dangerously-load-development-channels (host-side
flag, outside our control during the channels research preview).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 14:24:05 -07:00
Hongming Wang
63ef3b128c docs(mcp): correct server.ts reference + flag verification gap on experimental.claude/channel
Follow-up to commit 0a87dec5 (PR #2461, merged before live verification).

Two corrections to the docstring on `_build_initialize_result()`:

1. The original "mirrors molecule-mcp-claude-channel server.ts:374"
   claim is wrong on two axes. Line 374 is unrelated poll-init code
   (a comment inside `registerAsPoll`). The actual capability site
   is server.ts:475, where the bun bridge declares only
   `{ capabilities: { tools: {} } }` — *no* `experimental.claude/channel`.
   The bun bridge is reported to deliver `notifications/claude/channel`
   successfully in Claude Code despite this, which is direct counter-
   evidence that adding the capability was the bug fix.

2. The `@modelcontextprotocol/sdk` server's `assertNotificationCapability`
   does not include `notifications/claude/channel` in any of its switch
   cases, meaning custom (non-spec) notification methods are sent
   regardless of declared capabilities. Server-side, the declaration
   is almost certainly a no-op.

This commit doesn't remove the capability — additive, not destructive,
and the new tests pin its presence — but downgrades the docstring's
certainty so the next person debugging "channel notification didn't
fire" doesn't trust a stale claim and pursues the more likely root
causes:

  - writer.drain() swallowing exceptions on a closed pipe
  - inbox-thread → asyncio.run_coroutine_threadsafe race during init
  - MCP transport not yet attached when the first inbox event fires

Live verification per #2444 §2 (fresh Claude Code session on this wheel
with a peer A2A message, observe whether the interrupt fires) remains
the open hard-gate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 14:01:57 -07:00
Hongming Wang
0a87dec50e feat(mcp): declare experimental.claude/channel capability for push UX
Without this capability declaration in the initialize handshake,
Claude Code's MCP client receives our notifications/claude/channel
emissions but silently drops them — they never become inline
<channel> tags in the conversation. The push-UX bridge added in
PR #2433 ships, fires, and is invisible.

This was anticipated as a failure mode in #2444 §2 ("Notification
arrives but Claude Code doesn't surface it — host doesn't recognize
the method"), and confirmed live in this session: a canvas chat
"hi" landed in the inbox queue (inbox_peek returned it) but never
woke the agent until inbox_peek was called by hand.

The contract matches molecule-mcp-claude-channel/server.ts:374
where the bun bridge declares the same experimental flag.

Refactor: extracted _build_initialize_result() so the handshake
shape is unit-testable. Pure function, no behavioral change beyond
adding the experimental capability to the result.

Tests: 3 new pins on the initialize result (capability presence,
tools-still-there, protocolVersion stable). Closes the live-
verification gap §2 of #2444.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 13:45:06 -07:00
Hongming Wang
b8fdbd9fab fix(runtime): register configs_dir in TOP_LEVEL_MODULES + drop alias
Wheel-build smoke gate detected `configs_dir` missing from
scripts/build_runtime_package.py:TOP_LEVEL_MODULES. Without it the
build would ship `import configs_dir` un-rewritten and every
external-runtime install would die on `ModuleNotFoundError` at first
import.

Two callers used `import configs_dir as _configs_dir` to belt-and-
suspenders against an imagined name collision, but the rewriter
rejects `import X as Y` because the rewrite would produce
`import molecule_runtime.X as X as Y` (invalid syntax). No actual
collision exists (only docstring/comment references). Switched to
plain `import configs_dir`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 13:13:57 -07:00
Hongming Wang
c636022d2f fix(runtime): auto-fallback CONFIGS_DIR for non-container hosts (closes #2458)
The runtime persists per-workspace state (`.auth_token`,
`.platform_inbound_secret`, `.mcp_inbox_cursor`) under `/configs` —
the workspace-EC2 mount path. Inside a container that's writable,
agent-owned. Outside a container, `/configs` either doesn't exist or
isn't writable by an unprivileged user.

The default broke the external-runtime path (`pip install
molecule-ai-workspace-runtime` + `molecule-mcp` on a Mac/Linux
laptop). First heartbeat tries to persist `.platform_inbound_secret`
and crashes:

    [Errno 30] Read-only file system: '/configs'

The heartbeat thread logs and dies. Workspace flips offline within
a minute. Operator sees no actionable error.

Adds workspace/configs_dir.py — single resolution point with a tiered
fallback:

  1. CONFIGS_DIR env var, if set — explicit operator override
     (preserves existing tests + custom deployments verbatim).
  2. /configs — if it exists AND is writable. In-container default;
     unchanged behavior for every prod workspace.
  3. ~/.molecule-workspace — created with mode 0700 so per-file 0600
     perms aren't undermined by a world-readable parent.

Migrates the four readers (platform_auth, platform_inbound_auth,
mcp_cli, inbox) to call configs_dir.resolve() instead of
inlining `Path(os.environ.get("CONFIGS_DIR", "/configs"))`.

Existing tests that assert the old `/configs`-as-default contract
updated to assert the new contract: when CONFIGS_DIR is unset, path
resolves to a writable location — `/configs` if present, fallback
otherwise. Tests skip the fallback branch on hosts that DO have a
writable `/configs` (CI containers).

Verified the original repro is fixed: with no CONFIGS_DIR set on
macOS, configs_dir.resolve() returns ~/.molecule-workspace, the dir
exists, and writes succeed.

Test suite: 1454 passed, 3 skipped, 2 xfailed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 13:07:55 -07:00
Hongming Wang
2e8892ebc4 fix(workspace): surface errno + path on chat-upload mkdir failure
Production incident on hongming.moleculesai.app 2026-05-01T18:30Z —
fresh-tenant signup chat upload returned 500 with the body
{"error":"failed to prepare uploads dir"}. Diagnosis required SSM
access to the workspace stderr to recover errno + actual path.

The root-cause fix lives in claude-code template entrypoint
(molecule-ai-workspace-template-claude-code#23 — pre-create the
.molecule subtree as root before gosu drops to agent). This change
is the diagnostic improvement: when mkdir fails for any reason in
the future (EACCES, ENOSPC, EROFS, etc.), the response carries
the errno + offending path so the operator inspecting browser
devtools sees the real cause without needing SSM.

Backwards compatible — top-level "error" key is unchanged so
existing canvas / external alert rules continue to match. New
fields are additive: path, errno, detail.

Test pins the diagnostic shape so a future struct refactor can't
silently drop these fields.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 11:47:53 -07:00
Hongming Wang
c06c4c0f56
Merge pull request #2450 from Molecule-AI/feat/observability-config-schema
feat(config): observability block schema (#119 PR-1 of 4)
2026-05-01 05:20:11 +00:00
Hongming Wang
645c1862c4 feat(a2a-client): surface 410 Gone as 'removed' error so callers can re-onboard (#2429)
Follow-up A to PR #2449 — that PR taught the platform to return 410
Gone for status='removed' workspaces; this PR teaches get_workspace_info
to consume that signal.

Before: every non-200 collapsed into {"error": "not found"}, which
made the 2026-04-30 incident impossible to diagnose — the operator
KNEW the workspace_id existed (they'd just registered it), but the
runtime kept reporting "not found" for a deleted-but-not-purged row.

After: 410 produces a distinct {"error": "removed", "id", "removed_at",
"hint"} dict so callers (heartbeat-loop, channel bridge, dashboard
tools) can surface "your workspace was deleted, re-onboard" instead
of "not found". Falls back to a default hint if the platform body
isn't parseable so the actionable signal doesn't depend on body
shape parity.

Two new tests:
  - TestGetWorkspaceInfo.test_410_returns_removed_with_hint
  - TestGetWorkspaceInfo.test_410_with_unparseable_body_falls_back_to_default_hint

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 22:08:08 -07:00
Hongming Wang
59902bce83 feat(config): add observability block schema (#119 PR-1 of 4)
Hermes-style declarative block grouping cadence + verbosity knobs into
one place. Schema-only in this PR — wiring into heartbeat.py and main.py
lands in PR-3 of the #119 stack.

Two fields with live consumers waiting:
- heartbeat_interval_seconds (default 30, clamped to [5, 300])
  → heartbeat.py:134 currently has hard-coded HEARTBEAT_INTERVAL = 30
- log_level (default "INFO", uppercased at parse)
  → main.py:465 currently has hard-coded log_level="info"

Clamp band [5, 300] is intentional: sub-5s flooded the platform during
IR-2026-03-11; >5min lets crashed workspaces look healthy long enough
to mask failure. Coerce at parse so adapters and heartbeat.py can read
the value without re-validating.

Tests pin defaults, explicit YAML override, partial override, and
parametrized clamp behavior (10 cases including garbage strings + None).

Part of: task #119 (adopt hermes-style architecture)
Stack:  PR-1 schema → PR-2 event_log → PR-3 wire consumers → PR-4 skill compat
2026-04-30 21:58:45 -07:00
Hongming Wang
661eec2659 chore(smoke-mode): harden module-load + drop dead except clause
Two follow-ups from the #2275 Phase 1 self-review:

1. `_SMOKE_TIMEOUT_SECS = float(os.environ.get(...))` was evaluated at
   module load. main.py imports smoke_mode unconditionally — before
   the is_smoke_mode() check — so a malformed
   MOLECULE_SMOKE_TIMEOUT_SECS env value would SystemExit every
   workspace boot, not just smoke runs. Wrapped in try/except with a
   5.0 fallback. Probability of a typo'd env var hitting production
   is low (it's a CI-only knob), but the footgun is removed entirely.
   Regression test reloads the module under a malformed env value.

2. `_real_a2a_sdk_available()` caught (ImportError, AttributeError).
   `from X import Y` raises ImportError when Y is missing on X — never
   AttributeError. Dropped the unreachable branch.

No behavior change for the happy path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 21:31:08 -07:00
Hongming Wang
aacaba024c feat(wheel-smoke): exercise executor.execute() to catch lazy imports (#2275)
The existing wheel-publish smoke (`wheel_smoke.py`) only IMPORTS
`molecule_runtime.main` at module scope. Lazy imports buried inside
`async def execute(...)` bodies (e.g. `from a2a.types import FilePart`)
NEVER evaluate at static-import time — they crash at first message
delivery in production.

The 2026-04-2x v0→v1 a2a-sdk migration shipped 5 such regressions in
templates that all looked fine at module-load smoke. This change adds
`smoke_mode.py` plus a `MOLECULE_SMOKE_MODE=1` short-circuit in
`main.py`: after `adapter.create_executor(...)`, the boot path invokes
`executor.execute(stub_ctx, stub_queue)` once with a 5s timeout
(`MOLECULE_SMOKE_TIMEOUT_SECS`). Healthy import tree → execution
proceeds far enough to hit a network boundary and times out (exit 0).
Broken lazy import → `ImportError` / `ModuleNotFoundError` from inside
the executor body (exit 1). Other downstream errors (auth, validation)
pass — those are caught by adapter-level tests, not this gate.

Stub `(RequestContext, EventQueue)` is built from the real a2a-sdk so
SendMessageRequest/RequestContext constructor changes also surface as
import-tree failures (the regression class also includes "SDK
refactored mid-publish"). The stub-build itself is wrapped — if it
raises, that's a smoke fail too.

Phase 2 (separate PR, molecule-ci) wires this into
publish-template-image.yml so the publish gate runs the boot smoke
against every template image before pushing the tag.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 21:21:18 -07:00
Hongming Wang
0b2ea0a50f
Merge pull request #2441 from Molecule-AI/feat/explicit-provider-field
feat(config): add explicit `provider:` field alongside `model:` (PR-1 of stack)
2026-05-01 03:54:27 +00:00
Hongming Wang
067ad83ce5 feat(config): add explicit provider: field alongside model:
Adds a top-level `provider` slug to WorkspaceConfig and RuntimeConfig so
adapters can route to a specific gateway without re-implementing
slug-prefix parsing across hermes / claude-code / codex.

Resolution chain in load_config (mirrors how `model` resolves):

  1. ``LLM_PROVIDER`` env var — what canvas Save+Restart sets so the
     operator's Provider dropdown choice survives a CP-driven restart
     (the regenerated /configs/config.yaml drops most user fields).
  2. Explicit YAML ``provider:`` — operator pinned it in the file.
  3. Derive from the model slug prefix for backward compat:
       ``anthropic:claude-opus-4-7`` → ``anthropic``
       ``minimax/abab7-chat-preview`` → ``minimax``
       bare model names → ``""`` (let the adapter decide).

`runtime_config.provider` falls back to the top-level resolved
provider, the same shape PR #2438 added for `runtime_config.model`.

Why a separate field at all (we already parse the slug):
  - Custom model aliases without a recognizable prefix need an
    explicit signal — the canvas Provider dropdown writes it.
  - Adapters were each rolling their own slug-parse (hermes's
    derive-provider.sh, claude-code's adapter-default branch, etc.);
    one resolution point in load_config kills that drift class.
  - Canvas needs a stable storage field that doesn't get clobbered
    every time the user picks a new model.

Backward-compatible: when `provider:` is absent, slug derivation
keeps every existing config.yaml working without a migration.

PR-1 of a multi-PR stack (Option B from RFC discussion). Subsequent
PRs plumb the field through workspace-server env, CP user-data,
adapters (hermes prefers explicit over derive-provider.sh), and
canvas Provider dropdown UI.

Tests cover all four resolution paths + runtime_config inheritance:
  - test_provider_default_empty_when_bare_model
  - test_provider_derived_from_colon_slug
  - test_provider_derived_from_slash_slug
  - test_provider_yaml_explicit_wins_over_derived
  - test_provider_env_override_beats_yaml_and_derived
  - test_runtime_config_provider_yaml_wins_over_top_level
  - test_provider_default_from_default_model

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 20:47:09 -07:00
Hongming Wang
6e92fe0a08 chore: rewriter unit tests + drop misleading noqa on import inbox
Two small follow-ups to the PR #2433#2436#2439 incident chain.

1) `import inbox  # noqa: F401` in workspace/a2a_mcp_server.py was
   misleading — `inbox` IS used (at the bridge wiring inside main()).
   F401 means "imported but unused", which would mask a real future
   F401 if the usage is removed. Drop the noqa, keep the explanatory
   block comment about the rewriter's `import X` → `import mr.X as X`
   expansion (and the `import X as Y` → `import mr.X as X as Y` trap
   the comment exists to prevent re-introducing).

2) scripts/test_build_runtime_package.py — 17 unit tests covering
   `rewrite_imports()` and `build_import_rewriter()` in
   scripts/build_runtime_package.py. Until now the function had zero
   coverage despite the entire wheel build depending on it. Tests
   pin: bare-import aliasing, dotted-import preservation, indented
   imports, from-imports (simple + dotted + multi-symbol + block),
   the `import X as Y` rejection added in PR #2436 (with comment-
   stripping + indented + comma-not-alias edge cases), allowlist
   anchoring (`a2a` ≠ `a2a_tools`), and end-to-end reproduction
   of the PR #2433 failing pattern + the #2436 fix pattern.

3) Wire scripts/test_*.py into CI by adding a second discover pass
   to test-ops-scripts.yml. Top-level scripts/ tests live alongside
   their target file (parallels the scripts/ops/ test layout); the
   existing scripts/ops/ pass keeps running because scripts/ops/
   has no __init__.py so a single discover from scripts/ root
   doesn't recurse. Two passes is simpler than retrofitting
   namespace packages. Path filter widened from `scripts/ops/**`
   to `scripts/**` so PRs touching the build script trigger the
   new tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 20:45:32 -07:00
Hongming Wang
5ad41f63ce
Merge pull request #2438 from Molecule-AI/fix/runtime-config-model-fallback-clean
fix(config): runtime_config.model falls back to top-level model
2026-05-01 03:31:20 +00:00
Hongming Wang
0070d0bd59 fix(config): runtime_config.model falls back to top-level model
External feedback (2026-04-30): "Provisioner doesn't read model from
config.yaml and doesn't set MODEL env var. Without MODEL, the adapter
defaults to sonnet and bypasses the mimo routing." Confirmed accurate
for SaaS workspaces.

Trace: claude-code-default/adapter.py reads `runtime_config.model or
"sonnet"` (and hermes reads HERMES_DEFAULT_MODEL via install.sh, which
IS plumbed). For claude-code there's nothing — workspace/config.py
loaded `runtime_config.model` only from YAML, ignoring MODEL_PROVIDER
env. The CP user-data script regenerates /configs/config.yaml at every
boot with only `name`, `runtime`, `a2a` keys (intentionally minimal so
it doesn't carry stale state) — so any user-set runtime_config.model
is wiped on every restart, and the adapter falls back to "sonnet" even
when the user picked Opus in the canvas Config tab.

Fix: when YAML omits runtime_config.model, fall back to the top-level
resolved `model`, which already honors MODEL_PROVIDER env override.
One-line in workspace/config.py. Now MODEL_PROVIDER → top-level model
→ runtime_config.model → adapter sees the user's selection. Sticky
across CP-driven restarts; the canvas Save+Restart loop works as
intended for every runtime, not just hermes.

Tests:
  test_runtime_config_model_falls_back_to_top_level — top-level set, runtime_config empty → fallback wins
  test_runtime_config_model_yaml_wins_over_top_level — YAML explicit → fallback skipped (precedence)
  test_runtime_config_model_picks_up_env_via_top_level — full canvas Save+Restart simulation: env → top-level → runtime_config.model

Negative-control verified: removing the `or model` flips both fallback
tests red with the expected "" vs expected-model mismatch; restoring
flips them green. The yaml-wins test passes either way (correctly,
because precedence is preserved).

Replaces closed PR #2435 — that PR's commit was on a contaminated
branch and accidentally captured unrelated WIP changes (build script
+ a2a_mcp_server refactor) instead of this fix. Self-review caught it
and closed the PR. This branch is clean off main + diff verified
before push.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 20:28:50 -07:00
Hongming Wang
0acdf3bb56 fix(wheel): import inbox without alias to dodge rewriter collision
PR #2433 (notifications/claude/channel) shipped 'import inbox as
_inbox_module' inside a2a_mcp_server.py:main(). The build script's
import rewriter expands plain 'import inbox' to
'import molecule_runtime.inbox as inbox', so the original source
became 'import molecule_runtime.inbox as inbox as _inbox_module',
which is invalid Python.

Caught at the publish-runtime + PR-built-wheel-smoke gate (the
SyntaxError trace is in run 25200422679). The wheel didn't ship to
PyPI because publish-runtime's smoke-import step refused to install
it, but staging is currently sitting on a broken-build commit until
this fix-forward lands.

Changes:
- a2a_mcp_server.py: lift `import inbox` to top of file (rewriter
  produces clean `import molecule_runtime.inbox as inbox`), call
  inbox.set_notification_callback directly in main()
- build_runtime_package.py: rewrite_imports() now raises ValueError
  when it sees 'import X as Y' for any X in the workspace allowlist,
  instead of silently producing a syntax-error wheel. Operator gets
  a clear actionable error at build time pointing at the offending
  line + suggested rewrites ('from X import …' or plain 'import X').

The build-time gate (this PR's rewriter check) catches the regression
class earlier than the smoke-time gate (PR #2433's failure). Adding
'PR-built wheel + import smoke' to staging branch protection's
required checks is filed separately so this class doesn't merge again.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 20:21:54 -07:00
Hongming Wang
0a3ec53f34 feat(mcp): notifications/claude/channel for push-feel inbox UX
Adds a notification seam to the universal molecule-mcp wheel so push-
notification-capable MCP hosts (Claude Code today; any compliant
client tomorrow) get inbound A2A messages as conversation interrupts
instead of having to poll wait_for_message / inbox_peek.

Wire-up:
- inbox.py: module-level _NOTIFICATION_CALLBACK + set_notification_callback()
  Fires from InboxState.record() AFTER lock release, with same dict
  shape inbox_peek returns. Best-effort — a raising callback never
  prevents the message from landing in the queue.
- a2a_mcp_server.py: _build_channel_notification() pure helper +
  bridge wiring in main() that schedules notifications via
  asyncio.run_coroutine_threadsafe (poller is a daemon thread, MCP
  loop is asyncio).
- Method name 'notifications/claude/channel' matches the contract
  documented in molecule-mcp-claude-channel/server.ts:509.
- wheel_smoke.py: pin set_notification_callback as a published name,
  same regression class as the 0.1.16 main_sync incident.

Pollers (wait_for_message / inbox_peek) keep working unchanged for
runtimes without notification support.

Tests: 6 new in test_inbox.py (callback fires once on record, dedupe
short-circuits before fire, raising cb doesn't break inbox, set/clear
semantics), 5 new in test_a2a_mcp_server.py (method name pin, content
mapping, meta routing, no-id JSON-RPC notification spec, missing-
field tolerance). All 59 combined tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 20:10:01 -07:00
Hongming Wang
c4bb803329 feat(mcp_cli): agent_card from env vars (capability discovery)
External molecule-mcp runtimes register with hardcoded agent_card.name
= molecule-mcp-{id[:8]} and skills=[]. That made every external
workspace look identical on the canvas and gave peer agents calling
list_peers no signal beyond name — they had to guess capabilities.

Three new env vars let the operator declare identity + capabilities
without code changes:

  * MOLECULE_AGENT_NAME — display name on canvas (default unchanged)
  * MOLECULE_AGENT_DESCRIPTION — one-line description (default empty)
  * MOLECULE_AGENT_SKILLS — comma-separated skill names

Comma-separated skills get expanded to {"name": "..."} objects — the
minimum shape that satisfies both shared_runtime.summarize_peers
(reads s["name"]) AND canvas SkillsTab.tsx (id falls back to name).

Strict-superset behaviour: when no env vars are set, agent_card
matches the previous hardcoded value exactly. No regression for
operators who haven't migrated.

Why this matters end-to-end:
  * Canvas Skills tab now shows each declared skill as a chip
  * Peer agents calling list_peers see {name, skills} per peer and
    can route delegations to the right specialist
  * Same applies to the canvas Details tab + workspace card hover

Tests cover: defaults match prior behaviour; name override; CSV →
skill objects; whitespace stripping + empty entries dropped;
description omitted when unset (keeps wire payload minimal);
whitespace-only name falls back to default; end-to-end through
_platform_register's payload.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 18:57:39 -07:00
Hongming Wang
210f6e066a
Merge pull request #2424 from Molecule-AI/fix/in-container-heartbeat-persists-inbound-secret
fix(workspace): in-container heartbeat persists platform_inbound_secret
2026-05-01 01:36:52 +00:00
Hongming Wang
d887ce8e96 fix(mcp_cli): escalate consecutive heartbeat 401s with re-onboard guidance
The universal molecule-mcp wheel runs in a daemon thread, posting
/registry/heartbeat every 20s. When the workspace gets deleted
server-side (DELETE /workspaces/:id), the platform revokes all tokens
for that workspace. Previous behaviour: heartbeat would 401 forever,
log at WARNING per tick, no actionable signal anywhere.

Failure mode hit on hongmingwang tenant 2026-04-30: workspace
a1771dba was deleted at some prior time, the channel-bridge .env
still pointed at it, MCP tools 401-ed silently with the operator
having no idea why. The register-time path at mcp_cli.py:104-111
already does loud + actionable for 401 (sys.exit(3) with regenerate-
from-canvas-Tokens text) — extend the same pattern to the heartbeat.

Behaviour:
  * count < 3: WARNING per tick (could be transient blip)
  * count == 3: ERROR with re-onboard instructions, names the dead
    workspace_id, points at the canvas Tokens tab
  * count > 3 and every 20 ticks (~7 min): re-log ERROR so a session
    that started after the first ERROR still catches it

5xx and other non-auth HTTP errors do NOT increment the auth-failure
counter — that would mislead the operator (e.g. a server blip would
trigger "token revoked" when the token is fine).

Tests cover: single 401 stays at WARNING; 3 consecutive 401s escalate
to ERROR with the right keywords; 403 treated identically; recovery
via 200 resets the counter; 5xx never triggers the auth path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 18:26:35 -07:00
Hongming Wang
98845c8f42 fix(workspace): in-container heartbeat persists platform_inbound_secret
Follow-up to PR #2421. The standalone wrapper (mcp_cli.py) got
heartbeat-time secret persistence in #2421, but the in-container
heartbeat (workspace/heartbeat.py) was missed — and that's the path
every workspace EC2 actually runs. Result: hongmingwang Claude Code
agent stayed 401-forever on chat upload after this morning's deploy
because the workspace's runtime never picked up the lazy-healed
secret.

The in-container _loop now captures the heartbeat response and calls
the same _persist_inbound_secret_from_heartbeat helper used by the
standalone path, on both the first POST and the 401-retry POST.
Defensive on every error (non-JSON, non-dict, empty, save failure) —
liveness contract trumps secret persistence.

Tests pin: happy path, absent secret, empty string, non-JSON body,
non-dict body, save_inbound_secret OSError, end-to-end loop.
2026-04-30 18:18:10 -07:00
Hongming Wang
c733454a56
Merge pull request #2421 from Molecule-AI/fix/heartbeat-delivers-inbound-secret
fix(workspace): deliver platform_inbound_secret on every heartbeat
2026-05-01 00:54:00 +00:00
Hongming Wang
993f8c494e refactor(workspace-runtime): send_a2a_message takes peer_id, validates UUID
Two cleanups stacked on PR #2418:

1. Refactor `send_a2a_message(target_url, msg)` →
   `send_a2a_message(peer_id, msg)`. After #2418 every caller passes
   `${PLATFORM_URL}/workspaces/{peer_id}/a2a` — the function's
   parameter pretended to accept arbitrary URLs but in practice only
   one shape is meaningful. Owning URL construction inside the
   function makes the contract honest and centralises the peer-id
   validation introduced below.

2. Add `_validate_peer_id` UUID-shape check at the trust boundary.
   `discover_peer` and `send_a2a_message` are the entry points where
   agent-controlled strings flow into URL paths; rejecting non-UUID
   input at this layer eliminates the URL-interpolation class of
   bug (`workspace_id="../admin"` etc.) regardless of how the rest
   of the codebase interpolates ids elsewhere. Auth was already
   gating malicious access — this is consistency + clear failure
   over silent platform 4xx.

In-container tests cover positive UUIDs, malformed input
(``"ws-abc"``, ``"../admin"``, empty), and the contract that
``tool_delegate_task`` hands the peer_id to ``send_a2a_message``
without building URLs itself.

Live-verified: external delegation 8dad3e29 → 97ac32e9 returned
"refactor verified" from Claude Code Agent through the refactored
code; ``_validate_peer_id`` rejects ``"ws-abc"`` and ``"../admin"``
and accepts canonical UUIDs.

Stacked on PR #2418 (proxy-routing fix). Will rebase onto staging
once #2418 merges.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 17:43:01 -07:00
Hongming Wang
a5c5139e3a fix(workspace): deliver platform_inbound_secret on every heartbeat
Heartbeat now echoes the workspace's platform_inbound_secret on every
beat (mirroring /registry/register), and the molecule-mcp client
persists it to /configs/.platform_inbound_secret on receipt.

Symptom (2026-04-30, hongmingwang tenant): chat upload returned 503
"workspace will pick it up on its next heartbeat" and then 401 on
retry — permanent until workspace restart. The 503 message was a lie:
heartbeat used to discard the platform_inbound_secret entirely; only
register delivered it, and register fires once at startup.

Server (Go):
  - Heartbeat handler reuses readOrLazyHealInboundSecret (the same
    helper chat_files + register use), so heartbeat-time recovery
    covers the rotate / mid-life NULL-column case the existing
    register-time heal can't reach.
  - Failure is non-fatal: liveness contract trumps secret delivery,
    chat_files retries lazy-heal on its own next request.

Client (Python):
  - _persist_inbound_secret_from_heartbeat parses the heartbeat 200
    response and persists via platform_inbound_auth.save_inbound_secret.
  - All exceptions swallowed — heartbeat liveness > secret persistence;
    next tick (≤20s) retries.

Tests:
  - Server: pin secret-present, lazy-heal-mint-on-NULL, and heal-
    failure-omits-field branches.
  - Client: pin persist-on-200, skip-on-empty, skip-on-non-dict-body,
    skip-on-401, swallow-save-OSError.
2026-04-30 17:36:33 -07:00
Hongming Wang
aefb44aff2 fix(workspace-runtime): route delegate_task through platform A2A proxy
tool_delegate_task was POSTing directly to peer["url"], which is
the Docker-internal hostname (e.g. http://ws-X-Y:8000) for in-
container peers. External callers — the standalone molecule-mcp
wrapper running on an operator's laptop — get [Errno 8] nodename
nor servname every single delegation, breaking the universal-MCP
path's last "ride the same code as in-container" claim.

The platform's /workspaces/:peer-id/a2a proxy endpoint already
handles internal forwarding for in-container peers AND is the only
path external runtimes can use. Unify on it: in-container callers
pay one extra HTTP hop on the same Docker bridge (microseconds);
external callers get a working delegation path for the first time.

discover_peer is still called for access-control + online-status
detection — only the routing target changes. Verified live on
2026-04-30 against workspace 8dad3e29 (external mac runtime) →
97ac32e9 (Claude Code Agent in-container): direct POST returned
ConnectError, proxy POST returned "acknowledged from claude code
agent" as requested.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 17:13:50 -07:00
Hongming Wang
d061642cfc test(inbox): bind side-effecting pop() before assert
CodeQL flagged the bare `assert state.pop(...) is None` — under
`python -O` asserts are stripped, which would skip the call entirely
and the test would silently pass without exercising the code. Bind
the result first so the call always runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 16:39:45 -07:00
Hongming Wang
b47d4ceb00 feat(workspace-runtime): add inbox polling for standalone molecule-mcp path
The universal MCP server (a2a_mcp_server.py) was outbound-only — agents
in standalone runtimes (Claude Code, hermes, codex, etc.) could
delegate, list peers, and write memories, but never observed the
canvas-user or peer-agent messages addressed to them. This blocked
"constantly responding" loops without forcing operators back onto a
runtime-specific channel plugin.

This PR closes the inbound gap with a poller-fed in-memory queue and
three new MCP tools:

  - wait_for_message(timeout_secs?) — block until next message arrives
  - inbox_peek(limit?)              — list pending messages (non-destructive)
  - inbox_pop(activity_id)          — drop a handled message

A daemon thread polls /workspaces/:id/activity?type=a2a_receive every
5s, fills the queue from the cursor (since_id), and persists the cursor
to ${CONFIGS_DIR}/.mcp_inbox_cursor so a restart doesn't replay backlog.
On 410 (cursor pruned) we fall back to since_secs=600 for a bounded
recovery window. Activity-row → InboxMessage extraction mirrors the
molecule-mcp-claude-channel plugin's extractText (envelope shapes #1-3
+ summary fallback).

mcp_cli.main starts the poller alongside the existing register +
heartbeat threads. In-container runtimes (which have push delivery via
canvas WebSocket) skip activation, so inbox tools return an
informational "(inbox not enabled)" message instead of double-delivery.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 16:32:48 -07:00
Hongming Wang
b54ceb799f fix: address 5-axis review findings on PR #2413
Critical:
- ExternalConnectModal.tsx: filledUniversalMcp substitution searched
  for WORKSPACE_AUTH_TOKEN but the snippet's placeholder is now
  MOLECULE_WORKSPACE_TOKEN (changed in the previous polish commit
  876c0bfc). Operators copy-pasting the MCP tab would have gotten a
  literal "<paste from create response>" instead of the token. Fix
  the substitution to match the new placeholder name.

Important:
- mcp_cli._platform_register: 401/403 from initial register now hard-
  exits with code 3 + an actionable stderr message pointing the
  operator at the canvas Tokens tab. Pre-fix: warning log + continue,
  which made a bad-token startup silently fail (heartbeat 401's
  forever, every tool call also 401's, no clear surfacing in the
  operator's MCP client). 500/503 still log + continue (transient
  platform blips shouldn't abort the MCP loop).
- a2a_mcp_server.cli_main docstring: removed stale claim that this is
  the wheel's console-script entry-point target. The actual target is
  mcp_cli.main since 2026-04-30. Wheel-smoke pins both names so the
  functionality was correct, but the doc was lying.

Test coverage: 3 new mcp_cli tests:
  - register 401 exits code=3 + stderr mentions canvas Tokens tab
  - register 403 (C18 hijack rejection) takes same path
  - register 500/503 does NOT exit — only auth errors hard-fail

Findings deferred to follow-up (acceptable per review rubric):
  - Code dedup across mcp_cli / heartbeat.py / molecule_agent SDK
  - Pooled httpx.Client for connection reuse
  - Heartbeat exponential backoff
  - Token-resolution ordering parity (env-first vs file-first)
    between mcp_cli.main and platform_auth.get_token

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 16:06:59 -07:00
Hongming Wang
427300f3a4 feat: make molecule-mcp standalone (built-in register + heartbeat) + recover awaiting_agent on heartbeat
Two paired fixes that together let an external operator run a single
process (molecule-mcp) and see their workspace come up online in the
canvas — the bug surfaced live when status stuck at "awaiting_agent /
OFFLINE" despite an active MCP server.

Platform side (workspace-server/internal/handlers/registry.go):
  Heartbeat handler already auto-recovers offline → online and
  provisioning → online, but NOT awaiting_agent → online. Healthsweep
  flips stale-heartbeat external workspaces TO awaiting_agent, and
  with no recovery path the workspace stays "OFFLINE — Restart" in the
  canvas forever. Add the symmetric branch: if currentStatus ==
  "awaiting_agent" and a heartbeat arrives, flip to online + broadcast
  WORKSPACE_ONLINE. Mirrors the existing offline/provisioning patterns
  exactly. Test: TestHeartbeatHandler_AwaitingAgentToOnline asserts
  the SQL UPDATE fires with the awaiting_agent guard clause.

Wheel side (workspace/mcp_cli.py):
  molecule-mcp was outbound-only — operators had to run a separate
  SDK process to register + heartbeat. Now mcp_cli.main():
    1. Calls /registry/register at startup (idempotent upsert flips
       status awaiting_agent → online via the existing register path).
    2. Spawns a daemon thread that POSTs /registry/heartbeat every
       20s. 20s is comfortably under the healthsweep stale window so
       a single missed beat doesn't cause status churn.
    3. Runs the MCP stdio loop in the foreground.

  Both calls set Origin: ${PLATFORM_URL} so the SaaS edge WAF accepts
  them. Threaded heartbeat (not asyncio) chosen because it doesn't
  need to share an event loop with the MCP stdio server — daemon=True
  cleanly dies when the operator's runtime exits.

  MOLECULE_MCP_DISABLE_HEARTBEAT=1 escape hatch lets in-container
  callers (which have heartbeat.py running already) reuse the entry
  point without double-heartbeating. Default is enabled.

End-to-end verification (live, against
hongmingwang.moleculesai.app, workspace 8dad3e29-...):
  pre-fix:  status=awaiting_agent → canvas shows OFFLINE forever
  post-fix: ran `molecule-mcp` for 5s standalone → canvas state:
            status=online runtime=external agent=molecule-mcp-8dad3e29

Test coverage: 7 new mcp_cli tests (register-at-startup, heartbeat-
thread-spawned, disable-env-skips-both, env-and-file token resolution,
register payload shape, heartbeat endpoint + headers); 1 new platform
test (awaiting_agent → online recovery). Full workspace + handlers
suites green: 1355 Python, full Go handlers passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 15:42:44 -07:00
Hongming Wang
74c5e0d7a8 fix(workspace-runtime): add Origin header so SaaS edge WAF accepts MCP tool calls
Discovered while smoke-testing the molecule-mcp external-runtime path
against a live tenant (hongmingwang.moleculesai.app). Every tool call
that hit /workspaces/* or /registry/*/peers returned 404 — but
/registry/register and /registry/heartbeat returned 200. Diagnosis:
the tenant's edge WAF requires a same-origin header. Without it,
unhandled paths get silently rewritten to the canvas Next.js app,
which has no /workspaces or /registry/:id/peers route and returns an
empty 404. The molecule-mcp-claude-channel plugin already sets this
header (server.ts:271-276); the workspace runtime never did because
in-container PLATFORM_URLs (Docker network) aren't behind the WAF.

Fix: extend platform_auth.auth_headers() to include
Origin: ${PLATFORM_URL} whenever PLATFORM_URL is set. Inside-container
behavior is unchanged (the WAF is path-irrelevant for the internal
hostnames). External-runtime calls now thread the WAF correctly.

Verification (live, against a freshly-registered external workspace):
  pre-fix:  get_workspace_info → "not found", list_peers → 404
  post-fix: get_workspace_info → full workspace JSON,
            list_peers → "Claude Code Agent (ID: 97ac32e9..., status: online)"

This is the kind of bug unit tests can never catch — caught only by
running the wheel against the real tenant. Memory:
feedback_always_run_e2e.md.

Test coverage: 4 new tests in test_platform_auth.py — Origin alone
when no token + Origin + Authorization both, no-PLATFORM_URL falls
through to original empty-dict behavior, env-token path with Origin.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 15:30:15 -07:00
Hongming Wang
169e284d57 feat(workspace-runtime): expose universal MCP server to runtime=external operators
Ship the baseline universal MCP path that any external runtime (Claude
Code, hermes, codex, anything that speaks MCP stdio) can use, before
optimizing per-runtime channels. Today the workspace MCP server only
spins up inside the container; external operators have no way to call
the 8 platform tools (delegate_task, list_peers, send_message_to_user,
commit_memory, etc.) from outside.

Three additive changes:

1. **`platform_auth.get_token()` env-var fallback** — adds
   `MOLECULE_WORKSPACE_TOKEN` as a fallback when no
   `${CONFIGS_DIR}/.auth_token` file exists. File-first preserves
   in-container behavior unchanged. External operators (no /configs
   volume) now have a way to supply the token without faking the
   filesystem layout.

2. **`molecule-mcp` console script** — adds a new entry point in the
   published `molecule-ai-workspace-runtime` PyPI wheel. Operators run
   `pip install molecule-ai-workspace-runtime`, set 3 env vars
   (WORKSPACE_ID, PLATFORM_URL, MOLECULE_WORKSPACE_TOKEN), and register
   the binary in their agent's MCP config. `mcp_cli.main` is a thin
   validator wrapper — it checks env BEFORE importing the heavy
   `a2a_mcp_server` module so a misconfigured first-run gets a friendly
   3-line error instead of a 20-line module-level RuntimeError
   traceback.

3. **Wheel smoke gate** — extends `scripts/wheel_smoke.py` to assert
   `cli_main` and `mcp_cli.main` are importable. Same regression class
   as the 0.1.16 main_sync incident: a silent rename or unrewritten
   import here would break every external operator on the next wheel
   publish (memory: feedback_runtime_publish_pipeline_gates.md).

Test coverage:
- `tests/test_platform_auth.py` — 8 new tests for the env-var fallback:
  file-priority, env-fallback, whitespace handling, cache, header
  construction, empty-env-as-unset.
- `tests/test_mcp_cli.py` — 8 new tests for the validator: each
  required var separately, file-or-env satisfies token requirement,
  whitespace-only env treated as missing, help mentions canvas Tokens
  tab.
- Full `workspace/tests/` suite green: 1346 passed, 1 skipped.
- Local end-to-end: built wheel, installed in venv, ran `molecule-mcp`
  with no env → friendly error; with env → MCP server starts.

Why now / why this shape: user redirect was "support the baseline
first so all runtimes can use, then optimize". A claude-only MCP
channel leaves hermes/codex/third-party operators broken on
runtime=external. This PR ships the runtime-agnostic baseline; per-
runtime polish (claude-channel push delivery, hermes-native
bindings) is a follow-up PR. PR #2412 fixed the partner bug where
canvas Restart silently revoked the operator's token — the two
together unblock the external-runtime story end-to-end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 15:20:19 -07:00
Hongming Wang
3b34dfefbc feat(workspace): surface peer-discovery failure reason instead of "may be isolated"
Closes #2397. Today, every empty-peer condition (true empty, 401/403, 404,
5xx, network) collapses to a single message: "No peers available (this
workspace may be isolated)". The user has no way to tell whether they need
to provision more workspaces (true isolation), restart the workspace
(auth), re-register (404), page on-call (5xx), or check network (timeout) —
five different operator actions, one ambiguous string.

Wire:
  - new helper get_peers_with_diagnostic() in a2a_client.py returns
    (peers, error_summary). error_summary is None on 200; a short
    actionable string on every other branch.
  - get_peers() now shims through it so non-tool callers (system-prompt
    formatters) keep the bare-list contract.
  - tool_list_peers() switches to the diagnostic helper and surfaces the
    actual reason. The "may be isolated" string is removed; true empty
    now reads "no peers in the platform registry."

Tests:
  - TestGetPeersWithDiagnostic: 200, 200-empty, 401, 403, 404, 5xx,
    network exception, 200-but-non-list-body, and the bare-list-shim
    regression guard.
  - TestToolListPeers: each diagnostic branch surfaces its reason +
    explicit assertion that "may be isolated" is gone.

Coverage 91.53% (floor 86%). 122 a2a tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 11:09:26 -07:00
Hongming Wang
899a2231d6 test(platform_auth): module-functions signature snapshot drift gate
Pin the 5 public functions adapters and the runtime hot-path import
through ``from platform_auth import``:

  - ``auth_headers`` — every outbound httpx call merges this in
  - ``self_source_headers`` — A2A peer + self-message header builder
  - ``get_token`` — main.py reads on boot to decide register-vs-resume
  - ``save_token`` — main.py persists the platform-issued token
  - ``refresh_cache`` — 401-retry path drops in-process cache (#1877)

A grep across workspace/ shows 14+ runtime modules import these:
main.py, heartbeat.py, a2a_client.py, a2a_tools.py, consolidation.py,
events.py, executor_helpers.py (3 sites), molecule_ai_status.py,
builtin_tools/memory.py (3 sites), builtin_tools/temporal_workflow.py
(2 sites). Renaming any of the five (e.g. ``auth_headers`` →
``bearer_headers``) makes every one of those imports raise ImportError
at workspace boot — the failure surface is deep in heartbeat init,
nowhere near the rename site.

Same drift class as the BaseAdapter signature snapshot (#2378, #2380),
skill_loader gate (#2381), runtime_wedge gate (#2383). Reuses the
``_signature_snapshot.py`` helpers shipped in #2381.

Defense-in-depth: ``test_snapshot_has_required_functions`` asserts
the five names are still present, so removing one even with a
synchronized snapshot edit forces an explicit edit here with a
justification.

``clear_cache`` is intentionally NOT in the snapshot — it's a
test-only helper. Production code MUST NOT depend on it.

Verified red on deliberate rename: ``auth_headers`` →
``bearer_headers`` produces a clean diff of the missing function in
the failure message. Restored before commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 08:41:42 -07:00
Hongming Wang
70176e6c8f test(runtime_wedge): module-functions signature snapshot drift gate
BaseAdapter docstring tells adapter authors:

> ``runtime_wedge.mark_wedged()`` / ``clear_wedge()`` — flip the
> workspace to ``degraded`` + auto-recover when your SDK hits a
> non-recoverable error class. Import directly from ``runtime_wedge``;
> the heartbeat forwards the state to the platform automatically.

That's a contract — adapter templates depend on the four module-level
functions (``is_wedged``, ``wedge_reason``, ``mark_wedged``,
``clear_wedge``) being importable by those exact names with those
exact signatures. Renaming any silently breaks every adapter that
calls them: the import resolves the module fine, the
``AttributeError`` only surfaces when the adapter actually hits its
first SDK error — long after the rename merges.

Same drift class as #2378 / #2380 / #2381 (BaseAdapter, skill_loader)
applied to the module-level function surface.

Changes:

  - tests/_signature_snapshot.py gains build_module_functions_record.
    Walks a module's public top-level functions, optionally filtered
    to a specific name list (used here — runtime_wedge has internal
    helpers like reset_for_test that intentionally aren't part of
    the contract). Skips re-exports via __module__ check so a
    `from foo import bar` doesn't pollute the snapshot.
  - tests/test_runtime_wedge_signature.py snapshots the four
    contract functions. Plus a defense-in-depth required-functions
    test that catches removal even when source + snapshot are
    updated together.

Verified: deliberately renaming `mark_wedged` → `mark_wedged_RENAMED`
trips the gate with full snapshot diff in the failure message.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 07:01:10 -07:00
Hongming Wang
e336688278 test: extract shared signature-snapshot helpers + skill_loader gate
Two changes in one PR (tightly coupled — the second wouldn't make
sense without the first):

1. Hoist the inspect-based snapshot helpers out of
   test_adapter_base_signature.py into tests/_signature_snapshot.py
   so future surfaces don't copy-paste introspection logic.

   - build_class_signature_record(cls): walks public methods,
     unwraps static/class/abstract methods, returns a stable
     {class, methods: [...]} dict.
   - build_dataclass_record(cls): walks dataclass fields via
     dataclasses.fields(), returns {name, frozen, fields: [...]}.
   - compare_against_snapshot(actual, path): writes-on-first-run +
     diff-on-drift, with both expected and actual JSON in failure
     message.

   test_adapter_base_signature.py is rewritten to use the helpers;
   the existing snapshot file is byte-identical (no behavior change).

2. New gate: tests/test_skill_loader_signature.py covers the
   public dataclasses exported from skill_loader/loader.py:

   - SkillMetadata: every adapter pattern-matches on .runtime for
     skill-compat filtering. Renaming this field would silently
     break per-adapter skill loading — the loader still returns
     objects, but adapters' `if "*" in skill.metadata.runtime`
     raises AttributeError at workspace boot.
   - LoadedSkill: returned in SetupResult.loaded_skills.

   Includes test_snapshot_has_required_skill_metadata_fields
   defense-in-depth: ensures the runtime / id / name / description
   fields stay even if both source and snapshot are updated together.

Verified: deliberately renaming SkillMetadata.runtime trips the
gate with full snapshot diff in the failure message.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 06:27:20 -07:00
Hongming Wang
12e39c7311 test(adapter_base): extend signature snapshot to public dataclasses (#2364 item 2 followup)
Follows up #2378. The BaseAdapter snapshot covers method signatures
but `adapter_base.py` also exports three public dataclasses that
form the call/return contract between the platform and every
adapter:

  - SetupResult — returned by adapter._common_setup()
  - AdapterConfig — passed into adapter setup hooks
  - RuntimeCapabilities — returned by adapter.capabilities();
    drives platform-side dispatch routing (#117)

Renaming a RuntimeCapabilities flag silently disables every
adapter's capability declaration (the platform fallback runs)
without an AttributeError to surface the breakage. That's exactly
the drift class the snapshot pattern is meant to catch.

Changes:

  - _build_dataclass_snapshot walks SetupResult, AdapterConfig,
    RuntimeCapabilities via dataclasses.fields(), capturing field
    name + type annotation + has_default per field, plus the
    @dataclass(frozen=...) flag.
  - _build_full_snapshot composes method + dataclass records into
    one stable JSON snapshot.
  - test_snapshot_has_required_dataclass_fields — defense-in-depth
    test parallel to test_snapshot_has_required_methods. Catches
    field removal even when both source AND snapshot are updated
    together. Required field set is intentionally short (the flags
    that drive platform dispatch + the adapter-level config knobs).

Verified: deliberately renaming `provides_native_heartbeat` →
`provides_native_heartbeat_RENAMED` trips
test_base_adapter_signature_matches_snapshot with a full diff in
the failure message.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 05:53:10 -07:00
Hongming Wang
8488a188c2 test(adapter_base): signature snapshot — drift gate for adapter public surface (#2364 item 2)
Every workspace template (langgraph, claude-code, hermes, etc.)
subclasses BaseAdapter. Renaming, removing, or re-typing a method
on the base class silently breaks templates: the override stops
being recognized as an override; the old method-name's caller
silently invokes the default no-op; the new method-name is
unimplemented in templates that haven't migrated.

Recent #87 universal-runtime + #1957 recordResource refactor both
renamed/added methods. Without a frozen snapshot, the next rename
ships quietly and surfaces only when a template's CI catches the
AttributeError days later — long after the merge window for an
easy revert.

This snapshot pins BaseAdapter's public method surface against a
checked-in JSON file. Same-shape pattern as PR #2363's A2A
protocol-compat replay gate, applied to a Python public-API
surface instead of JSON message shapes. Both close drift classes
by snapshotting the structural surface that consumers depend on.

Two tests:

  1. test_base_adapter_signature_matches_snapshot — full
     introspection diff against tests/snapshots/adapter_base_signature.json.
     Drift = test failure with both expected + actual JSON in
     the message so the reviewer sees what changed.
  2. test_snapshot_has_required_methods — defense-in-depth: even
     if both the source AND snapshot are updated together
     (intentional API removal), this catches removal of the
     short list of methods that EVERY template depends on (name,
     display_name, description, capabilities, memory_filename).
     Removing one of these requires explicit edit to the
     `required` set with a justification.

Verified the gate fires red on a deliberate rename
(memory_filename → memory_filename_RENAMED) — failure message
shows the full snapshot diff including parameter shapes and
return annotations.

Updating the snapshot is the explicit acknowledgment that a
template-affecting API change is intentional. Reviewer of the
introducing PR sees the snapshot diff and decides whether
template repos need coordinated updates.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 05:18:39 -07:00
Hongming Wang
4299475746 feat(prompt): Platform Capabilities preamble at top of system prompt
Closes #2332 item 1 (workspace awareness — agents don't surface
platform-native tools up front).

The dogfooding session surfaced that agents weren't using A2A
delegation, persistent memory, or send_message_to_user. The tools
were registered AND documented in the system prompt — but only in
sections #8 (Inter-Agent Communication) and #9 (Hierarchical Memory),
which agents read AFTER they've already started reasoning about a
plan from earlier sections.

This adds a tight inventory at section #1.5 (immediately after
Platform Instructions, before role-specific prompt files) — every
tool name + its short description in a bulleted block. Detailed
when_to_use docs in sections #8/#9 stay; this preamble is the
elevator pitch ("you have these"), the later sections are the
manual ("here's when and how").

Generated from `platform_tools.registry` ToolSpecs — every tool's
`name` + `short` flow through automatically, no manual sync. A new
`get_capabilities_preamble(mcp: bool)` helper in executor_helpers
mirrors the existing get_a2a_instructions / get_hma_instructions
pattern.

CLI-runtime agents (mcp=False) get an empty preamble — they see
_A2A_INSTRUCTIONS_CLI's hand-written subcommand vocabulary further
down, and the registry's MCP tool names would conflict.

Tests:
  - test_capabilities_preamble_appears_in_mcp_prompt: header present
  - test_capabilities_preamble_lists_every_registry_tool: every
    a2a + memory tool from registry shows up (drift catches at test
    time — adding a new tool to registry surfaces here automatically)
  - test_capabilities_preamble_precedes_prompt_files: ordering
    invariant (toolkit before role docs)
  - test_capabilities_preamble_skipped_for_cli_runtime: empty when
    mcp=False

All 40 prompt + platform_tools tests pass.
2026-04-29 21:31:13 -07:00
Hongming Wang
e955597a98 feat(chat_files): rewrite Download as HTTP-forward (RFC #2312, PR-D)
Mirrors PR-C's Upload migration: replaces the docker-cp tar-stream
extraction with a streaming HTTP GET to the workspace's own
/internal/file/read endpoint. Closes the SaaS gap for downloads —
without this PR, GET /workspaces/:id/chat/download still returns 503
on Railway-hosted SaaS even after A+B+C+F land.

Stacks: PR-A #2313 → PR-B #2314 → PR-C #2315 → PR-F #2319 → this PR.

Why a single broad /internal/file/read instead of /internal/chat/download:

  Today's chat_files.go::Download already accepts paths under any of the
  four allowed roots {/configs, /workspace, /home, /plugins} — it's not
  strictly chat. Future PRs (template export, etc.) will reuse this
  endpoint via the same forward pattern; reusing avoids three near-
  identical handlers (one per domain) with duplicated path-safety logic.

Path safety is duplicated on platform + workspace sides — defence in
depth via two parallel checks, not "trust the workspace."

Changes:
  * workspace/internal_file_read.py — Starlette handler. Validates path
    (must be absolute, under allowed roots, no traversal, canonicalises
    cleanly). lstat (not stat) so a symlink at the path doesn't redirect
    the read. Streams via FileResponse (no buffering). Mirrors Go's
    contentDispositionAttachment for Content-Disposition header.
  * workspace/main.py — registers GET /internal/file/read alongside the
    POST /internal/chat/uploads/ingest from PR-B.
  * scripts/build_runtime_package.py — adds internal_file_read to
    TOP_LEVEL_MODULES so the publish-runtime cascade rewrites its
    imports correctly. Also includes the PR-B additions
    (internal_chat_uploads, platform_inbound_auth) since this branch
    was rooted before PR-B's drift-gate fix; merge-clean alphabetic
    additions.
  * workspace-server/internal/handlers/chat_files.go — Download
    rewritten as streaming HTTP GET forward. Resolves workspace URL +
    platform_inbound_secret (same shape as Upload), builds GET request
    with path query param, propagates response headers (Content-Type /
    Content-Length / Content-Disposition) + body. Drops archive/tar
    + mime imports (no longer needed). Drops Docker-exec branch entirely
    — Download is now uniform across self-hosted Docker and SaaS EC2.
  * workspace-server/internal/handlers/chat_files_test.go — replaces
    TestChatDownload_DockerUnavailable (stale post-rewrite) with 4
    new tests:
      - TestChatDownload_WorkspaceNotInDB → 404 on missing row
      - TestChatDownload_NoInboundSecret → 503 on NULL column
        (with RFC #2312 detail in body)
      - TestChatDownload_ForwardsToWorkspace_HappyPath → forward shape
        (auth header, GET method, /internal/file/read path) + headers
        propagated + body byte-for-byte
      - TestChatDownload_404FromWorkspacePropagated → 404 from
        workspace propagates (NOT remapped to 500)
    Existing TestChatDownload_InvalidPath path-safety tests preserved.
  * workspace/tests/test_internal_file_read.py — 21 tests covering
    _validate_path matrix (absolute, allowed roots, traversal, double-
    slash, exact-match-on-root), 401 on missing/wrong/no-secret-file
    bearer, 400 on missing path/outside-root/traversal, 404 on missing
    file, happy-path streaming with correct Content-Type +
    Content-Disposition, special-char escaping in Content-Disposition,
    symlink-redirect-rejection (lstat-not-stat protection).

Test results:
  * go test ./internal/handlers/ ./internal/wsauth/ — green
  * pytest workspace/tests/ — 1292 passed (was 1272 before PR-D)

Refs #2312 (parent RFC), #2308 (chat upload+download 503 incident).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 15:19:02 -07:00
Hongming Wang
055e447355 feat(saas): deliver platform_inbound_secret via /registry/register (RFC #2312, PR-F)
Closes the SaaS-side gap that PR-A acknowledged but didn't fix: SaaS
workspaces have no persistent /configs volume, so the platform_inbound_secret
that PR-A's provisioner wrote at workspace creation never reaches the
runtime. Without this, even after the entire RFC #2312 stack lands,
SaaS chat upload would 401 (workspace fails-closed when /configs/.platform_inbound_secret
is missing).

Solution: return the secret in the /registry/register response body
on every register call. The runtime extracts it and persists to
/configs/.platform_inbound_secret at mode 0600. Idempotent — Docker-
mode workspaces also receive it and overwrite the value the provisioner
already wrote (same value until rotation).

Why on every register, not just first-register:
  * SaaS containers can be restarted (deploys, drains, EBS detach/
    re-attach) — /configs is rebuilt empty on each fresh start.
  * The auth_token is "issue once" because re-issuing rotates and
    invalidates the previous one. The inbound secret has no rotation
    flow yet (#2318) so re-sending the same value is harmless.
  * Eliminates the bootstrap window where a restarted SaaS workspace
    has no inbound secret on disk and would 401 every platform call.

Changes:
  * workspace-server/internal/handlers/registry.go — Register handler
    reads workspaces.platform_inbound_secret via wsauth.ReadPlatformInboundSecret
    and includes it in the response body. Legacy workspaces (NULL
    column) get a successful registration with the field omitted.
  * workspace-server/internal/handlers/registry_test.go — two new tests:
      - TestRegister_ReturnsPlatformInboundSecret_RFC2312_PRF: secret
        present in DB → secret in response, alongside auth_token.
      - TestRegister_NoInboundSecret_OmitsField: NULL column → field
        omitted, registration still 200.
  * workspace/platform_inbound_auth.py — adds save_inbound_secret(secret).
    Atomic write via tmp + os.replace, mode 0600 from os.open(O_CREAT,
    0o600) so a concurrent reader never sees 0644-default. Resets the
    in-process cache after write so the next get_inbound_secret() returns
    the freshly-written value (rotation-safe when it lands).
  * workspace/main.py — register-response handler extracts
    platform_inbound_secret alongside auth_token and persists via
    save_inbound_secret. Mirrors the existing save_token pattern.
  * workspace/tests/test_platform_inbound_auth.py — 6 new tests for
    save_inbound_secret: writes file, mode 0600, overwrite-existing,
    cache invalidation after save, empty-input no-op, parent-dir creation
    for fresh installs.

Test results:
  * go test ./internal/handlers/ ./internal/wsauth/ — all green
  * pytest workspace/tests/ — 1272 passed (was 1266 before this PR)

Refs #2312 (parent RFC), #2308 (chat upload 503 incident).
Stacks: PR-A #2313 → PR-B #2314 → PR-C #2315 → this PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 15:12:34 -07:00
Hongming Wang
d1de330152 feat(workspace): /internal/chat/uploads/ingest endpoint (RFC #2312, PR-B)
Stacked on PR-A (#2313). The platform-side rewrite that actually calls
this endpoint lands in PR-C; this PR adds the workspace-side consumer
+ hardening so PR-C is a small Go-only diff.

What this adds:

  * platform_inbound_auth.py — auth gate mirroring transcript_auth.py.
    Reads /configs/.platform_inbound_secret (delivered by the PR-A
    provisioner). Fail-closed when the file is missing/empty/unreadable.
    Constant-time compare via hmac.compare_digest.

  * internal_chat_uploads.py — POST /internal/chat/uploads/ingest.
    Multipart parse → sanitize each filename → write to
    /workspace/.molecule/chat-uploads/<random>-<name> with
    O_CREAT|O_EXCL|O_NOFOLLOW. Same response shape (uri/name/mimeType/
    size + workspace: URI scheme) as the legacy Go handler — canvas /
    agent code that resolves "workspace:..." paths keeps working.

  * Wired into workspace/main.py via starlette_app.add_route alongside
    the existing /transcript route.

  * python-multipart>=0.0.18 added to requirements.txt (Starlette's
    Request.form() needs it; ≥ 0.0.18 closes CVE-2024-53981).

Test coverage (36 tests, all green; full workspace suite 1266 passed):

  * test_platform_inbound_auth.py — 14 tests:
      happy path, fail-closed on missing file, empty file, whitespace-
      only file, missing/case-wrong/empty Bearer prefix, in-process
      cache, default CONFIGS_DIR fallback, end-to-end file → authorized.

  * test_internal_chat_uploads.py — 22 tests:
      sanitize_filename matrix (incl. ../traversal, CJK chars, length
      truncation), 401 on missing/wrong/no-secret-file bearer, single +
      batch upload happy paths, unique random prefix on duplicate names,
      mimetype guess fallback, 400 on missing files field, 413 on per-
      file + total-body oversize, symlink-at-target refusal (with
      sentinel-content unchanged assertion).

Why this is safe to ship before PR-C:

  * No platform-side caller yet → no behavior change visible to users.
  * Auth fails closed; nothing on the network can hit a write path
    until the platform forwards with the matching bearer.
  * Workspace's existing routes (/health, /transcript, /handle/*) are
    unchanged.

Refs #2312 (parent RFC), #2308 (chat upload 503 incident).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 14:16:32 -07:00