fix/s8-bind-loopback-dev
10 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
1161b97faf |
feat(mcp): cross-workspace delegation routing (multi-ws PR-2)
PR-2 of the multi-workspace external-agent stack. PR-1 (#2739) landed per-workspace auth + heartbeat + inbox. This PR threads ``source_workspace_id`` through the A2A client + tool surface so an agent registered against multiple workspaces can list peers across all of them and delegate from a specific source. Changes ------- * ``a2a_client``: ``discover_peer``, ``send_a2a_message``, ``get_peers_with_diagnostic``, and ``enrich_peer_metadata`` now accept ``source_workspace_id``. Routing uses it for both the X-Workspace-ID header and (transitively, via ``auth_headers(src)``) the bearer token. Defaults to module-level WORKSPACE_ID for back-compat. * ``a2a_client._peer_to_source``: a new lock-free cache mapping each discovered peer back to the source workspace whose registry surfaced it. ``tool_list_peers`` populates the cache on every call; ``tool_delegate_task`` consults it for auto-routing. * ``a2a_tools.tool_list_peers(source_workspace_id=None)``: when multiple workspaces are registered (MOLECULE_WORKSPACES) and no explicit source is passed, aggregates peers across every registered workspace and tags each entry with ``via: <src[:8]>``. Single-workspace mode is unchanged — no ``via:`` annotation, same output shape. * ``a2a_tools.tool_delegate_task`` and ``tool_delegate_task_async`` resolve source via ``source_workspace_id arg → _peer_to_source[target] → WORKSPACE_ID``. Agents almost never need to specify ``source_*`` explicitly — call ``list_peers`` first and the cache handles the rest. * ``tool_delegate_task_async`` idempotency key now includes the source workspace, so the same task delegated from two registered workspaces produces two distinct delegations (the right behavior — one per tenant audit trail). * ``platform_auth.list_registered_workspaces()``: new helper for the tool layer to enumerate the multi-ws registry. Lock-free reads matched by the existing single-writer-per-workspace contract from PR-1. * ``platform_auth.self_source_headers``: now passes ``workspace_id`` through to ``auth_headers`` — without this, a multi-workspace POST source-tagged with ``X-Workspace-ID=ws_b`` was authenticating with ws_a's token (or no token if MOLECULE_WORKSPACE_TOKEN unset). Latent PR-1 bug exposed by the new tool surface. * ``a2a_mcp_server`` tool dispatch passes ``source_workspace_id`` from the tool call arguments. * ``platform_tools.registry``: add ``source_workspace_id`` to the delegate_task, delegate_task_async, check_task_status, list_peers input schemas with copy explaining when to use it (rarely — the cache handles it). Tests (15 new, all passing) --------------------------- ``test_a2a_multi_workspace.py``: * TestDiscoverPeerSourceRouting (3): src arg drives header+token, fallback to module ws when omitted, invalid target short-circuits before any HTTP attempt. * TestSendA2AMessageSourceRouting (1): X-Workspace-ID source header + Authorization bearer both come from the source arg via the patched self_source_headers chain. * TestGetPeersSourceRouting (1): URL path AND headers use the source workspace id. * TestToolListPeersAggregation (4): aggregates across multiple registered workspaces, tags origin, leaves single-workspace path unchanged, explicit src arg overrides aggregation, diagnostic joining when every workspace returns empty. * TestToolDelegateTaskAutoRouting (3): cache-driven auto-route, explicit override beats cache, single-workspace fallback to module WORKSPACE_ID. * TestListRegisteredWorkspaces (3): registry enumeration helper. Plus ``tests/snapshots/a2a_instructions_mcp.txt`` regenerated to absorb the new ``source_workspace_id`` schema entries. Back-compat ----------- Every change defaults ``source_workspace_id=None``; legacy single-workspace operators (no MOLECULE_WORKSPACES) see identical behavior — same URLs, same headers, same tool output. The 24 PR-1 tests + 125 existing A2A tests all still pass. Out of scope (PR-3) ------------------- Memory namespacing per registered workspace lands after the new memory system v2 PR (#2740) settles in production. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
6fb9bc9bcd |
mcp: regenerate platform_auth signature snapshot for auth_headers(workspace_id=...)
PR-1's auth_headers added an optional workspace_id parameter for multi-workspace token routing; the signature drift gate (test_platform_auth_signature_matches_snapshot) caught the change as expected. Snapshot regenerated to capture the new shape — diff is visible in the PR for reviewers + template repos that depend on this surface. Behavior unchanged: auth_headers() with no arg still routes through the legacy resolution path (back-compat exact); the workspace_id arg is opt-in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
09e99a09c6 |
feat(a2a-mcp): add chat_history tool for prior turns with a peer
When a peer_agent push lands and the agent needs context from prior
turns with that workspace ("what task did this peer assign me last
hour?", "what did I tell them?"), the only options today are
re-deriving from memory (lossy) or scrolling activity_logs in the
canvas (no agent-facing tool). Surface the platform's existing
audit log directly via a new MCP tool so agents can read both sides
of an A2A conversation in chronological order.
Implementation:
- a2a_tools.py: new tool_chat_history(peer_id, limit=20, before_ts="")
hits /workspaces/<self>/activity?peer_id=X&limit=N (the new server
filter from molecule-core#2472). Reverses the DESC response into
chronological order so the agent reads top-down. Graceful error
envelope on validation/network/non-200 — never crashes the MCP
server, agent can branch on Error: prefix.
- platform_tools/registry.py: ToolSpec wired into the A2A section so
the rendered system-prompt block automatically includes it. Same
pattern as the existing inbox_peek/inbox_pop/wait_for_message.
- a2a_mcp_server.py: dispatch in handle_tool_call.
- executor_helpers.py: _CLI_A2A_COMMAND_KEYWORDS gets a None entry
(CLI runtimes don't expose chat history today; flip to a keyword
when a2a_cli grows a `history` subcommand).
- snapshots/a2a_instructions_mcp.txt regenerated.
Tests: 10 new branches in TestChatHistory (validation / param
forwarding / limit cap / before_ts pass-through / DESC→chronological
reorder / 400 verbatim / 500 generic / network exc / non-list resp).
Mutation-verified: reverting a2a_tools.py fails 10/10. Full test
suite remains green at 1516 passed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
b47d4ceb00 |
feat(workspace-runtime): add inbox polling for standalone molecule-mcp path
The universal MCP server (a2a_mcp_server.py) was outbound-only — agents
in standalone runtimes (Claude Code, hermes, codex, etc.) could
delegate, list peers, and write memories, but never observed the
canvas-user or peer-agent messages addressed to them. This blocked
"constantly responding" loops without forcing operators back onto a
runtime-specific channel plugin.
This PR closes the inbound gap with a poller-fed in-memory queue and
three new MCP tools:
- wait_for_message(timeout_secs?) — block until next message arrives
- inbox_peek(limit?) — list pending messages (non-destructive)
- inbox_pop(activity_id) — drop a handled message
A daemon thread polls /workspaces/:id/activity?type=a2a_receive every
5s, fills the queue from the cursor (since_id), and persists the cursor
to ${CONFIGS_DIR}/.mcp_inbox_cursor so a restart doesn't replay backlog.
On 410 (cursor pruned) we fall back to since_secs=600 for a bounded
recovery window. Activity-row → InboxMessage extraction mirrors the
molecule-mcp-claude-channel plugin's extractText (envelope shapes #1-3
+ summary fallback).
mcp_cli.main starts the poller alongside the existing register +
heartbeat threads. In-container runtimes (which have push delivery via
canvas WebSocket) skip activation, so inbox tools return an
informational "(inbox not enabled)" message instead of double-delivery.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
899a2231d6 |
test(platform_auth): module-functions signature snapshot drift gate
Pin the 5 public functions adapters and the runtime hot-path import through ``from platform_auth import``: - ``auth_headers`` — every outbound httpx call merges this in - ``self_source_headers`` — A2A peer + self-message header builder - ``get_token`` — main.py reads on boot to decide register-vs-resume - ``save_token`` — main.py persists the platform-issued token - ``refresh_cache`` — 401-retry path drops in-process cache (#1877) A grep across workspace/ shows 14+ runtime modules import these: main.py, heartbeat.py, a2a_client.py, a2a_tools.py, consolidation.py, events.py, executor_helpers.py (3 sites), molecule_ai_status.py, builtin_tools/memory.py (3 sites), builtin_tools/temporal_workflow.py (2 sites). Renaming any of the five (e.g. ``auth_headers`` → ``bearer_headers``) makes every one of those imports raise ImportError at workspace boot — the failure surface is deep in heartbeat init, nowhere near the rename site. Same drift class as the BaseAdapter signature snapshot (#2378, #2380), skill_loader gate (#2381), runtime_wedge gate (#2383). Reuses the ``_signature_snapshot.py`` helpers shipped in #2381. Defense-in-depth: ``test_snapshot_has_required_functions`` asserts the five names are still present, so removing one even with a synchronized snapshot edit forces an explicit edit here with a justification. ``clear_cache`` is intentionally NOT in the snapshot — it's a test-only helper. Production code MUST NOT depend on it. Verified red on deliberate rename: ``auth_headers`` → ``bearer_headers`` produces a clean diff of the missing function in the failure message. Restored before commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
70176e6c8f |
test(runtime_wedge): module-functions signature snapshot drift gate
BaseAdapter docstring tells adapter authors: > ``runtime_wedge.mark_wedged()`` / ``clear_wedge()`` — flip the > workspace to ``degraded`` + auto-recover when your SDK hits a > non-recoverable error class. Import directly from ``runtime_wedge``; > the heartbeat forwards the state to the platform automatically. That's a contract — adapter templates depend on the four module-level functions (``is_wedged``, ``wedge_reason``, ``mark_wedged``, ``clear_wedge``) being importable by those exact names with those exact signatures. Renaming any silently breaks every adapter that calls them: the import resolves the module fine, the ``AttributeError`` only surfaces when the adapter actually hits its first SDK error — long after the rename merges. Same drift class as #2378 / #2380 / #2381 (BaseAdapter, skill_loader) applied to the module-level function surface. Changes: - tests/_signature_snapshot.py gains build_module_functions_record. Walks a module's public top-level functions, optionally filtered to a specific name list (used here — runtime_wedge has internal helpers like reset_for_test that intentionally aren't part of the contract). Skips re-exports via __module__ check so a `from foo import bar` doesn't pollute the snapshot. - tests/test_runtime_wedge_signature.py snapshots the four contract functions. Plus a defense-in-depth required-functions test that catches removal even when source + snapshot are updated together. Verified: deliberately renaming `mark_wedged` → `mark_wedged_RENAMED` trips the gate with full snapshot diff in the failure message. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
e336688278 |
test: extract shared signature-snapshot helpers + skill_loader gate
Two changes in one PR (tightly coupled — the second wouldn't make
sense without the first):
1. Hoist the inspect-based snapshot helpers out of
test_adapter_base_signature.py into tests/_signature_snapshot.py
so future surfaces don't copy-paste introspection logic.
- build_class_signature_record(cls): walks public methods,
unwraps static/class/abstract methods, returns a stable
{class, methods: [...]} dict.
- build_dataclass_record(cls): walks dataclass fields via
dataclasses.fields(), returns {name, frozen, fields: [...]}.
- compare_against_snapshot(actual, path): writes-on-first-run +
diff-on-drift, with both expected and actual JSON in failure
message.
test_adapter_base_signature.py is rewritten to use the helpers;
the existing snapshot file is byte-identical (no behavior change).
2. New gate: tests/test_skill_loader_signature.py covers the
public dataclasses exported from skill_loader/loader.py:
- SkillMetadata: every adapter pattern-matches on .runtime for
skill-compat filtering. Renaming this field would silently
break per-adapter skill loading — the loader still returns
objects, but adapters' `if "*" in skill.metadata.runtime`
raises AttributeError at workspace boot.
- LoadedSkill: returned in SetupResult.loaded_skills.
Includes test_snapshot_has_required_skill_metadata_fields
defense-in-depth: ensures the runtime / id / name / description
fields stay even if both source and snapshot are updated together.
Verified: deliberately renaming SkillMetadata.runtime trips the
gate with full snapshot diff in the failure message.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
12e39c7311 |
test(adapter_base): extend signature snapshot to public dataclasses (#2364 item 2 followup)
Follows up #2378. The BaseAdapter snapshot covers method signatures but `adapter_base.py` also exports three public dataclasses that form the call/return contract between the platform and every adapter: - SetupResult — returned by adapter._common_setup() - AdapterConfig — passed into adapter setup hooks - RuntimeCapabilities — returned by adapter.capabilities(); drives platform-side dispatch routing (#117) Renaming a RuntimeCapabilities flag silently disables every adapter's capability declaration (the platform fallback runs) without an AttributeError to surface the breakage. That's exactly the drift class the snapshot pattern is meant to catch. Changes: - _build_dataclass_snapshot walks SetupResult, AdapterConfig, RuntimeCapabilities via dataclasses.fields(), capturing field name + type annotation + has_default per field, plus the @dataclass(frozen=...) flag. - _build_full_snapshot composes method + dataclass records into one stable JSON snapshot. - test_snapshot_has_required_dataclass_fields — defense-in-depth test parallel to test_snapshot_has_required_methods. Catches field removal even when both source AND snapshot are updated together. Required field set is intentionally short (the flags that drive platform dispatch + the adapter-level config knobs). Verified: deliberately renaming `provides_native_heartbeat` → `provides_native_heartbeat_RENAMED` trips test_base_adapter_signature_matches_snapshot with a full diff in the failure message. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
8488a188c2 |
test(adapter_base): signature snapshot — drift gate for adapter public surface (#2364 item 2)
Every workspace template (langgraph, claude-code, hermes, etc.) subclasses BaseAdapter. Renaming, removing, or re-typing a method on the base class silently breaks templates: the override stops being recognized as an override; the old method-name's caller silently invokes the default no-op; the new method-name is unimplemented in templates that haven't migrated. Recent #87 universal-runtime + #1957 recordResource refactor both renamed/added methods. Without a frozen snapshot, the next rename ships quietly and surfaces only when a template's CI catches the AttributeError days later — long after the merge window for an easy revert. This snapshot pins BaseAdapter's public method surface against a checked-in JSON file. Same-shape pattern as PR #2363's A2A protocol-compat replay gate, applied to a Python public-API surface instead of JSON message shapes. Both close drift classes by snapshotting the structural surface that consumers depend on. Two tests: 1. test_base_adapter_signature_matches_snapshot — full introspection diff against tests/snapshots/adapter_base_signature.json. Drift = test failure with both expected + actual JSON in the message so the reviewer sees what changed. 2. test_snapshot_has_required_methods — defense-in-depth: even if both the source AND snapshot are updated together (intentional API removal), this catches removal of the short list of methods that EVERY template depends on (name, display_name, description, capabilities, memory_filename). Removing one of these requires explicit edit to the `required` set with a justification. Verified the gate fires red on a deliberate rename (memory_filename → memory_filename_RENAMED) — failure message shows the full snapshot diff including parameter shapes and return annotations. Updating the snapshot is the explicit acknowledgment that a template-affecting API change is intentional. Reviewer of the introducing PR sees the snapshot diff and decides whether template repos need coordinated updates. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
ddf6720498 |
chore(registry): snapshot tests + CLI-block alignment for #2240
Two follow-ups from the #2240 code review: 1. Snapshot tests for the rendered tool-instruction blocks. The structural tests added in #2240 guarantee tool NAMES are present; these new tests pin the SHAPE — bullet ordering, heading style, footer placement — so a future contributor who reorders fields in `_render_section` or rewrites a `when_to_use` paragraph sees the diff in CI rather than shipping a silently-different system prompt. Golden files live under workspace/tests/snapshots/. 2. CLI-block alignment test + corrected source-of-truth comment. `_A2A_INSTRUCTIONS_CLI` is a separate hand-maintained surface for ollama and other non-MCP runtimes — the registry can't auto-generate it because the CLI subprocess interface uses different command shapes (`peers` vs `list_peers`, etc.). A new `_CLI_A2A_COMMAND_KEYWORDS` mapping declares the registry-tool → CLI-keyword correspondence (or explicit `None` for tools not exposed via subprocess). Two tests enforce coverage: - every a2a tool in the registry is keyed in the mapping - every non-None subcommand keyword literally appears in `_A2A_INSTRUCTIONS_CLI` Caught one real gap: `send_message_to_user` is in the registry but has no CLI subcommand. Mapped to `None` with an explanatory comment. The "no other source of truth" claim in registry.py's docstring was wrong post-#2240 (the CLI block survived) — corrected to describe the two surfaces explicitly and point at the alignment tests as the gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |