molecule-ai-workspace-runtime/molecule_runtime/a2a_client.py
rabbitblood 18d904cfc1 fix: MCP server path resolution + absolute imports (2nd half of #507)
The a2a MCP subprocess was launched with a hard-coded /app/a2a_mcp_server.py
path that only existed in the legacy workspace-template layout. Current
templates copy adapter.py into /app but not the MCP server script, so
claude-code's mcp_servers={"a2a": ...} config spawned a non-existent file,
the server never registered any tools, and every agent reported that
search_memory / commit_memory / list_peers / delegate_task / send_message_to_user
were unavailable in the tool registry.

Surfaced this cycle after the CRLF hook fix (PR molecule-core#508 +
plugin repo's .gitattributes) unblocked the primary (no response generated)
symptom. Before that, agents crashed before the missing-MCP issue was
observable — the two bugs stacked.

Changes
-------
* executor_helpers._default_mcp_server_path: resolves the installed
  molecule_runtime.a2a_mcp_server module's __file__ so the path is
  always correct regardless of template layout. Legacy /app path kept
  as last-resort fallback for any old images still in rotation.
* a2a_mcp_server.py, a2a_tools.py, a2a_client.py: convert bare module
  imports (from a2a_tools import ...) to absolute
  (from molecule_runtime.a2a_tools import ...). Previously this worked
  only when main.py injected the package dir onto sys.path; the MCP
  subprocess doesn't go through main.py, so the bare imports would fail.
  Added a sys.path shim at the top of a2a_mcp_server.py so running as a
  standalone script (python path/to/a2a_mcp_server.py) still works —
  the subprocess can now locate the package root automatically.
* consolidation.py, heartbeat.py, main.py: same bare-to-absolute
  conversion for platform_auth imports (unblocks the same class of
  failure if any of these modules are imported from a non-main.py
  entrypoint in the future).

Verification
------------
Deployed the updated files into ws-8010dbd0 (PM) and ran an isolated
sdk.query() as agent user. SystemMessage.init.mcp_servers now reports
[{'name': 'a2a', 'status': 'connected'}] and the tools list includes
all 8 mcp__a2a__* entries:
  mcp__a2a__check_task_status, mcp__a2a__commit_memory,
  mcp__a2a__delegate_task, mcp__a2a__delegate_task_async,
  mcp__a2a__get_workspace_info, mcp__a2a__list_peers,
  mcp__a2a__recall_memory, mcp__a2a__send_message_to_user

Rolled the in-container hotfix across all 22 workspaces pending release
(docker cp the 4 changed files into each site-packages/molecule_runtime/).

Fixes Molecule-AI/molecule-core#507 (secondary)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 13:28:57 -07:00

112 lines
4.0 KiB
Python

"""A2A protocol client — peer discovery, messaging, and workspace info.
Shared constants (WORKSPACE_ID, PLATFORM_URL) live here so that
a2a_tools and a2a_mcp_server can import them from a single place.
"""
import logging
import os
import uuid
import httpx
from molecule_runtime.platform_auth import auth_headers
logger = logging.getLogger(__name__)
WORKSPACE_ID = os.environ.get("WORKSPACE_ID", "")
PLATFORM_URL = os.environ.get("PLATFORM_URL", "http://platform:8080")
# Cache workspace ID → name mappings (populated by list_peers calls)
_peer_names: dict[str, str] = {}
# Sentinel prefix for errors originating from send_a2a_message / child agents.
# Used by delegate_task to distinguish real errors from normal response text.
_A2A_ERROR_PREFIX = "[A2A_ERROR] "
async def discover_peer(target_id: str) -> dict | None:
"""Discover a peer workspace's URL via the platform registry."""
async with httpx.AsyncClient(timeout=10.0) as client:
try:
resp = await client.get(
f"{PLATFORM_URL}/registry/discover/{target_id}",
headers={"X-Workspace-ID": WORKSPACE_ID, **auth_headers()},
)
if resp.status_code == 200:
return resp.json()
return None
except Exception as e:
logger.error(f"Discovery failed for {target_id}: {e}")
return None
async def send_a2a_message(target_url: str, message: str) -> str:
"""Send an A2A message/send to a target workspace."""
# Fix F (Cycle 5 / H2 — flagged 5 consecutive audits): timeout=None allowed
# a hung upstream to block the agent indefinitely. Use a generous but bounded
# timeout: 30s connect + 300s read (long enough for slow LLM responses).
async with httpx.AsyncClient(
timeout=httpx.Timeout(connect=30.0, read=300.0, write=30.0, pool=30.0)
) as client:
try:
resp = await client.post(
target_url,
headers=auth_headers(),
json={
"jsonrpc": "2.0",
"id": str(uuid.uuid4()),
"method": "message/send",
"params": {
"message": {
"role": "user",
"messageId": str(uuid.uuid4()),
"parts": [{"kind": "text", "text": message}],
}
},
},
)
data = resp.json()
if "result" in data:
parts = data["result"].get("parts", [])
text = parts[0].get("text", "") if parts else "(no response)"
# Tag child-reported errors so the caller can detect them reliably
if text.startswith("Agent error:"):
return f"{_A2A_ERROR_PREFIX}{text}"
return text
elif "error" in data:
return f"{_A2A_ERROR_PREFIX}{data['error'].get('message', 'unknown')}"
return str(data)
except Exception as e:
return f"{_A2A_ERROR_PREFIX}{e}"
async def get_peers() -> list[dict]:
"""Get this workspace's peers from the platform registry."""
async with httpx.AsyncClient(timeout=10.0) as client:
try:
resp = await client.get(
f"{PLATFORM_URL}/registry/{WORKSPACE_ID}/peers",
headers={"X-Workspace-ID": WORKSPACE_ID, **auth_headers()},
)
if resp.status_code == 200:
return resp.json()
return []
except Exception:
return []
async def get_workspace_info() -> dict:
"""Get this workspace's info from the platform."""
async with httpx.AsyncClient(timeout=10.0) as client:
try:
resp = await client.get(
f"{PLATFORM_URL}/workspaces/{WORKSPACE_ID}",
headers=auth_headers(),
)
if resp.status_code == 200:
return resp.json()
return {"error": "not found"}
except Exception as e:
return {"error": str(e)}