molecule-core

Author	SHA1	Message	Date
Hongming Wang	5fe52b08e7	feat(harness): coordinator phase-boundary instrumentation for RFC #2251 Adds structured `rfc2251_phase=...` log lines at the deterministic phase boundaries inside route_task_to_team and check_task_status, so an operator running scripts/measure-coordinator-task-bounds.sh against staging can correlate the harness's external timing trace with what phase the coordinator was in at any given second. The harness already exists in staging and measures end-to-end response time + heartbeat trace. What it CAN'T do without this PR is answer "the coordinator response took 7 minutes — was it stuck delegating, or stuck polling children, or stuck synthesizing after all children returned?" The phase logs answer that question. Phases instrumented (deterministic Python boundaries, no agent prompt involvement): route_start → enter route_task_to_team children_fetched → after get_children() returns routing_decided → after build_team_routing_payload delegate_invoked → just before delegate_task_async.ainvoke delegate_returned → after delegate_task_async returns check_status → every check_task_status poll (per-poll) route_returning_decision_only → fall-through path Each line includes elapsed_ms from route_start so per-phase durations are extractable via: grep rfc2251_phase= <container.log> \ \| awk '{...}' to compute deltas between consecutive phases The synthesis phase (after all children return, before agent emits final A2A response) is NOT instrumented here because it's agent-driven (no deterministic Python boundary). The harness operator infers synthesis_secs = total_response_secs − max(check_status_ts). This is reproduction-harness scaffolding; it adds zero behavior. Strip the rfc2251_phase log lines when V1.0 ships and the phase data lands in the structured heartbeat payload instead. Refs: - RFC: molecule-core#2251 - Harness: scripts/measure-coordinator-task-bounds.sh (shipped earlier) - V1.0 gate: this is deliverable #2 of the four pre-V1.0 gates	2026-04-28 20:11:46 -07:00
Hongming Wang	e9a59cda3b	feat(platform): single-source-of-truth tool registry — adapters consume, no drift Establishes workspace/platform_tools/registry.py as THE place tool naming and docs live. Every consumer reads from it; nothing duplicates the source. Closes the architectural gap behind the doc/tool drift discussion 2026-04-28 — adding hundreds of future runtime SDK adapters should not require touching tool names anywhere except the registry. What the registry owns ToolSpec dataclass with: name, short (one-line description), when_to_use (multi-paragraph agent-facing usage guidance), input_schema (JSON Schema), impl (the actual coroutine in a2a_tools.py), section ('a2a' \| 'memory'). TOOLS list with 8 entries — delegate_task, delegate_task_async, check_task_status, list_peers, get_workspace_info, send_message_to_user, commit_memory, recall_memory. What now reads from the registry - workspace/a2a_mcp_server.py The hardcoded TOOLS list (167 lines of hand-maintained dicts) is gone. Replaced with a 6-line list comprehension over the registry. MCP description = spec.short. inputSchema = spec.input_schema. - workspace/executor_helpers.py get_a2a_instructions(mcp=True) and get_hma_instructions() now GENERATE the agent-facing system-prompt text from the registry. Heading + per-tool bullet (spec.short) + per-tool when_to_use + a section-specific footer. No more hand-maintained instruction blocks that drift from reality. - workspace/builtin_tools/delegation.py Renamed delegate_to_workspace -> delegate_task_async to match registry. check_delegation_status -> check_task_status. Added sync delegate_task @tool wrapping a2a_tools.tool_delegate_task (was missing for LangChain runtimes — CP review Issue 3). - workspace/builtin_tools/memory.py Renamed search_memory -> recall_memory to match registry. - workspace/adapter_base.py, workspace/main.py Bundle all 7 core tools (was 6) into all_tools / base_tools. - workspace/coordinator.py, shared_runtime.py, policies/routing.py Updated system-prompt-text references to use the registry names. Structural alignment tests workspace/tests/test_platform_tools.py — 9 tests pin every registry-to-adapter mapping: - registry names are unique - a2a + memory partition is complete (no orphans) - by_name lookup works - MCP server registers exactly the registry's tool set - MCP description equals registry.short for every tool - MCP inputSchema equals registry.input_schema for every tool - get_a2a_instructions text contains every a2a tool name - get_hma_instructions text contains every memory tool name - pre-rename names (delegate_to_workspace, search_memory, check_delegation_status) cannot leak back Adding a future tool means adding one ToolSpec; the test failure list tells the author exactly which adapter to update. Adapter pattern for future SDK support When (e.g.) AutoGen or Pydantic AI gets adapters, the only work needed for tool surfacing is "wrap registry.TOOLS in your SDK's tool format." Names, descriptions, schemas, impl come from the registry — adapter author writes zero strings. Why this needed to ship now PR #2237 (already in staging) injected MCP-world docs as the default system-prompt content. Without the registry, those docs said "delegate_task" while LangChain runtimes only had "delegate_to_workspace" — workers see docs for tools that don't exist (CP review Issue 1+3). PR #2239 was a tactical rename; this PR is the structural fix that prevents the same class of drift from recurring as new adapters ship. PR #2239 was closed in favor of this — same renames, plus the registry, plus structural tests. Single coherent change. Tests: 1232 pass, 2 xfailed (pre-existing). 9 new in test_platform_tools.py; 4 alignment tests in test_prompt.py from #2237 still pass; original test_executor_helpers tests adapted to the registry-driven world. Refs: CP review Issues 1, 2, 3, 5; project memory project_runtime_native_pluggable.md (platform owns A2A); project memory feedback_doc_tool_alignment.md (this is the structural fix for the tactical lesson).	2026-04-28 17:11:36 -07:00
Hongming Wang	81c4c1321c	fix(runtime): use lowercase wire role for v0.3 JSON-RPC compat layer Manual-test failure surfaced what was hidden behind the MCP-path bug: once delegate_task could actually fire, every cross-workspace call came back as JSON-RPC -32600 "Invalid Request" with the underlying pydantic ValidationError: params.message.role Input should be 'agent' or 'user' [type=enum, input_value='ROLE_USER', input_type=str] PR #2184's a2a-sdk 1.x migration sweep over-corrected: it changed every `"role": "user"` literal in JSON-RPC payload construction to `"role": "ROLE_USER"` to match the protobuf enum names of the 1.x native types (a2a.types.Role.ROLE_USER / ROLE_AGENT). That was correct for in-process Message construction (which the SDK serialises before wire transmission) but WRONG for the 8 sites that hand-build JSON-RPC payloads. The workspace's own a2a-sdk runs inbound requests through the v0.3 compat adapter (/usr/local/lib/python3.11/site-packages/a2a/compat/v0_3/) because main.py sets enable_v0_3_compat=True for backwards compatibility, and that adapter validates against the v0.3 Pydantic Role enum (`agent` \| `user` lowercase). The protobuf-style names blow it up. Reverted the 8 wire-payload sites to lowercase: - workspace/a2a_client.py:74 - workspace/a2a_cli.py:74, 111 - workspace/heartbeat.py:378 - workspace/main.py:464, 563 - workspace/builtin_tools/a2a_tools.py:60 - workspace/builtin_tools/delegation.py:272 Native-type usage at workspace/a2a_executor.py:471 (`Role.ROLE_AGENT`) stays — that's an in-process Message construction; the SDK handles wire serialisation correctly. Updated the misleading comment at main.py:255-257 (which said "outbound payloads are now 1.x-shaped (ROLE_USER)") to spell out the actual rule: outbound JSON-RPC wire payloads MUST use v0.3 shape, native types are only for in-process construction. New regression test test_jsonrpc_wire_role_format.py greps the 6 wire-payload-emitting files for any "ROLE_USER" / "ROLE_AGENT" string literal and fails loud — cheapest possible drift detector. Why E2E missed it: the priority-runtimes harness sends a single message canvas → workspace, but the canvas already used lowercase "user" (it never went through the migration sweep). The bug only surfaces on workspace → workspace delegation, which the harness doesn't exercise. Same gap as #131 (extend smoke to call main() against a stub). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 12:40:11 -07:00
Hongming Wang	dd57a840b6	fix: comprehensive a2a-sdk 1.x migration sweep across workspace/ Audited every a2a-sdk surface in workspace/ against the installed 1.0.2 wheel. Found and fixed: main.py (the live workspace startup path): • create_jsonrpc_routes(rpc_url='/', enable_v0_3_compat=True) — rpc_url required in 1.x; v0.3 compat enables inbound legacy clients (`"role": "user"` lowercase) without forcing them to upgrade. Pairs with the outbound rename below. a2a_executor.py: • TextPart/FilePart/FileWithUri removed in 1.x. Part is now a flat proto message: Part(text=…) / Part(url=…, filename=…, media_type=…). Updated the file-attachment branch (only reachable when an agent emits files; the harness's PONG path didn't exercise this, but it's a latent crash). • Message field names: messageId/taskId/contextId → message_id/task_id/context_id (proto3 snake_case). • Role enum: Role.agent → Role.ROLE_AGENT (proto enum). Outbound JSON-RPC payloads (8 files): • "role": "user" → "role": "ROLE_USER" — proto3 JSON serialization is strict about enum values. Sites: a2a_client, a2a_cli, main (initial+idle prompts), heartbeat, builtin_tools/a2a_tools, builtin_tools/delegation. Wire JSON keys stay camelCase (proto3 default), only the role enum value changed. google-adk/adapter.py: • new_agent_text_message → new_text_message (4 sites). This adapter's directory has a hyphen, so it can't be imported as a Python module — effectively dead code, but the wheel ships the file and a future fix should keep it correct against 1.x. Why one PR instead of seven: every previous a2a-sdk migration find landed as its own publish → cascade → harness → next-bug cycle. Today's audit ran every a2a-sdk symbol/type/method in workspace/ against the installed 1.0.2 wheel in a single sweep + tested the critical paths (Message construction, Part construction, Role enum parsing) against the actual SDK. Should be the last migration PR. Verified locally: python3 scripts/build_runtime_package.py --version 0.1.99 \ --out /tmp/build-final pip install /tmp/build-final python -c "import molecule_runtime.main; \ from molecule_runtime.a2a_executor import LangGraphA2AExecutor" → ✓ all imports clean against a2a-sdk 1.0.2 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 09:42:57 -07:00
Hongming Wang	5071454074	fix(delegation): lazy-refresh QUEUED state from platform; live DELEGATION_* events Critical follow-up to PR #2126's review. Two real bugs: 1. Runtime QUEUED never resolved. Platform's drain stitch updates the platform's delegate_result row when a queued delegation finally completes, but never pushes back to the runtime. The LLM polling check_delegation_status saw status="queued" forever — combined with the new docstring guidance ("queued → wait, peer will reply"), the model would wait indefinitely on a state that never resolves. Strictly worse than pre-PR behavior where it would have at least bypassed. 2. Live updates dead code. delegation.go writes activity rows by direct INSERT INTO activity_logs, bypassing the LogActivity helper that fires ACTIVITY_LOGGED. Adding "delegation" to the canvas's ACTIVITY_LOGGED filter (PR #2126 first cut) was inert — initial GET worked, live updates did not. Fix: (1) Runtime side, workspace/builtin_tools/delegation.py: - New `_refresh_queued_from_platform(task_id)` async helper that pulls /workspaces/<self>/delegations and finds the platform-side delegate_result row for our task_id. - check_delegation_status calls _refresh when local status is QUEUED, so the LLM's poll itself drives state convergence. - Best-effort: GET failure leaves local state untouched, next poll retries. - Docstring updated to reflect the actual behavior ("polls transparently — keep polling and you'll see the flip"). - 4 new tests cover: QUEUED → completed via refresh; QUEUED → failed via refresh; refresh keeps QUEUED when platform hasn't resolved; refresh swallows network errors safely. (2) Canvas side, AgentCommsPanel.tsx WS push handler: - Listens for DELEGATION_SENT / DELEGATION_STATUS / DELEGATION_COMPLETE / DELEGATION_FAILED in addition to ACTIVITY_LOGGED. - Each event's payload synthesized into an ActivityEntry shape so toCommMessage's existing delegation branch maps it. Status derived: STATUS uses payload.status, COMPLETE → "completed", FAILED → "failed", SENT → "pending". - The ACTIVITY_LOGGED branch keeps the "delegation" type accepted as a no-op-today / future-proof path: if delegation handlers are ever refactored to call LogActivity, this lights up automatically without another canvas change. Doesn't change: the docstring guidance ("queued → wait, don't bypass") is now actually load-bearing because the refresh path will deliver the eventual outcome. Without the refresh, the guidance was a trap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 16:05:04 -07:00
Hongming Wang	057876cb0c	fix(delegation): runtime handles 202+queued; canvas surfaces delegation rows Two bugs that compounded into the "Director does the work itself" UX: 1. workspace/builtin_tools/delegation.py: _execute_delegation only handled HTTP 200 in the response branch. When the peer's a2a-proxy returned HTTP 202 + {queued: true} (single-SDK-session bottleneck on the peer), the loop fell through. Two iterations later the `if "error" in result` check tried to access an unbound `result`, the goroutine ended quietly, and the delegation stayed at FAILED with error="None". The LLM checking status saw "failed" + the platform's "Delegation queued — target at capacity" log line in chat context, concluded the peer was permanently unavailable, and bypassed delegation to do the work itself. Fix: explicit 202+queued branch. Adds DelegationStatus.QUEUED, marks the local delegation as QUEUED, mirrors to the platform, and returns cleanly without retrying. The retry loop is for transient transport errors — queueing is a real ack, not a failure to retry against (retrying would just re-queue the same task). check_delegation_status docstring extended with explicit per-status guidance: pending/in_progress → wait, queued → wait (peer busy on prior task, reply WILL arrive), completed → use result, failed → real error in error field; only fall back on failed, never queued. 2. canvas/src/components/tabs/chat/AgentCommsPanel.tsx: filter dropped every delegation row because it whitelisted only a2a_send / a2a_receive. activity_type='delegation' rows (written by the platform's /delegate handler with method='delegate' or 'delegate_result') never reached toCommMessage. User saw "No agent-to-agent communications yet" while 6+ delegations existed in the DB. Fix: include "delegation" in the both the initial filter and the WS push filter, plus a delegation branch in toCommMessage that maps the row as outbound (always — platform proxies on our behalf) and uses summary as the primary text source. Tests: - 3 new Python tests cover the 202+queued path: status becomes QUEUED not FAILED; no retry on queued (counted by URL match against the A2A target since the mock is shared across all AsyncClient calls); bare 202 without {queued:true} still falls through to the existing retry-then-FAILED path. - 3 new TS tests cover the delegation mapper: 'delegate' row maps as outbound to target with summary text; queued 'delegate_result' preserves status='queued' (load-bearing for the LLM's wait-vs-bypass decision); missing target_id returns null instead of rendering a ghost. Does NOT solve: the underlying single-SDK-session bottleneck that causes peers to queue in the first place. Tracked as task #102 (parallel SDK sessions per workspace) — real architectural work. This PR makes the runtime handle the queueing correctly so the LLM doesn't bail out, and makes the delegations visible in Agent Comms so operators can see what's happening. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 15:01:50 -07:00
Hongming Wang	65b531acf6	fix(workspace): tag self-originated A2A POSTs with X-Workspace-ID Workspace runtime fired four classes of A2A request to the platform without the X-Workspace-ID header that identifies the source workspace: heartbeat self-messages, initial_prompt, idle-loop fires, and peer-to-peer A2A from runtime tools. The platform's a2a_receive logger keys source_id off that header — without it, every such row was written with source_id=NULL, which the canvas's My Chat tab filters as ?source=canvas (i.e. "user typed this") and rendered the internal triggers as if the human user had sent them. The "Delegation results are ready..." heartbeat trigger was visible to end users in the chat history; delegate_task A2A calls between agents were misclassified the same way. Centralise the header construction in a new platform_auth helper self_source_headers(workspace_id) that returns auth_headers() PLUS {X-Workspace-ID: <id>}. Apply it to: - heartbeat.py self-message (refactored from inline header dict) - main.py initial_prompt POST - main.py idle_prompt POST - a2a_client.py send_a2a_message (peer A2A from runtime) - builtin_tools/a2a_tools.py delegate_task (was missing ALL headers) Tests: - test_heartbeat.py asserts the X-Workspace-ID header is set on the self-message POST. - test_a2a_tools_module.py asserts the same on delegate_task POSTs; FakeClient.post mocks updated to accept the headers kwarg. Production effect lands the moment workspace containers are rebuilt with this code; existing rows in activity_logs keep their NULL source_id (legacy data). The canvas-side filter (#follow-up) covers the historical-rows case until backfill. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 19:54:43 -07:00
molecule-ai[bot]	3bef6af241	fix: apply #1124 env-var defaults + scrub F1088 credentials from INCIDENT_LOG.md (#1347 ) - PLATFORM_URL: replace unreachable http://platform:8080 mesh-only default with Docker-aware detection (host.docker.internal in containers, localhost for local dev) across all workspace Python modules and the git-token-helper shell script. - WORKSPACE_ID: add fail-fast validation in main.py (SystemExit if empty) consistent with coordinator.py / a2a_cli.py patterns already in place. - INCIDENT_LOG.md: replace all 3 F1088 credential types with *REDACTED* (sk-cp- 2x, github_pat_ 2x, ADMIN_TOKEN base64 3x). Fixes #1124, #1333. Co-authored-by: Molecule AI Dev Lead <dev-lead@agents.moleculesai.app>	2026-04-21 08:11:44 +00:00
Hongming Wang	d8026347e5	chore: open-source restructure — rename dirs, remove internal files, scrub secrets Renames: - platform/ → workspace-server/ (Go module path stays as "platform" for external dep compat — will update after plugin module republish) - workspace-template/ → workspace/ Removed (moved to separate repos or deleted): - PLAN.md — internal roadmap (move to private project board) - HANDOFF.md, AGENTS.md — one-time internal session docs - .claude/ — gitignored entirely (local agent config) - infra/cloudflare-worker/ → Molecule-AI/molecule-tenant-proxy - org-templates/molecule-dev/ → standalone template repo - .mcp-eval/ → molecule-mcp-server repo - test-results/ — ephemeral, gitignored Security scrubbing: - Cloudflare account/zone/KV IDs → placeholders - Real EC2 IPs → <EC2_IP> in all docs - CF token prefix, Neon project ID, Fly app names → redacted - Langfuse dev credentials → parameterized - Personal runner username/machine name → generic Community files: - CONTRIBUTING.md — build, test, branch conventions - CODE_OF_CONDUCT.md — Contributor Covenant 2.1 All Dockerfiles, CI workflows, docker-compose, railway.toml, render.yaml, README, CLAUDE.md updated for new directory names. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-18 00:24:44 -07:00

9 Commits