molecule-core

Author	SHA1	Message	Date
Hongming Wang	d1de330152	feat(workspace): /internal/chat/uploads/ingest endpoint (RFC #2312 , PR-B) Stacked on PR-A (#2313). The platform-side rewrite that actually calls this endpoint lands in PR-C; this PR adds the workspace-side consumer + hardening so PR-C is a small Go-only diff. What this adds: * platform_inbound_auth.py — auth gate mirroring transcript_auth.py. Reads /configs/.platform_inbound_secret (delivered by the PR-A provisioner). Fail-closed when the file is missing/empty/unreadable. Constant-time compare via hmac.compare_digest. * internal_chat_uploads.py — POST /internal/chat/uploads/ingest. Multipart parse → sanitize each filename → write to /workspace/.molecule/chat-uploads/<random>-<name> with O_CREAT\|O_EXCL\|O_NOFOLLOW. Same response shape (uri/name/mimeType/ size + workspace: URI scheme) as the legacy Go handler — canvas / agent code that resolves "workspace:..." paths keeps working. * Wired into workspace/main.py via starlette_app.add_route alongside the existing /transcript route. * python-multipart>=0.0.18 added to requirements.txt (Starlette's Request.form() needs it; ≥ 0.0.18 closes CVE-2024-53981). Test coverage (36 tests, all green; full workspace suite 1266 passed): * test_platform_inbound_auth.py — 14 tests: happy path, fail-closed on missing file, empty file, whitespace- only file, missing/case-wrong/empty Bearer prefix, in-process cache, default CONFIGS_DIR fallback, end-to-end file → authorized. * test_internal_chat_uploads.py — 22 tests: sanitize_filename matrix (incl. ../traversal, CJK chars, length truncation), 401 on missing/wrong/no-secret-file bearer, single + batch upload happy paths, unique random prefix on duplicate names, mimetype guess fallback, 400 on missing files field, 413 on per- file + total-body oversize, symlink-at-target refusal (with sentinel-content unchanged assertion). Why this is safe to ship before PR-C: * No platform-side caller yet → no behavior change visible to users. * Auth fails closed; nothing on the network can hit a write path until the platform forwards with the matching bearer. * Workspace's existing routes (/health, /transcript, /handle/*) are unchanged. Refs #2312 (parent RFC), #2308 (chat upload 503 incident). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 14:16:32 -07:00
Hongming Wang	a57382e918	feat(runtime): add new_response_message helper for adapter A2A responses Surfaced via cross-template review of the a2a-sdk v0→v1 migration: every adapter executor (claude-code, gemini-cli, crewai, openclaw, autogen) builds A2A response Messages independently using `new_text_message(text)` from the SDK, which omits `task_id` and `context_id`. The runtime's own canonical pattern in `workspace/a2a_executor.py:466-475` correctly threads both: Message( message_id=uuid.uuid4().hex, role=Role.ROLE_AGENT, parts=_parts, task_id=task_id, # ← canonical context_id=context_id, # ← canonical ) Adapters skipping these correlation fields means the platform's a2a proxy can't reliably tie the response back to the originating task. This is a divergence from canonical, not necessarily a strict bug (task_id may be optional with a default) — but it's enough of a correlation/observability gap that the canonical pattern bothers to thread it. Add `new_response_message(context, text, files=None)` to executor_helpers.py — single home for response Message construction. Templates can migrate from `new_text_message(text)` to this helper in stacked PRs once the runtime publishes to PyPI. The helper: - Reads `context.task_id`/`context.context_id` from the inbound RequestContext, falling back to fresh UUIDs (RequestContextBuilder always sets them in production; fallback is for unit tests). - Sets `role=Role.ROLE_AGENT` (the v1 enum value). - Builds text Parts via `Part(text=...)` and file Parts via `Part(url="workspace:<path>", filename=..., media_type=...)`. - Returns a v1 protobuf Message ready for `event_queue.enqueue_event(...)`. Why "files=None" with the workspace: URI scheme as the file Part shape: matches the canonical pattern in a2a_executor.py exactly so the platform's chat-attachment download path (executor_helpers.py `resolve_attachment_uri`) interprets responses uniformly across all adapters. Tests (5, all pass with --no-cov against the live runtime image): - test_new_response_message_text_only - test_new_response_message_with_files - test_new_response_message_files_only_no_text - test_new_response_message_falls_back_when_context_ids_unset - test_new_response_message_handles_missing_attrs The conftest's a2a stubs needed an extension for Message + Role + Part with kwargs preservation. Strictly additive — no existing tests affected. (The 19 pre-existing failures in test_executor_helpers.py are unrelated debt from the commit_memory/recall_memory rewrite, visible on staging baseline before this change.) Per-template migration is the follow-up: claude-code, gemini-cli, crewai, openclaw, autogen all call `new_text_message(text)` today; each gets a per-repo PR replacing it with `new_response_message(context, text)`. This PR ships the helper first so the templates have something to import. Refs: PR #2266/#2267 (restart-race), claude-code #15 (FilePart fix), gemini-cli #10/crewai #8/openclaw #9/autogen #8 (rename PRs). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 01:13:34 -07:00
Hongming Wang	ddf6720498	chore(registry): snapshot tests + CLI-block alignment for #2240 Two follow-ups from the #2240 code review: 1. Snapshot tests for the rendered tool-instruction blocks. The structural tests added in #2240 guarantee tool NAMES are present; these new tests pin the SHAPE — bullet ordering, heading style, footer placement — so a future contributor who reorders fields in `_render_section` or rewrites a `when_to_use` paragraph sees the diff in CI rather than shipping a silently-different system prompt. Golden files live under workspace/tests/snapshots/. 2. CLI-block alignment test + corrected source-of-truth comment. `_A2A_INSTRUCTIONS_CLI` is a separate hand-maintained surface for ollama and other non-MCP runtimes — the registry can't auto-generate it because the CLI subprocess interface uses different command shapes (`peers` vs `list_peers`, etc.). A new `_CLI_A2A_COMMAND_KEYWORDS` mapping declares the registry-tool → CLI-keyword correspondence (or explicit `None` for tools not exposed via subprocess). Two tests enforce coverage: - every a2a tool in the registry is keyed in the mapping - every non-None subcommand keyword literally appears in `_A2A_INSTRUCTIONS_CLI` Caught one real gap: `send_message_to_user` is in the registry but has no CLI subcommand. Mapped to `None` with an explanatory comment. The "no other source of truth" claim in registry.py's docstring was wrong post-#2240 (the CLI block survived) — corrected to describe the two surfaces explicitly and point at the alignment tests as the gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 20:42:15 -07:00
Hongming Wang	e9a59cda3b	feat(platform): single-source-of-truth tool registry — adapters consume, no drift Establishes workspace/platform_tools/registry.py as THE place tool naming and docs live. Every consumer reads from it; nothing duplicates the source. Closes the architectural gap behind the doc/tool drift discussion 2026-04-28 — adding hundreds of future runtime SDK adapters should not require touching tool names anywhere except the registry. What the registry owns ToolSpec dataclass with: name, short (one-line description), when_to_use (multi-paragraph agent-facing usage guidance), input_schema (JSON Schema), impl (the actual coroutine in a2a_tools.py), section ('a2a' \| 'memory'). TOOLS list with 8 entries — delegate_task, delegate_task_async, check_task_status, list_peers, get_workspace_info, send_message_to_user, commit_memory, recall_memory. What now reads from the registry - workspace/a2a_mcp_server.py The hardcoded TOOLS list (167 lines of hand-maintained dicts) is gone. Replaced with a 6-line list comprehension over the registry. MCP description = spec.short. inputSchema = spec.input_schema. - workspace/executor_helpers.py get_a2a_instructions(mcp=True) and get_hma_instructions() now GENERATE the agent-facing system-prompt text from the registry. Heading + per-tool bullet (spec.short) + per-tool when_to_use + a section-specific footer. No more hand-maintained instruction blocks that drift from reality. - workspace/builtin_tools/delegation.py Renamed delegate_to_workspace -> delegate_task_async to match registry. check_delegation_status -> check_task_status. Added sync delegate_task @tool wrapping a2a_tools.tool_delegate_task (was missing for LangChain runtimes — CP review Issue 3). - workspace/builtin_tools/memory.py Renamed search_memory -> recall_memory to match registry. - workspace/adapter_base.py, workspace/main.py Bundle all 7 core tools (was 6) into all_tools / base_tools. - workspace/coordinator.py, shared_runtime.py, policies/routing.py Updated system-prompt-text references to use the registry names. Structural alignment tests workspace/tests/test_platform_tools.py — 9 tests pin every registry-to-adapter mapping: - registry names are unique - a2a + memory partition is complete (no orphans) - by_name lookup works - MCP server registers exactly the registry's tool set - MCP description equals registry.short for every tool - MCP inputSchema equals registry.input_schema for every tool - get_a2a_instructions text contains every a2a tool name - get_hma_instructions text contains every memory tool name - pre-rename names (delegate_to_workspace, search_memory, check_delegation_status) cannot leak back Adding a future tool means adding one ToolSpec; the test failure list tells the author exactly which adapter to update. Adapter pattern for future SDK support When (e.g.) AutoGen or Pydantic AI gets adapters, the only work needed for tool surfacing is "wrap registry.TOOLS in your SDK's tool format." Names, descriptions, schemas, impl come from the registry — adapter author writes zero strings. Why this needed to ship now PR #2237 (already in staging) injected MCP-world docs as the default system-prompt content. Without the registry, those docs said "delegate_task" while LangChain runtimes only had "delegate_to_workspace" — workers see docs for tools that don't exist (CP review Issue 1+3). PR #2239 was a tactical rename; this PR is the structural fix that prevents the same class of drift from recurring as new adapters ship. PR #2239 was closed in favor of this — same renames, plus the registry, plus structural tests. Single coherent change. Tests: 1232 pass, 2 xfailed (pre-existing). 9 new in test_platform_tools.py; 4 alignment tests in test_prompt.py from #2237 still pass; original test_executor_helpers tests adapted to the registry-driven world. Refs: CP review Issues 1, 2, 3, 5; project memory project_runtime_native_pluggable.md (platform owns A2A); project memory feedback_doc_tool_alignment.md (this is the structural fix for the tactical lesson).	2026-04-28 17:11:36 -07:00
Hongming Wang	448709f4b4	fix(prompt): inject A2A and HMA tool instructions into system prompt Workers were registering platform tools (delegate_task, delegate_task_async, list_peers, check_task_status, send_message_to_user, commit_memory, recall_memory) but the build_system_prompt assembly never included documentation for any of them. The instruction-text functions get_a2a_instructions() and get_hma_instructions() exist in executor_helpers.py and have unit tests, but were not called from any production code path — workers received system-prompt.md content only and saw the tools as bare names with no usage guidance. Symptom: agents called commit_memory and delegate_task without knowing they were platform tools. They worked when the agent guessed the API correctly and silently failed when the agent didn't. Fix: build_system_prompt() now appends both instruction sets between the Skills section and the Peers section. The placement is intentional — A2A docs explain how to call delegate_task; the peer list is the data that delegate_task operates over, so the docs precede the peer table. New parameter `a2a_mcp: bool = True` lets adapters opt into the CLI subprocess variant of the A2A instructions for runtimes without MCP support (ollama, custom CLI runtimes). Default True covers the MCP-capable majority (claude-code, hermes, langchain, crewai). Adapter callers don't need to change unless they specifically need CLI mode. Tests: 4 new regression tests in test_prompt.py pin - A2A MCP variant injection (default) - A2A CLI variant injection (a2a_mcp=False, with MCP-only fields absent) - HMA instruction injection - A2A docs precede peer list ordering Full suite green: 1223 passed, 2 xfailed.	2026-04-28 16:43:36 -07:00
Hongming Wang	96acbd719b	test: update test_peer_capabilities_format for fallback behavior The previous assertion `'Silent Agent' not in result` was pinning the buggy behavior — peers without an agent_card were silently dropped from the prompt. With the fallback to DB name+role those peers are correctly visible. Flip the assertion so the test pins the new (correct) rendering and would catch a regression to the silent-drop behavior.	2026-04-28 14:15:42 -07:00
Hongming Wang	8ff0748ab9	fix(workspace): keep peers visible in coordinator prompt when agent_card is null Bug: a Design Director coordinator with 6 freshly-created worker peers rendered an empty `## Your Peers` section in its system prompt — the hosting registry endpoint correctly returned all 6 peers, but `summarize_peer_cards()` silently dropped every entry whose `agent_card` column was null (the default until A2A discovery has run end-to-end against the worker). The coordinator then refused to delegate any task because "no peers exist". Fix: fall back to the registry row's `name` and `role` columns when `agent_card` is missing, malformed, or wrong-typed, instead of skipping the peer. The registry endpoint (`workspace-server/internal/handlers/discovery.go:queryPeerMaps`) has always returned both fields — they were just being thrown away on the consumer side. `build_peer_section()` now renders `Role: …` when the agent_card-derived skill list is empty so the coordinator's prompt still has something concrete to delegate against. Also hoists `import json` out of the per-peer loop body to module level (was previously imported once per iteration). Tests: new `test_shared_runtime_peer_summary.py` pins all four fallback cases (null / malformed string / wrong type / null + no DB name) plus the agent-card-present happy path and the mixed-list case the coordinator actually consumes. First peer-summary test coverage `shared_runtime.py` has had — no prior tests existed. Refs: 2026-04-27 Design Director discovery report from infra team.	2026-04-28 14:10:29 -07:00
Hongming Wang	3eb599bbb6	fix(workspace): use SDK constant for agent-card readiness probe The initial-prompt readiness probe in workspace/main.py hardcoded the pre-1.x well-known path. After the a2a-sdk 1.x bump the SDK started mounting the agent card at the new canonical path (the value of `a2a.utils.constants.AGENT_CARD_WELL_KNOWN_PATH`), so the probe returned 404 every attempt and silently fell through to "server not ready after 30s, skipping". Net effect: every workspace silently dropped its `initial_prompt` from config.yaml — the agent never sent the kickoff self-message, and users hit a fresh chat with no context. Reported by an external user as "/.well-known/agent.json 404 — the a2a-sdk agent card route was not being mounted at the expected path". The route IS mounted; the probe was looking at the wrong place. Fix imports `AGENT_CARD_WELL_KNOWN_PATH` from `a2a.utils.constants` and uses it directly in the probe URL — the SDK constant is now the single source of truth, so any future rename travels through automatically. Adds two static regression tests pinning the invariant: 1. No hardcoded `/.well-known/agent.json` literal anywhere in main.py. 2. The probe URL fstring interpolates AGENT_CARD_WELL_KNOWN_PATH (catches a "fix" that imports the constant for show but reverts to a literal in the actual GET). Verified manually inside ghcr.io/molecule-ai/workspace-template-langgraph that AGENT_CARD_WELL_KNOWN_PATH == '/.well-known/agent-card.json' and that `create_agent_card_routes(card)` mounts at exactly that path — constant + mount are aligned in the runtime image, so the probe will now find the server. Full workspace test suite: 1209 passed, 2 xfailed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 16:43:32 -07:00
Hongming Wang	e87a9c3858	fix(a2a): auto-retry transient transport errors in send_a2a_message Three different intermittent failures observed during a single manual-test session — RemoteProtocolError, ReadTimeout, ConnectError — each surfaced as a "Failed to deliver to <peer>" error chip in the canvas Agent Comms panel even though the next attempt would have succeeded (verified by direct probes from the same source workspace to the same peer). The error message even told the user "Usually a transient network blip — retry once," but it left the retry to a human reading the error message. Auto-retry inside send_a2a_message itself: up to 5 attempts (1 initial + 4 retries) with exponential backoff (1s, 2s, 4s, 8s, 16s-capped), each backoff jittered ±25% to break sync across siblings. Cumulative wall-clock capped at 600s by _DELEGATE_TOTAL_BUDGET_S so a string of 5×300s ReadTimeouts can't make the caller wait 25 minutes — once the deadline elapses, retries stop even if attempts remain. Retry only on transport-layer transients: - ConnectError / ConnectTimeout (peer's listening socket not ready) - RemoteProtocolError (peer closed TCP without writing — observed when a peer's prior in-flight Claude SDK session aborted) - ReadError / WriteError (network blip on Docker bridge) - ReadTimeout (peer wrote no response in 300s) Application-level errors are NOT retried — they're deterministic and retrying just wastes wall-clock: - HTTP 4xx (peer rejected the request format) - JSON parse failures (peer returned garbage) - JSON-RPC error in response body (peer's runtime errored cleanly) - Programmer-bug exceptions (ValueError, etc.) 8 new tests pin the contract: - retry succeeds after 2 RemoteProtocolErrors - retry succeeds after 1 ConnectError - all 5 attempts fail → returns formatted last-error - capped at exactly _DELEGATE_MAX_ATTEMPTS (regression cover for "did someone bump the constant accidentally?") - JSON-RPC error response NOT retried (1 attempt only) - non-httpx exception NOT retried (programmer bugs stay loud) - total budget caps the loop even if attempts remain - backoff schedule grows exponentially with ±25% jitter Refactor: extracted _format_a2a_error() so the success and exhausted paths share one error-formatting routine. _delegate_backoff_seconds() is a pure function so the schedule is unit-testable without monkey- patching asyncio.sleep. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 13:52:01 -07:00
Hongming Wang	81c4c1321c	fix(runtime): use lowercase wire role for v0.3 JSON-RPC compat layer Manual-test failure surfaced what was hidden behind the MCP-path bug: once delegate_task could actually fire, every cross-workspace call came back as JSON-RPC -32600 "Invalid Request" with the underlying pydantic ValidationError: params.message.role Input should be 'agent' or 'user' [type=enum, input_value='ROLE_USER', input_type=str] PR #2184's a2a-sdk 1.x migration sweep over-corrected: it changed every `"role": "user"` literal in JSON-RPC payload construction to `"role": "ROLE_USER"` to match the protobuf enum names of the 1.x native types (a2a.types.Role.ROLE_USER / ROLE_AGENT). That was correct for in-process Message construction (which the SDK serialises before wire transmission) but WRONG for the 8 sites that hand-build JSON-RPC payloads. The workspace's own a2a-sdk runs inbound requests through the v0.3 compat adapter (/usr/local/lib/python3.11/site-packages/a2a/compat/v0_3/) because main.py sets enable_v0_3_compat=True for backwards compatibility, and that adapter validates against the v0.3 Pydantic Role enum (`agent` \| `user` lowercase). The protobuf-style names blow it up. Reverted the 8 wire-payload sites to lowercase: - workspace/a2a_client.py:74 - workspace/a2a_cli.py:74, 111 - workspace/heartbeat.py:378 - workspace/main.py:464, 563 - workspace/builtin_tools/a2a_tools.py:60 - workspace/builtin_tools/delegation.py:272 Native-type usage at workspace/a2a_executor.py:471 (`Role.ROLE_AGENT`) stays — that's an in-process Message construction; the SDK handles wire serialisation correctly. Updated the misleading comment at main.py:255-257 (which said "outbound payloads are now 1.x-shaped (ROLE_USER)") to spell out the actual rule: outbound JSON-RPC wire payloads MUST use v0.3 shape, native types are only for in-process construction. New regression test test_jsonrpc_wire_role_format.py greps the 6 wire-payload-emitting files for any "ROLE_USER" / "ROLE_AGENT" string literal and fails loud — cheapest possible drift detector. Why E2E missed it: the priority-runtimes harness sends a single message canvas → workspace, but the canvas already used lowercase "user" (it never went through the migration sweep). The bug only surfaces on workspace → workspace delegation, which the harness doesn't exercise. Same gap as #131 (extend smoke to call main() against a stub). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 12:40:11 -07:00
Hongming Wang	9c3695df6d	test(runtime): update molecule_ai_status test for renamed error prefix Pre-existing test_set_status_exception_prints_to_stderr asserted on the legacy "molecule-monorepo-status: failed to update" prefix string. The prior commit renamed it to "molecule_ai_status: failed to update" so the printed label matches the canonical module-form invocation (`python3 -m molecule_runtime.molecule_ai_status`) instead of a shell alias that only ever existed in the dev-only base image. Updating the expected substring in lockstep. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 11:48:05 -07:00
Hongming Wang	28fc7a8cbd	fix(runtime): replace remaining /app/ legacy paths in agent prompts + docstrings Comprehensive sweep follow-up to the MCP server path fix. Audited every /app/ reference in the runtime source against the live claude-code template image and confirmed the actual /app/ contents post-#87 are ONLY: __init__.py, adapter.py, claude_sdk_executor.py, requirements.txt — every other workspace module ships in the wheel under site-packages/molecule_runtime/. Two more leaks found: 1. executor_helpers.py:_A2A_INSTRUCTIONS_CLI — inter-agent system prompt for non-MCP runtimes (Ollama, custom) had 5 lines telling the model `python3 /app/a2a_cli.py X`. Models copy these examples verbatim, so every CLI-runtime delegation would fail at the shell layer (no such file). Replaced with `python3 -m molecule_runtime.a2a_cli` form, which works regardless of where the wheel is installed. 2. molecule_ai_status.py docstring — usage examples invoked `python3 /app/molecule_ai_status.py` and claimed a `molecule-monorepo-status` shell alias. Both broken in current templates: the file's at site-packages, and `which molecule-monorepo-status` errors (the legacy symlink only existed in the dev-only workspace/Dockerfile base image, not in the standalone template Dockerfiles that ship to production). Updated docstring + the __main__ usage banner + the stderr error prefix to use the same `python3 -m molecule_runtime.X` form. Plugins audited and clean: WORKSPACE_PLUGINS_DIR=/configs/plugins, SHARED_PLUGINS_DIR=$PLUGINS_DIR fallback /plugins. No /app/ assumptions. Regression test: `test_a2a_cli_instructions_use_module_invocation_not_legacy_app_path` asserts the legacy /app/a2a_cli.py path can't drift back into the CLI system prompt and that the canonical module form is present. The legacy workspace/Dockerfile + workspace/entrypoint.sh + workspace/scripts/ still contain /app/-shaped paths but are dev-only base-image scaffolding (per workspace/build-all.sh's own header comment) — not shipped to the standalone template images. Out of scope here; can be cleaned up in a separate dead-code pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 11:22:00 -07:00
Hongming Wang	203a4f0f91	fix(runtime): resolve a2a_mcp_server.py path from wheel install location DEFAULT_MCP_SERVER_PATH was hardcoded to /app/a2a_mcp_server.py, which was correct under the pre-#87 monolithic-template Docker layout where the workspace/ tree was COPY'd into /app/. After the universal-runtime refactor (#87, #117), workspace modules ship inside the molecule-ai-workspace-runtime wheel under site-packages/molecule_runtime/, while /app/ now holds only template-specific files (adapter.py + the runtime-native executor for that template). Net effect: in every workspace built since the wheel cutover, Claude Code SDK's mcp_servers={"a2a": {"command": python, "args": ["/app/a2a_mcp_server.py"]}} pointed at a missing file. The subprocess launch failed silently, the SDK registered zero MCP tools, and the agent's list_peers / delegate_task / a2a_send_message / a2a_send_signal all disappeared. Symptom observed today: Design Director said "I tried to reach the perf auditor via the inter-agent MCP tools (list_peers, delegate_task) but those tools didn't resolve in this environment" and fell back to running the audit itself with WebFetch. Why this slipped through E2E: the priority-runtimes harness sends a single message and verifies a reply — it does not exercise inter-agent delegation, so the missing MCP tools are invisible at that layer. Fix: resolve the path relative to executor_helpers.py via __file__, which tracks wherever the wheel is installed (site-packages today, anywhere else tomorrow). The A2A_MCP_SERVER_PATH env override is preserved for tests / non-default layouts. Regression test: assert os.path.exists(DEFAULT_MCP_SERVER_PATH) so any future move of a2a_mcp_server.py out of the package directory fails at unit-test time instead of silently disabling delegation in production. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 11:15:06 -07:00
Hongming Wang	5b05d663ee	test: update a2a.helpers mock to export new_text_message The conftest mock only exposed `new_agent_text_message`, the pre-v1 name. After fixing a2a_executor.py to use the v1 name `new_text_message`, the mock didn't satisfy the import → CI red. Mock both names (aliased to the same lambda) so any in-flight test that still references the old name keeps working until the next sweep removes those references. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 05:34:28 -07:00
Hongming Wang	d19d35f6b3	test(skills): make watcher test fakes accept current_runtime kwarg The runtime-compat change in this branch added a `current_runtime` kwarg to load_skills(); the watcher passes it through. Test mocks that pre-date the kwarg signature broke with TypeError, which the watcher's reload-error try/except swallowed — the symptom was empty callback lists, not a clear failure. Switching the fakes to accept **kwargs keeps them forward-compat for future load_skills additions without another test churn. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 02:04:26 -07:00
Hongming Wang	d0057912d2	feat(skills): per-skill runtime compatibility (#119 , hermes pattern) SKILL.md frontmatter can now declare `runtime: [claude-code]` or `runtime: [hermes, claude-code]` to opt out of incompatible adapters instead of failing at first invocation. Default `[""]` means universal — existing skill libraries need zero migration. Borrowed from hermes' declarative skill-compat pattern surfaced in the hermes architecture survey. The remaining two patterns (event-log layer, observability config block) stay open under #119. Wiring: - SkillMetadata.runtime: list[str] = [""] - _normalize_runtime_field accepts list, string-sugar, missing -> [""]; malformed warns and falls back to universal so a typo never silently drops a skill. - load_skills(..., current_runtime=...) filters out skills whose runtime list lacks "" or current_runtime, with an INFO log line. - BaseAdapter.start passes type(self).name() so the live adapter drives the filter; SkillsWatcher takes the same kwarg so hot-reload honors it. 8 new tests cover default universal, no-field universal, explicit match/mismatch, string sugar, wildcard short-circuit, current_runtime=None (preserves old behavior), and malformed-warns-not-drops. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 01:57:43 -07:00
Hongming Wang	e99f937630	Merge pull request #2157 from Molecule-AI/chore/drop-cli-executor-from-runtime chore(workspace): drop cli_executor — Phase 3 of #87 [DRAFT]	2026-04-27 08:24:30 +00:00
Hongming Wang	98ca5c50fa	chore(workspace): drop cli_executor — Phase 3 of #87 (DRAFT, blocked on gemini-cli image rebuild) DRAFT — do NOT merge until gemini-cli template image rebuilds with its local cli_executor.py copy (template PR #9 just merged at 07:59 UTC; image build kicks off now). Final adapter-specific deletion from molecule-runtime, completing #87 for the priority adapters (claude-code via PR #2156, plus gemini-cli via this PR + template #9). Deletes: - workspace/cli_executor.py (461 LOC) — CLIAgentExecutor + the RUNTIME_PRESETS dict for codex / ollama / gemini-cli. The file moved to molecule-ai-workspace-template-gemini-cli (PR #9, merged). - workspace/tests/test_agent_base_urls.py — only consumer of CLIAgentExecutor in the test suite. Tests for the executor behavior live in the template repo now. Updates: - workspace/tests/test_executor_helpers.py — docstring refresh: executor_helpers.py is the runtime-agnostic shared helpers; the executor classes themselves live in template repos post-#87. Codex / ollama presets disappear naturally with the file. They never had template repos, so no production path could invoke them anyway — this is dead-code removal as a side effect of the move. Verified-safe-to-delete: - heartbeat.py: doesn't import cli_executor - claude_sdk_executor.py: deleted by PR #2156 (in flight) - preflight.py: only references runtime names by string; no import - main.py: doesn't import cli_executor (uses adapter discovery via ADAPTER_MODULE; the template's adapter constructs the executor) - Only test_agent_base_urls.py + test_executor_helpers.py docstring referenced cli_executor Verification: - 1249/1249 workspace pytest pass (was 1251; -2 = test_agent_base_urls.py cases — exact match) - No live import of cli_executor anywhere in molecule-core after deletion (grep verified) Sequencing: 1. ✅ Template PR #9 (gemini-cli local copy) — MERGED 2. ⏳ Template image rebuild — running 3. THIS PR — wait until image is published, then mark ready-for-review Closes #87 for the priority adapters: workspace/ is now adapter- agnostic except for adapter discovery (ADAPTER_MODULE) + the runtime_wedge primitive. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 01:22:39 -07:00
Hongming Wang	7504aba934	feat(tools): tighten send_message_to_user description to forbid pasting URLs in body Root-cause fix for #118 (chat attachments rendering as plain text links instead of download chips). User flagged with screenshot 2026-04-26 showing the Design Director agent pasting https://files.catbox.moe/… in the message body — chat rendered the URL as plain markdown text, unclickable in the canvas's bubble layout, and unreachable in any SaaS deployment where the user's browser can't egress to catbox. The structured `attachments` field already exists, the canvas's AttachmentChip already renders well, the WebSocket broadcast already carries attachments verbatim — the missing piece was the LLM choosing the body over the structured field. Tighten the tool description so it trains the right behavior. Three targeted strengthenings: 1. Top-level tool description: enumerated use case (4) now reads "via the `attachments` field (NEVER paste file URLs in `message`)". The all-caps NEVER + the explicit field name move the LLM toward the structured path on first read. 2. `message` param: adds an explicit DO NOT rule with rationale. Includes the SaaS-reachability reason so operators can grep for "SaaS" and find this design constraint instead of re-discovering it after a tenant complaint. Calls out catbox.moe + file:// by name as concrete examples of forbidden hosts (those are the two we've seen in production). 3. `attachments` param: leads with REQUIRED, lists the bad alternatives explicitly (pasting URLs, base64-encoding, telling user to look at a path). LLMs handle "use X, NOT Y" framings better than "use X" alone — observed during prompt-engineering iteration on hermes' tool descriptions. Tests pin all three load-bearing phrases (4 new in test_a2a_mcp_server.py) so a future doc edit that softens or drops them fails CI. Brittle by design — these are prompt-engineering invariants, not implementation details. This is the root-cause fix. A defensive canvas-side backstop (auto- detect download-shaped URLs in body and convert to chips) is a follow-up that could land separately if the steering proves insufficient in practice. Verification: - 1190/1190 workspace pytest pass - 4 new test_a2a_mcp_server.py cases all green Closes the steering half of #118. The structured-attachments-only contract was already enforced server-side (PR #2130 added per-attachment validation); this PR closes the prompt-side gap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 01:13:11 -07:00
Hongming Wang	4e6030d783	Merge pull request #2156 from Molecule-AI/chore/drop-claude-sdk-executor-from-runtime chore(workspace): drop claude_sdk_executor — Phase 2 of #87	2026-04-27 08:02:51 +00:00
Hongming Wang	4b5ac2ebc2	chore(workspace): drop claude_sdk_executor — Phase 2 of #87 Phase 2 of the universal-runtime refactor (task #87). Now that the claude-code template repo ships its own claude_sdk_executor.py (template PR #13 merged + image rebuilt at 07:36 UTC) the molecule-runtime no longer needs to ship the file. Deletes: - workspace/claude_sdk_executor.py (704 LOC) - workspace/tests/test_claude_sdk_executor.py (~1.6K LOC) Updates: - workspace/runtime_wedge.py — drops the "Compatibility shim" docstring section. The shim was time-bounded ("removed once #87 Phase 2 lands"); this is that PR. - workspace/tests/test_runtime_wedge.py — drops the TestClaudeSdkExecutorReExportShim test class (the shim doesn't exist anymore so the identity assertions would fail at import). - workspace/tests/conftest.py — drops the claude_agent_sdk stub. Its only consumer was test_claude_sdk_executor.py which is gone; no other test imports the SDK. - workspace/cli_executor.py — comment refresh: claude-code template repo (not workspace/) is now the home for ClaudeSDKExecutor. Verified-safe-to-delete: - heartbeat.py: migrated to runtime_wedge in PR #2154 (no longer imports from claude_sdk_executor) - cli_executor.py: only comments referenced claude_sdk_executor; its line-117 ValueError defends against accidental routing - tests: only test_claude_sdk_executor.py + test_runtime_wedge.py's shim class consumed the deleted module; both removed in this PR Verification: - 1182/1182 workspace pytest pass (was 1251; -69 = exactly the deleted test cases — zero unexpected regressions) - No live import of claude_sdk_executor anywhere in molecule-core after deletion (grep verified) Closes #87 for the claude-code adapter. Hermes is already template-only. The remaining adapter-specific code in workspace/ is cli_executor.py (codex/ollama/gemini-cli) tracked by task #122. preflight.py's SUPPORTED_RUNTIMES static list is tracked by task #123 (PR #2155 in flight). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 00:52:55 -07:00
Hongming Wang	7dba700ac3	feat(preflight): replace SUPPORTED_RUNTIMES static list with adapter discovery Closes task #123 — last piece of #87 cleanup. Pre-fix: workspace/preflight.py:11 hardcoded a tuple of "supported" runtime names (claude-code, codex, ollama, langgraph, etc.). Every new template repo required a code change in molecule-runtime to be recognized — direct violation of the universal-runtime principle (#87) where adapters declare themselves and the runtime stays generic. Post-fix: discovery-based validation via the same ADAPTER_MODULE env var that production load paths already consult (workspace/adapters/__init__.py:get_adapter). Distinguished failure modes so operator messages are concrete: - ADAPTER_MODULE unset → "no adapter installed; set the env var" - ADAPTER_MODULE set but module won't import → import error type + message - module imports but no Adapter class → "convention violation, add `Adapter = YourClass`" - Adapter.name() raises → caught with operator message - Adapter.name() returns non-string → contract violation message - Adapter.name() doesn't match config.runtime → drift WARNING (not fatal; the adapter wins in production, config.yaml is just documentation) The drift case is the one behavioral change worth calling out: the prior static-list path would have hard-failed config.runtime values not in the allowlist. With discovery, an unknown runtime in config.yaml is just a documentation drift — the adapter that's actually installed runs regardless. Operator gets a warning naming both the configured and installed names so they can fix whichever is stale. Tests: - Replaces the obsolete "static list pass/fail" tests with 6 new cases covering each distinguished failure mode, plus a positive test for the adapter-matches-config happy path - Adds an autouse `_default_langgraph_adapter` fixture that pre-installs a fake adapter via sys.modules monkey-patching, so existing tests building default WorkspaceConfig (runtime="langgraph") inherit a valid adapter without each test setting ADAPTER_MODULE - Failure-mode tests opt out of the default fixture via @pytest.mark.no_default_adapter (registered in pytest.ini) - Sentinel pattern (`_UNSET = object()`) for `name_returns` so None is a passable test value (otherwise `is not None` would skip the None branch — exact bug the sentinel avoids) Verification: - 22/22 preflight tests pass (was 16; +6 new failure-path tests) - 1256/1256 workspace pytest pass (was 1251; +5 net) - No production code path other than preflight changed Source: 2026-04-27 #87 cleanup audit after PR #2154 (wedge extraction). This change is independent of the cli_executor.py template moves (task #122) — completes one of the two remaining cleanup items. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 00:44:51 -07:00
Hongming Wang	1d231ed295	refactor(wedge): extract claude_sdk_executor wedge state into runtime_wedge module Prerequisite for the universal-runtime refactor (task #87) to move claude_sdk_executor.py out of molecule-runtime into the claude-code template repo. heartbeat.py had a hard import: from claude_sdk_executor import is_wedged, wedge_reason which would break the moment the executor moves out of the runtime package — the heartbeat would lose access to the wedge state used to flip workspace status to degraded. Extract the wedge state to a runtime-side module that the heartbeat can keep importing regardless of which adapter executor is wedged: - workspace/runtime_wedge.py — single-flag state + mark_wedged / clear_wedge / is_wedged / wedge_reason / reset_for_test. Same semantics as the original claude_sdk_executor implementation (sticky first-write-wins, auto-clear on observed success). 100 LOC of pure stateless helpers; lock-free ok because there's one executor per workspace process today. - workspace/claude_sdk_executor.py — drops the in-file definitions; re-exports the same names from runtime_wedge as a backwards-compat shim. Any third-party adapter that imported is_wedged / wedge_reason / _mark_sdk_wedged from claude_sdk_executor keeps working for one release cycle while they migrate to runtime_wedge. - workspace/heartbeat.py — _runtime_state_payload() now imports from runtime_wedge instead of claude_sdk_executor. Lazy-import pattern preserved; the docstring updated to explain the new cross-cutting source-of-truth. Tests (10 new in test_runtime_wedge.py): - Default state (unwedged), mark sets flag, first-write-wins, clear restores healthy, clear-when-not-wedged is no-op, re-marking after clear is allowed - Re-export shim: each old name in claude_sdk_executor IS the runtime_wedge function (identity check), state is shared (marking via the executor shim is observable via runtime_wedge and vice versa) Verification: - 1251/1251 workspace pytest pass (was 1241 after orphan deletion; +10 = exactly the new test_runtime_wedge.py cases) - All existing test_claude_sdk_executor.py cases (which call _mark_sdk_wedged via the shim) still pass After this lands + the claude-code template image rebuilds with the local claude_sdk_executor.py copy (template PR #13), the molecule- core deletion of workspace/claude_sdk_executor.py becomes safe (the shim deletion comes alongside the file deletion, since runtime_wedge is the new public API). See project memory `project_runtime_native_pluggable.md`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 00:08:53 -07:00
Hongming Wang	fa8deb9d16	chore(workspace): delete orphan HermesA2AExecutor (dead code, 1.8K LOC) Removes: - workspace/hermes_executor.py (545 LOC) — HermesA2AExecutor, an OpenAI-compat direct-call executor that was the original hermes integration before the template was rewritten to bridge to hermes-agent's sidecar API server. - workspace/tests/test_hermes_executor.py (1307 LOC) — its test file. Verified-dead-code analysis: - Zero `from hermes_executor` / `import hermes_executor` imports anywhere in workspace/, workspace-server/, or workspace-configs-templates/ (excluding the file itself + its test). - The hermes template (workspace-configs-templates/hermes/executor.py) uses HermesAgentProxyExecutor, NOT HermesA2AExecutor — they're independent implementations. The executor.py file imports from `executor` (local), not from molecule_runtime. - Last touched in PR #1974 (2026 a2a-sdk migration to 1.0.0) for SDK compatibility — kept compiling but never wired into any code path. - Older than that, only the 2026 open-source restructure rename. Why now: starting task #87 (universal-runtime violation, move adapter- specific code out of workspace/). Dead-code deletion is the safest first step and motivates the broader refactor by clearing the landscape — no risk of someone defending HermesA2AExecutor as "actually used somewhere." Verification: - 1241/1241 workspace pytest pass (was 1312; the 71 dropped tests are exactly test_hermes_executor.py's coverage) - No new failures, no broken imports anywhere The remaining adapter-specific executors in workspace/ that #87 will eventually relocate (per the user's scope: claude-code + hermes priority, others later): - workspace/claude_sdk_executor.py (757 LOC) → claude-code template repo - workspace/cli_executor.py (461 LOC) → defer (codex/ollama/etc still use the runtime presets here; comes back later when those bump versions) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 23:52:10 -07:00
Hongming Wang	af664e3e87	feat(tools): borrow hermes-style discipline — error/summary caps + sharper MCP descriptions Three small wins from the hermes-agent design survey, bundled because each is too small for its own PR but they all improve the priority adapters (claude-code + hermes) immediately. 1. Hermes-style cap on telemetry fields, applied INSIDE report_activity so every caller benefits without remembering. error_detail capped at 4096 (hermes' value); summary capped at 256 (one-liner ceiling). The existing call site in tool_delegate_task already truncated error_detail at 4096, but moving the cap into the helper closes the door on a future caller pasting a giant traceback. response_text is NOT capped (it's the agent's user-visible reply; truncating would silently drop content). Pinned by 4 new tests including a negative-pin that response_text MUST stay untruncated. 2. Sharper MCP tool descriptions for commit_memory + recall_memory — hermes' delegate_task description literally says "WAIT for the response" and delegate_task_async says "Returns immediately." LLMs pick the right tool variant from descriptions; ambiguity costs accuracy. - commit_memory now states it APPENDS (each call creates a row, no overwrite) and that GLOBAL requires tier 0. - recall_memory now states it's case-insensitive substring search with no pagination, returns all matches, and that empty-query is cheap and safer than a narrow keyword. 3. (no code change) Filed task #120 for the bigger user-flow win — a per-workspace tool enable/disable menu in Canvas Config — and task #121 for model-string passthrough (depends on #87 universal-runtime refactor). Verification: - 1312/1312 Python pytest pass (was 1308, +4 new) See task #119 for the architectural follow-ups (event-log layer, declarative skill compat, observability config block) and project memory `project_runtime_native_pluggable.md`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 23:25:54 -07:00
Hongming Wang	aa70727ab9	fix(test): drop unused MagicMock import in test_heartbeat_runtime_metadata Reviewer bot flagged: import was leftover from earlier scaffolding — all test fixtures use sys.modules monkey-patching with SimpleNamespace instead. Drop to unblock merge. Tests still 5/5 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 22:58:21 -07:00
Hongming Wang	0d3058585b	feat(runtime): adapter-declared idle_timeout_override end-to-end Capability primitive #2 (task #117). The first cross-cutting capability where the adapter actually displaces platform behavior — claude-code's streaming session can legitimately go silent for 8+ minutes during synthesis + slow tool calls; the platform's hardcoded 5min idle timer in a2a_proxy.go cancels it mid-flight (the bug PR #2128 patched at the env-var layer). This PR fixes it at the right layer: the adapter declares "I need 600s" and the platform's dispatch path honors it. Wire shape (Python → Go): POST /registry/heartbeat { "workspace_id": "...", ... "runtime_metadata": { "capabilities": {"heartbeat": false, "scheduler": false, ...}, "idle_timeout_seconds": 600 // optional, omitted = use default } } Default behavior preserved: any adapter that doesn't override BaseAdapter.idle_timeout_override() (returns None by default) sends no idle_timeout_seconds field; the Go side falls through to idleTimeoutDuration (env A2A_IDLE_TIMEOUT_SECONDS, default 5min). Existing langgraph / crewai / deepagents workspaces are unaffected. Components: Python: - adapter_base.py: idle_timeout_override() method on BaseAdapter returning None (the platform-default sentinel). - heartbeat.py: _runtime_metadata_payload() lazy-imports the active adapter and assembles the capability + override block. Try/except swallows ANY error so heartbeat never breaks because of capability discovery — observability outranks capability accuracy. Go: - models.HeartbeatPayload.RuntimeMetadata (pointer so absent = "old runtime, didn't say"; explicit zero-cap = "new runtime, declared no native ownership"). - handlers.runtimeOverrides: in-memory sync.Map cache keyed by workspaceID. Populated by the heartbeat handler, consulted on every dispatchA2A. Reset on platform restart (worst-case 30s of platform-default behavior — acceptable; nothing about overrides is correctness-critical). - a2a_proxy.dispatchA2A: looks up the override before applyIdle Timeout; falls through to global default when absent. Tests: Python (17, all new): - RuntimeCapabilities dataclass shape (frozen, defaults, wire keys) - BaseAdapter.capabilities() default + override + sibling isolation - idle_timeout_override default, positive override, dropped-override - Heartbeat metadata producer: default adapter emits all-False, native adapter emits flag + override, missing ADAPTER_MODULE returns {} (graceful), zero/negative override is omitted from wire, exception inside adapter swallowed Go (6, all new): - SetIdleTimeout + IdleTimeout round-trip - Zero/negative duration clears the override - Empty workspace_id ignored - Replacement (heartbeat overwrites prior value) - Reset clears entire cache - Concurrent reads + writes (sync.Map invariant) Verification: - 1308 / 1308 workspace pytest pass (was 1300, +8) - All Go handlers tests pass (6 new + existing) - go vet clean See project memory `project_runtime_native_pluggable.md` for the architecture principle this implements. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 22:38:01 -07:00
Hongming Wang	205a454c09	feat(runtime): RuntimeCapabilities dataclass + BaseAdapter.capabilities() Foundation primitive for the native+pluggable runtime principle (task #117, blocks #87). Lets each adapter declare which cross-cutting capabilities it owns natively (heartbeat, scheduler, durable session, status mgmt, retry, activity decoration, channel dispatch) versus delegates to the platform's fallback implementation. Pure additive: every existing adapter inherits BaseAdapter.capabilities() which returns RuntimeCapabilities() — every flag False — so today's "platform owns everything" behavior is preserved exactly. Subsequent PRs land platform-side consumers (idle-timeout override, scheduler skip, status-transition hook, etc.) one capability at a time. Why a frozen dataclass instead of class attributes: capabilities are declared at class-load time and read by the platform on every heartbeat. A mutable value would let a runtime change capabilities mid-flight, creating impossible-to-debug state where the platform's idea of who- owns-heartbeat drifts from the adapter's actual code. Why a `to_dict()` with explicit short keys: the Go side will read these from the heartbeat payload by string key. The dict's wire names are pinned independently of Python field names so a Python-side rename doesn't silently break the Go consumer (test pins this). Tests (9 new): - is a frozen dataclass (mutation rejected) - all 7 default flags are False (load-bearing — flipping any default silently moves ownership for langgraph/crewai/deepagents) - to_dict() keys are stable wire names (Go contract) - BaseAdapter.capabilities() default returns all-False - subclass override mechanism works - sibling adapters' defaults aren't affected by an override Verification: - 1300/1300 workspace pytest pass (was 1291, +9) - Zero behavior change for any existing code path See project memory `project_runtime_native_pluggable.md`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 22:17:49 -07:00
Hongming Wang	6eaacf175b	fix(notify): review-flagged Critical + Required findings on PR #2130 Two Critical bugs caught in code review of the agent→user attachments PR: 1. Empty-URI attachments slipped past validation. Gin's go-playground/validator does NOT iterate slice elements without `dive` — verified zero `dive` usage anywhere in workspace-server — so the inner `binding:"required"` tags on NotifyAttachment.URI/Name were never enforced. `attachments: [{"uri":"","name":""}]` would pass validation, broadcast empty-URI chips that render blank in canvas, AND persist them in activity_logs for every page reload to re-render. Added explicit per-element validation in Notify (returns 400 with `attachment[i]: uri and name are required`) plus defence-in-depth in the canvas filter (rejects empty strings, not just non-strings). 3-case regression test pins the rejection. 2. Hardcoded application/octet-stream stripped real mime types. `_upload_chat_files` always passed octet-stream as the multipart Content-Type. chat_files.go:Upload reads `fh.Header.Get("Content-Type")` FIRST and only falls back to extension-sniffing when the header is empty, so every agent-attached file lost its real type forever — broke the canvas's MIME-based icon/preview logic. Now sniff via `mimetypes.guess_type(path)` and only fall back to octet-stream when sniffing returns None. Plus three Required nits: - `sqlmockArgMatcher` was misleading — the closure always returned true after capture, identical to `sqlmock.AnyArg()` semantics, but named like a custom matcher. Renamed to `sqlmockCaptureArg(*string)` so the intent (capture for post-call inspection, not validate via driver-callback) is unambiguous. - Test asserted notify call by `await_args_list[1]` index — fragile to any future _upload_chat_files refactor that adds a pre-flight POST. Now filter call list by URL suffix `/notify` and assert exactly one match. - Added `TestNotify_RejectsAttachmentWithEmptyURIOrName` (3 cases) covering empty-uri, empty-name, both-empty so the Critical fix stays defended. Deferred to follow-up: - ORDER BY tiebreaker for same-millisecond notifies — pre-existing risk, not regression. - Streaming multipart upload — bounded by the platform's 50MB total cap so RAM ceiling is fixed; switch to streaming if cap rises. - Symlink rejection — agent UID can already read whatever its filesystem perms allow via the shell tool; rejecting symlinks doesn't materially shrink the attack surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 19:47:31 -07:00
Hongming Wang	d028fe19ff	feat(notify): agent → user file attachments via send_message_to_user Closes the gap where the Director would say "ZIP is ready at /tmp/foo.zip" in plain text instead of attaching a download chip — the runtime literally had no API for outbound file attachments. The canvas + platform's chat-uploads infrastructure already supported the inbound (user → agent) direction (commit `94d9331c`); this PR wires the outbound side. End-to-end shape: agent: send_message_to_user("Done!", attachments=["/tmp/build.zip"]) ↓ runtime POST /workspaces/<self>/chat/uploads (multipart) ↓ platform /workspace/.molecule/chat-uploads/<uuid>-build.zip → returns {uri: workspace:/...build.zip, name, mimeType, size} ↓ runtime POST /workspaces/<self>/notify {message: "Done!", attachments: [{uri, name, mimeType, size}]} ↓ platform Broadcasts AGENT_MESSAGE with attachments + persists to activity_logs with response_body = {result: "Done!", parts: [{kind:file, file:{...}}]} ↓ canvas WS push: canvas-events.ts adds attachments to agentMessages queue Reload: ChatTab.loadMessagesFromDB → extractFilesFromTask sees parts[] Either path → ChatTab renders download chip via existing path Files changed: workspace-server/internal/handlers/activity.go - NotifyAttachment struct {URI, Name, MimeType, Size} - Notify body accepts attachments[], broadcasts in payload, persists as response_body.parts[].kind="file" canvas/src/store/canvas-events.ts - AGENT_MESSAGE handler reads payload.attachments, type-validates each entry, attaches to agentMessages queue - Skips empty events (was: skipped only when content empty) workspace/a2a_tools.py - tool_send_message_to_user(message, attachments=[paths]) - New _upload_chat_files helper: opens each path, multipart POSTs to /chat/uploads, returns the platform's metadata - Fail-fast on missing file / upload error — never sends a notify with a half-rendered attachment chip workspace/a2a_mcp_server.py - inputSchema declares attachments param so claude-code SDK surfaces it to the model - Defensive filter on the dispatch path (drops non-string entries if the model sends a malformed payload) Tests: - 4 new Python: success path, missing file, upload 5xx, no-attach backwards compat - 1 new Go: Notify-with-attachments persists parts[] in response_body so chat reload reconstructs the chip Why /tmp paths work even though they're outside the canvas's allowed roots: the runtime tool reads the bytes locally and re-uploads through /chat/uploads, which lands the file under /workspace (an allowed root). The agent can specify any readable path. Does NOT include: agent → agent file transfer. Different design problem (cross-workspace download auth: peer would need a credential to call sender's /chat/download). Tracked as a follow-up under task #114. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 19:35:58 -07:00
Hongming Wang	5071454074	fix(delegation): lazy-refresh QUEUED state from platform; live DELEGATION_* events Critical follow-up to PR #2126's review. Two real bugs: 1. Runtime QUEUED never resolved. Platform's drain stitch updates the platform's delegate_result row when a queued delegation finally completes, but never pushes back to the runtime. The LLM polling check_delegation_status saw status="queued" forever — combined with the new docstring guidance ("queued → wait, peer will reply"), the model would wait indefinitely on a state that never resolves. Strictly worse than pre-PR behavior where it would have at least bypassed. 2. Live updates dead code. delegation.go writes activity rows by direct INSERT INTO activity_logs, bypassing the LogActivity helper that fires ACTIVITY_LOGGED. Adding "delegation" to the canvas's ACTIVITY_LOGGED filter (PR #2126 first cut) was inert — initial GET worked, live updates did not. Fix: (1) Runtime side, workspace/builtin_tools/delegation.py: - New `_refresh_queued_from_platform(task_id)` async helper that pulls /workspaces/<self>/delegations and finds the platform-side delegate_result row for our task_id. - check_delegation_status calls _refresh when local status is QUEUED, so the LLM's poll itself drives state convergence. - Best-effort: GET failure leaves local state untouched, next poll retries. - Docstring updated to reflect the actual behavior ("polls transparently — keep polling and you'll see the flip"). - 4 new tests cover: QUEUED → completed via refresh; QUEUED → failed via refresh; refresh keeps QUEUED when platform hasn't resolved; refresh swallows network errors safely. (2) Canvas side, AgentCommsPanel.tsx WS push handler: - Listens for DELEGATION_SENT / DELEGATION_STATUS / DELEGATION_COMPLETE / DELEGATION_FAILED in addition to ACTIVITY_LOGGED. - Each event's payload synthesized into an ActivityEntry shape so toCommMessage's existing delegation branch maps it. Status derived: STATUS uses payload.status, COMPLETE → "completed", FAILED → "failed", SENT → "pending". - The ACTIVITY_LOGGED branch keeps the "delegation" type accepted as a no-op-today / future-proof path: if delegation handlers are ever refactored to call LogActivity, this lights up automatically without another canvas change. Doesn't change: the docstring guidance ("queued → wait, don't bypass") is now actually load-bearing because the refresh path will deliver the eventual outcome. Without the refresh, the guidance was a trap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 16:05:04 -07:00
Hongming Wang	057876cb0c	fix(delegation): runtime handles 202+queued; canvas surfaces delegation rows Two bugs that compounded into the "Director does the work itself" UX: 1. workspace/builtin_tools/delegation.py: _execute_delegation only handled HTTP 200 in the response branch. When the peer's a2a-proxy returned HTTP 202 + {queued: true} (single-SDK-session bottleneck on the peer), the loop fell through. Two iterations later the `if "error" in result` check tried to access an unbound `result`, the goroutine ended quietly, and the delegation stayed at FAILED with error="None". The LLM checking status saw "failed" + the platform's "Delegation queued — target at capacity" log line in chat context, concluded the peer was permanently unavailable, and bypassed delegation to do the work itself. Fix: explicit 202+queued branch. Adds DelegationStatus.QUEUED, marks the local delegation as QUEUED, mirrors to the platform, and returns cleanly without retrying. The retry loop is for transient transport errors — queueing is a real ack, not a failure to retry against (retrying would just re-queue the same task). check_delegation_status docstring extended with explicit per-status guidance: pending/in_progress → wait, queued → wait (peer busy on prior task, reply WILL arrive), completed → use result, failed → real error in error field; only fall back on failed, never queued. 2. canvas/src/components/tabs/chat/AgentCommsPanel.tsx: filter dropped every delegation row because it whitelisted only a2a_send / a2a_receive. activity_type='delegation' rows (written by the platform's /delegate handler with method='delegate' or 'delegate_result') never reached toCommMessage. User saw "No agent-to-agent communications yet" while 6+ delegations existed in the DB. Fix: include "delegation" in the both the initial filter and the WS push filter, plus a delegation branch in toCommMessage that maps the row as outbound (always — platform proxies on our behalf) and uses summary as the primary text source. Tests: - 3 new Python tests cover the 202+queued path: status becomes QUEUED not FAILED; no retry on queued (counted by URL match against the A2A target since the mock is shared across all AsyncClient calls); bare 202 without {queued:true} still falls through to the existing retry-then-FAILED path. - 3 new TS tests cover the delegation mapper: 'delegate' row maps as outbound to target with summary text; queued 'delegate_result' preserves status='queued' (load-bearing for the LLM's wait-vs-bypass decision); missing target_id returns null instead of rendering a ghost. Does NOT solve: the underlying single-SDK-session bottleneck that causes peers to queue in the first place. Tracked as task #102 (parallel SDK sessions per workspace) — real architectural work. This PR makes the runtime handle the queueing correctly so the LLM doesn't bail out, and makes the delegations visible in Agent Comms so operators can see what's happening. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 15:01:50 -07:00
Hongming Wang	09bfd9bdce	fix(tests): hoist _executor_mod alias so async wedge tests pass under --cov The Copilot Auto-fix in `5a8f42b4` addressed the duplicate-import lint by removing 'import claude_sdk_executor as _executor_mod' entirely, but the async wedge tests (test_execute_marks_wedge_, test_execute_clears_wedge_) still call _executor_mod._reset_sdk_wedge_for_test() etc. — so they failed with NameError once that line was removed. Restore the alias, but at the top of the file (alongside the other module- level imports) rather than at line 1248. The late-file binding was the proximate cause of the original CI failure: with --cov enabled (#1817), sys.settrace + the @pytest.mark.asyncio wrapper combination caused the late module-level binding to not be visible from inside the async test bodies, even though the binding existed at module-load time. Hoisting fixes that scope-resolution issue. Verified locally with the exact CI config (--cov-fail-under=86): 1280 passed, 2 xfailed — total coverage 90.25% 🤖 Generated with [Claude Code](https://claude.com/claude-code)	2026-04-26 10:57:21 -07:00
Hongming Wang	5a8f42b405	Potential fix for pull request finding 'Module is imported with 'import' and 'import from'' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>	2026-04-26 10:45:37 -07:00
Hongming Wang	d0f198b24f	merge: resolve staging conflicts (a2a_proxy + workspace_crud) Three files conflicted with staging changes that landed while this PR sat open. Resolved each by combining both intents (not picking one side): - a2a_proxy.go: keep the branch's idle-timeout signature (workspaceID parameter + comment) AND apply staging's #1483 SSRF defense-in-depth check at the top of dispatchA2A. Type-assert h.broadcaster (now an EventEmitter interface per staging) back to Broadcaster for applyIdleTimeout's SubscribeSSE call; falls through to no-op when the assertion fails (test-mock case). - a2a_proxy_test.go: keep both new test suites — branch's TestApplyIdleTimeout_ (3 cases for the idle-timeout helper) AND staging's TestDispatchA2A_RejectsUnsafeURL (#1483 regression). Updated the staging test's dispatchA2A call to pass the workspaceID arg introduced by the branch's signature change. - workspace_crud.go: combine both Delete-cleanup intents: * Branch's cleanupCtx detachment (WithoutCancel + 30s) so canvas hang-up doesn't cancel mid-Docker-call (the container-leak fix) * Branch's stopAndRemove helper that skips RemoveVolume when Stop fails (orphan sweeper handles) * Staging's #1843 stopErrs aggregation so Stop failures bubble up as 500 to the client (the EC2 orphan-instance prevention) Both concerns satisfied: cleanup runs to completion past canvas hangup AND failed Stop calls surface to caller. Build clean, all platform tests pass. 🤖 Generated with [Claude Code](https://claude.com/claude-code)	2026-04-26 10:43:22 -07:00
rabbitblood	4a4a740804	refactor(test_config): parametrize the 3 yaml-default cases (simplify on #2085 ) Collapses test_compliance_default_when_yaml_omits_block, _when_yaml_block_is_empty, _explicit_optout_still_works into one parametrized test_compliance_default_via_load_config with three ids (yaml_omits_block, yaml_block_empty, yaml_explicit_optout). The dataclass-default test stays separate (no tmp_path needed). Coverage and assertions identical; net -19 lines, same 4 logical cases. prompt_injection check moves out of per-case to a single tail-assert since no payload overrode it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 02:03:59 -07:00
rabbitblood	577294b8f4	test(config): lock ComplianceConfig default to owasp_agentic (#2059 ) PR #2056 flipped ComplianceConfig.mode default from "" to "owasp_agentic" so every shipped template gets prompt-injection detection + PII redaction by default. The flip is correct + already shipping, but no test asserts the new default — a silent revert (or a refactor that reintroduces the old "" default) would pass workspace/tests/ and ship a workspace with compliance silently off. Add 4 regression tests: - test_compliance_dataclass_default — ComplianceConfig() with no args returns mode='owasp_agentic' + prompt_injection='detect' - test_compliance_default_when_yaml_omits_block — load_config on a yaml without `compliance:` key still produces owasp_agentic - test_compliance_default_when_yaml_block_is_empty — load_config on `compliance: {}` (a common shape during template editing) still produces owasp_agentic; covers the load_config() `.get("mode", "owasp_agentic")` default-fill path - test_compliance_explicit_optout_still_works — `mode: ""` in yaml must disable compliance (the documented opt-out path) 23/23 tests pass locally (4 new + 19 existing). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 02:01:57 -07:00
Hongming Wang	2ee4b67cab	chore: third-pass review polish — empty-stream gate test + Callable type Pass 3 review came back Approve with two optional polish items. Both taken to fully converge the loop: 1. Regression test for the empty-stream wedge-clear gate (added in `3c4eef49`). A degenerate stream that iterates without raising but emits NEITHER an AssistantMessage NOR a ResultMessage must NOT clear the wedge flag — pre-set wedge persists, the next heartbeat still reports runtime_state="wedged". Pins the gate against future regression. 2. Replaced the type annotation `"dict[str, callable[[dict], str]]"` (lowercase `callable`, string-quoted) with the proper `dict[str, Callable[[dict], str]]` using `Callable` from `collections.abc`. Benign before (`from __future__ import annotations` makes the annotation a string Python never evaluates), but pyright/mypy may flag the lowercase form. 65 Python tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 08:52:32 -07:00
Hongming Wang	892de784b3	fix: review-driven hardening of wedge detector + idle timeout + progress feed Bundle review of pieces 1/2/3 surfaced two critical issues plus a handful of required + optional fixes. All addressed. Critical: 1. Migration 043 was missing 'paused' and 'hibernated' from the workspace_status enum. Both are real production statuses written by workspace_restart.go (lines 283 and 406), introduced by migration 029_workspace_hibernation. The original `USING status::workspace_status` cast would have errored mid-transaction on any production DB containing those values. Added both. Also added `SET LOCAL lock_timeout = '5s'` so the migration aborts instead of stalling the workspace fleet behind a slow SELECT. 2. The chat activity-feed window kept only 8 lines, and a single multi-tool turn (Read 5 files + Grep + Bash + Edit + delegate) easily flushed older context before the user could read it. Extracted appendActivityLine to chat/activityLog.ts with a 20-line window AND consecutive-duplicate collapse (same tool on the same target twice in a row is noise, not new progress). 5 unit tests pin the behavior. Required: 3. The SDK wedge flag was sticky-only — a single transient Control-request-timeout from a flaky network blip locked the workspace into degraded for the whole process lifetime, even when the next query() would have succeeded. Added _clear_sdk_wedge_on_success(), called from _run_query's success path. The next heartbeat after a working query reports runtime_state empty and the platform recovers the workspace to online without a manual restart. New regression test. 4. _report_tool_use now sets target_id = WORKSPACE_ID for self- actions, matching the convention other self-logged activity rows use. DB consumers joining on target_id see a well-defined value instead of NULL. Optional taken: 5. Tightened _WEDGE_ERROR_PATTERNS from "control request timeout" to "control request timeout: initialize" — suffix-anchored so a future SDK error on an in-flight tool-call control message doesn't get misclassified as the unrecoverable post-init wedge. 6. Dropped the redundant "context canceled" substring fallback in isUpstreamBusyError. errors.Is(err, context.Canceled) is the typed check; the substring would also match healthy client-side aborts, which we don't want classified as upstream-busy. Verified: 1010 canvas tests + 64 Python tests + full Go suite pass; migration applies cleanly on dev DB with all 8 enum values; reverse migration restores TEXT. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 08:43:10 -07:00
Hongming Wang	4eb09e2146	feat(platform,workspace): SDK-wedge detection + workspace_status ENUM Heartbeat lies. The asyncio task that POSTs /registry/heartbeat lives in its own process slot, so a workspace whose claude_agent_sdk has wedged on `Control request timeout: initialize` keeps reporting "online" — every chat send hangs the full 5-min platform deadline even though the runtime is dead in the water. This commit teaches the workspace to admit it's wedged and the platform to honor that admission by flipping status → degraded. Five layers, all in one commit because they share a contract: 1. Migration 043 — convert workspaces.status from free-form TEXT to a real `workspace_status` Postgres ENUM with the 6 values production code actually writes (provisioning, online, offline, degraded, failed, removed). Locks the value set; future typo writes error at the DB instead of silently storing rogue strings. Down migration reverts to TEXT and drops the type. 2. workspace-server/internal/models — `HeartbeatPayload` gains a `runtime_state string` field. Empty = healthy. Currently the only non-empty value the handler honors is "wedged"; future symptoms can extend without another migration. 3. workspace-server/internal/handlers/registry.go — `evaluateStatus` gains a wedge branch BEFORE the existing error_rate >= 0.5 path: if `RuntimeState=="wedged"` and currently online, flip to degraded and broadcast WORKSPACE_DEGRADED with the wedge sample error. Recovery (`degraded → online`) now requires BOTH error_rate < 0.1 AND runtime_state cleared, so a workspace still reporting wedged stays degraded even when its error count happens to be 0 (the wedge captures a runtime state, not an error count). 4. workspace/claude_sdk_executor.py — module-level `_sdk_wedged_reason` flag set when execute()'s catch block sees an error matching `_WEDGE_ERROR_PATTERNS` (currently just "control request timeout"). Sticky for the process lifetime; the SDK's internal client-process state is corrupted on this error and only a workspace restart (= new Python process = fresh module state) clears it. Helpers `is_wedged()` / `wedge_reason()` / `_reset_sdk_wedge_for_test()` exposed. 5. workspace/heartbeat.py — heartbeat body now layers on `_runtime_state_payload()` for both the happy path and the 401-retry path. Lazy-imports claude_sdk_executor so non-Claude runtimes (where the module may not even be importable) keep working unchanged. Canvas required no changes — `STATUS_CONFIG.degraded` was already defined in design-tokens.ts (amber dot, "Degraded" label) and WorkspaceNode.tsx already renders `lastSampleError` underneath the status pill when status === "degraded". The existing wiring just never fired because nothing was writing degraded in this code path. Tests: - 3 Go handler tests for the new transitions (online → degraded on wedged, degraded stays put while still wedged, degraded → online after wedge clears) - 5 Python wedge-detector tests (default clean, mark sets flag, sticky-first-wins, execute() flips on Control request timeout, execute() does NOT flip on unrelated errors) - Migration smoke-tested against the local dev DB (3 existing rows, all enum-compatible; migration applied cleanly, post-state has the column as workspace_status type and the index preserved) Verified: 79 Python tests pass; full Go test suite passes; migration applies clean on a real DB; reverse migration restores the column to TEXT. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 00:59:15 -07:00
Hongming Wang	c159d85eb5	fix(a2a): review-driven hardening — prefix-anchored type check, error_detail cap, shared hint module Three required fixes from the bundle review of `391e1872`: 1. workspace/a2a_client.py: substring `type_name in msg` could miss the diagnostic prefix when an exception's message embedded a different class name mid-string (e.g. `OSError("see ConnectionError below")` → printed as plain msg, type lost). Switched to a prefix-anchored check (`msg.startswith(f"{type_name}:")` etc.) so the type label is always added when not already at the start of the message. 2. workspace/a2a_tools.py: `activity_logs.error_detail` is unbounded TEXT on the platform (handlers/activity.go does not validate length). A buggy or hostile peer could stream arbitrarily large error messages into the caller's activity log. Cap at 4096 chars at the producer — comfortably above any real exception traceback, well below an obvious-DoS threshold. 3. New regression test for JSON-RPC `code=0` — pins the `code is not None` semantics so the code is preserved in the detail rather than collapsing into the no-code path. Code=0 is not valid per the spec, but a malformed peer can still emit it and we want it visible for diagnosis. Plus one optional taken: extracted the A2A-error → hint mapping into canvas/src/components/tabs/chat/a2aErrorHint.ts. The two prior copies (AgentCommsPanel.inferCauseHint + ActivityTab.inferA2AErrorHint) had already drifted — Activity tab gained `not found`/`offline` cases the chat panel never picked up, AgentCommsPanel handled empty-input explicitly while Activity didn't. The shared module is the merged superset, with 10 unit tests pinning each named pattern + the "most specific first" ordering (Claude SDK wedge wins over generic timeout). Skipped (per analysis): - Unicode-naive 120-char slice — Python str[:N] slices on code points, not bytes. Safe. - Nested [A2A_ERROR] confusion — non-issue per reviewer; outer prefix winning still produces a structured render. - MessagePreview + JsonBlock dual render on errors — intentional drilldown; raw JSON is below the fold for operators who need it. - console.warn dedup — refetches don't happen per-event so spam risk is low. - str(data)[:200] materialization — A2A response bodies aren't typically MB-sized. Verified: 1005 canvas tests pass (10 new hint tests); 10 Python send_a2a_message tests pass (1 new for code=0); tsc clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 23:47:44 -07:00
Hongming Wang	391e187281	fix(a2a,canvas): make delivery failures comprehensive instead of "[A2A_ERROR] " Symptom: Activity tab and Agent Comms surfaced bare "[A2A_ERROR] " (prefix + nothing) for failed delegations. Operator had no signal to act on — no exception type, no target, no hint about what went wrong, no next step. Fix is in three layers. 1. workspace/a2a_client.py — every error path now produces an actionable detail string: - except branch: some httpx exceptions (RemoteProtocolError, ConnectionReset variants) stringify to "". Pre-fix the catch was `f"{_A2A_ERROR_PREFIX}{e}"` → bare prefix. Now falls back to `<TypeName> (no message — likely connection reset or silent timeout)` and always appends `[target=<url>]` for traceability in chained delegations. - JSON-RPC error branch: previously dropped error.code on the floor and printed "unknown" when message was missing. Now surfaces both, including the well-defined "JSON-RPC error with no message (code=N)" path. - "neither result nor error" branch: pre-fix returned str(payload) which the canvas rendered as a successful response block. Now tagged as A2A_ERROR with a payload snippet so downstream UI routes through the error path. 2. workspace/a2a_tools.py — tool_delegate_task now passes error_detail (the stripped error message) through to the activity-log POST. The platform's activity_logs.error_detail column is the canvas's red error chip source; populating it makes the failure visible in the row header without the user having to expand into raw response_body JSON. The summary line also gets a 120-char prefix of the cause so the collapsed row reads "React Engineer failed: ConnectionResetError: ... [target=...]" instead of "React Engineer failed". 3. canvas/src/components/tabs/ActivityTab.tsx — MessagePreview now detects [A2A_ERROR]-prefixed bodies and renders a structured error block (red chip, stripped detail, cause hint) instead of the previous gray text-block that showed the literal "[A2A_ERROR]" string. inferA2AErrorHint mirrors the patterns from AgentCommsPanel.inferCauseHint so the same symptom reads the same way in both surfaces (Claude SDK init wedge → restart workspace; timeout → busy/stuck; connection-reset → transient blip then check logs). Tests: 9 send_a2a_message tests pass (including a new regression test for the empty-stringifying-exception case that the user reported); 995 canvas tests pass; tsc clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 23:40:05 -07:00
Hongming Wang	65b531acf6	fix(workspace): tag self-originated A2A POSTs with X-Workspace-ID Workspace runtime fired four classes of A2A request to the platform without the X-Workspace-ID header that identifies the source workspace: heartbeat self-messages, initial_prompt, idle-loop fires, and peer-to-peer A2A from runtime tools. The platform's a2a_receive logger keys source_id off that header — without it, every such row was written with source_id=NULL, which the canvas's My Chat tab filters as ?source=canvas (i.e. "user typed this") and rendered the internal triggers as if the human user had sent them. The "Delegation results are ready..." heartbeat trigger was visible to end users in the chat history; delegate_task A2A calls between agents were misclassified the same way. Centralise the header construction in a new platform_auth helper self_source_headers(workspace_id) that returns auth_headers() PLUS {X-Workspace-ID: <id>}. Apply it to: - heartbeat.py self-message (refactored from inline header dict) - main.py initial_prompt POST - main.py idle_prompt POST - a2a_client.py send_a2a_message (peer A2A from runtime) - builtin_tools/a2a_tools.py delegate_task (was missing ALL headers) Tests: - test_heartbeat.py asserts the X-Workspace-ID header is set on the self-message POST. - test_a2a_tools_module.py asserts the same on delegate_task POSTs; FakeClient.post mocks updated to accept the headers kwarg. Production effect lands the moment workspace containers are rebuilt with this code; existing rows in activity_logs keep their NULL source_id (legacy data). The canvas-side filter (#follow-up) covers the historical-rows case until backfill. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 19:54:43 -07:00
Hongming Wang	94d9331c76	feat(canvas+platform): chat attachments, model selection, deploy/delete UX Session's accumulated UX work across frontend and platform. Reviewable in four logical sections — diff is large but internally cohesive (each section fixes a gap the next one depends on). ## Chat attachments — user ↔ agent file round trip - New POST /workspaces/:id/chat/uploads (multipart, 50 MB total / 25 MB per file, UUID-prefixed storage under /workspace/.molecule/chat-uploads/). - New GET /workspaces/:id/chat/download with RFC 6266 filename escaping and binary-safe io.CopyN streaming. - Canvas: drag-and-drop onto chat pane, pending-file pills, per-message attachment chips with fetch+blob download (anchor navigation can't carry auth headers). - A2A flow carries FileParts end-to-end; hermes template executor now consumes attachments via platform helpers. ## Platform attachment helpers (workspace/executor_helpers.py) Every runtime's executor routes through the same helpers so future runtimes inherit attachment awareness for free: - extract_attached_files — resolve workspace:/file:///bare URIs, reject traversal, skip non-existent. - build_user_content_with_files — manifest for non-image files, multi-modal list (text + image_url) for images. Respects MOLECULE_DISABLE_IMAGE_INLINING for providers whose vision adapter hangs on base64 payloads (MiniMax M2.7). - collect_outbound_files — scans agent reply for /workspace/... paths, stages each into chat-uploads/ (download endpoint whitelist), emits as FileParts in the A2A response. - ensure_workspace_writable — called at molecule-runtime startup so non-root agents can write /workspace without each template having to chmod in its Dockerfile. Hermes template executor + langgraph (a2a_executor.py) + claude-code (claude_sdk_executor.py) all adopt the helpers. ## Model selection & related platform fixes - PUT /workspaces/:id/model — was 404'ing, so canvas "Save" silently lost the model choice. Stores into workspace_secrets (MODEL_PROVIDER), auto-restarts via RestartByID. - applyRuntimeModelEnv falls back to envVars["MODEL_PROVIDER"] so Restart propagates the stored model to HERMES_DEFAULT_MODEL without needing the caller to rehydrate payload.Model. - ConfigTab Tier dropdown now reads from workspaces row, not the (stale) config.yaml — fixes "badge shows T3, form shows T2". ## ChatTab & WebSocket UX fixes - Send button no longer locks after a dropped TASK_COMPLETE — `sending` no longer initializes from data.currentTask. - A2A POST timeout 15 s → 120 s. LLM turns routinely exceed 15 s; the previous default aborted fetches while the server was still replying, producing "agent may be unreachable" on success. - socket.ts: disposed flag + reconnectTimer cancellation + handler detachment fix zombie-WebSocket in React StrictMode. - Hermes Config tab: RUNTIMES_WITH_OWN_CONFIG drops 'hermes' — the adaptor's purpose IS the form, banner was contradictory. - workspace_provision.go auto-recovery: try <runtime>-default AND bare <runtime> for template path (hermes lives at the bare name). ## Org deploy/delete animation (theme-ready CSS) - styles/theme-tokens.css — design tokens (durations, easings, colors). Light theme overrides by setting only the deltas. - styles/org-deploy.css — animation classes + keyframes, every value references a token. prefers-reduced-motion respected. - Canvas projects node.draggable=false onto locked workspaces (deploying children AND actively-deleting ids) — RF's authoritative drag lock; useDragHandlers retains a belt-and- braces check. - Organ cancel button (red pulse pill on root during deploy) cascades via existing DELETE /workspaces/:id?confirm=true. - Auto fit-view after each arrival, debounced 500 ms so rapid sibling arrivals coalesce into one fit (previous per-event fit made the viewport lurch continuously). - Auto-fit respects user-pan — onMoveEnd stamps a user-pan timestamp only when event !== null (ignores programmatic fitView) so auto-fits don't self-cancel. - deletingIds store slice + useOrgDeployState merge gives the delete flow the same dim + non-draggable treatment as deploy. - Platform-level classNames.ts shared by canvas-events + useCanvasViewport (DRY'd 3 copies of split/filter/join). ## Server payload change - org_import.go WORKSPACE_PROVISIONING broadcast now includes parent_id + parent-RELATIVE x/y (slotX/slotY) so the canvas renders the child at the right parent-nested slot without doing any absolute-position walk. createWorkspaceTree signature gains relX, relY alongside absX, absY; both call sites updated. ## Tests - workspace/tests/test_executor_helpers.py — 11 new cases covering URI resolution (including traversal rejection), attached-file extraction (both Part shapes), manifest-only vs multi-modal content, large-image skip, outbound staging, dedup, and ensure_workspace_writable (chmod 777 + non-root tolerance). - workspace-server chat_files_test.go — upload validation, Content-Disposition escaping, filename sanitisation. - workspace-server secrets_test.go — SetModel upsert, empty clears, invalid UUID rejection. - tests/e2e/test_chat_attachments_e2e.sh — round-trip against a live hermes workspace. - tests/e2e/test_chat_attachments_multiruntime_e2e.sh — static plumbing check + round-trip across hermes/langgraph/claude-code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 13:27:51 -07:00
molecule-ai[bot]	35bcad9204	feat(workspace): migrate a2a-sdk from 0.3.x to 1.0.0 (KI-009) (#1974 ) * feat(workspace): migrate a2a-sdk from 0.3.x to 1.0.0 (KI-009) Migrates all workspace code from a2a-sdk v0.3.x to v1.0.0, following the official migration guide from a2aproject/a2a-python. Breaking changes applied: - A2AStarletteApplication → Starlette route factory (create_agent_card_routes + create_jsonrpc_routes) - AgentCard.url removed; url+protocol now in supported_protocols[].url - AgentCapabilities fields renamed to snake_case (pushNotifications→push_notifications, stateTransitionHistory→state_transition_history) - AgentCard.defaultInputModes/outputModes → default_input_modes/output_modes - TaskState.canceled → TaskState.TASK_STATE_CANCELED - a2a.utils → a2a.helpers - Part(root=TextPart(text=t)) → Part(text=t) (TextPart removed) Files changed: - requirements.txt: pinned >=1.0.0,<2.0 - main.py: Starlette route factory + AgentCard restructure - a2a_executor.py: Part() + TaskState + helpers import - hermes_executor.py: TaskState + helpers import - google-adk/adapter.py: TaskState + helpers import - cli_executor.py: helpers import - claude_sdk_executor.py: helpers import - tests/conftest.py: a2a.helpers mock stub - tests/test_a2a_executor.py: TaskState enum key - adapters/google-adk/test_adapter.py: Part + helpers stub Refs: KI-009 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(test): update _TaskState mock to a2a-sdk v1 enum name (TASK_STATE_CANCELED) --------- Co-authored-by: Molecule AI Tech Researcher <tech-researcher@agents.moleculesai.app> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>	2026-04-24 04:43:17 +00:00
Molecule AI Plugin-Dev	61c5f8ad9a	feat(plugin): implement MCPServerAdaptor (issue #847 ) Rule-of-three threshold met: 4 plugin proposals (molecule-firecrawl #512, molecule-github-mcp #520, molecule-browser-use #553, mcp-connector #573) all independently shipped the same mcpServers-adapter pattern. Adds MCPServerAdaptor to builtins.py — plugins wrapping an MCP server now declare `from plugins_registry.builtins import MCPServerAdaptor as Adaptor` in their per-runtime adapter file. The adaptor: - Merges mcpServers from settings-fragment.json into <configs>/.claude/settings.json (deep-merge so multiple plugins' servers coexist). - Optionally ships skills/rules/setup.sh via AgentskillsAdaptor delegation. - On uninstall: removes skills/rules but intentionally leaves mcpServers entries in settings.json (users may share configs with other tools or have manually curated entries). Also fixes _deep_merge_hooks: non-hook top-level keys that are dicts (e.g. mcpServers) are now deep-merged with existing values instead of being skipped via setdefault. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-24 01:42:13 +00:00
Molecule AI Marketing Lead	e00797ba35	fix(security): prevent cross-tenant memory contamination in commit_memory/recall_memory (GH#1610) Two critical gaps in a2a_tools.py let any tenant workspace poison org-wide (GLOBAL) memory and bypass all RBAC enforcement: 1. tool_commit_memory had no RBAC check — any agent could write any scope. 2. tool_commit_memory had no root-workspace enforcement for GLOBAL scope — Tenant A could POST scope=GLOBAL and pollute the shared memory store that Tenant B's agent reads as trusted context. Fix adds: - _ROLE_PERMISSIONS table (mirrors builtin_tools/audit.py) so a2a_tools has isolated RBAC logic without depending on memory.py. - _check_memory_write_permission() / _check_memory_read_permission() helpers: evaluate RBAC roles from WorkspaceConfig; fail closed (deny) on errors. - _is_root_workspace() / _get_workspace_tier(): read WorkspaceConfig.tier (0 = root/org, 1+ = tenant) from config.yaml; fall back to WORKSPACE_TIER env var. - tool_commit_memory now (a) checks memory.write RBAC, (b) rejects GLOBAL scope for non-root workspaces, (c) embeds workspace_id in the POST body so the platform can namespace-isolate and audit cross-workspace writes. - tool_recall_memory now checks memory.read RBAC before any HTTP call, and always sends workspace_id as a GET param for platform cross-validation. Security regression tests added: - GLOBAL scope denied for non-root (tier>0) workspaces. - RBAC denial blocks all scope levels (including LOCAL) on write. - RBAC denial blocks recall entirely. - workspace_id present in POST body and GET params. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-23 10:21:34 -07:00
Hongming Wang	1aea013e20	fix(ci): unblock main CI on ubuntu-latest — IPv6-safe addr + MagicMock seed Two latent bugs the self-hosted Mac mini had been hiding. Both caught by the newer toolchain on ubuntu-latest runners after PR #1626. 1. workspace-server/internal/handlers/terminal.go:442 `fmt.Sprintf("%s:%d", host, port)` flagged by go vet as unsafe for IPv6 (it omits the required [::] brackets). Replaced with `net.JoinHostPort(host, strconv.Itoa(port))` which handles both IPv4 and IPv6 correctly. No runtime behaviour change — the only call site passes "127.0.0.1", so the bug would never trigger in practice, but vet is right to flag it as a latent correctness issue. 2. workspace/tests/test_a2a_executor.py::test_set_current_task_updates_heartbeat `MagicMock()` auto-creates attributes on first access, so `getattr(heartbeat, "active_tasks", 0)` in shared_runtime.py returned a MagicMock rather than the default 0. Adding 1 to a MagicMock returns another MagicMock, so the assertion `heartbeat.active_tasks == 1` never held. Seeding `heartbeat.active_tasks = 0` before the first call makes getattr() return a real int, matching how the real HeartbeatLoop class initialises itself. Both pre-existed on main and were hidden by the older Python / Go toolchains on the Mac mini runner. Verified locally (venv pytest pass, `go vet ./...` + `go build ./...` clean on workspace-server). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 13:18:46 -07:00
molecule-ai[bot]	859d676f70	fix(CI): correct BASE in detect-changes (PR/push race); catch RuntimeError in conftest (#1473 ) - ci.yml: replace if/else BASE assignment with GITHUB_BASE_REF default + pull_request base.sha override pattern. Prevents push events from overwriting the correct PR base SHA when both events fire together. - conftest.py: catch RuntimeError in addition to ImportError when importing coordinator.py, which raises RuntimeError at import time when WORKSPACE_ID is not set (before the ImportError guard). Co-authored-by: Molecule AI Release Manager <release-manager@agents.moleculesai.app> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-21 18:15:45 +00:00
molecule-ai[bot]	4675402e58	feat(workspace): pre-stop serialization for pause/resume (closes #1386 ) Add a pre-stop hook that captures agent state before container exit and writes a scrubbed snapshot to /configs/.agent_snapshot.json. On restart, the snapshot is loaded and the adapter's restore_state() is called before the A2A server starts. - New lib/pre_stop.py: build_snapshot / write_snapshot / read_snapshot / delete_snapshot + _scrub_value deep-scrubber (uses lib.snapshot_scrub to redact API keys, tokens, and sandbox output before persisting) - BaseAdapter.pre_stop_state(): captures _executor._session_id and recent transcript_lines; overridden by adapters with richer in-memory state - BaseAdapter.restore_state(): stores snapshot fields as adapter attrs for create_executor() to pick up - main.py: calls pre_stop serialization in finally block (after server serves) and restore_state() after adapter setup, before server starts - Added 12 unit tests covering scrub, read/write, adapter integration Co-authored-by: Molecule AI Infra-Runtime-BE <infra-runtime-be@agents.moleculesai.app> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-21 12:40:44 +00:00

1 2

54 Commits