molecule-core

Author	SHA1	Message	Date
Hongming Wang	2f7beb9bce	feat: drop shared_context — use memory v2 team namespace instead Parent → child knowledge sharing previously lived behind a `shared_context` list in config.yaml: at boot, every child workspace HTTP-fetched its parent's listed files via GET /workspaces/:id/shared-context and prepended them as a "## Parent Context" block. That paid the full transfer cost on every boot regardless of whether the agent needed it, single-parent SPOF, no team or org scope, and broken if the parent was unreachable. Replace with memory v2's team:<id> namespace: agents call recall_memory on demand. For large blob-shaped artefacts see RFC #2789 (platform-owned shared file storage). Removed: - workspace/coordinator.py: get_parent_context() - workspace/prompt.py: parent_context arg + injection block - workspace/adapter_base.py: import + call + arg pass - workspace/config.py: shared_context field + parser entry - workspace-server/internal/handlers/templates.go: SharedContext handler - workspace-server/internal/router/router.go: GET /shared-context route - canvas/src/components/tabs/ConfigTab.tsx: Shared Context tag input - canvas/src/components/tabs/config/form-inputs.tsx: schema field + default - canvas/src/components/tabs/config/yaml-utils.ts: serializer entry - 6 tests pinning the removed behavior; 5 doc references Added regression gates so any reintroduction is loud: - workspace/tests/test_prompt.py: build_system_prompt must NOT emit "## Parent Context" - workspace/tests/test_config.py: legacy YAML key loads cleanly but shared_context attr must NOT exist on WorkspaceConfig - tests/e2e/test_staging_full_saas.sh §9d: GET /shared-context must NOT return 200 against a live tenant Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 16:30:26 -07:00
Hongming Wang	4f4b6c4f90	test(runtime): pin PR #2756 's card-vs-setup decoupling with build_routes helper PR #2756's contract — card route always mounted regardless of adapter.setup() outcome — lived inline in main.py's `# pragma: no cover` boot sequence. A future refactor that re-coupled the two would have silently bypassed PR #2756 and shipped the original "stuck booting forever" UX again, with no pytest catching it. This change extracts route assembly into workspace/boot_routes.py's build_routes(card, executor, adapter_error) and pins the contract with 6 integration tests using Starlette's TestClient: - test_card_route_serves_200_when_adapter_ready: happy path - test_card_route_serves_200_when_adapter_failed: misconfigured boot, card still 200, skill stubs survive - test_jsonrpc_returns_503_when_no_executor: full -32603 envelope with the adapter_error in error.data - test_jsonrpc_returns_503_with_generic_when_no_error_string: fallback reason for the rare case main.py reaches this branch without one - test_card_route_does_not_depend_on_executor: direct PR #2756 regression guard — both branches MUST mount the card route - test_executor_present_does_not_mount_not_configured_handler: sanity that a healthy workspace doesn't return -32603 to every request Conftest stubs extended with a2a.server.routes / request_handlers classes so the tests work under the existing a2a-mock infra (pattern matches the AgentCard/AgentSkill stubs added for PR #2765). main.py now calls build_routes; the inline if/else is gone. Same production behaviour, cleaner shape, regression-proof. Heavy a2a-sdk imports inside build_routes() are lazy (deferred to the executor-only branch) so tests that only exercise the not-configured path don't pull DefaultRequestHandler / InMemoryTaskStore. card_helpers + boot_routes registered in TOP_LEVEL_MODULES (build drift gate would have caught the missing entry on the wheel-publish smoke). All 18 related tests pass (test_boot_routes.py: 6, test_card_helpers.py: 6, test_not_configured_handler.py: 6). Closes #2761 Pairs with: PR #2756 (decouple agent-card from setup), PR #2765 (defensive isolation of enrichment + transcript) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 14:59:56 -07:00
Hongming Wang	63ac99788b	fix(runtime): isolate card-skill enrichment + transcript handler from adapter shape mismatch PR #2756 added a try/except around adapter.setup() so a missing LLM key doesn't crash the workspace boot. Two paths that now run AFTER setup succeeds were not similarly isolated, leaving small but real coupling risks for future adapter authors. 1. Skill metadata enrichment swap (main.py:248-259). When adapter.setup() returns, main.py reads adapter.loaded_skills and replaces the static stubs in agent_card.skills with rich metadata (description, tags, examples). The list comprehension assumes each element exposes .metadata.{id,name,description,tags,examples}. A future adapter that returns a non-canonical shape would raise AttributeError, propagate to the outer except, capture as adapter_error, and silently degrade an OK boot to the not-configured state — even though setup() actually succeeded. Extract to card_helpers.enrich_card_skills(card, loaded_skills) → bool. Helper swallows enrichment failures, logs the cause, returns False, leaves the static stubs in place. setup() success path continues unchanged. 6 unit tests cover: None input, empty list, canonical happy path, missing .metadata attr, partial .metadata (missing one canonical field), atomic-failure-no-partial-swap. 2. /transcript handler (main.py:513). Calls await adapter.transcript_lines(...) without try/except. BaseAdapter's default returns {"supported": false} so today's 4 adapters never trigger this — but a future adapter override that assumes setup() ran would surface as a 500 from Starlette's default error handler instead of a useful 503 with the exception class + message. Inline try/except returns 503 with the reason, matching the not-configured JSON-RPC handler's pattern. Both changes match the architectural principle the PR #2756 chain established: availability (workspace reachable) is decoupled from configuration / adapter behavior. Operators see useful errors instead of silent degradation; future adapter authors can't accidentally break tenant readiness with a shape mismatch. Adds: - workspace/card_helpers.py (~50 lines, 100% covered) - workspace/tests/test_card_helpers.py (6 tests) - AgentCard/AgentSkill/AgentCapabilities/AgentInterface stubs to workspace/tests/conftest.py so future card-related tests work under the existing a2a-mock infrastructure - card_helpers in TOP_LEVEL_MODULES (drift gate would have caught it) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 14:15:27 -07:00
Hongming Wang	e1628c4d56	fix(a2a): route terminal Message via TaskUpdater.complete/failed in task mode PR #2558 enqueued a Task at the start of new requests so the v1 SDK would accept TaskUpdater.start_work() — fix #1 of the v0→v1 migration gap (PR #2170). But after Task is enqueued, the executor enters "task mode" and the SDK rejects raw Message enqueues at the terminal step: {"code":-32603,"message":"Received Message object in task mode. Use TaskStatusUpdateEvent or TaskArtifactUpdateEvent instead."} Synth-E2E 2026-05-03T11:00:34Z surfaced this on the very first run after the prior fix cascaded. Validation site is the same a2a/server/agent_execution/active_task.py — the framework's job is to enforce the v1 invariant; we're catching up to it. The fix routes both terminal events through TaskUpdater helpers: - success: updater.complete(message=msg) wraps in TaskStatusUpdateEvent(state=COMPLETED, final=True) - error: updater.failed(message=...) wraps in TaskStatusUpdateEvent(state=FAILED, final=True) Both helpers exist in a2a-sdk ≥ 1.0; verified via TaskUpdater.complete signature. Tests: - conftest TaskUpdater stub now records complete/failed calls AND routes the message back through event_queue.enqueue_event so the ~20 legacy tests asserting on enqueue_event keep working - 2 new regression tests pin the contract: * test_terminal_success_routes_via_updater_complete * test_terminal_error_routes_via_updater_failed - Both NEW tests verified to FAIL on staging-baseline (without this fix) and PASS with it — they'd catch the regression before staging if the wheel-smoke gate covered task-mode terminal events too (separate yak-shave for #131 follow-up) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 04:06:45 -07:00
Hongming Wang	5c3b79a8ba	fix(a2a): enqueue Task before TaskStatusUpdateEvent for v1 SDK contract a2a-sdk ≥ 1.0 raises InvalidAgentResponseError when an executor publishes a TaskStatusUpdateEvent (e.g. via TaskUpdater.start_work) before any Task event for fresh requests. The framework only auto-creates the Task on continuation messages (existing task_id resolves via task_manager.get_task); new requests leave _task_created unset and the SDK validation at a2a/server/agent_execution/active_task.py rejects the first status update. PR #2170 migrated the executor surface to v1 but missed this contract. The synthetic E2E gate caught it on every staging run since (~1 week silent fail) with: {"jsonrpc":"2.0","id":"e2e-msg-1","error":{"code":-32603, "message":"Agent should enqueue Task before TaskStatusUpdateEvent event","data":null}} The fix enqueues a Task(state=SUBMITTED) before the TaskUpdater is constructed, gated on `context.current_task is None` so continuation messages don't double-enqueue (which the SDK logs about but doesn't reject). Tests: - test_first_event_is_task_for_new_request — pins the new-request path: first enqueue must be a Task with the expected id/context_id - test_no_task_enqueue_on_continuation — pins the continuation path: when context.current_task is set, the executor must NOT re-enqueue Task - conftest: stub Task / TaskStatus / TaskState in the mocked a2a.types module so the import inside the executor resolves under unit tests google-adk adapter does not have this bug — its execute() only emits Message events, not TaskStatusUpdateEvent. Its cancel() does emit one, but cancel is rarely-invoked and out of scope for this fix. Live verification path: this PR's merge → publish-runtime cascade → next synth-E2E firing should go green at step "8/11 Sending A2A message to parent — expecting agent response".	2026-05-03 03:15:54 -07:00
Hongming Wang	46bc63e373	chore(smoke): runtime_wedge follow-ups from PR #2473 review Three review nits from PR #2473: 1. Narrow `_check_runtime_wedge` import catch to (ImportError, ModuleNotFoundError). The bare `except Exception:` would have masked an `AttributeError`/`TypeError` from a runtime_wedge API rename — silently degrading the smoke gate to "no wedge info" with no log line. The `runtime_wedge_signature.json` snapshot test (task #169) carries the API-drift load instead. 2. Drop the unreachable `or "<unspecified>"` fallback. `wedge_reason()` only returns "" when not wedged, but the call is guarded by `is_wedged()` being True and `mark_wedged` requires a non-None reason. The defensive arm couldn't fire. 3. Promote `reset_runtime_wedge` from a per-file fixture in test_smoke_mode.py to an autouse fixture in workspace/tests/conftest.py. Heartbeat tests or future adapter tests that call `mark_wedged` without cleanup would otherwise leak a sticky wedge into smoke tests later in the same pytest process — smoke tests would fail-via-leak instead of asserting their actual contract. Two-sided reset survives early test failures. Also: `test_check_runtime_wedge_returns_none_when_module_missing` now `monkeypatch.delitem(sys.modules, "runtime_wedge")` before patching `__import__`, so the test re-exercises the import path instead of resolving from the module cache (the test was passing today by luck — it would still pass even if the catch arm were deleted, because the cached module's `is_wedged` returned False). Tests: 28 still pass in test_smoke_mode.py, 57 across smoke + wedge + heartbeat. Regression-injection-checked: catch tightening doesn't regress the existing wedge tests.	2026-05-01 18:01:51 -07:00
Hongming Wang	a57382e918	feat(runtime): add new_response_message helper for adapter A2A responses Surfaced via cross-template review of the a2a-sdk v0→v1 migration: every adapter executor (claude-code, gemini-cli, crewai, openclaw, autogen) builds A2A response Messages independently using `new_text_message(text)` from the SDK, which omits `task_id` and `context_id`. The runtime's own canonical pattern in `workspace/a2a_executor.py:466-475` correctly threads both: Message( message_id=uuid.uuid4().hex, role=Role.ROLE_AGENT, parts=_parts, task_id=task_id, # ← canonical context_id=context_id, # ← canonical ) Adapters skipping these correlation fields means the platform's a2a proxy can't reliably tie the response back to the originating task. This is a divergence from canonical, not necessarily a strict bug (task_id may be optional with a default) — but it's enough of a correlation/observability gap that the canonical pattern bothers to thread it. Add `new_response_message(context, text, files=None)` to executor_helpers.py — single home for response Message construction. Templates can migrate from `new_text_message(text)` to this helper in stacked PRs once the runtime publishes to PyPI. The helper: - Reads `context.task_id`/`context.context_id` from the inbound RequestContext, falling back to fresh UUIDs (RequestContextBuilder always sets them in production; fallback is for unit tests). - Sets `role=Role.ROLE_AGENT` (the v1 enum value). - Builds text Parts via `Part(text=...)` and file Parts via `Part(url="workspace:<path>", filename=..., media_type=...)`. - Returns a v1 protobuf Message ready for `event_queue.enqueue_event(...)`. Why "files=None" with the workspace: URI scheme as the file Part shape: matches the canonical pattern in a2a_executor.py exactly so the platform's chat-attachment download path (executor_helpers.py `resolve_attachment_uri`) interprets responses uniformly across all adapters. Tests (5, all pass with --no-cov against the live runtime image): - test_new_response_message_text_only - test_new_response_message_with_files - test_new_response_message_files_only_no_text - test_new_response_message_falls_back_when_context_ids_unset - test_new_response_message_handles_missing_attrs The conftest's a2a stubs needed an extension for Message + Role + Part with kwargs preservation. Strictly additive — no existing tests affected. (The 19 pre-existing failures in test_executor_helpers.py are unrelated debt from the commit_memory/recall_memory rewrite, visible on staging baseline before this change.) Per-template migration is the follow-up: claude-code, gemini-cli, crewai, openclaw, autogen all call `new_text_message(text)` today; each gets a per-repo PR replacing it with `new_response_message(context, text)`. This PR ships the helper first so the templates have something to import. Refs: PR #2266/#2267 (restart-race), claude-code #15 (FilePart fix), gemini-cli #10/crewai #8/openclaw #9/autogen #8 (rename PRs). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 01:13:34 -07:00
Hongming Wang	e9a59cda3b	feat(platform): single-source-of-truth tool registry — adapters consume, no drift Establishes workspace/platform_tools/registry.py as THE place tool naming and docs live. Every consumer reads from it; nothing duplicates the source. Closes the architectural gap behind the doc/tool drift discussion 2026-04-28 — adding hundreds of future runtime SDK adapters should not require touching tool names anywhere except the registry. What the registry owns ToolSpec dataclass with: name, short (one-line description), when_to_use (multi-paragraph agent-facing usage guidance), input_schema (JSON Schema), impl (the actual coroutine in a2a_tools.py), section ('a2a' \| 'memory'). TOOLS list with 8 entries — delegate_task, delegate_task_async, check_task_status, list_peers, get_workspace_info, send_message_to_user, commit_memory, recall_memory. What now reads from the registry - workspace/a2a_mcp_server.py The hardcoded TOOLS list (167 lines of hand-maintained dicts) is gone. Replaced with a 6-line list comprehension over the registry. MCP description = spec.short. inputSchema = spec.input_schema. - workspace/executor_helpers.py get_a2a_instructions(mcp=True) and get_hma_instructions() now GENERATE the agent-facing system-prompt text from the registry. Heading + per-tool bullet (spec.short) + per-tool when_to_use + a section-specific footer. No more hand-maintained instruction blocks that drift from reality. - workspace/builtin_tools/delegation.py Renamed delegate_to_workspace -> delegate_task_async to match registry. check_delegation_status -> check_task_status. Added sync delegate_task @tool wrapping a2a_tools.tool_delegate_task (was missing for LangChain runtimes — CP review Issue 3). - workspace/builtin_tools/memory.py Renamed search_memory -> recall_memory to match registry. - workspace/adapter_base.py, workspace/main.py Bundle all 7 core tools (was 6) into all_tools / base_tools. - workspace/coordinator.py, shared_runtime.py, policies/routing.py Updated system-prompt-text references to use the registry names. Structural alignment tests workspace/tests/test_platform_tools.py — 9 tests pin every registry-to-adapter mapping: - registry names are unique - a2a + memory partition is complete (no orphans) - by_name lookup works - MCP server registers exactly the registry's tool set - MCP description equals registry.short for every tool - MCP inputSchema equals registry.input_schema for every tool - get_a2a_instructions text contains every a2a tool name - get_hma_instructions text contains every memory tool name - pre-rename names (delegate_to_workspace, search_memory, check_delegation_status) cannot leak back Adding a future tool means adding one ToolSpec; the test failure list tells the author exactly which adapter to update. Adapter pattern for future SDK support When (e.g.) AutoGen or Pydantic AI gets adapters, the only work needed for tool surfacing is "wrap registry.TOOLS in your SDK's tool format." Names, descriptions, schemas, impl come from the registry — adapter author writes zero strings. Why this needed to ship now PR #2237 (already in staging) injected MCP-world docs as the default system-prompt content. Without the registry, those docs said "delegate_task" while LangChain runtimes only had "delegate_to_workspace" — workers see docs for tools that don't exist (CP review Issue 1+3). PR #2239 was a tactical rename; this PR is the structural fix that prevents the same class of drift from recurring as new adapters ship. PR #2239 was closed in favor of this — same renames, plus the registry, plus structural tests. Single coherent change. Tests: 1232 pass, 2 xfailed (pre-existing). 9 new in test_platform_tools.py; 4 alignment tests in test_prompt.py from #2237 still pass; original test_executor_helpers tests adapted to the registry-driven world. Refs: CP review Issues 1, 2, 3, 5; project memory project_runtime_native_pluggable.md (platform owns A2A); project memory feedback_doc_tool_alignment.md (this is the structural fix for the tactical lesson).	2026-04-28 17:11:36 -07:00
Hongming Wang	5b05d663ee	test: update a2a.helpers mock to export new_text_message The conftest mock only exposed `new_agent_text_message`, the pre-v1 name. After fixing a2a_executor.py to use the v1 name `new_text_message`, the mock didn't satisfy the import → CI red. Mock both names (aliased to the same lambda) so any in-flight test that still references the old name keeps working until the next sweep removes those references. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 05:34:28 -07:00
Hongming Wang	4b5ac2ebc2	chore(workspace): drop claude_sdk_executor — Phase 2 of #87 Phase 2 of the universal-runtime refactor (task #87). Now that the claude-code template repo ships its own claude_sdk_executor.py (template PR #13 merged + image rebuilt at 07:36 UTC) the molecule-runtime no longer needs to ship the file. Deletes: - workspace/claude_sdk_executor.py (704 LOC) - workspace/tests/test_claude_sdk_executor.py (~1.6K LOC) Updates: - workspace/runtime_wedge.py — drops the "Compatibility shim" docstring section. The shim was time-bounded ("removed once #87 Phase 2 lands"); this is that PR. - workspace/tests/test_runtime_wedge.py — drops the TestClaudeSdkExecutorReExportShim test class (the shim doesn't exist anymore so the identity assertions would fail at import). - workspace/tests/conftest.py — drops the claude_agent_sdk stub. Its only consumer was test_claude_sdk_executor.py which is gone; no other test imports the SDK. - workspace/cli_executor.py — comment refresh: claude-code template repo (not workspace/) is now the home for ClaudeSDKExecutor. Verified-safe-to-delete: - heartbeat.py: migrated to runtime_wedge in PR #2154 (no longer imports from claude_sdk_executor) - cli_executor.py: only comments referenced claude_sdk_executor; its line-117 ValueError defends against accidental routing - tests: only test_claude_sdk_executor.py + test_runtime_wedge.py's shim class consumed the deleted module; both removed in this PR Verification: - 1182/1182 workspace pytest pass (was 1251; -69 = exactly the deleted test cases — zero unexpected regressions) - No live import of claude_sdk_executor anywhere in molecule-core after deletion (grep verified) Closes #87 for the claude-code adapter. Hermes is already template-only. The remaining adapter-specific code in workspace/ is cli_executor.py (codex/ollama/gemini-cli) tracked by task #122. preflight.py's SUPPORTED_RUNTIMES static list is tracked by task #123 (PR #2155 in flight). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 00:52:55 -07:00
molecule-ai[bot]	35bcad9204	feat(workspace): migrate a2a-sdk from 0.3.x to 1.0.0 (KI-009) (#1974 ) * feat(workspace): migrate a2a-sdk from 0.3.x to 1.0.0 (KI-009) Migrates all workspace code from a2a-sdk v0.3.x to v1.0.0, following the official migration guide from a2aproject/a2a-python. Breaking changes applied: - A2AStarletteApplication → Starlette route factory (create_agent_card_routes + create_jsonrpc_routes) - AgentCard.url removed; url+protocol now in supported_protocols[].url - AgentCapabilities fields renamed to snake_case (pushNotifications→push_notifications, stateTransitionHistory→state_transition_history) - AgentCard.defaultInputModes/outputModes → default_input_modes/output_modes - TaskState.canceled → TaskState.TASK_STATE_CANCELED - a2a.utils → a2a.helpers - Part(root=TextPart(text=t)) → Part(text=t) (TextPart removed) Files changed: - requirements.txt: pinned >=1.0.0,<2.0 - main.py: Starlette route factory + AgentCard restructure - a2a_executor.py: Part() + TaskState + helpers import - hermes_executor.py: TaskState + helpers import - google-adk/adapter.py: TaskState + helpers import - cli_executor.py: helpers import - claude_sdk_executor.py: helpers import - tests/conftest.py: a2a.helpers mock stub - tests/test_a2a_executor.py: TaskState enum key - adapters/google-adk/test_adapter.py: Part + helpers stub Refs: KI-009 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(test): update _TaskState mock to a2a-sdk v1 enum name (TASK_STATE_CANCELED) --------- Co-authored-by: Molecule AI Tech Researcher <tech-researcher@agents.moleculesai.app> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>	2026-04-24 04:43:17 +00:00
molecule-ai[bot]	859d676f70	fix(CI): correct BASE in detect-changes (PR/push race); catch RuntimeError in conftest (#1473 ) - ci.yml: replace if/else BASE assignment with GITHUB_BASE_REF default + pull_request base.sha override pattern. Prevents push events from overwriting the correct PR base SHA when both events fire together. - conftest.py: catch RuntimeError in addition to ImportError when importing coordinator.py, which raises RuntimeError at import time when WORKSPACE_ID is not set (before the ImportError guard). Co-authored-by: Molecule AI Release Manager <release-manager@agents.moleculesai.app> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-21 18:15:45 +00:00
Hongming Wang	d8026347e5	chore: open-source restructure — rename dirs, remove internal files, scrub secrets Renames: - platform/ → workspace-server/ (Go module path stays as "platform" for external dep compat — will update after plugin module republish) - workspace-template/ → workspace/ Removed (moved to separate repos or deleted): - PLAN.md — internal roadmap (move to private project board) - HANDOFF.md, AGENTS.md — one-time internal session docs - .claude/ — gitignored entirely (local agent config) - infra/cloudflare-worker/ → Molecule-AI/molecule-tenant-proxy - org-templates/molecule-dev/ → standalone template repo - .mcp-eval/ → molecule-mcp-server repo - test-results/ — ephemeral, gitignored Security scrubbing: - Cloudflare account/zone/KV IDs → placeholders - Real EC2 IPs → <EC2_IP> in all docs - CF token prefix, Neon project ID, Fly app names → redacted - Langfuse dev credentials → parameterized - Personal runner username/machine name → generic Community files: - CONTRIBUTING.md — build, test, branch conventions - CODE_OF_CONDUCT.md — Contributor Covenant 2.1 All Dockerfiles, CI workflows, docker-compose, railway.toml, render.yaml, README, CLAUDE.md updated for new directory names. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-18 00:24:44 -07:00

13 Commits