molecule-core

Author	SHA1	Message	Date
Hongming Wang	dd57a840b6	fix: comprehensive a2a-sdk 1.x migration sweep across workspace/ Audited every a2a-sdk surface in workspace/ against the installed 1.0.2 wheel. Found and fixed: main.py (the live workspace startup path): • create_jsonrpc_routes(rpc_url='/', enable_v0_3_compat=True) — rpc_url required in 1.x; v0.3 compat enables inbound legacy clients (`"role": "user"` lowercase) without forcing them to upgrade. Pairs with the outbound rename below. a2a_executor.py: • TextPart/FilePart/FileWithUri removed in 1.x. Part is now a flat proto message: Part(text=…) / Part(url=…, filename=…, media_type=…). Updated the file-attachment branch (only reachable when an agent emits files; the harness's PONG path didn't exercise this, but it's a latent crash). • Message field names: messageId/taskId/contextId → message_id/task_id/context_id (proto3 snake_case). • Role enum: Role.agent → Role.ROLE_AGENT (proto enum). Outbound JSON-RPC payloads (8 files): • "role": "user" → "role": "ROLE_USER" — proto3 JSON serialization is strict about enum values. Sites: a2a_client, a2a_cli, main (initial+idle prompts), heartbeat, builtin_tools/a2a_tools, builtin_tools/delegation. Wire JSON keys stay camelCase (proto3 default), only the role enum value changed. google-adk/adapter.py: • new_agent_text_message → new_text_message (4 sites). This adapter's directory has a hyphen, so it can't be imported as a Python module — effectively dead code, but the wheel ships the file and a future fix should keep it correct against 1.x. Why one PR instead of seven: every previous a2a-sdk migration find landed as its own publish → cascade → harness → next-bug cycle. Today's audit ran every a2a-sdk symbol/type/method in workspace/ against the installed 1.0.2 wheel in a single sweep + tested the critical paths (Message construction, Part construction, Role enum parsing) against the actual SDK. Should be the last migration PR. Verified locally: python3 scripts/build_runtime_package.py --version 0.1.99 \ --out /tmp/build-final pip install /tmp/build-final python -c "import molecule_runtime.main; \ from molecule_runtime.a2a_executor import LangGraphA2AExecutor" → ✓ all imports clean against a2a-sdk 1.0.2 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 09:42:57 -07:00
Hongming Wang	c80b3ff0eb	fix: pass rpc_url='/' to create_jsonrpc_routes (a2a-sdk 1.x requirement) 7th a2a-sdk migration find from the v0 → v1 transition. create_jsonrpc_routes() now requires rpc_url as a positional arg (was implicit at root in 0.x). Pass '/' to match a2a.utils.constants.DEFAULT_RPC_URL — that's also what workspace-server's a2a_proxy.go forwards to (POSTs to workspace URL without appending a path). Symptom before fix: every workspace startup crashed with TypeError: create_jsonrpc_routes() missing 1 required positional argument: 'rpc_url' Caught by harness 9 phase 4 (claude-code + langgraph both on 0.1.24). The user's "use langgraph for fast iteration" call cut the diagnose cycle from 15min to ~30s — without that, this would have taken another hermes round-trip to surface. Updated reference_a2a_sdk_v0_to_v1_migration.md memory with this entry alongside the previous 6 finds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 09:33:23 -07:00
Hongming Wang	6859099a08	fix: pass agent_card to DefaultRequestHandler (a2a-sdk 1.x requirement) a2a-sdk 1.x added agent_card as a required argument to DefaultRequestHandler.__init__. main.py constructed it with only agent_executor + task_store, so every workspace startup that reached the handler init step crashed with: TypeError: DefaultRequestHandlerV2.__init__() missing 1 required positional argument: 'agent_card' This is the 6th a2a-sdk migration find from the v0 → v1 transition (see reference_a2a_sdk_v0_to_v1_migration memory). Pattern is the same: SDK exposes a new required arg, our call site needs to pass the existing object we already construct upstream. Why the import-only smoke gates didn't catch this: it's a call-time constructor error inside `async def main()`, not a module load error. The runtime-pin-compat smoke imports main_sync but doesn't invoke main() against a real config. Worth filing a follow-up to extend the smoke to a "construct + dispose" cycle. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 08:53:47 -07:00
Hongming Wang	851fd21fb1	fix(workspace): rename supported_protocols → supported_interfaces (a2a-sdk 1.0) CRITICAL: every workspace boot since the a2a-sdk 1.0 migration (#1974) has been crashing at AgentCard construction with: ValueError: Protocol message AgentCard has no "supported_protocols" field The protobuf field is `supported_interfaces` (plural, interfaces — see a2a-sdk types/a2a_pb2.pyi:189). The 0.3→1.0 migration left the kwarg as `supported_protocols`, which doesn't exist in the 1.0 schema, so the constructor raises before any subsequent line of main runs. Why this hid for so long: - publish-runtime.yml's smoke step only IMPORTED molecule_runtime.main; importing the module is fine, only CONSTRUCTING the AgentCard fails - The user-visible symptom is "Workspace failed: " with empty last_sample_error, indistinguishable from generic boot timeouts - The state_transition_history=True bug (fixed in #2179) was a sibling of this — same migration, same class, just caught first Fix is symmetric with #2179: 1. workspace/main.py: rename the kwarg + comment explaining why 2. .github/workflows/publish-runtime.yml: extend the smoke block to instantiate AgentCard with the exact production call shape, so the next field-rename of this class fails at publish time instead of breaking every workspace startup Verification: - Constructed AgentCard against fresh a2a-sdk 1.0.2 in a clean venv with the corrected kwarg → succeeds - Constructed it with the original `supported_protocols` kwarg → fails immediately with the exact error production sees - Smoke test pinned to mirror main.py's exact call shape; main.py + smoke must stay in lockstep going forward Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 07:54:23 -07:00
Hongming Wang	12d446bc8e	docs: explain why state_transition_history is gone (research-backed) Adds a comment block citing a2a-sdk's own a2a/compat/v0_3/conversions.py, which says verbatim: state_transition_history=None, # No longer supported in v1.0 So a future reader who notices the missing kwarg won't try to add it back. The capability is now universal: every v1.x Task carries a history list and tasks/get supports historyLength via the apply_history_length helper. No flag because nothing's optional. Confirmed by reading the SDK source directly: - a2a/types.py AgentCapabilities exposes only: streaming, push_notifications, extensions, extended_agent_card. - a2a/compat/v0_3/conversions.py explicitly maps None when down-converting v1 → v0.3 (deliberate removal, not rename). - a2a/server/request_handlers/default_request_handler_v2.py uses apply_history_length(task, params) — agent doesn't opt in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 07:20:05 -07:00
Hongming Wang	f531fe1367	fix: drop state_transition_history field — removed in a2a-sdk 1.x a2a-sdk 1.x's AgentCapabilities only exposes 4 fields: streaming, push_notifications, extensions, extended_agent_card. The state_transition_history field was removed in the v1 protobuf schema. main.py still passed it as a kwarg, so every workspace that reached the AgentCard construction step (line 188) crashed: ValueError: Protocol message AgentCapabilities has no "state_transition_history" field Symptom: every claude-code + hermes workspace stuck in `provisioning` forever — caught when the user provisioned a Design Director crew manually via the canvas while harness 5 was running. Why every prior smoke gate missed it: - runtime-pin-compat.yml smokes `from molecule_runtime.main import main_sync` — only imports the module. AgentCapabilities() runs inside `async def main()`, not at module load. - Template image boot smoke does `import every /app/*.py` — same story. main.py imports fine; the field error only fires at call. The fix is one line — drop the kwarg. Fields we actually need (streaming + push_notifications) are still passed. Follow-up worth filing: smoke step that instantiates Adapter() + calls a no-op setup() against a stub config. That would have caught this before publish. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 07:16:16 -07:00
Hongming Wang	3df5867b56	fix: restore main_sync entry point in workspace/main.py The wheel's pyproject.toml has declared `molecule-runtime = "molecule_runtime.main:main_sync"` since the publish pipeline was created on 2026-04-26, but the function itself was never present in workspace/main.py — it lived in the pre-monorepo molecule-ai-workspace-runtime repo and was lost during the consolidation that made workspace/ the source of truth. The 0.1.15 wheel still had main_sync from a leftover snapshot, so the regression went unnoticed until 0.1.16 (the first wheel built from the new source-of-truth) shipped. Symptom: every workspace container restart loops with ImportError: cannot import name 'main_sync' from 'molecule_runtime.main' — the molecule-runtime CLI script's first line tries to import the missing symbol. Workspaces stay in `provisioning` until the 10-min sweep marks them failed. Caught by .github/workflows/runtime-pin-compat.yml, which already imports the symbol by name as its smoke test. (That check kept failing red on every recent merge_group run; this PR fixes the underlying symbol-not-found instead of the smoke step.) Also strengthens publish-runtime.yml's wheel smoke from `import molecule_runtime.main` (loads the module — passes even when entry-point target is missing) to `from molecule_runtime.main import main_sync` (the actual contract the CLI script needs). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 03:35:49 -07:00
Hongming Wang	d0057912d2	feat(skills): per-skill runtime compatibility (#119 , hermes pattern) SKILL.md frontmatter can now declare `runtime: [claude-code]` or `runtime: [hermes, claude-code]` to opt out of incompatible adapters instead of failing at first invocation. Default `[""]` means universal — existing skill libraries need zero migration. Borrowed from hermes' declarative skill-compat pattern surfaced in the hermes architecture survey. The remaining two patterns (event-log layer, observability config block) stay open under #119. Wiring: - SkillMetadata.runtime: list[str] = [""] - _normalize_runtime_field accepts list, string-sugar, missing -> [""]; malformed warns and falls back to universal so a typo never silently drops a skill. - load_skills(..., current_runtime=...) filters out skills whose runtime list lacks "" or current_runtime, with an INFO log line. - BaseAdapter.start passes type(self).name() so the live adapter drives the filter; SkillsWatcher takes the same kwarg so hot-reload honors it. 8 new tests cover default universal, no-field universal, explicit match/mismatch, string sugar, wildcard short-circuit, current_runtime=None (preserves old behavior), and malformed-warns-not-drops. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 01:57:43 -07:00
Hongming Wang	65b531acf6	fix(workspace): tag self-originated A2A POSTs with X-Workspace-ID Workspace runtime fired four classes of A2A request to the platform without the X-Workspace-ID header that identifies the source workspace: heartbeat self-messages, initial_prompt, idle-loop fires, and peer-to-peer A2A from runtime tools. The platform's a2a_receive logger keys source_id off that header — without it, every such row was written with source_id=NULL, which the canvas's My Chat tab filters as ?source=canvas (i.e. "user typed this") and rendered the internal triggers as if the human user had sent them. The "Delegation results are ready..." heartbeat trigger was visible to end users in the chat history; delegate_task A2A calls between agents were misclassified the same way. Centralise the header construction in a new platform_auth helper self_source_headers(workspace_id) that returns auth_headers() PLUS {X-Workspace-ID: <id>}. Apply it to: - heartbeat.py self-message (refactored from inline header dict) - main.py initial_prompt POST - main.py idle_prompt POST - a2a_client.py send_a2a_message (peer A2A from runtime) - builtin_tools/a2a_tools.py delegate_task (was missing ALL headers) Tests: - test_heartbeat.py asserts the X-Workspace-ID header is set on the self-message POST. - test_a2a_tools_module.py asserts the same on delegate_task POSTs; FakeClient.post mocks updated to accept the headers kwarg. Production effect lands the moment workspace containers are rebuilt with this code; existing rows in activity_logs keep their NULL source_id (legacy data). The canvas-side filter (#follow-up) covers the historical-rows case until backfill. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 19:54:43 -07:00
Hongming Wang	94d9331c76	feat(canvas+platform): chat attachments, model selection, deploy/delete UX Session's accumulated UX work across frontend and platform. Reviewable in four logical sections — diff is large but internally cohesive (each section fixes a gap the next one depends on). ## Chat attachments — user ↔ agent file round trip - New POST /workspaces/:id/chat/uploads (multipart, 50 MB total / 25 MB per file, UUID-prefixed storage under /workspace/.molecule/chat-uploads/). - New GET /workspaces/:id/chat/download with RFC 6266 filename escaping and binary-safe io.CopyN streaming. - Canvas: drag-and-drop onto chat pane, pending-file pills, per-message attachment chips with fetch+blob download (anchor navigation can't carry auth headers). - A2A flow carries FileParts end-to-end; hermes template executor now consumes attachments via platform helpers. ## Platform attachment helpers (workspace/executor_helpers.py) Every runtime's executor routes through the same helpers so future runtimes inherit attachment awareness for free: - extract_attached_files — resolve workspace:/file:///bare URIs, reject traversal, skip non-existent. - build_user_content_with_files — manifest for non-image files, multi-modal list (text + image_url) for images. Respects MOLECULE_DISABLE_IMAGE_INLINING for providers whose vision adapter hangs on base64 payloads (MiniMax M2.7). - collect_outbound_files — scans agent reply for /workspace/... paths, stages each into chat-uploads/ (download endpoint whitelist), emits as FileParts in the A2A response. - ensure_workspace_writable — called at molecule-runtime startup so non-root agents can write /workspace without each template having to chmod in its Dockerfile. Hermes template executor + langgraph (a2a_executor.py) + claude-code (claude_sdk_executor.py) all adopt the helpers. ## Model selection & related platform fixes - PUT /workspaces/:id/model — was 404'ing, so canvas "Save" silently lost the model choice. Stores into workspace_secrets (MODEL_PROVIDER), auto-restarts via RestartByID. - applyRuntimeModelEnv falls back to envVars["MODEL_PROVIDER"] so Restart propagates the stored model to HERMES_DEFAULT_MODEL without needing the caller to rehydrate payload.Model. - ConfigTab Tier dropdown now reads from workspaces row, not the (stale) config.yaml — fixes "badge shows T3, form shows T2". ## ChatTab & WebSocket UX fixes - Send button no longer locks after a dropped TASK_COMPLETE — `sending` no longer initializes from data.currentTask. - A2A POST timeout 15 s → 120 s. LLM turns routinely exceed 15 s; the previous default aborted fetches while the server was still replying, producing "agent may be unreachable" on success. - socket.ts: disposed flag + reconnectTimer cancellation + handler detachment fix zombie-WebSocket in React StrictMode. - Hermes Config tab: RUNTIMES_WITH_OWN_CONFIG drops 'hermes' — the adaptor's purpose IS the form, banner was contradictory. - workspace_provision.go auto-recovery: try <runtime>-default AND bare <runtime> for template path (hermes lives at the bare name). ## Org deploy/delete animation (theme-ready CSS) - styles/theme-tokens.css — design tokens (durations, easings, colors). Light theme overrides by setting only the deltas. - styles/org-deploy.css — animation classes + keyframes, every value references a token. prefers-reduced-motion respected. - Canvas projects node.draggable=false onto locked workspaces (deploying children AND actively-deleting ids) — RF's authoritative drag lock; useDragHandlers retains a belt-and- braces check. - Organ cancel button (red pulse pill on root during deploy) cascades via existing DELETE /workspaces/:id?confirm=true. - Auto fit-view after each arrival, debounced 500 ms so rapid sibling arrivals coalesce into one fit (previous per-event fit made the viewport lurch continuously). - Auto-fit respects user-pan — onMoveEnd stamps a user-pan timestamp only when event !== null (ignores programmatic fitView) so auto-fits don't self-cancel. - deletingIds store slice + useOrgDeployState merge gives the delete flow the same dim + non-draggable treatment as deploy. - Platform-level classNames.ts shared by canvas-events + useCanvasViewport (DRY'd 3 copies of split/filter/join). ## Server payload change - org_import.go WORKSPACE_PROVISIONING broadcast now includes parent_id + parent-RELATIVE x/y (slotX/slotY) so the canvas renders the child at the right parent-nested slot without doing any absolute-position walk. createWorkspaceTree signature gains relX, relY alongside absX, absY; both call sites updated. ## Tests - workspace/tests/test_executor_helpers.py — 11 new cases covering URI resolution (including traversal rejection), attached-file extraction (both Part shapes), manifest-only vs multi-modal content, large-image skip, outbound staging, dedup, and ensure_workspace_writable (chmod 777 + non-root tolerance). - workspace-server chat_files_test.go — upload validation, Content-Disposition escaping, filename sanitisation. - workspace-server secrets_test.go — SetModel upsert, empty clears, invalid UUID rejection. - tests/e2e/test_chat_attachments_e2e.sh — round-trip against a live hermes workspace. - tests/e2e/test_chat_attachments_multiruntime_e2e.sh — static plumbing check + round-trip across hermes/langgraph/claude-code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 13:27:51 -07:00
molecule-ai[bot]	35bcad9204	feat(workspace): migrate a2a-sdk from 0.3.x to 1.0.0 (KI-009) (#1974 ) * feat(workspace): migrate a2a-sdk from 0.3.x to 1.0.0 (KI-009) Migrates all workspace code from a2a-sdk v0.3.x to v1.0.0, following the official migration guide from a2aproject/a2a-python. Breaking changes applied: - A2AStarletteApplication → Starlette route factory (create_agent_card_routes + create_jsonrpc_routes) - AgentCard.url removed; url+protocol now in supported_protocols[].url - AgentCapabilities fields renamed to snake_case (pushNotifications→push_notifications, stateTransitionHistory→state_transition_history) - AgentCard.defaultInputModes/outputModes → default_input_modes/output_modes - TaskState.canceled → TaskState.TASK_STATE_CANCELED - a2a.utils → a2a.helpers - Part(root=TextPart(text=t)) → Part(text=t) (TextPart removed) Files changed: - requirements.txt: pinned >=1.0.0,<2.0 - main.py: Starlette route factory + AgentCard restructure - a2a_executor.py: Part() + TaskState + helpers import - hermes_executor.py: TaskState + helpers import - google-adk/adapter.py: TaskState + helpers import - cli_executor.py: helpers import - claude_sdk_executor.py: helpers import - tests/conftest.py: a2a.helpers mock stub - tests/test_a2a_executor.py: TaskState enum key - adapters/google-adk/test_adapter.py: Part + helpers stub Refs: KI-009 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(test): update _TaskState mock to a2a-sdk v1 enum name (TASK_STATE_CANCELED) --------- Co-authored-by: Molecule AI Tech Researcher <tech-researcher@agents.moleculesai.app> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>	2026-04-24 04:43:17 +00:00
molecule-ai[bot]	4675402e58	feat(workspace): pre-stop serialization for pause/resume (closes #1386 ) Add a pre-stop hook that captures agent state before container exit and writes a scrubbed snapshot to /configs/.agent_snapshot.json. On restart, the snapshot is loaded and the adapter's restore_state() is called before the A2A server starts. - New lib/pre_stop.py: build_snapshot / write_snapshot / read_snapshot / delete_snapshot + _scrub_value deep-scrubber (uses lib.snapshot_scrub to redact API keys, tokens, and sandbox output before persisting) - BaseAdapter.pre_stop_state(): captures _executor._session_id and recent transcript_lines; overridden by adapters with richer in-memory state - BaseAdapter.restore_state(): stores snapshot fields as adapter attrs for create_executor() to pick up - main.py: calls pre_stop serialization in finally block (after server serves) and restore_state() after adapter setup, before server starts - Added 12 unit tests covering scrub, read/write, adapter integration Co-authored-by: Molecule AI Infra-Runtime-BE <infra-runtime-be@agents.moleculesai.app> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-21 12:40:44 +00:00
molecule-ai[bot]	3bef6af241	fix: apply #1124 env-var defaults + scrub F1088 credentials from INCIDENT_LOG.md (#1347 ) - PLATFORM_URL: replace unreachable http://platform:8080 mesh-only default with Docker-aware detection (host.docker.internal in containers, localhost for local dev) across all workspace Python modules and the git-token-helper shell script. - WORKSPACE_ID: add fail-fast validation in main.py (SystemExit if empty) consistent with coordinator.py / a2a_cli.py patterns already in place. - INCIDENT_LOG.md: replace all 3 F1088 credential types with *REDACTED* (sk-cp- 2x, github_pat_ 2x, ADMIN_TOKEN base64 3x). Fixes #1124, #1333. Co-authored-by: Molecule AI Dev Lead <dev-lead@agents.moleculesai.app>	2026-04-21 08:11:44 +00:00
Hongming Wang	d8026347e5	chore: open-source restructure — rename dirs, remove internal files, scrub secrets Renames: - platform/ → workspace-server/ (Go module path stays as "platform" for external dep compat — will update after plugin module republish) - workspace-template/ → workspace/ Removed (moved to separate repos or deleted): - PLAN.md — internal roadmap (move to private project board) - HANDOFF.md, AGENTS.md — one-time internal session docs - .claude/ — gitignored entirely (local agent config) - infra/cloudflare-worker/ → Molecule-AI/molecule-tenant-proxy - org-templates/molecule-dev/ → standalone template repo - .mcp-eval/ → molecule-mcp-server repo - test-results/ — ephemeral, gitignored Security scrubbing: - Cloudflare account/zone/KV IDs → placeholders - Real EC2 IPs → <EC2_IP> in all docs - CF token prefix, Neon project ID, Fly app names → redacted - Langfuse dev credentials → parameterized - Personal runner username/machine name → generic Community files: - CONTRIBUTING.md — build, test, branch conventions - CODE_OF_CONDUCT.md — Contributor Covenant 2.1 All Dockerfiles, CI workflows, docker-compose, railway.toml, render.yaml, README, CLAUDE.md updated for new directory names. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-18 00:24:44 -07:00

14 Commits