molecule-core

Author	SHA1	Message	Date
Hongming Wang	205a454c09	feat(runtime): RuntimeCapabilities dataclass + BaseAdapter.capabilities() Foundation primitive for the native+pluggable runtime principle (task #117, blocks #87). Lets each adapter declare which cross-cutting capabilities it owns natively (heartbeat, scheduler, durable session, status mgmt, retry, activity decoration, channel dispatch) versus delegates to the platform's fallback implementation. Pure additive: every existing adapter inherits BaseAdapter.capabilities() which returns RuntimeCapabilities() — every flag False — so today's "platform owns everything" behavior is preserved exactly. Subsequent PRs land platform-side consumers (idle-timeout override, scheduler skip, status-transition hook, etc.) one capability at a time. Why a frozen dataclass instead of class attributes: capabilities are declared at class-load time and read by the platform on every heartbeat. A mutable value would let a runtime change capabilities mid-flight, creating impossible-to-debug state where the platform's idea of who- owns-heartbeat drifts from the adapter's actual code. Why a `to_dict()` with explicit short keys: the Go side will read these from the heartbeat payload by string key. The dict's wire names are pinned independently of Python field names so a Python-side rename doesn't silently break the Go consumer (test pins this). Tests (9 new): - is a frozen dataclass (mutation rejected) - all 7 default flags are False (load-bearing — flipping any default silently moves ownership for langgraph/crewai/deepagents) - to_dict() keys are stable wire names (Go contract) - BaseAdapter.capabilities() default returns all-False - subclass override mechanism works - sibling adapters' defaults aren't affected by an override Verification: - 1300/1300 workspace pytest pass (was 1291, +9) - Zero behavior change for any existing code path See project memory `project_runtime_native_pluggable.md`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 22:17:49 -07:00
Hongming Wang	6eaacf175b	fix(notify): review-flagged Critical + Required findings on PR #2130 Two Critical bugs caught in code review of the agent→user attachments PR: 1. Empty-URI attachments slipped past validation. Gin's go-playground/validator does NOT iterate slice elements without `dive` — verified zero `dive` usage anywhere in workspace-server — so the inner `binding:"required"` tags on NotifyAttachment.URI/Name were never enforced. `attachments: [{"uri":"","name":""}]` would pass validation, broadcast empty-URI chips that render blank in canvas, AND persist them in activity_logs for every page reload to re-render. Added explicit per-element validation in Notify (returns 400 with `attachment[i]: uri and name are required`) plus defence-in-depth in the canvas filter (rejects empty strings, not just non-strings). 3-case regression test pins the rejection. 2. Hardcoded application/octet-stream stripped real mime types. `_upload_chat_files` always passed octet-stream as the multipart Content-Type. chat_files.go:Upload reads `fh.Header.Get("Content-Type")` FIRST and only falls back to extension-sniffing when the header is empty, so every agent-attached file lost its real type forever — broke the canvas's MIME-based icon/preview logic. Now sniff via `mimetypes.guess_type(path)` and only fall back to octet-stream when sniffing returns None. Plus three Required nits: - `sqlmockArgMatcher` was misleading — the closure always returned true after capture, identical to `sqlmock.AnyArg()` semantics, but named like a custom matcher. Renamed to `sqlmockCaptureArg(*string)` so the intent (capture for post-call inspection, not validate via driver-callback) is unambiguous. - Test asserted notify call by `await_args_list[1]` index — fragile to any future _upload_chat_files refactor that adds a pre-flight POST. Now filter call list by URL suffix `/notify` and assert exactly one match. - Added `TestNotify_RejectsAttachmentWithEmptyURIOrName` (3 cases) covering empty-uri, empty-name, both-empty so the Critical fix stays defended. Deferred to follow-up: - ORDER BY tiebreaker for same-millisecond notifies — pre-existing risk, not regression. - Streaming multipart upload — bounded by the platform's 50MB total cap so RAM ceiling is fixed; switch to streaming if cap rises. - Symlink rejection — agent UID can already read whatever its filesystem perms allow via the shell tool; rejecting symlinks doesn't materially shrink the attack surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 19:47:31 -07:00
Hongming Wang	d028fe19ff	feat(notify): agent → user file attachments via send_message_to_user Closes the gap where the Director would say "ZIP is ready at /tmp/foo.zip" in plain text instead of attaching a download chip — the runtime literally had no API for outbound file attachments. The canvas + platform's chat-uploads infrastructure already supported the inbound (user → agent) direction (commit `94d9331c`); this PR wires the outbound side. End-to-end shape: agent: send_message_to_user("Done!", attachments=["/tmp/build.zip"]) ↓ runtime POST /workspaces/<self>/chat/uploads (multipart) ↓ platform /workspace/.molecule/chat-uploads/<uuid>-build.zip → returns {uri: workspace:/...build.zip, name, mimeType, size} ↓ runtime POST /workspaces/<self>/notify {message: "Done!", attachments: [{uri, name, mimeType, size}]} ↓ platform Broadcasts AGENT_MESSAGE with attachments + persists to activity_logs with response_body = {result: "Done!", parts: [{kind:file, file:{...}}]} ↓ canvas WS push: canvas-events.ts adds attachments to agentMessages queue Reload: ChatTab.loadMessagesFromDB → extractFilesFromTask sees parts[] Either path → ChatTab renders download chip via existing path Files changed: workspace-server/internal/handlers/activity.go - NotifyAttachment struct {URI, Name, MimeType, Size} - Notify body accepts attachments[], broadcasts in payload, persists as response_body.parts[].kind="file" canvas/src/store/canvas-events.ts - AGENT_MESSAGE handler reads payload.attachments, type-validates each entry, attaches to agentMessages queue - Skips empty events (was: skipped only when content empty) workspace/a2a_tools.py - tool_send_message_to_user(message, attachments=[paths]) - New _upload_chat_files helper: opens each path, multipart POSTs to /chat/uploads, returns the platform's metadata - Fail-fast on missing file / upload error — never sends a notify with a half-rendered attachment chip workspace/a2a_mcp_server.py - inputSchema declares attachments param so claude-code SDK surfaces it to the model - Defensive filter on the dispatch path (drops non-string entries if the model sends a malformed payload) Tests: - 4 new Python: success path, missing file, upload 5xx, no-attach backwards compat - 1 new Go: Notify-with-attachments persists parts[] in response_body so chat reload reconstructs the chip Why /tmp paths work even though they're outside the canvas's allowed roots: the runtime tool reads the bytes locally and re-uploads through /chat/uploads, which lands the file under /workspace (an allowed root). The agent can specify any readable path. Does NOT include: agent → agent file transfer. Different design problem (cross-workspace download auth: peer would need a credential to call sender's /chat/download). Tracked as a follow-up under task #114. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 19:35:58 -07:00
Hongming Wang	5071454074	fix(delegation): lazy-refresh QUEUED state from platform; live DELEGATION_* events Critical follow-up to PR #2126's review. Two real bugs: 1. Runtime QUEUED never resolved. Platform's drain stitch updates the platform's delegate_result row when a queued delegation finally completes, but never pushes back to the runtime. The LLM polling check_delegation_status saw status="queued" forever — combined with the new docstring guidance ("queued → wait, peer will reply"), the model would wait indefinitely on a state that never resolves. Strictly worse than pre-PR behavior where it would have at least bypassed. 2. Live updates dead code. delegation.go writes activity rows by direct INSERT INTO activity_logs, bypassing the LogActivity helper that fires ACTIVITY_LOGGED. Adding "delegation" to the canvas's ACTIVITY_LOGGED filter (PR #2126 first cut) was inert — initial GET worked, live updates did not. Fix: (1) Runtime side, workspace/builtin_tools/delegation.py: - New `_refresh_queued_from_platform(task_id)` async helper that pulls /workspaces/<self>/delegations and finds the platform-side delegate_result row for our task_id. - check_delegation_status calls _refresh when local status is QUEUED, so the LLM's poll itself drives state convergence. - Best-effort: GET failure leaves local state untouched, next poll retries. - Docstring updated to reflect the actual behavior ("polls transparently — keep polling and you'll see the flip"). - 4 new tests cover: QUEUED → completed via refresh; QUEUED → failed via refresh; refresh keeps QUEUED when platform hasn't resolved; refresh swallows network errors safely. (2) Canvas side, AgentCommsPanel.tsx WS push handler: - Listens for DELEGATION_SENT / DELEGATION_STATUS / DELEGATION_COMPLETE / DELEGATION_FAILED in addition to ACTIVITY_LOGGED. - Each event's payload synthesized into an ActivityEntry shape so toCommMessage's existing delegation branch maps it. Status derived: STATUS uses payload.status, COMPLETE → "completed", FAILED → "failed", SENT → "pending". - The ACTIVITY_LOGGED branch keeps the "delegation" type accepted as a no-op-today / future-proof path: if delegation handlers are ever refactored to call LogActivity, this lights up automatically without another canvas change. Doesn't change: the docstring guidance ("queued → wait, don't bypass") is now actually load-bearing because the refresh path will deliver the eventual outcome. Without the refresh, the guidance was a trap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 16:05:04 -07:00
Hongming Wang	057876cb0c	fix(delegation): runtime handles 202+queued; canvas surfaces delegation rows Two bugs that compounded into the "Director does the work itself" UX: 1. workspace/builtin_tools/delegation.py: _execute_delegation only handled HTTP 200 in the response branch. When the peer's a2a-proxy returned HTTP 202 + {queued: true} (single-SDK-session bottleneck on the peer), the loop fell through. Two iterations later the `if "error" in result` check tried to access an unbound `result`, the goroutine ended quietly, and the delegation stayed at FAILED with error="None". The LLM checking status saw "failed" + the platform's "Delegation queued — target at capacity" log line in chat context, concluded the peer was permanently unavailable, and bypassed delegation to do the work itself. Fix: explicit 202+queued branch. Adds DelegationStatus.QUEUED, marks the local delegation as QUEUED, mirrors to the platform, and returns cleanly without retrying. The retry loop is for transient transport errors — queueing is a real ack, not a failure to retry against (retrying would just re-queue the same task). check_delegation_status docstring extended with explicit per-status guidance: pending/in_progress → wait, queued → wait (peer busy on prior task, reply WILL arrive), completed → use result, failed → real error in error field; only fall back on failed, never queued. 2. canvas/src/components/tabs/chat/AgentCommsPanel.tsx: filter dropped every delegation row because it whitelisted only a2a_send / a2a_receive. activity_type='delegation' rows (written by the platform's /delegate handler with method='delegate' or 'delegate_result') never reached toCommMessage. User saw "No agent-to-agent communications yet" while 6+ delegations existed in the DB. Fix: include "delegation" in the both the initial filter and the WS push filter, plus a delegation branch in toCommMessage that maps the row as outbound (always — platform proxies on our behalf) and uses summary as the primary text source. Tests: - 3 new Python tests cover the 202+queued path: status becomes QUEUED not FAILED; no retry on queued (counted by URL match against the A2A target since the mock is shared across all AsyncClient calls); bare 202 without {queued:true} still falls through to the existing retry-then-FAILED path. - 3 new TS tests cover the delegation mapper: 'delegate' row maps as outbound to target with summary text; queued 'delegate_result' preserves status='queued' (load-bearing for the LLM's wait-vs-bypass decision); missing target_id returns null instead of rendering a ghost. Does NOT solve: the underlying single-SDK-session bottleneck that causes peers to queue in the first place. Tracked as task #102 (parallel SDK sessions per workspace) — real architectural work. This PR makes the runtime handle the queueing correctly so the LLM doesn't bail out, and makes the delegations visible in Agent Comms so operators can see what's happening. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 15:01:50 -07:00
Hongming Wang	09bfd9bdce	fix(tests): hoist _executor_mod alias so async wedge tests pass under --cov The Copilot Auto-fix in `5a8f42b4` addressed the duplicate-import lint by removing 'import claude_sdk_executor as _executor_mod' entirely, but the async wedge tests (test_execute_marks_wedge_, test_execute_clears_wedge_) still call _executor_mod._reset_sdk_wedge_for_test() etc. — so they failed with NameError once that line was removed. Restore the alias, but at the top of the file (alongside the other module- level imports) rather than at line 1248. The late-file binding was the proximate cause of the original CI failure: with --cov enabled (#1817), sys.settrace + the @pytest.mark.asyncio wrapper combination caused the late module-level binding to not be visible from inside the async test bodies, even though the binding existed at module-load time. Hoisting fixes that scope-resolution issue. Verified locally with the exact CI config (--cov-fail-under=86): 1280 passed, 2 xfailed — total coverage 90.25% 🤖 Generated with [Claude Code](https://claude.com/claude-code)	2026-04-26 10:57:21 -07:00
Hongming Wang	5a8f42b405	Potential fix for pull request finding 'Module is imported with 'import' and 'import from'' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>	2026-04-26 10:45:37 -07:00
Hongming Wang	d0f198b24f	merge: resolve staging conflicts (a2a_proxy + workspace_crud) Three files conflicted with staging changes that landed while this PR sat open. Resolved each by combining both intents (not picking one side): - a2a_proxy.go: keep the branch's idle-timeout signature (workspaceID parameter + comment) AND apply staging's #1483 SSRF defense-in-depth check at the top of dispatchA2A. Type-assert h.broadcaster (now an EventEmitter interface per staging) back to Broadcaster for applyIdleTimeout's SubscribeSSE call; falls through to no-op when the assertion fails (test-mock case). - a2a_proxy_test.go: keep both new test suites — branch's TestApplyIdleTimeout_ (3 cases for the idle-timeout helper) AND staging's TestDispatchA2A_RejectsUnsafeURL (#1483 regression). Updated the staging test's dispatchA2A call to pass the workspaceID arg introduced by the branch's signature change. - workspace_crud.go: combine both Delete-cleanup intents: * Branch's cleanupCtx detachment (WithoutCancel + 30s) so canvas hang-up doesn't cancel mid-Docker-call (the container-leak fix) * Branch's stopAndRemove helper that skips RemoveVolume when Stop fails (orphan sweeper handles) * Staging's #1843 stopErrs aggregation so Stop failures bubble up as 500 to the client (the EC2 orphan-instance prevention) Both concerns satisfied: cleanup runs to completion past canvas hangup AND failed Stop calls surface to caller. Build clean, all platform tests pass. 🤖 Generated with [Claude Code](https://claude.com/claude-code)	2026-04-26 10:43:22 -07:00
rabbitblood	4a4a740804	refactor(test_config): parametrize the 3 yaml-default cases (simplify on #2085 ) Collapses test_compliance_default_when_yaml_omits_block, _when_yaml_block_is_empty, _explicit_optout_still_works into one parametrized test_compliance_default_via_load_config with three ids (yaml_omits_block, yaml_block_empty, yaml_explicit_optout). The dataclass-default test stays separate (no tmp_path needed). Coverage and assertions identical; net -19 lines, same 4 logical cases. prompt_injection check moves out of per-case to a single tail-assert since no payload overrode it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 02:03:59 -07:00
rabbitblood	577294b8f4	test(config): lock ComplianceConfig default to owasp_agentic (#2059 ) PR #2056 flipped ComplianceConfig.mode default from "" to "owasp_agentic" so every shipped template gets prompt-injection detection + PII redaction by default. The flip is correct + already shipping, but no test asserts the new default — a silent revert (or a refactor that reintroduces the old "" default) would pass workspace/tests/ and ship a workspace with compliance silently off. Add 4 regression tests: - test_compliance_dataclass_default — ComplianceConfig() with no args returns mode='owasp_agentic' + prompt_injection='detect' - test_compliance_default_when_yaml_omits_block — load_config on a yaml without `compliance:` key still produces owasp_agentic - test_compliance_default_when_yaml_block_is_empty — load_config on `compliance: {}` (a common shape during template editing) still produces owasp_agentic; covers the load_config() `.get("mode", "owasp_agentic")` default-fill path - test_compliance_explicit_optout_still_works — `mode: ""` in yaml must disable compliance (the documented opt-out path) 23/23 tests pass locally (4 new + 19 existing). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 02:01:57 -07:00
Hongming Wang	2ee4b67cab	chore: third-pass review polish — empty-stream gate test + Callable type Pass 3 review came back Approve with two optional polish items. Both taken to fully converge the loop: 1. Regression test for the empty-stream wedge-clear gate (added in `3c4eef49`). A degenerate stream that iterates without raising but emits NEITHER an AssistantMessage NOR a ResultMessage must NOT clear the wedge flag — pre-set wedge persists, the next heartbeat still reports runtime_state="wedged". Pins the gate against future regression. 2. Replaced the type annotation `"dict[str, callable[[dict], str]]"` (lowercase `callable`, string-quoted) with the proper `dict[str, Callable[[dict], str]]` using `Callable` from `collections.abc`. Benign before (`from __future__ import annotations` makes the annotation a string Python never evaluates), but pyright/mypy may flag the lowercase form. 65 Python tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 08:52:32 -07:00
Hongming Wang	892de784b3	fix: review-driven hardening of wedge detector + idle timeout + progress feed Bundle review of pieces 1/2/3 surfaced two critical issues plus a handful of required + optional fixes. All addressed. Critical: 1. Migration 043 was missing 'paused' and 'hibernated' from the workspace_status enum. Both are real production statuses written by workspace_restart.go (lines 283 and 406), introduced by migration 029_workspace_hibernation. The original `USING status::workspace_status` cast would have errored mid-transaction on any production DB containing those values. Added both. Also added `SET LOCAL lock_timeout = '5s'` so the migration aborts instead of stalling the workspace fleet behind a slow SELECT. 2. The chat activity-feed window kept only 8 lines, and a single multi-tool turn (Read 5 files + Grep + Bash + Edit + delegate) easily flushed older context before the user could read it. Extracted appendActivityLine to chat/activityLog.ts with a 20-line window AND consecutive-duplicate collapse (same tool on the same target twice in a row is noise, not new progress). 5 unit tests pin the behavior. Required: 3. The SDK wedge flag was sticky-only — a single transient Control-request-timeout from a flaky network blip locked the workspace into degraded for the whole process lifetime, even when the next query() would have succeeded. Added _clear_sdk_wedge_on_success(), called from _run_query's success path. The next heartbeat after a working query reports runtime_state empty and the platform recovers the workspace to online without a manual restart. New regression test. 4. _report_tool_use now sets target_id = WORKSPACE_ID for self- actions, matching the convention other self-logged activity rows use. DB consumers joining on target_id see a well-defined value instead of NULL. Optional taken: 5. Tightened _WEDGE_ERROR_PATTERNS from "control request timeout" to "control request timeout: initialize" — suffix-anchored so a future SDK error on an in-flight tool-call control message doesn't get misclassified as the unrecoverable post-init wedge. 6. Dropped the redundant "context canceled" substring fallback in isUpstreamBusyError. errors.Is(err, context.Canceled) is the typed check; the substring would also match healthy client-side aborts, which we don't want classified as upstream-busy. Verified: 1010 canvas tests + 64 Python tests + full Go suite pass; migration applies cleanly on dev DB with all 8 enum values; reverse migration restores TEXT. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 08:43:10 -07:00
Hongming Wang	4eb09e2146	feat(platform,workspace): SDK-wedge detection + workspace_status ENUM Heartbeat lies. The asyncio task that POSTs /registry/heartbeat lives in its own process slot, so a workspace whose claude_agent_sdk has wedged on `Control request timeout: initialize` keeps reporting "online" — every chat send hangs the full 5-min platform deadline even though the runtime is dead in the water. This commit teaches the workspace to admit it's wedged and the platform to honor that admission by flipping status → degraded. Five layers, all in one commit because they share a contract: 1. Migration 043 — convert workspaces.status from free-form TEXT to a real `workspace_status` Postgres ENUM with the 6 values production code actually writes (provisioning, online, offline, degraded, failed, removed). Locks the value set; future typo writes error at the DB instead of silently storing rogue strings. Down migration reverts to TEXT and drops the type. 2. workspace-server/internal/models — `HeartbeatPayload` gains a `runtime_state string` field. Empty = healthy. Currently the only non-empty value the handler honors is "wedged"; future symptoms can extend without another migration. 3. workspace-server/internal/handlers/registry.go — `evaluateStatus` gains a wedge branch BEFORE the existing error_rate >= 0.5 path: if `RuntimeState=="wedged"` and currently online, flip to degraded and broadcast WORKSPACE_DEGRADED with the wedge sample error. Recovery (`degraded → online`) now requires BOTH error_rate < 0.1 AND runtime_state cleared, so a workspace still reporting wedged stays degraded even when its error count happens to be 0 (the wedge captures a runtime state, not an error count). 4. workspace/claude_sdk_executor.py — module-level `_sdk_wedged_reason` flag set when execute()'s catch block sees an error matching `_WEDGE_ERROR_PATTERNS` (currently just "control request timeout"). Sticky for the process lifetime; the SDK's internal client-process state is corrupted on this error and only a workspace restart (= new Python process = fresh module state) clears it. Helpers `is_wedged()` / `wedge_reason()` / `_reset_sdk_wedge_for_test()` exposed. 5. workspace/heartbeat.py — heartbeat body now layers on `_runtime_state_payload()` for both the happy path and the 401-retry path. Lazy-imports claude_sdk_executor so non-Claude runtimes (where the module may not even be importable) keep working unchanged. Canvas required no changes — `STATUS_CONFIG.degraded` was already defined in design-tokens.ts (amber dot, "Degraded" label) and WorkspaceNode.tsx already renders `lastSampleError` underneath the status pill when status === "degraded". The existing wiring just never fired because nothing was writing degraded in this code path. Tests: - 3 Go handler tests for the new transitions (online → degraded on wedged, degraded stays put while still wedged, degraded → online after wedge clears) - 5 Python wedge-detector tests (default clean, mark sets flag, sticky-first-wins, execute() flips on Control request timeout, execute() does NOT flip on unrelated errors) - Migration smoke-tested against the local dev DB (3 existing rows, all enum-compatible; migration applied cleanly, post-state has the column as workspace_status type and the index preserved) Verified: 79 Python tests pass; full Go test suite passes; migration applies clean on a real DB; reverse migration restores the column to TEXT. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 00:59:15 -07:00
Hongming Wang	c159d85eb5	fix(a2a): review-driven hardening — prefix-anchored type check, error_detail cap, shared hint module Three required fixes from the bundle review of `391e1872`: 1. workspace/a2a_client.py: substring `type_name in msg` could miss the diagnostic prefix when an exception's message embedded a different class name mid-string (e.g. `OSError("see ConnectionError below")` → printed as plain msg, type lost). Switched to a prefix-anchored check (`msg.startswith(f"{type_name}:")` etc.) so the type label is always added when not already at the start of the message. 2. workspace/a2a_tools.py: `activity_logs.error_detail` is unbounded TEXT on the platform (handlers/activity.go does not validate length). A buggy or hostile peer could stream arbitrarily large error messages into the caller's activity log. Cap at 4096 chars at the producer — comfortably above any real exception traceback, well below an obvious-DoS threshold. 3. New regression test for JSON-RPC `code=0` — pins the `code is not None` semantics so the code is preserved in the detail rather than collapsing into the no-code path. Code=0 is not valid per the spec, but a malformed peer can still emit it and we want it visible for diagnosis. Plus one optional taken: extracted the A2A-error → hint mapping into canvas/src/components/tabs/chat/a2aErrorHint.ts. The two prior copies (AgentCommsPanel.inferCauseHint + ActivityTab.inferA2AErrorHint) had already drifted — Activity tab gained `not found`/`offline` cases the chat panel never picked up, AgentCommsPanel handled empty-input explicitly while Activity didn't. The shared module is the merged superset, with 10 unit tests pinning each named pattern + the "most specific first" ordering (Claude SDK wedge wins over generic timeout). Skipped (per analysis): - Unicode-naive 120-char slice — Python str[:N] slices on code points, not bytes. Safe. - Nested [A2A_ERROR] confusion — non-issue per reviewer; outer prefix winning still produces a structured render. - MessagePreview + JsonBlock dual render on errors — intentional drilldown; raw JSON is below the fold for operators who need it. - console.warn dedup — refetches don't happen per-event so spam risk is low. - str(data)[:200] materialization — A2A response bodies aren't typically MB-sized. Verified: 1005 canvas tests pass (10 new hint tests); 10 Python send_a2a_message tests pass (1 new for code=0); tsc clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 23:47:44 -07:00
Hongming Wang	391e187281	fix(a2a,canvas): make delivery failures comprehensive instead of "[A2A_ERROR] " Symptom: Activity tab and Agent Comms surfaced bare "[A2A_ERROR] " (prefix + nothing) for failed delegations. Operator had no signal to act on — no exception type, no target, no hint about what went wrong, no next step. Fix is in three layers. 1. workspace/a2a_client.py — every error path now produces an actionable detail string: - except branch: some httpx exceptions (RemoteProtocolError, ConnectionReset variants) stringify to "". Pre-fix the catch was `f"{_A2A_ERROR_PREFIX}{e}"` → bare prefix. Now falls back to `<TypeName> (no message — likely connection reset or silent timeout)` and always appends `[target=<url>]` for traceability in chained delegations. - JSON-RPC error branch: previously dropped error.code on the floor and printed "unknown" when message was missing. Now surfaces both, including the well-defined "JSON-RPC error with no message (code=N)" path. - "neither result nor error" branch: pre-fix returned str(payload) which the canvas rendered as a successful response block. Now tagged as A2A_ERROR with a payload snippet so downstream UI routes through the error path. 2. workspace/a2a_tools.py — tool_delegate_task now passes error_detail (the stripped error message) through to the activity-log POST. The platform's activity_logs.error_detail column is the canvas's red error chip source; populating it makes the failure visible in the row header without the user having to expand into raw response_body JSON. The summary line also gets a 120-char prefix of the cause so the collapsed row reads "React Engineer failed: ConnectionResetError: ... [target=...]" instead of "React Engineer failed". 3. canvas/src/components/tabs/ActivityTab.tsx — MessagePreview now detects [A2A_ERROR]-prefixed bodies and renders a structured error block (red chip, stripped detail, cause hint) instead of the previous gray text-block that showed the literal "[A2A_ERROR]" string. inferA2AErrorHint mirrors the patterns from AgentCommsPanel.inferCauseHint so the same symptom reads the same way in both surfaces (Claude SDK init wedge → restart workspace; timeout → busy/stuck; connection-reset → transient blip then check logs). Tests: 9 send_a2a_message tests pass (including a new regression test for the empty-stringifying-exception case that the user reported); 995 canvas tests pass; tsc clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 23:40:05 -07:00
Hongming Wang	65b531acf6	fix(workspace): tag self-originated A2A POSTs with X-Workspace-ID Workspace runtime fired four classes of A2A request to the platform without the X-Workspace-ID header that identifies the source workspace: heartbeat self-messages, initial_prompt, idle-loop fires, and peer-to-peer A2A from runtime tools. The platform's a2a_receive logger keys source_id off that header — without it, every such row was written with source_id=NULL, which the canvas's My Chat tab filters as ?source=canvas (i.e. "user typed this") and rendered the internal triggers as if the human user had sent them. The "Delegation results are ready..." heartbeat trigger was visible to end users in the chat history; delegate_task A2A calls between agents were misclassified the same way. Centralise the header construction in a new platform_auth helper self_source_headers(workspace_id) that returns auth_headers() PLUS {X-Workspace-ID: <id>}. Apply it to: - heartbeat.py self-message (refactored from inline header dict) - main.py initial_prompt POST - main.py idle_prompt POST - a2a_client.py send_a2a_message (peer A2A from runtime) - builtin_tools/a2a_tools.py delegate_task (was missing ALL headers) Tests: - test_heartbeat.py asserts the X-Workspace-ID header is set on the self-message POST. - test_a2a_tools_module.py asserts the same on delegate_task POSTs; FakeClient.post mocks updated to accept the headers kwarg. Production effect lands the moment workspace containers are rebuilt with this code; existing rows in activity_logs keep their NULL source_id (legacy data). The canvas-side filter (#follow-up) covers the historical-rows case until backfill. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 19:54:43 -07:00
Hongming Wang	94d9331c76	feat(canvas+platform): chat attachments, model selection, deploy/delete UX Session's accumulated UX work across frontend and platform. Reviewable in four logical sections — diff is large but internally cohesive (each section fixes a gap the next one depends on). ## Chat attachments — user ↔ agent file round trip - New POST /workspaces/:id/chat/uploads (multipart, 50 MB total / 25 MB per file, UUID-prefixed storage under /workspace/.molecule/chat-uploads/). - New GET /workspaces/:id/chat/download with RFC 6266 filename escaping and binary-safe io.CopyN streaming. - Canvas: drag-and-drop onto chat pane, pending-file pills, per-message attachment chips with fetch+blob download (anchor navigation can't carry auth headers). - A2A flow carries FileParts end-to-end; hermes template executor now consumes attachments via platform helpers. ## Platform attachment helpers (workspace/executor_helpers.py) Every runtime's executor routes through the same helpers so future runtimes inherit attachment awareness for free: - extract_attached_files — resolve workspace:/file:///bare URIs, reject traversal, skip non-existent. - build_user_content_with_files — manifest for non-image files, multi-modal list (text + image_url) for images. Respects MOLECULE_DISABLE_IMAGE_INLINING for providers whose vision adapter hangs on base64 payloads (MiniMax M2.7). - collect_outbound_files — scans agent reply for /workspace/... paths, stages each into chat-uploads/ (download endpoint whitelist), emits as FileParts in the A2A response. - ensure_workspace_writable — called at molecule-runtime startup so non-root agents can write /workspace without each template having to chmod in its Dockerfile. Hermes template executor + langgraph (a2a_executor.py) + claude-code (claude_sdk_executor.py) all adopt the helpers. ## Model selection & related platform fixes - PUT /workspaces/:id/model — was 404'ing, so canvas "Save" silently lost the model choice. Stores into workspace_secrets (MODEL_PROVIDER), auto-restarts via RestartByID. - applyRuntimeModelEnv falls back to envVars["MODEL_PROVIDER"] so Restart propagates the stored model to HERMES_DEFAULT_MODEL without needing the caller to rehydrate payload.Model. - ConfigTab Tier dropdown now reads from workspaces row, not the (stale) config.yaml — fixes "badge shows T3, form shows T2". ## ChatTab & WebSocket UX fixes - Send button no longer locks after a dropped TASK_COMPLETE — `sending` no longer initializes from data.currentTask. - A2A POST timeout 15 s → 120 s. LLM turns routinely exceed 15 s; the previous default aborted fetches while the server was still replying, producing "agent may be unreachable" on success. - socket.ts: disposed flag + reconnectTimer cancellation + handler detachment fix zombie-WebSocket in React StrictMode. - Hermes Config tab: RUNTIMES_WITH_OWN_CONFIG drops 'hermes' — the adaptor's purpose IS the form, banner was contradictory. - workspace_provision.go auto-recovery: try <runtime>-default AND bare <runtime> for template path (hermes lives at the bare name). ## Org deploy/delete animation (theme-ready CSS) - styles/theme-tokens.css — design tokens (durations, easings, colors). Light theme overrides by setting only the deltas. - styles/org-deploy.css — animation classes + keyframes, every value references a token. prefers-reduced-motion respected. - Canvas projects node.draggable=false onto locked workspaces (deploying children AND actively-deleting ids) — RF's authoritative drag lock; useDragHandlers retains a belt-and- braces check. - Organ cancel button (red pulse pill on root during deploy) cascades via existing DELETE /workspaces/:id?confirm=true. - Auto fit-view after each arrival, debounced 500 ms so rapid sibling arrivals coalesce into one fit (previous per-event fit made the viewport lurch continuously). - Auto-fit respects user-pan — onMoveEnd stamps a user-pan timestamp only when event !== null (ignores programmatic fitView) so auto-fits don't self-cancel. - deletingIds store slice + useOrgDeployState merge gives the delete flow the same dim + non-draggable treatment as deploy. - Platform-level classNames.ts shared by canvas-events + useCanvasViewport (DRY'd 3 copies of split/filter/join). ## Server payload change - org_import.go WORKSPACE_PROVISIONING broadcast now includes parent_id + parent-RELATIVE x/y (slotX/slotY) so the canvas renders the child at the right parent-nested slot without doing any absolute-position walk. createWorkspaceTree signature gains relX, relY alongside absX, absY; both call sites updated. ## Tests - workspace/tests/test_executor_helpers.py — 11 new cases covering URI resolution (including traversal rejection), attached-file extraction (both Part shapes), manifest-only vs multi-modal content, large-image skip, outbound staging, dedup, and ensure_workspace_writable (chmod 777 + non-root tolerance). - workspace-server chat_files_test.go — upload validation, Content-Disposition escaping, filename sanitisation. - workspace-server secrets_test.go — SetModel upsert, empty clears, invalid UUID rejection. - tests/e2e/test_chat_attachments_e2e.sh — round-trip against a live hermes workspace. - tests/e2e/test_chat_attachments_multiruntime_e2e.sh — static plumbing check + round-trip across hermes/langgraph/claude-code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 13:27:51 -07:00
molecule-ai[bot]	35bcad9204	feat(workspace): migrate a2a-sdk from 0.3.x to 1.0.0 (KI-009) (#1974 ) * feat(workspace): migrate a2a-sdk from 0.3.x to 1.0.0 (KI-009) Migrates all workspace code from a2a-sdk v0.3.x to v1.0.0, following the official migration guide from a2aproject/a2a-python. Breaking changes applied: - A2AStarletteApplication → Starlette route factory (create_agent_card_routes + create_jsonrpc_routes) - AgentCard.url removed; url+protocol now in supported_protocols[].url - AgentCapabilities fields renamed to snake_case (pushNotifications→push_notifications, stateTransitionHistory→state_transition_history) - AgentCard.defaultInputModes/outputModes → default_input_modes/output_modes - TaskState.canceled → TaskState.TASK_STATE_CANCELED - a2a.utils → a2a.helpers - Part(root=TextPart(text=t)) → Part(text=t) (TextPart removed) Files changed: - requirements.txt: pinned >=1.0.0,<2.0 - main.py: Starlette route factory + AgentCard restructure - a2a_executor.py: Part() + TaskState + helpers import - hermes_executor.py: TaskState + helpers import - google-adk/adapter.py: TaskState + helpers import - cli_executor.py: helpers import - claude_sdk_executor.py: helpers import - tests/conftest.py: a2a.helpers mock stub - tests/test_a2a_executor.py: TaskState enum key - adapters/google-adk/test_adapter.py: Part + helpers stub Refs: KI-009 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(test): update _TaskState mock to a2a-sdk v1 enum name (TASK_STATE_CANCELED) --------- Co-authored-by: Molecule AI Tech Researcher <tech-researcher@agents.moleculesai.app> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>	2026-04-24 04:43:17 +00:00
Molecule AI Plugin-Dev	61c5f8ad9a	feat(plugin): implement MCPServerAdaptor (issue #847 ) Rule-of-three threshold met: 4 plugin proposals (molecule-firecrawl #512, molecule-github-mcp #520, molecule-browser-use #553, mcp-connector #573) all independently shipped the same mcpServers-adapter pattern. Adds MCPServerAdaptor to builtins.py — plugins wrapping an MCP server now declare `from plugins_registry.builtins import MCPServerAdaptor as Adaptor` in their per-runtime adapter file. The adaptor: - Merges mcpServers from settings-fragment.json into <configs>/.claude/settings.json (deep-merge so multiple plugins' servers coexist). - Optionally ships skills/rules/setup.sh via AgentskillsAdaptor delegation. - On uninstall: removes skills/rules but intentionally leaves mcpServers entries in settings.json (users may share configs with other tools or have manually curated entries). Also fixes _deep_merge_hooks: non-hook top-level keys that are dicts (e.g. mcpServers) are now deep-merged with existing values instead of being skipped via setdefault. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-24 01:42:13 +00:00
Molecule AI Marketing Lead	e00797ba35	fix(security): prevent cross-tenant memory contamination in commit_memory/recall_memory (GH#1610) Two critical gaps in a2a_tools.py let any tenant workspace poison org-wide (GLOBAL) memory and bypass all RBAC enforcement: 1. tool_commit_memory had no RBAC check — any agent could write any scope. 2. tool_commit_memory had no root-workspace enforcement for GLOBAL scope — Tenant A could POST scope=GLOBAL and pollute the shared memory store that Tenant B's agent reads as trusted context. Fix adds: - _ROLE_PERMISSIONS table (mirrors builtin_tools/audit.py) so a2a_tools has isolated RBAC logic without depending on memory.py. - _check_memory_write_permission() / _check_memory_read_permission() helpers: evaluate RBAC roles from WorkspaceConfig; fail closed (deny) on errors. - _is_root_workspace() / _get_workspace_tier(): read WorkspaceConfig.tier (0 = root/org, 1+ = tenant) from config.yaml; fall back to WORKSPACE_TIER env var. - tool_commit_memory now (a) checks memory.write RBAC, (b) rejects GLOBAL scope for non-root workspaces, (c) embeds workspace_id in the POST body so the platform can namespace-isolate and audit cross-workspace writes. - tool_recall_memory now checks memory.read RBAC before any HTTP call, and always sends workspace_id as a GET param for platform cross-validation. Security regression tests added: - GLOBAL scope denied for non-root (tier>0) workspaces. - RBAC denial blocks all scope levels (including LOCAL) on write. - RBAC denial blocks recall entirely. - workspace_id present in POST body and GET params. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-23 10:21:34 -07:00
Hongming Wang	1aea013e20	fix(ci): unblock main CI on ubuntu-latest — IPv6-safe addr + MagicMock seed Two latent bugs the self-hosted Mac mini had been hiding. Both caught by the newer toolchain on ubuntu-latest runners after PR #1626. 1. workspace-server/internal/handlers/terminal.go:442 `fmt.Sprintf("%s:%d", host, port)` flagged by go vet as unsafe for IPv6 (it omits the required [::] brackets). Replaced with `net.JoinHostPort(host, strconv.Itoa(port))` which handles both IPv4 and IPv6 correctly. No runtime behaviour change — the only call site passes "127.0.0.1", so the bug would never trigger in practice, but vet is right to flag it as a latent correctness issue. 2. workspace/tests/test_a2a_executor.py::test_set_current_task_updates_heartbeat `MagicMock()` auto-creates attributes on first access, so `getattr(heartbeat, "active_tasks", 0)` in shared_runtime.py returned a MagicMock rather than the default 0. Adding 1 to a MagicMock returns another MagicMock, so the assertion `heartbeat.active_tasks == 1` never held. Seeding `heartbeat.active_tasks = 0` before the first call makes getattr() return a real int, matching how the real HeartbeatLoop class initialises itself. Both pre-existed on main and were hidden by the older Python / Go toolchains on the Mac mini runner. Verified locally (venv pytest pass, `go vet ./...` + `go build ./...` clean on workspace-server). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 13:18:46 -07:00
molecule-ai[bot]	859d676f70	fix(CI): correct BASE in detect-changes (PR/push race); catch RuntimeError in conftest (#1473 ) - ci.yml: replace if/else BASE assignment with GITHUB_BASE_REF default + pull_request base.sha override pattern. Prevents push events from overwriting the correct PR base SHA when both events fire together. - conftest.py: catch RuntimeError in addition to ImportError when importing coordinator.py, which raises RuntimeError at import time when WORKSPACE_ID is not set (before the ImportError guard). Co-authored-by: Molecule AI Release Manager <release-manager@agents.moleculesai.app> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-21 18:15:45 +00:00
molecule-ai[bot]	4675402e58	feat(workspace): pre-stop serialization for pause/resume (closes #1386 ) Add a pre-stop hook that captures agent state before container exit and writes a scrubbed snapshot to /configs/.agent_snapshot.json. On restart, the snapshot is loaded and the adapter's restore_state() is called before the A2A server starts. - New lib/pre_stop.py: build_snapshot / write_snapshot / read_snapshot / delete_snapshot + _scrub_value deep-scrubber (uses lib.snapshot_scrub to redact API keys, tokens, and sandbox output before persisting) - BaseAdapter.pre_stop_state(): captures _executor._session_id and recent transcript_lines; overridden by adapters with richer in-memory state - BaseAdapter.restore_state(): stores snapshot fields as adapter attrs for create_executor() to pick up - main.py: calls pre_stop serialization in finally block (after server serves) and restore_state() after adapter setup, before server starts - Added 12 unit tests covering scrub, read/write, adapter integration Co-authored-by: Molecule AI Infra-Runtime-BE <infra-runtime-be@agents.moleculesai.app> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-21 12:40:44 +00:00
rabbitblood	5bc3edfbdd	Fix test assertions to account for HMA instructions in system prompt Mock get_hma_instructions in exact-match tests so they don't break when HMA content is appended. Add a dedicated test for HMA inclusion. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-20 01:05:05 -07:00
Hongming Wang	3976361483	feat(workspace): snapshot secret scrubber (closes #823 ) Sub-issue of #799, security condition C4. Standalone module in workspace/lib/snapshot_scrub.py with three public functions: - scrub_content(str) → str: regex-based redaction of secret patterns - is_sandbox_content(str) → bool: detect run_code tool output markers - scrub_snapshot(dict) → dict: walk memories, scrub each, drop sandbox entries Patterns covered: sk-ant-/sk-proj-, ghp_/ghs_/github_pat_, AKIA, cfut_, mol_pk_, ctx7_, Bearer, env-var assignments, base64 blobs ≥33 chars. 21 unit tests, 100% coverage on new code. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-19 00:32:42 -07:00
Hongming Wang	39074cc4ae	chore: final open-source cleanup — binary, stale paths, private refs - Remove compiled workspace-server/server binary from git - Fix .gitignore, .gitattributes, .githooks/pre-commit for renamed dirs - Fix CI workflow path filters (workspace-template → workspace) - Replace real EC2 IP and personal slug in test_saas_tenant.sh - Scrub molecule-controlplane references in docs - Fix stale workspace-template/ paths in provisioner, handlers, tests - Clean tracked Python cache files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-18 00:38:55 -07:00
Hongming Wang	d8026347e5	chore: open-source restructure — rename dirs, remove internal files, scrub secrets Renames: - platform/ → workspace-server/ (Go module path stays as "platform" for external dep compat — will update after plugin module republish) - workspace-template/ → workspace/ Removed (moved to separate repos or deleted): - PLAN.md — internal roadmap (move to private project board) - HANDOFF.md, AGENTS.md — one-time internal session docs - .claude/ — gitignored entirely (local agent config) - infra/cloudflare-worker/ → Molecule-AI/molecule-tenant-proxy - org-templates/molecule-dev/ → standalone template repo - .mcp-eval/ → molecule-mcp-server repo - test-results/ — ephemeral, gitignored Security scrubbing: - Cloudflare account/zone/KV IDs → placeholders - Real EC2 IPs → <EC2_IP> in all docs - CF token prefix, Neon project ID, Fly app names → redacted - Langfuse dev credentials → parameterized - Personal runner username/machine name → generic Community files: - CONTRIBUTING.md — build, test, branch conventions - CODE_OF_CONDUCT.md — Contributor Covenant 2.1 All Dockerfiles, CI workflows, docker-compose, railway.toml, render.yaml, README, CLAUDE.md updated for new directory names. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-18 00:24:44 -07:00

27 Commits