molecule-core

Author	SHA1	Message	Date
Hongming Wang	dbd086c7ad	test(mcp): comment empty except in bridge test cleanup Address github-code-quality review on PR #2465: explain why the OSError swallow in pipe teardown is intentional (best-effort cleanup of a possibly-already-closed fd). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 16:07:33 -07:00
Hongming Wang	ea206043d8	feat(mcp): universal inbound delivery — instructions-driven polling + optional push Why this exists --------------- Live evidence on 2026-05-01 caught a regression latent in #46's "push-feel inbound" closure: standard `claude` launches without `--dangerously-load-development-channels` silently drop our `notifications/claude/channel` emissions, so canvas/peer messages sat in the wheel inbox and never reached the agent loop until manual `inbox_peek`. The flag is research-preview-only; non-Claude-Code MCP clients (Cursor, Cline, OpenCode, hermes-agent, codex) never receive the notification at all because the method namespace is Claude- specific. Push-only delivery shipped as the universal contract is not actually universal. What this changes ----------------- Adds a poll path that works on every spec-compliant MCP client. The `initialize` `instructions` field — read by every client and surfaced to the agent's system prompt automatically — now tells the agent to call `wait_for_message(timeout_secs=N)` at the start of every turn. Push remains as the strictly-better delivery for hosts that opt in (Claude Code with the dev flag or a future allowlist entry), but is no longer load-bearing. Both paths converge on the same `inbox_pop` ack so duplicate-delivery on a push+poll race is impossible: whoever surfaces the message to the agent first pops it, the other side returns empty. Operator knob ------------- `MOLECULE_MCP_POLL_TIMEOUT_SECS` controls per-turn poll blocking (default 2s). 0 disables polling for push-only Claude Code with the dev flag. Above 60 clamps to 60 — protects against an accidental five-minute stall per turn. Resolved fresh on every `initialize` so a relaunch with new env is enough; no wheel rebuild required. Tests ----- - structural pins on the new instructions: `wait_for_message` + `timeout_secs` named, both PUSH PATH / POLL PATH labels present - env-resolution: default fallback, garbage fallback, negative fallback, 60s clamp - operator override: `MOLECULE_MCP_POLL_TIMEOUT_SECS=7` reaches the agent's instructions string - timeout=0 toggles to push-only-mode messaging (no wait_for_message call asked of the agent) - existing pins on push path, reply tools, prompt-injection defense, meta attributes — all preserved Successor to #46. Closure milestone for this PR (per feedback_close_on_user_visible_not_merge.md): launched `claude` against the published wheel, sent a canvas message, observed the agent surfaces the message inline at the start of its next turn without me running `inbox_peek` — verified live before declaring done. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 15:32:57 -07:00
Hongming Wang	a3a496bced	test(mcp): pin inbox→stdout bridge end-to-end with three failure-mode tests Closes the dynamic-coverage gap on the `notifications/claude/channel` push-UX bridge — until now we had static pins on the wire shape (_build_channel_notification) and the initialize handshake, but the threading + asyncio + stdout chain that ships notifications to the host was never exercised under realistic conditions. The three failure modes anticipated in #2444 §2 are each now pinned: test_inbox_bridge_emits_channel_notification_to_writer Drives a fake inbox event from a daemon thread, asserts the notification lands on a real os.pipe-backed asyncio writer with the correct JSON-RPC envelope. Catches: bridge wired up incorrectly (no-op _on_inbox_message), run_coroutine_threadsafe drift, _build_channel_notification call missing. test_inbox_bridge_swallows_closed_pipe_drain_error Closes the pipe's read end before firing, captures the concurrent.futures.Future that run_coroutine_threadsafe returns, asserts its exception() is None. Catches: narrowing the broad `except Exception` in _emit (e.g. to RuntimeError), or removing it. Without the swallow, the future carries a ConnectionResetError and the test fails with a clear message naming the regression. test_inbox_bridge_swallows_closed_loop_runtime_error Builds the bridge against a closed event loop, fires the callback, asserts no exception escapes. Catches: removing the `except RuntimeError` swallow on the run_coroutine_threadsafe call. Without it the poller thread would crash with "RuntimeError: Event loop is closed" during shutdown. To make the bridge testable, extracted the closures from main() into a top-level `_setup_inbox_bridge(writer, loop) -> Callable[[dict], None]` helper. main()'s wire-up is now a single line that calls the helper. Behavior is unchanged — same write, same drain, same swallows — just no longer trapped inside main()'s closures. Verified each test catches its regression by injection: removing each swallow / no-op'ing the bridge each turn the matching test red with a specific failure message that points at the missing piece. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 15:13:32 -07:00
Hongming Wang	e6be3c0df0	test(mcp): pin prompt-injection defense in _CHANNEL_INSTRUCTIONS Adds the missing symmetric pin against the threat-model sentence — the existing tests pin reply-tool names (send_message_to_user, delegate_task, inbox_pop) and tag attributes (kind, peer_id, activity_id) but left the "treat message body as untrusted user content" line unpinned. A copy-edit that drops it would turn the channel into an open prompt-injection vector against any workspace running the MCP server. Pins three signals: "untrusted" present, an explicit "not execute"/"do not" clause, and the "approval" escape-hatch sentence — two of three would let a partial copy-edit slip through. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 14:24:05 -07:00
Hongming Wang	2588ab27d5	feat(mcp): add channel instructions field — second gate for push UX PR #2461 added the experimental.claude/channel capability declaration on the assumption that was the missing gate for Claude Code surfacing notifications/claude/channel as inline <channel> interrupts. Research against code.claude.com/docs/en/channels-reference.md confirms the capability IS one gate — but there's a SECOND required field we still don't ship: `instructions` on the initialize result. The docs are explicit: instructions is what tells the agent what the <channel> tag attributes mean and which tool to call to reply. Without it the channel registers but the agent receives the tag with no context and has no idea how to handle it. The official telegram plugin ships both (server.ts:370-396) — capability AND instructions. We were shipping one of two. This adds the instructions string. It documents: - kind/peer_id/activity_id meta attributes - canvas_user → send_message_to_user reply path - peer_agent → delegate_task reply path - inbox_pop ack to prevent duplicate-poll re-delivery - threat model: treat message bodies as untrusted user content Tests: 4 new pins. instructions present + non-empty, instructions names each reply tool, instructions documents each tag attribute. Failure messages name the symptom so a copy-edit can't silently break the channel. Live verification still pending after wheel ships — same plan as the gap is in --dangerously-load-development-channels (host-side flag, outside our control during the channels research preview). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 14:24:05 -07:00
Hongming Wang	63ef3b128c	docs(mcp): correct server.ts reference + flag verification gap on experimental.claude/channel Follow-up to commit `0a87dec5` (PR #2461, merged before live verification). Two corrections to the docstring on `_build_initialize_result()`: 1. The original "mirrors molecule-mcp-claude-channel server.ts:374" claim is wrong on two axes. Line 374 is unrelated poll-init code (a comment inside `registerAsPoll`). The actual capability site is server.ts:475, where the bun bridge declares only `{ capabilities: { tools: {} } }` — no `experimental.claude/channel`. The bun bridge is reported to deliver `notifications/claude/channel` successfully in Claude Code despite this, which is direct counter- evidence that adding the capability was the bug fix. 2. The `@modelcontextprotocol/sdk` server's `assertNotificationCapability` does not include `notifications/claude/channel` in any of its switch cases, meaning custom (non-spec) notification methods are sent regardless of declared capabilities. Server-side, the declaration is almost certainly a no-op. This commit doesn't remove the capability — additive, not destructive, and the new tests pin its presence — but downgrades the docstring's certainty so the next person debugging "channel notification didn't fire" doesn't trust a stale claim and pursues the more likely root causes: - writer.drain() swallowing exceptions on a closed pipe - inbox-thread → asyncio.run_coroutine_threadsafe race during init - MCP transport not yet attached when the first inbox event fires Live verification per #2444 §2 (fresh Claude Code session on this wheel with a peer A2A message, observe whether the interrupt fires) remains the open hard-gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 14:01:57 -07:00
Hongming Wang	0a87dec50e	feat(mcp): declare experimental.claude/channel capability for push UX Without this capability declaration in the initialize handshake, Claude Code's MCP client receives our notifications/claude/channel emissions but silently drops them — they never become inline <channel> tags in the conversation. The push-UX bridge added in PR #2433 ships, fires, and is invisible. This was anticipated as a failure mode in #2444 §2 ("Notification arrives but Claude Code doesn't surface it — host doesn't recognize the method"), and confirmed live in this session: a canvas chat "hi" landed in the inbox queue (inbox_peek returned it) but never woke the agent until inbox_peek was called by hand. The contract matches molecule-mcp-claude-channel/server.ts:374 where the bun bridge declares the same experimental flag. Refactor: extracted _build_initialize_result() so the handshake shape is unit-testable. Pure function, no behavioral change beyond adding the experimental capability to the result. Tests: 3 new pins on the initialize result (capability presence, tools-still-there, protocolVersion stable). Closes the live- verification gap §2 of #2444. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 13:45:06 -07:00
Hongming Wang	b8fdbd9fab	fix(runtime): register configs_dir in TOP_LEVEL_MODULES + drop alias Wheel-build smoke gate detected `configs_dir` missing from scripts/build_runtime_package.py:TOP_LEVEL_MODULES. Without it the build would ship `import configs_dir` un-rewritten and every external-runtime install would die on `ModuleNotFoundError` at first import. Two callers used `import configs_dir as _configs_dir` to belt-and- suspenders against an imagined name collision, but the rewriter rejects `import X as Y` because the rewrite would produce `import molecule_runtime.X as X as Y` (invalid syntax). No actual collision exists (only docstring/comment references). Switched to plain `import configs_dir`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 13:13:57 -07:00
Hongming Wang	c636022d2f	fix(runtime): auto-fallback CONFIGS_DIR for non-container hosts (closes #2458 ) The runtime persists per-workspace state (`.auth_token`, `.platform_inbound_secret`, `.mcp_inbox_cursor`) under `/configs` — the workspace-EC2 mount path. Inside a container that's writable, agent-owned. Outside a container, `/configs` either doesn't exist or isn't writable by an unprivileged user. The default broke the external-runtime path (`pip install molecule-ai-workspace-runtime` + `molecule-mcp` on a Mac/Linux laptop). First heartbeat tries to persist `.platform_inbound_secret` and crashes: [Errno 30] Read-only file system: '/configs' The heartbeat thread logs and dies. Workspace flips offline within a minute. Operator sees no actionable error. Adds workspace/configs_dir.py — single resolution point with a tiered fallback: 1. CONFIGS_DIR env var, if set — explicit operator override (preserves existing tests + custom deployments verbatim). 2. /configs — if it exists AND is writable. In-container default; unchanged behavior for every prod workspace. 3. ~/.molecule-workspace — created with mode 0700 so per-file 0600 perms aren't undermined by a world-readable parent. Migrates the four readers (platform_auth, platform_inbound_auth, mcp_cli, inbox) to call configs_dir.resolve() instead of inlining `Path(os.environ.get("CONFIGS_DIR", "/configs"))`. Existing tests that assert the old `/configs`-as-default contract updated to assert the new contract: when CONFIGS_DIR is unset, path resolves to a writable location — `/configs` if present, fallback otherwise. Tests skip the fallback branch on hosts that DO have a writable `/configs` (CI containers). Verified the original repro is fixed: with no CONFIGS_DIR set on macOS, configs_dir.resolve() returns ~/.molecule-workspace, the dir exists, and writes succeed. Test suite: 1454 passed, 3 skipped, 2 xfailed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 13:07:55 -07:00
Hongming Wang	2e8892ebc4	fix(workspace): surface errno + path on chat-upload mkdir failure Production incident on hongming.moleculesai.app 2026-05-01T18:30Z — fresh-tenant signup chat upload returned 500 with the body {"error":"failed to prepare uploads dir"}. Diagnosis required SSM access to the workspace stderr to recover errno + actual path. The root-cause fix lives in claude-code template entrypoint (molecule-ai-workspace-template-claude-code#23 — pre-create the .molecule subtree as root before gosu drops to agent). This change is the diagnostic improvement: when mkdir fails for any reason in the future (EACCES, ENOSPC, EROFS, etc.), the response carries the errno + offending path so the operator inspecting browser devtools sees the real cause without needing SSM. Backwards compatible — top-level "error" key is unchanged so existing canvas / external alert rules continue to match. New fields are additive: path, errno, detail. Test pins the diagnostic shape so a future struct refactor can't silently drop these fields. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 11:47:53 -07:00
Hongming Wang	c06c4c0f56	Merge pull request #2450 from Molecule-AI/feat/observability-config-schema feat(config): observability block schema (#119 PR-1 of 4)	2026-05-01 05:20:11 +00:00
Hongming Wang	645c1862c4	feat(a2a-client): surface 410 Gone as 'removed' error so callers can re-onboard (#2429 ) Follow-up A to PR #2449 — that PR taught the platform to return 410 Gone for status='removed' workspaces; this PR teaches get_workspace_info to consume that signal. Before: every non-200 collapsed into {"error": "not found"}, which made the 2026-04-30 incident impossible to diagnose — the operator KNEW the workspace_id existed (they'd just registered it), but the runtime kept reporting "not found" for a deleted-but-not-purged row. After: 410 produces a distinct {"error": "removed", "id", "removed_at", "hint"} dict so callers (heartbeat-loop, channel bridge, dashboard tools) can surface "your workspace was deleted, re-onboard" instead of "not found". Falls back to a default hint if the platform body isn't parseable so the actionable signal doesn't depend on body shape parity. Two new tests: - TestGetWorkspaceInfo.test_410_returns_removed_with_hint - TestGetWorkspaceInfo.test_410_with_unparseable_body_falls_back_to_default_hint Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 22:08:08 -07:00
Hongming Wang	59902bce83	feat(config): add observability block schema (#119 PR-1 of 4) Hermes-style declarative block grouping cadence + verbosity knobs into one place. Schema-only in this PR — wiring into heartbeat.py and main.py lands in PR-3 of the #119 stack. Two fields with live consumers waiting: - heartbeat_interval_seconds (default 30, clamped to [5, 300]) → heartbeat.py:134 currently has hard-coded HEARTBEAT_INTERVAL = 30 - log_level (default "INFO", uppercased at parse) → main.py:465 currently has hard-coded log_level="info" Clamp band [5, 300] is intentional: sub-5s flooded the platform during IR-2026-03-11; >5min lets crashed workspaces look healthy long enough to mask failure. Coerce at parse so adapters and heartbeat.py can read the value without re-validating. Tests pin defaults, explicit YAML override, partial override, and parametrized clamp behavior (10 cases including garbage strings + None). Part of: task #119 (adopt hermes-style architecture) Stack: PR-1 schema → PR-2 event_log → PR-3 wire consumers → PR-4 skill compat	2026-04-30 21:58:45 -07:00
Hongming Wang	661eec2659	chore(smoke-mode): harden module-load + drop dead except clause Two follow-ups from the #2275 Phase 1 self-review: 1. `_SMOKE_TIMEOUT_SECS = float(os.environ.get(...))` was evaluated at module load. main.py imports smoke_mode unconditionally — before the is_smoke_mode() check — so a malformed MOLECULE_SMOKE_TIMEOUT_SECS env value would SystemExit every workspace boot, not just smoke runs. Wrapped in try/except with a 5.0 fallback. Probability of a typo'd env var hitting production is low (it's a CI-only knob), but the footgun is removed entirely. Regression test reloads the module under a malformed env value. 2. `_real_a2a_sdk_available()` caught (ImportError, AttributeError). `from X import Y` raises ImportError when Y is missing on X — never AttributeError. Dropped the unreachable branch. No behavior change for the happy path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 21:31:08 -07:00
Hongming Wang	aacaba024c	feat(wheel-smoke): exercise executor.execute() to catch lazy imports (#2275 ) The existing wheel-publish smoke (`wheel_smoke.py`) only IMPORTS `molecule_runtime.main` at module scope. Lazy imports buried inside `async def execute(...)` bodies (e.g. `from a2a.types import FilePart`) NEVER evaluate at static-import time — they crash at first message delivery in production. The 2026-04-2x v0→v1 a2a-sdk migration shipped 5 such regressions in templates that all looked fine at module-load smoke. This change adds `smoke_mode.py` plus a `MOLECULE_SMOKE_MODE=1` short-circuit in `main.py`: after `adapter.create_executor(...)`, the boot path invokes `executor.execute(stub_ctx, stub_queue)` once with a 5s timeout (`MOLECULE_SMOKE_TIMEOUT_SECS`). Healthy import tree → execution proceeds far enough to hit a network boundary and times out (exit 0). Broken lazy import → `ImportError` / `ModuleNotFoundError` from inside the executor body (exit 1). Other downstream errors (auth, validation) pass — those are caught by adapter-level tests, not this gate. Stub `(RequestContext, EventQueue)` is built from the real a2a-sdk so SendMessageRequest/RequestContext constructor changes also surface as import-tree failures (the regression class also includes "SDK refactored mid-publish"). The stub-build itself is wrapped — if it raises, that's a smoke fail too. Phase 2 (separate PR, molecule-ci) wires this into publish-template-image.yml so the publish gate runs the boot smoke against every template image before pushing the tag. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 21:21:18 -07:00
Hongming Wang	0b2ea0a50f	Merge pull request #2441 from Molecule-AI/feat/explicit-provider-field feat(config): add explicit `provider:` field alongside `model:` (PR-1 of stack)	2026-05-01 03:54:27 +00:00
Hongming Wang	067ad83ce5	feat(config): add explicit `provider:` field alongside `model:` Adds a top-level `provider` slug to WorkspaceConfig and RuntimeConfig so adapters can route to a specific gateway without re-implementing slug-prefix parsing across hermes / claude-code / codex. Resolution chain in load_config (mirrors how `model` resolves): 1. ``LLM_PROVIDER`` env var — what canvas Save+Restart sets so the operator's Provider dropdown choice survives a CP-driven restart (the regenerated /configs/config.yaml drops most user fields). 2. Explicit YAML ``provider:`` — operator pinned it in the file. 3. Derive from the model slug prefix for backward compat: ``anthropic:claude-opus-4-7`` → ``anthropic`` ``minimax/abab7-chat-preview`` → ``minimax`` bare model names → ``""`` (let the adapter decide). `runtime_config.provider` falls back to the top-level resolved provider, the same shape PR #2438 added for `runtime_config.model`. Why a separate field at all (we already parse the slug): - Custom model aliases without a recognizable prefix need an explicit signal — the canvas Provider dropdown writes it. - Adapters were each rolling their own slug-parse (hermes's derive-provider.sh, claude-code's adapter-default branch, etc.); one resolution point in load_config kills that drift class. - Canvas needs a stable storage field that doesn't get clobbered every time the user picks a new model. Backward-compatible: when `provider:` is absent, slug derivation keeps every existing config.yaml working without a migration. PR-1 of a multi-PR stack (Option B from RFC discussion). Subsequent PRs plumb the field through workspace-server env, CP user-data, adapters (hermes prefers explicit over derive-provider.sh), and canvas Provider dropdown UI. Tests cover all four resolution paths + runtime_config inheritance: - test_provider_default_empty_when_bare_model - test_provider_derived_from_colon_slug - test_provider_derived_from_slash_slug - test_provider_yaml_explicit_wins_over_derived - test_provider_env_override_beats_yaml_and_derived - test_runtime_config_provider_yaml_wins_over_top_level - test_provider_default_from_default_model Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 20:47:09 -07:00
Hongming Wang	6e92fe0a08	chore: rewriter unit tests + drop misleading noqa on `import inbox` Two small follow-ups to the PR #2433 → #2436 → #2439 incident chain. 1) `import inbox # noqa: F401` in workspace/a2a_mcp_server.py was misleading — `inbox` IS used (at the bridge wiring inside main()). F401 means "imported but unused", which would mask a real future F401 if the usage is removed. Drop the noqa, keep the explanatory block comment about the rewriter's `import X` → `import mr.X as X` expansion (and the `import X as Y` → `import mr.X as X as Y` trap the comment exists to prevent re-introducing). 2) scripts/test_build_runtime_package.py — 17 unit tests covering `rewrite_imports()` and `build_import_rewriter()` in scripts/build_runtime_package.py. Until now the function had zero coverage despite the entire wheel build depending on it. Tests pin: bare-import aliasing, dotted-import preservation, indented imports, from-imports (simple + dotted + multi-symbol + block), the `import X as Y` rejection added in PR #2436 (with comment- stripping + indented + comma-not-alias edge cases), allowlist anchoring (`a2a` ≠ `a2a_tools`), and end-to-end reproduction of the PR #2433 failing pattern + the #2436 fix pattern. 3) Wire scripts/test_.py into CI by adding a second discover pass to test-ops-scripts.yml. Top-level scripts/ tests live alongside their target file (parallels the scripts/ops/ test layout); the existing scripts/ops/ pass keeps running because scripts/ops/ has no __init__.py so a single discover from scripts/ root doesn't recurse. Two passes is simpler than retrofitting namespace packages. Path filter widened from `scripts/ops/` to `scripts/*` so PRs touching the build script trigger the new tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 20:45:32 -07:00
Hongming Wang	5ad41f63ce	Merge pull request #2438 from Molecule-AI/fix/runtime-config-model-fallback-clean fix(config): runtime_config.model falls back to top-level model	2026-05-01 03:31:20 +00:00
Hongming Wang	0070d0bd59	fix(config): runtime_config.model falls back to top-level model External feedback (2026-04-30): "Provisioner doesn't read model from config.yaml and doesn't set MODEL env var. Without MODEL, the adapter defaults to sonnet and bypasses the mimo routing." Confirmed accurate for SaaS workspaces. Trace: claude-code-default/adapter.py reads `runtime_config.model or "sonnet"` (and hermes reads HERMES_DEFAULT_MODEL via install.sh, which IS plumbed). For claude-code there's nothing — workspace/config.py loaded `runtime_config.model` only from YAML, ignoring MODEL_PROVIDER env. The CP user-data script regenerates /configs/config.yaml at every boot with only `name`, `runtime`, `a2a` keys (intentionally minimal so it doesn't carry stale state) — so any user-set runtime_config.model is wiped on every restart, and the adapter falls back to "sonnet" even when the user picked Opus in the canvas Config tab. Fix: when YAML omits runtime_config.model, fall back to the top-level resolved `model`, which already honors MODEL_PROVIDER env override. One-line in workspace/config.py. Now MODEL_PROVIDER → top-level model → runtime_config.model → adapter sees the user's selection. Sticky across CP-driven restarts; the canvas Save+Restart loop works as intended for every runtime, not just hermes. Tests: test_runtime_config_model_falls_back_to_top_level — top-level set, runtime_config empty → fallback wins test_runtime_config_model_yaml_wins_over_top_level — YAML explicit → fallback skipped (precedence) test_runtime_config_model_picks_up_env_via_top_level — full canvas Save+Restart simulation: env → top-level → runtime_config.model Negative-control verified: removing the `or model` flips both fallback tests red with the expected "" vs expected-model mismatch; restoring flips them green. The yaml-wins test passes either way (correctly, because precedence is preserved). Replaces closed PR #2435 — that PR's commit was on a contaminated branch and accidentally captured unrelated WIP changes (build script + a2a_mcp_server refactor) instead of this fix. Self-review caught it and closed the PR. This branch is clean off main + diff verified before push. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 20:28:50 -07:00
Hongming Wang	0acdf3bb56	fix(wheel): import inbox without alias to dodge rewriter collision PR #2433 (notifications/claude/channel) shipped 'import inbox as _inbox_module' inside a2a_mcp_server.py:main(). The build script's import rewriter expands plain 'import inbox' to 'import molecule_runtime.inbox as inbox', so the original source became 'import molecule_runtime.inbox as inbox as _inbox_module', which is invalid Python. Caught at the publish-runtime + PR-built-wheel-smoke gate (the SyntaxError trace is in run 25200422679). The wheel didn't ship to PyPI because publish-runtime's smoke-import step refused to install it, but staging is currently sitting on a broken-build commit until this fix-forward lands. Changes: - a2a_mcp_server.py: lift `import inbox` to top of file (rewriter produces clean `import molecule_runtime.inbox as inbox`), call inbox.set_notification_callback directly in main() - build_runtime_package.py: rewrite_imports() now raises ValueError when it sees 'import X as Y' for any X in the workspace allowlist, instead of silently producing a syntax-error wheel. Operator gets a clear actionable error at build time pointing at the offending line + suggested rewrites ('from X import …' or plain 'import X'). The build-time gate (this PR's rewriter check) catches the regression class earlier than the smoke-time gate (PR #2433's failure). Adding 'PR-built wheel + import smoke' to staging branch protection's required checks is filed separately so this class doesn't merge again. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 20:21:54 -07:00
Hongming Wang	0a3ec53f34	feat(mcp): notifications/claude/channel for push-feel inbox UX Adds a notification seam to the universal molecule-mcp wheel so push- notification-capable MCP hosts (Claude Code today; any compliant client tomorrow) get inbound A2A messages as conversation interrupts instead of having to poll wait_for_message / inbox_peek. Wire-up: - inbox.py: module-level _NOTIFICATION_CALLBACK + set_notification_callback() Fires from InboxState.record() AFTER lock release, with same dict shape inbox_peek returns. Best-effort — a raising callback never prevents the message from landing in the queue. - a2a_mcp_server.py: _build_channel_notification() pure helper + bridge wiring in main() that schedules notifications via asyncio.run_coroutine_threadsafe (poller is a daemon thread, MCP loop is asyncio). - Method name 'notifications/claude/channel' matches the contract documented in molecule-mcp-claude-channel/server.ts:509. - wheel_smoke.py: pin set_notification_callback as a published name, same regression class as the 0.1.16 main_sync incident. Pollers (wait_for_message / inbox_peek) keep working unchanged for runtimes without notification support. Tests: 6 new in test_inbox.py (callback fires once on record, dedupe short-circuits before fire, raising cb doesn't break inbox, set/clear semantics), 5 new in test_a2a_mcp_server.py (method name pin, content mapping, meta routing, no-id JSON-RPC notification spec, missing- field tolerance). All 59 combined tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 20:10:01 -07:00
Hongming Wang	c4bb803329	feat(mcp_cli): agent_card from env vars (capability discovery) External molecule-mcp runtimes register with hardcoded agent_card.name = molecule-mcp-{id[:8]} and skills=[]. That made every external workspace look identical on the canvas and gave peer agents calling list_peers no signal beyond name — they had to guess capabilities. Three new env vars let the operator declare identity + capabilities without code changes: * MOLECULE_AGENT_NAME — display name on canvas (default unchanged) * MOLECULE_AGENT_DESCRIPTION — one-line description (default empty) * MOLECULE_AGENT_SKILLS — comma-separated skill names Comma-separated skills get expanded to {"name": "..."} objects — the minimum shape that satisfies both shared_runtime.summarize_peers (reads s["name"]) AND canvas SkillsTab.tsx (id falls back to name). Strict-superset behaviour: when no env vars are set, agent_card matches the previous hardcoded value exactly. No regression for operators who haven't migrated. Why this matters end-to-end: * Canvas Skills tab now shows each declared skill as a chip * Peer agents calling list_peers see {name, skills} per peer and can route delegations to the right specialist * Same applies to the canvas Details tab + workspace card hover Tests cover: defaults match prior behaviour; name override; CSV → skill objects; whitespace stripping + empty entries dropped; description omitted when unset (keeps wire payload minimal); whitespace-only name falls back to default; end-to-end through _platform_register's payload. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 18:57:39 -07:00
Hongming Wang	210f6e066a	Merge pull request #2424 from Molecule-AI/fix/in-container-heartbeat-persists-inbound-secret fix(workspace): in-container heartbeat persists platform_inbound_secret	2026-05-01 01:36:52 +00:00
Hongming Wang	d887ce8e96	fix(mcp_cli): escalate consecutive heartbeat 401s with re-onboard guidance The universal molecule-mcp wheel runs in a daemon thread, posting /registry/heartbeat every 20s. When the workspace gets deleted server-side (DELETE /workspaces/:id), the platform revokes all tokens for that workspace. Previous behaviour: heartbeat would 401 forever, log at WARNING per tick, no actionable signal anywhere. Failure mode hit on hongmingwang tenant 2026-04-30: workspace a1771dba was deleted at some prior time, the channel-bridge .env still pointed at it, MCP tools 401-ed silently with the operator having no idea why. The register-time path at mcp_cli.py:104-111 already does loud + actionable for 401 (sys.exit(3) with regenerate- from-canvas-Tokens text) — extend the same pattern to the heartbeat. Behaviour: * count < 3: WARNING per tick (could be transient blip) * count == 3: ERROR with re-onboard instructions, names the dead workspace_id, points at the canvas Tokens tab * count > 3 and every 20 ticks (~7 min): re-log ERROR so a session that started after the first ERROR still catches it 5xx and other non-auth HTTP errors do NOT increment the auth-failure counter — that would mislead the operator (e.g. a server blip would trigger "token revoked" when the token is fine). Tests cover: single 401 stays at WARNING; 3 consecutive 401s escalate to ERROR with the right keywords; 403 treated identically; recovery via 200 resets the counter; 5xx never triggers the auth path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 18:26:35 -07:00
Hongming Wang	98845c8f42	fix(workspace): in-container heartbeat persists platform_inbound_secret Follow-up to PR #2421. The standalone wrapper (mcp_cli.py) got heartbeat-time secret persistence in #2421, but the in-container heartbeat (workspace/heartbeat.py) was missed — and that's the path every workspace EC2 actually runs. Result: hongmingwang Claude Code agent stayed 401-forever on chat upload after this morning's deploy because the workspace's runtime never picked up the lazy-healed secret. The in-container _loop now captures the heartbeat response and calls the same _persist_inbound_secret_from_heartbeat helper used by the standalone path, on both the first POST and the 401-retry POST. Defensive on every error (non-JSON, non-dict, empty, save failure) — liveness contract trumps secret persistence. Tests pin: happy path, absent secret, empty string, non-JSON body, non-dict body, save_inbound_secret OSError, end-to-end loop.	2026-04-30 18:18:10 -07:00
Hongming Wang	c733454a56	Merge pull request #2421 from Molecule-AI/fix/heartbeat-delivers-inbound-secret fix(workspace): deliver platform_inbound_secret on every heartbeat	2026-05-01 00:54:00 +00:00
Hongming Wang	993f8c494e	refactor(workspace-runtime): send_a2a_message takes peer_id, validates UUID Two cleanups stacked on PR #2418: 1. Refactor `send_a2a_message(target_url, msg)` → `send_a2a_message(peer_id, msg)`. After #2418 every caller passes `${PLATFORM_URL}/workspaces/{peer_id}/a2a` — the function's parameter pretended to accept arbitrary URLs but in practice only one shape is meaningful. Owning URL construction inside the function makes the contract honest and centralises the peer-id validation introduced below. 2. Add `_validate_peer_id` UUID-shape check at the trust boundary. `discover_peer` and `send_a2a_message` are the entry points where agent-controlled strings flow into URL paths; rejecting non-UUID input at this layer eliminates the URL-interpolation class of bug (`workspace_id="../admin"` etc.) regardless of how the rest of the codebase interpolates ids elsewhere. Auth was already gating malicious access — this is consistency + clear failure over silent platform 4xx. In-container tests cover positive UUIDs, malformed input (``"ws-abc"``, ``"../admin"``, empty), and the contract that ``tool_delegate_task`` hands the peer_id to ``send_a2a_message`` without building URLs itself. Live-verified: external delegation 8dad3e29 → 97ac32e9 returned "refactor verified" from Claude Code Agent through the refactored code; ``_validate_peer_id`` rejects ``"ws-abc"`` and ``"../admin"`` and accepts canonical UUIDs. Stacked on PR #2418 (proxy-routing fix). Will rebase onto staging once #2418 merges. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 17:43:01 -07:00
Hongming Wang	a5c5139e3a	fix(workspace): deliver platform_inbound_secret on every heartbeat Heartbeat now echoes the workspace's platform_inbound_secret on every beat (mirroring /registry/register), and the molecule-mcp client persists it to /configs/.platform_inbound_secret on receipt. Symptom (2026-04-30, hongmingwang tenant): chat upload returned 503 "workspace will pick it up on its next heartbeat" and then 401 on retry — permanent until workspace restart. The 503 message was a lie: heartbeat used to discard the platform_inbound_secret entirely; only register delivered it, and register fires once at startup. Server (Go): - Heartbeat handler reuses readOrLazyHealInboundSecret (the same helper chat_files + register use), so heartbeat-time recovery covers the rotate / mid-life NULL-column case the existing register-time heal can't reach. - Failure is non-fatal: liveness contract trumps secret delivery, chat_files retries lazy-heal on its own next request. Client (Python): - _persist_inbound_secret_from_heartbeat parses the heartbeat 200 response and persists via platform_inbound_auth.save_inbound_secret. - All exceptions swallowed — heartbeat liveness > secret persistence; next tick (≤20s) retries. Tests: - Server: pin secret-present, lazy-heal-mint-on-NULL, and heal- failure-omits-field branches. - Client: pin persist-on-200, skip-on-empty, skip-on-non-dict-body, skip-on-401, swallow-save-OSError.	2026-04-30 17:36:33 -07:00
Hongming Wang	aefb44aff2	fix(workspace-runtime): route delegate_task through platform A2A proxy tool_delegate_task was POSTing directly to peer["url"], which is the Docker-internal hostname (e.g. http://ws-X-Y:8000) for in- container peers. External callers — the standalone molecule-mcp wrapper running on an operator's laptop — get [Errno 8] nodename nor servname every single delegation, breaking the universal-MCP path's last "ride the same code as in-container" claim. The platform's /workspaces/:peer-id/a2a proxy endpoint already handles internal forwarding for in-container peers AND is the only path external runtimes can use. Unify on it: in-container callers pay one extra HTTP hop on the same Docker bridge (microseconds); external callers get a working delegation path for the first time. discover_peer is still called for access-control + online-status detection — only the routing target changes. Verified live on 2026-04-30 against workspace 8dad3e29 (external mac runtime) → 97ac32e9 (Claude Code Agent in-container): direct POST returned ConnectError, proxy POST returned "acknowledged from claude code agent" as requested. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 17:13:50 -07:00
Hongming Wang	d061642cfc	test(inbox): bind side-effecting pop() before assert CodeQL flagged the bare `assert state.pop(...) is None` — under `python -O` asserts are stripped, which would skip the call entirely and the test would silently pass without exercising the code. Bind the result first so the call always runs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 16:39:45 -07:00
Hongming Wang	b47d4ceb00	feat(workspace-runtime): add inbox polling for standalone molecule-mcp path The universal MCP server (a2a_mcp_server.py) was outbound-only — agents in standalone runtimes (Claude Code, hermes, codex, etc.) could delegate, list peers, and write memories, but never observed the canvas-user or peer-agent messages addressed to them. This blocked "constantly responding" loops without forcing operators back onto a runtime-specific channel plugin. This PR closes the inbound gap with a poller-fed in-memory queue and three new MCP tools: - wait_for_message(timeout_secs?) — block until next message arrives - inbox_peek(limit?) — list pending messages (non-destructive) - inbox_pop(activity_id) — drop a handled message A daemon thread polls /workspaces/:id/activity?type=a2a_receive every 5s, fills the queue from the cursor (since_id), and persists the cursor to ${CONFIGS_DIR}/.mcp_inbox_cursor so a restart doesn't replay backlog. On 410 (cursor pruned) we fall back to since_secs=600 for a bounded recovery window. Activity-row → InboxMessage extraction mirrors the molecule-mcp-claude-channel plugin's extractText (envelope shapes #1-3 + summary fallback). mcp_cli.main starts the poller alongside the existing register + heartbeat threads. In-container runtimes (which have push delivery via canvas WebSocket) skip activation, so inbox tools return an informational "(inbox not enabled)" message instead of double-delivery. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 16:32:48 -07:00
Hongming Wang	b54ceb799f	fix: address 5-axis review findings on PR #2413 Critical: - ExternalConnectModal.tsx: filledUniversalMcp substitution searched for WORKSPACE_AUTH_TOKEN but the snippet's placeholder is now MOLECULE_WORKSPACE_TOKEN (changed in the previous polish commit `876c0bfc`). Operators copy-pasting the MCP tab would have gotten a literal "<paste from create response>" instead of the token. Fix the substitution to match the new placeholder name. Important: - mcp_cli._platform_register: 401/403 from initial register now hard- exits with code 3 + an actionable stderr message pointing the operator at the canvas Tokens tab. Pre-fix: warning log + continue, which made a bad-token startup silently fail (heartbeat 401's forever, every tool call also 401's, no clear surfacing in the operator's MCP client). 500/503 still log + continue (transient platform blips shouldn't abort the MCP loop). - a2a_mcp_server.cli_main docstring: removed stale claim that this is the wheel's console-script entry-point target. The actual target is mcp_cli.main since 2026-04-30. Wheel-smoke pins both names so the functionality was correct, but the doc was lying. Test coverage: 3 new mcp_cli tests: - register 401 exits code=3 + stderr mentions canvas Tokens tab - register 403 (C18 hijack rejection) takes same path - register 500/503 does NOT exit — only auth errors hard-fail Findings deferred to follow-up (acceptable per review rubric): - Code dedup across mcp_cli / heartbeat.py / molecule_agent SDK - Pooled httpx.Client for connection reuse - Heartbeat exponential backoff - Token-resolution ordering parity (env-first vs file-first) between mcp_cli.main and platform_auth.get_token Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 16:06:59 -07:00
Hongming Wang	427300f3a4	feat: make molecule-mcp standalone (built-in register + heartbeat) + recover awaiting_agent on heartbeat Two paired fixes that together let an external operator run a single process (molecule-mcp) and see their workspace come up online in the canvas — the bug surfaced live when status stuck at "awaiting_agent / OFFLINE" despite an active MCP server. Platform side (workspace-server/internal/handlers/registry.go): Heartbeat handler already auto-recovers offline → online and provisioning → online, but NOT awaiting_agent → online. Healthsweep flips stale-heartbeat external workspaces TO awaiting_agent, and with no recovery path the workspace stays "OFFLINE — Restart" in the canvas forever. Add the symmetric branch: if currentStatus == "awaiting_agent" and a heartbeat arrives, flip to online + broadcast WORKSPACE_ONLINE. Mirrors the existing offline/provisioning patterns exactly. Test: TestHeartbeatHandler_AwaitingAgentToOnline asserts the SQL UPDATE fires with the awaiting_agent guard clause. Wheel side (workspace/mcp_cli.py): molecule-mcp was outbound-only — operators had to run a separate SDK process to register + heartbeat. Now mcp_cli.main(): 1. Calls /registry/register at startup (idempotent upsert flips status awaiting_agent → online via the existing register path). 2. Spawns a daemon thread that POSTs /registry/heartbeat every 20s. 20s is comfortably under the healthsweep stale window so a single missed beat doesn't cause status churn. 3. Runs the MCP stdio loop in the foreground. Both calls set Origin: ${PLATFORM_URL} so the SaaS edge WAF accepts them. Threaded heartbeat (not asyncio) chosen because it doesn't need to share an event loop with the MCP stdio server — daemon=True cleanly dies when the operator's runtime exits. MOLECULE_MCP_DISABLE_HEARTBEAT=1 escape hatch lets in-container callers (which have heartbeat.py running already) reuse the entry point without double-heartbeating. Default is enabled. End-to-end verification (live, against hongmingwang.moleculesai.app, workspace 8dad3e29-...): pre-fix: status=awaiting_agent → canvas shows OFFLINE forever post-fix: ran `molecule-mcp` for 5s standalone → canvas state: status=online runtime=external agent=molecule-mcp-8dad3e29 Test coverage: 7 new mcp_cli tests (register-at-startup, heartbeat- thread-spawned, disable-env-skips-both, env-and-file token resolution, register payload shape, heartbeat endpoint + headers); 1 new platform test (awaiting_agent → online recovery). Full workspace + handlers suites green: 1355 Python, full Go handlers passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 15:42:44 -07:00
Hongming Wang	74c5e0d7a8	fix(workspace-runtime): add Origin header so SaaS edge WAF accepts MCP tool calls Discovered while smoke-testing the molecule-mcp external-runtime path against a live tenant (hongmingwang.moleculesai.app). Every tool call that hit /workspaces/* or /registry/*/peers returned 404 — but /registry/register and /registry/heartbeat returned 200. Diagnosis: the tenant's edge WAF requires a same-origin header. Without it, unhandled paths get silently rewritten to the canvas Next.js app, which has no /workspaces or /registry/:id/peers route and returns an empty 404. The molecule-mcp-claude-channel plugin already sets this header (server.ts:271-276); the workspace runtime never did because in-container PLATFORM_URLs (Docker network) aren't behind the WAF. Fix: extend platform_auth.auth_headers() to include Origin: ${PLATFORM_URL} whenever PLATFORM_URL is set. Inside-container behavior is unchanged (the WAF is path-irrelevant for the internal hostnames). External-runtime calls now thread the WAF correctly. Verification (live, against a freshly-registered external workspace): pre-fix: get_workspace_info → "not found", list_peers → 404 post-fix: get_workspace_info → full workspace JSON, list_peers → "Claude Code Agent (ID: 97ac32e9..., status: online)" This is the kind of bug unit tests can never catch — caught only by running the wheel against the real tenant. Memory: feedback_always_run_e2e.md. Test coverage: 4 new tests in test_platform_auth.py — Origin alone when no token + Origin + Authorization both, no-PLATFORM_URL falls through to original empty-dict behavior, env-token path with Origin. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 15:30:15 -07:00
Hongming Wang	169e284d57	feat(workspace-runtime): expose universal MCP server to runtime=external operators Ship the baseline universal MCP path that any external runtime (Claude Code, hermes, codex, anything that speaks MCP stdio) can use, before optimizing per-runtime channels. Today the workspace MCP server only spins up inside the container; external operators have no way to call the 8 platform tools (delegate_task, list_peers, send_message_to_user, commit_memory, etc.) from outside. Three additive changes: 1. `platform_auth.get_token()` env-var fallback — adds `MOLECULE_WORKSPACE_TOKEN` as a fallback when no `${CONFIGS_DIR}/.auth_token` file exists. File-first preserves in-container behavior unchanged. External operators (no /configs volume) now have a way to supply the token without faking the filesystem layout. 2. `molecule-mcp` console script — adds a new entry point in the published `molecule-ai-workspace-runtime` PyPI wheel. Operators run `pip install molecule-ai-workspace-runtime`, set 3 env vars (WORKSPACE_ID, PLATFORM_URL, MOLECULE_WORKSPACE_TOKEN), and register the binary in their agent's MCP config. `mcp_cli.main` is a thin validator wrapper — it checks env BEFORE importing the heavy `a2a_mcp_server` module so a misconfigured first-run gets a friendly 3-line error instead of a 20-line module-level RuntimeError traceback. 3. Wheel smoke gate — extends `scripts/wheel_smoke.py` to assert `cli_main` and `mcp_cli.main` are importable. Same regression class as the 0.1.16 main_sync incident: a silent rename or unrewritten import here would break every external operator on the next wheel publish (memory: feedback_runtime_publish_pipeline_gates.md). Test coverage: - `tests/test_platform_auth.py` — 8 new tests for the env-var fallback: file-priority, env-fallback, whitespace handling, cache, header construction, empty-env-as-unset. - `tests/test_mcp_cli.py` — 8 new tests for the validator: each required var separately, file-or-env satisfies token requirement, whitespace-only env treated as missing, help mentions canvas Tokens tab. - Full `workspace/tests/` suite green: 1346 passed, 1 skipped. - Local end-to-end: built wheel, installed in venv, ran `molecule-mcp` with no env → friendly error; with env → MCP server starts. Why now / why this shape: user redirect was "support the baseline first so all runtimes can use, then optimize". A claude-only MCP channel leaves hermes/codex/third-party operators broken on runtime=external. This PR ships the runtime-agnostic baseline; per- runtime polish (claude-channel push delivery, hermes-native bindings) is a follow-up PR. PR #2412 fixed the partner bug where canvas Restart silently revoked the operator's token — the two together unblock the external-runtime story end-to-end. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 15:20:19 -07:00
Hongming Wang	3b34dfefbc	feat(workspace): surface peer-discovery failure reason instead of "may be isolated" Closes #2397. Today, every empty-peer condition (true empty, 401/403, 404, 5xx, network) collapses to a single message: "No peers available (this workspace may be isolated)". The user has no way to tell whether they need to provision more workspaces (true isolation), restart the workspace (auth), re-register (404), page on-call (5xx), or check network (timeout) — five different operator actions, one ambiguous string. Wire: - new helper get_peers_with_diagnostic() in a2a_client.py returns (peers, error_summary). error_summary is None on 200; a short actionable string on every other branch. - get_peers() now shims through it so non-tool callers (system-prompt formatters) keep the bare-list contract. - tool_list_peers() switches to the diagnostic helper and surfaces the actual reason. The "may be isolated" string is removed; true empty now reads "no peers in the platform registry." Tests: - TestGetPeersWithDiagnostic: 200, 200-empty, 401, 403, 404, 5xx, network exception, 200-but-non-list-body, and the bare-list-shim regression guard. - TestToolListPeers: each diagnostic branch surfaces its reason + explicit assertion that "may be isolated" is gone. Coverage 91.53% (floor 86%). 122 a2a tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 11:09:26 -07:00
Hongming Wang	899a2231d6	test(platform_auth): module-functions signature snapshot drift gate Pin the 5 public functions adapters and the runtime hot-path import through ``from platform_auth import``: - ``auth_headers`` — every outbound httpx call merges this in - ``self_source_headers`` — A2A peer + self-message header builder - ``get_token`` — main.py reads on boot to decide register-vs-resume - ``save_token`` — main.py persists the platform-issued token - ``refresh_cache`` — 401-retry path drops in-process cache (#1877) A grep across workspace/ shows 14+ runtime modules import these: main.py, heartbeat.py, a2a_client.py, a2a_tools.py, consolidation.py, events.py, executor_helpers.py (3 sites), molecule_ai_status.py, builtin_tools/memory.py (3 sites), builtin_tools/temporal_workflow.py (2 sites). Renaming any of the five (e.g. ``auth_headers`` → ``bearer_headers``) makes every one of those imports raise ImportError at workspace boot — the failure surface is deep in heartbeat init, nowhere near the rename site. Same drift class as the BaseAdapter signature snapshot (#2378, #2380), skill_loader gate (#2381), runtime_wedge gate (#2383). Reuses the ``_signature_snapshot.py`` helpers shipped in #2381. Defense-in-depth: ``test_snapshot_has_required_functions`` asserts the five names are still present, so removing one even with a synchronized snapshot edit forces an explicit edit here with a justification. ``clear_cache`` is intentionally NOT in the snapshot — it's a test-only helper. Production code MUST NOT depend on it. Verified red on deliberate rename: ``auth_headers`` → ``bearer_headers`` produces a clean diff of the missing function in the failure message. Restored before commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 08:41:42 -07:00
Hongming Wang	70176e6c8f	test(runtime_wedge): module-functions signature snapshot drift gate BaseAdapter docstring tells adapter authors: > ``runtime_wedge.mark_wedged()`` / ``clear_wedge()`` — flip the > workspace to ``degraded`` + auto-recover when your SDK hits a > non-recoverable error class. Import directly from ``runtime_wedge``; > the heartbeat forwards the state to the platform automatically. That's a contract — adapter templates depend on the four module-level functions (``is_wedged``, ``wedge_reason``, ``mark_wedged``, ``clear_wedge``) being importable by those exact names with those exact signatures. Renaming any silently breaks every adapter that calls them: the import resolves the module fine, the ``AttributeError`` only surfaces when the adapter actually hits its first SDK error — long after the rename merges. Same drift class as #2378 / #2380 / #2381 (BaseAdapter, skill_loader) applied to the module-level function surface. Changes: - tests/_signature_snapshot.py gains build_module_functions_record. Walks a module's public top-level functions, optionally filtered to a specific name list (used here — runtime_wedge has internal helpers like reset_for_test that intentionally aren't part of the contract). Skips re-exports via __module__ check so a `from foo import bar` doesn't pollute the snapshot. - tests/test_runtime_wedge_signature.py snapshots the four contract functions. Plus a defense-in-depth required-functions test that catches removal even when source + snapshot are updated together. Verified: deliberately renaming `mark_wedged` → `mark_wedged_RENAMED` trips the gate with full snapshot diff in the failure message. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 07:01:10 -07:00
Hongming Wang	e336688278	test: extract shared signature-snapshot helpers + skill_loader gate Two changes in one PR (tightly coupled — the second wouldn't make sense without the first): 1. Hoist the inspect-based snapshot helpers out of test_adapter_base_signature.py into tests/_signature_snapshot.py so future surfaces don't copy-paste introspection logic. - build_class_signature_record(cls): walks public methods, unwraps static/class/abstract methods, returns a stable {class, methods: [...]} dict. - build_dataclass_record(cls): walks dataclass fields via dataclasses.fields(), returns {name, frozen, fields: [...]}. - compare_against_snapshot(actual, path): writes-on-first-run + diff-on-drift, with both expected and actual JSON in failure message. test_adapter_base_signature.py is rewritten to use the helpers; the existing snapshot file is byte-identical (no behavior change). 2. New gate: tests/test_skill_loader_signature.py covers the public dataclasses exported from skill_loader/loader.py: - SkillMetadata: every adapter pattern-matches on .runtime for skill-compat filtering. Renaming this field would silently break per-adapter skill loading — the loader still returns objects, but adapters' `if "*" in skill.metadata.runtime` raises AttributeError at workspace boot. - LoadedSkill: returned in SetupResult.loaded_skills. Includes test_snapshot_has_required_skill_metadata_fields defense-in-depth: ensures the runtime / id / name / description fields stay even if both source and snapshot are updated together. Verified: deliberately renaming SkillMetadata.runtime trips the gate with full snapshot diff in the failure message. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 06:27:20 -07:00
Hongming Wang	12e39c7311	test(adapter_base): extend signature snapshot to public dataclasses (#2364 item 2 followup) Follows up #2378. The BaseAdapter snapshot covers method signatures but `adapter_base.py` also exports three public dataclasses that form the call/return contract between the platform and every adapter: - SetupResult — returned by adapter._common_setup() - AdapterConfig — passed into adapter setup hooks - RuntimeCapabilities — returned by adapter.capabilities(); drives platform-side dispatch routing (#117) Renaming a RuntimeCapabilities flag silently disables every adapter's capability declaration (the platform fallback runs) without an AttributeError to surface the breakage. That's exactly the drift class the snapshot pattern is meant to catch. Changes: - _build_dataclass_snapshot walks SetupResult, AdapterConfig, RuntimeCapabilities via dataclasses.fields(), capturing field name + type annotation + has_default per field, plus the @dataclass(frozen=...) flag. - _build_full_snapshot composes method + dataclass records into one stable JSON snapshot. - test_snapshot_has_required_dataclass_fields — defense-in-depth test parallel to test_snapshot_has_required_methods. Catches field removal even when both source AND snapshot are updated together. Required field set is intentionally short (the flags that drive platform dispatch + the adapter-level config knobs). Verified: deliberately renaming `provides_native_heartbeat` → `provides_native_heartbeat_RENAMED` trips test_base_adapter_signature_matches_snapshot with a full diff in the failure message. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 05:53:10 -07:00
Hongming Wang	8488a188c2	test(adapter_base): signature snapshot — drift gate for adapter public surface (#2364 item 2) Every workspace template (langgraph, claude-code, hermes, etc.) subclasses BaseAdapter. Renaming, removing, or re-typing a method on the base class silently breaks templates: the override stops being recognized as an override; the old method-name's caller silently invokes the default no-op; the new method-name is unimplemented in templates that haven't migrated. Recent #87 universal-runtime + #1957 recordResource refactor both renamed/added methods. Without a frozen snapshot, the next rename ships quietly and surfaces only when a template's CI catches the AttributeError days later — long after the merge window for an easy revert. This snapshot pins BaseAdapter's public method surface against a checked-in JSON file. Same-shape pattern as PR #2363's A2A protocol-compat replay gate, applied to a Python public-API surface instead of JSON message shapes. Both close drift classes by snapshotting the structural surface that consumers depend on. Two tests: 1. test_base_adapter_signature_matches_snapshot — full introspection diff against tests/snapshots/adapter_base_signature.json. Drift = test failure with both expected + actual JSON in the message so the reviewer sees what changed. 2. test_snapshot_has_required_methods — defense-in-depth: even if both the source AND snapshot are updated together (intentional API removal), this catches removal of the short list of methods that EVERY template depends on (name, display_name, description, capabilities, memory_filename). Removing one of these requires explicit edit to the `required` set with a justification. Verified the gate fires red on a deliberate rename (memory_filename → memory_filename_RENAMED) — failure message shows the full snapshot diff including parameter shapes and return annotations. Updating the snapshot is the explicit acknowledgment that a template-affecting API change is intentional. Reviewer of the introducing PR sees the snapshot diff and decides whether template repos need coordinated updates. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 05:18:39 -07:00
Hongming Wang	4299475746	feat(prompt): Platform Capabilities preamble at top of system prompt Closes #2332 item 1 (workspace awareness — agents don't surface platform-native tools up front). The dogfooding session surfaced that agents weren't using A2A delegation, persistent memory, or send_message_to_user. The tools were registered AND documented in the system prompt — but only in sections #8 (Inter-Agent Communication) and #9 (Hierarchical Memory), which agents read AFTER they've already started reasoning about a plan from earlier sections. This adds a tight inventory at section #1.5 (immediately after Platform Instructions, before role-specific prompt files) — every tool name + its short description in a bulleted block. Detailed when_to_use docs in sections #8/#9 stay; this preamble is the elevator pitch ("you have these"), the later sections are the manual ("here's when and how"). Generated from `platform_tools.registry` ToolSpecs — every tool's `name` + `short` flow through automatically, no manual sync. A new `get_capabilities_preamble(mcp: bool)` helper in executor_helpers mirrors the existing get_a2a_instructions / get_hma_instructions pattern. CLI-runtime agents (mcp=False) get an empty preamble — they see _A2A_INSTRUCTIONS_CLI's hand-written subcommand vocabulary further down, and the registry's MCP tool names would conflict. Tests: - test_capabilities_preamble_appears_in_mcp_prompt: header present - test_capabilities_preamble_lists_every_registry_tool: every a2a + memory tool from registry shows up (drift catches at test time — adding a new tool to registry surfaces here automatically) - test_capabilities_preamble_precedes_prompt_files: ordering invariant (toolkit before role docs) - test_capabilities_preamble_skipped_for_cli_runtime: empty when mcp=False All 40 prompt + platform_tools tests pass.	2026-04-29 21:31:13 -07:00
Hongming Wang	e955597a98	feat(chat_files): rewrite Download as HTTP-forward (RFC #2312 , PR-D) Mirrors PR-C's Upload migration: replaces the docker-cp tar-stream extraction with a streaming HTTP GET to the workspace's own /internal/file/read endpoint. Closes the SaaS gap for downloads — without this PR, GET /workspaces/:id/chat/download still returns 503 on Railway-hosted SaaS even after A+B+C+F land. Stacks: PR-A #2313 → PR-B #2314 → PR-C #2315 → PR-F #2319 → this PR. Why a single broad /internal/file/read instead of /internal/chat/download: Today's chat_files.go::Download already accepts paths under any of the four allowed roots {/configs, /workspace, /home, /plugins} — it's not strictly chat. Future PRs (template export, etc.) will reuse this endpoint via the same forward pattern; reusing avoids three near- identical handlers (one per domain) with duplicated path-safety logic. Path safety is duplicated on platform + workspace sides — defence in depth via two parallel checks, not "trust the workspace." Changes: * workspace/internal_file_read.py — Starlette handler. Validates path (must be absolute, under allowed roots, no traversal, canonicalises cleanly). lstat (not stat) so a symlink at the path doesn't redirect the read. Streams via FileResponse (no buffering). Mirrors Go's contentDispositionAttachment for Content-Disposition header. * workspace/main.py — registers GET /internal/file/read alongside the POST /internal/chat/uploads/ingest from PR-B. * scripts/build_runtime_package.py — adds internal_file_read to TOP_LEVEL_MODULES so the publish-runtime cascade rewrites its imports correctly. Also includes the PR-B additions (internal_chat_uploads, platform_inbound_auth) since this branch was rooted before PR-B's drift-gate fix; merge-clean alphabetic additions. * workspace-server/internal/handlers/chat_files.go — Download rewritten as streaming HTTP GET forward. Resolves workspace URL + platform_inbound_secret (same shape as Upload), builds GET request with path query param, propagates response headers (Content-Type / Content-Length / Content-Disposition) + body. Drops archive/tar + mime imports (no longer needed). Drops Docker-exec branch entirely — Download is now uniform across self-hosted Docker and SaaS EC2. * workspace-server/internal/handlers/chat_files_test.go — replaces TestChatDownload_DockerUnavailable (stale post-rewrite) with 4 new tests: - TestChatDownload_WorkspaceNotInDB → 404 on missing row - TestChatDownload_NoInboundSecret → 503 on NULL column (with RFC #2312 detail in body) - TestChatDownload_ForwardsToWorkspace_HappyPath → forward shape (auth header, GET method, /internal/file/read path) + headers propagated + body byte-for-byte - TestChatDownload_404FromWorkspacePropagated → 404 from workspace propagates (NOT remapped to 500) Existing TestChatDownload_InvalidPath path-safety tests preserved. * workspace/tests/test_internal_file_read.py — 21 tests covering _validate_path matrix (absolute, allowed roots, traversal, double- slash, exact-match-on-root), 401 on missing/wrong/no-secret-file bearer, 400 on missing path/outside-root/traversal, 404 on missing file, happy-path streaming with correct Content-Type + Content-Disposition, special-char escaping in Content-Disposition, symlink-redirect-rejection (lstat-not-stat protection). Test results: * go test ./internal/handlers/ ./internal/wsauth/ — green * pytest workspace/tests/ — 1292 passed (was 1272 before PR-D) Refs #2312 (parent RFC), #2308 (chat upload+download 503 incident). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 15:19:02 -07:00
Hongming Wang	055e447355	feat(saas): deliver platform_inbound_secret via /registry/register (RFC #2312 , PR-F) Closes the SaaS-side gap that PR-A acknowledged but didn't fix: SaaS workspaces have no persistent /configs volume, so the platform_inbound_secret that PR-A's provisioner wrote at workspace creation never reaches the runtime. Without this, even after the entire RFC #2312 stack lands, SaaS chat upload would 401 (workspace fails-closed when /configs/.platform_inbound_secret is missing). Solution: return the secret in the /registry/register response body on every register call. The runtime extracts it and persists to /configs/.platform_inbound_secret at mode 0600. Idempotent — Docker- mode workspaces also receive it and overwrite the value the provisioner already wrote (same value until rotation). Why on every register, not just first-register: * SaaS containers can be restarted (deploys, drains, EBS detach/ re-attach) — /configs is rebuilt empty on each fresh start. * The auth_token is "issue once" because re-issuing rotates and invalidates the previous one. The inbound secret has no rotation flow yet (#2318) so re-sending the same value is harmless. * Eliminates the bootstrap window where a restarted SaaS workspace has no inbound secret on disk and would 401 every platform call. Changes: * workspace-server/internal/handlers/registry.go — Register handler reads workspaces.platform_inbound_secret via wsauth.ReadPlatformInboundSecret and includes it in the response body. Legacy workspaces (NULL column) get a successful registration with the field omitted. * workspace-server/internal/handlers/registry_test.go — two new tests: - TestRegister_ReturnsPlatformInboundSecret_RFC2312_PRF: secret present in DB → secret in response, alongside auth_token. - TestRegister_NoInboundSecret_OmitsField: NULL column → field omitted, registration still 200. * workspace/platform_inbound_auth.py — adds save_inbound_secret(secret). Atomic write via tmp + os.replace, mode 0600 from os.open(O_CREAT, 0o600) so a concurrent reader never sees 0644-default. Resets the in-process cache after write so the next get_inbound_secret() returns the freshly-written value (rotation-safe when it lands). * workspace/main.py — register-response handler extracts platform_inbound_secret alongside auth_token and persists via save_inbound_secret. Mirrors the existing save_token pattern. * workspace/tests/test_platform_inbound_auth.py — 6 new tests for save_inbound_secret: writes file, mode 0600, overwrite-existing, cache invalidation after save, empty-input no-op, parent-dir creation for fresh installs. Test results: * go test ./internal/handlers/ ./internal/wsauth/ — all green * pytest workspace/tests/ — 1272 passed (was 1266 before this PR) Refs #2312 (parent RFC), #2308 (chat upload 503 incident). Stacks: PR-A #2313 → PR-B #2314 → PR-C #2315 → this PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 15:12:34 -07:00
Hongming Wang	d1de330152	feat(workspace): /internal/chat/uploads/ingest endpoint (RFC #2312 , PR-B) Stacked on PR-A (#2313). The platform-side rewrite that actually calls this endpoint lands in PR-C; this PR adds the workspace-side consumer + hardening so PR-C is a small Go-only diff. What this adds: * platform_inbound_auth.py — auth gate mirroring transcript_auth.py. Reads /configs/.platform_inbound_secret (delivered by the PR-A provisioner). Fail-closed when the file is missing/empty/unreadable. Constant-time compare via hmac.compare_digest. * internal_chat_uploads.py — POST /internal/chat/uploads/ingest. Multipart parse → sanitize each filename → write to /workspace/.molecule/chat-uploads/<random>-<name> with O_CREAT\|O_EXCL\|O_NOFOLLOW. Same response shape (uri/name/mimeType/ size + workspace: URI scheme) as the legacy Go handler — canvas / agent code that resolves "workspace:..." paths keeps working. * Wired into workspace/main.py via starlette_app.add_route alongside the existing /transcript route. * python-multipart>=0.0.18 added to requirements.txt (Starlette's Request.form() needs it; ≥ 0.0.18 closes CVE-2024-53981). Test coverage (36 tests, all green; full workspace suite 1266 passed): * test_platform_inbound_auth.py — 14 tests: happy path, fail-closed on missing file, empty file, whitespace- only file, missing/case-wrong/empty Bearer prefix, in-process cache, default CONFIGS_DIR fallback, end-to-end file → authorized. * test_internal_chat_uploads.py — 22 tests: sanitize_filename matrix (incl. ../traversal, CJK chars, length truncation), 401 on missing/wrong/no-secret-file bearer, single + batch upload happy paths, unique random prefix on duplicate names, mimetype guess fallback, 400 on missing files field, 413 on per- file + total-body oversize, symlink-at-target refusal (with sentinel-content unchanged assertion). Why this is safe to ship before PR-C: * No platform-side caller yet → no behavior change visible to users. * Auth fails closed; nothing on the network can hit a write path until the platform forwards with the matching bearer. * Workspace's existing routes (/health, /transcript, /handle/*) are unchanged. Refs #2312 (parent RFC), #2308 (chat upload 503 incident). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 14:16:32 -07:00
Hongming Wang	59d65ba557	fix: resolve git/gh from PATH instead of hardcoded /usr/local/bin Closes #2289. Some workspace template images ship `/usr/local/bin/{git,gh}` wrappers that bake `GH_TOKEN` into argv handling (preferred — auto-PR creation authenticates without explicit token plumbing); other templates have plain `/usr/bin/git` installed via apt with no wrapper. The hardcoded `_GIT = "/usr/local/bin/git"` crashed every auto-push attempt on the latter image class: FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/bin/git' File "/app/molecule_runtime/executor_helpers.py", line 524, in _auto_push_and_pr_sync subprocess.run(['/usr/local/bin/git', 'rev-parse', '--is-inside-work-tree'], ...) `shutil.which("git")` walks PATH in order — finds the `/usr/local/bin/` wrapper first when it exists, falls back to `/usr/bin/git` otherwise. GH_TOKEN injection still wins on wrapper-equipped images; auto-push no longer crashes on bare-apt images. Verified locally: `shutil.which("git")` resolves to `/usr/bin/git` on the bug-reporter's image; `shutil.which("gh")` resolves to the homebrew path on dev. Both paths exist + are executable on respective hosts. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 02:07:19 -07:00
Hongming Wang	a57382e918	feat(runtime): add new_response_message helper for adapter A2A responses Surfaced via cross-template review of the a2a-sdk v0→v1 migration: every adapter executor (claude-code, gemini-cli, crewai, openclaw, autogen) builds A2A response Messages independently using `new_text_message(text)` from the SDK, which omits `task_id` and `context_id`. The runtime's own canonical pattern in `workspace/a2a_executor.py:466-475` correctly threads both: Message( message_id=uuid.uuid4().hex, role=Role.ROLE_AGENT, parts=_parts, task_id=task_id, # ← canonical context_id=context_id, # ← canonical ) Adapters skipping these correlation fields means the platform's a2a proxy can't reliably tie the response back to the originating task. This is a divergence from canonical, not necessarily a strict bug (task_id may be optional with a default) — but it's enough of a correlation/observability gap that the canonical pattern bothers to thread it. Add `new_response_message(context, text, files=None)` to executor_helpers.py — single home for response Message construction. Templates can migrate from `new_text_message(text)` to this helper in stacked PRs once the runtime publishes to PyPI. The helper: - Reads `context.task_id`/`context.context_id` from the inbound RequestContext, falling back to fresh UUIDs (RequestContextBuilder always sets them in production; fallback is for unit tests). - Sets `role=Role.ROLE_AGENT` (the v1 enum value). - Builds text Parts via `Part(text=...)` and file Parts via `Part(url="workspace:<path>", filename=..., media_type=...)`. - Returns a v1 protobuf Message ready for `event_queue.enqueue_event(...)`. Why "files=None" with the workspace: URI scheme as the file Part shape: matches the canonical pattern in a2a_executor.py exactly so the platform's chat-attachment download path (executor_helpers.py `resolve_attachment_uri`) interprets responses uniformly across all adapters. Tests (5, all pass with --no-cov against the live runtime image): - test_new_response_message_text_only - test_new_response_message_with_files - test_new_response_message_files_only_no_text - test_new_response_message_falls_back_when_context_ids_unset - test_new_response_message_handles_missing_attrs The conftest's a2a stubs needed an extension for Message + Role + Part with kwargs preservation. Strictly additive — no existing tests affected. (The 19 pre-existing failures in test_executor_helpers.py are unrelated debt from the commit_memory/recall_memory rewrite, visible on staging baseline before this change.) Per-template migration is the follow-up: claude-code, gemini-cli, crewai, openclaw, autogen all call `new_text_message(text)` today; each gets a per-repo PR replacing it with `new_response_message(context, text)`. This PR ships the helper first so the templates have something to import. Refs: PR #2266/#2267 (restart-race), claude-code #15 (FilePart fix), gemini-cli #10/crewai #8/openclaw #9/autogen #8 (rename PRs). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 01:13:34 -07:00
hongmingwang-moleculeai	a18d116606	Merge pull request #2261 from Molecule-AI/fix/harness-cleanup-failed-event harness: SaaS routing + provider-agnostic config for RFC #2251 measurement	2026-04-29 05:35:43 +00:00
Hongming Wang	00e4766046	docs: registry pattern + harness scripts READMEs Two docs covering load-bearing patterns from today's work that weren't previously discoverable: 1. workspace/platform_tools/README.md — explains the ToolSpec single-source-of-truth pattern (#2240), the CLI-block alignment gap that hand-maintained generation can't close (#2258), the snapshot golden files + LF-pinning (#2260), and the add/rename/ remove playbook. The next reader who lands in workspace/platform_tools/ now has the design rationale + the safe-edit procedure colocated with the code. 2. scripts/README.md — disambiguates the three measure-coordinator- task-bounds.sh files that now exist across two repos: - scripts/measure-coordinator-task-bounds.sh (canonical OSS, this repo) - scripts/measure-coordinator-task-bounds-runner.sh (Hermes/MiniMax variant, this repo) - scripts/measure-coordinator-task-bounds.sh (production-shape, in molecule-controlplane) Cross-references reference_harness_pair_pattern (auto-memory) for the cross-repo design rationale. Documents the common safety pattern (cleanup trap, DRY_RUN, non-target guard, cleanup_*_failed events) and the heartbeat-trace caveat. Refs: #2240, #2254, #2257, #2258, #2259, #2260; molecule-controlplane#321. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 22:19:40 -07:00
hongmingwang-moleculeai	9bc3d6e352	Potential fix for pull request finding 'Unused global variable' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>	2026-04-28 20:45:53 -07:00
Hongming Wang	ddf6720498	chore(registry): snapshot tests + CLI-block alignment for #2240 Two follow-ups from the #2240 code review: 1. Snapshot tests for the rendered tool-instruction blocks. The structural tests added in #2240 guarantee tool NAMES are present; these new tests pin the SHAPE — bullet ordering, heading style, footer placement — so a future contributor who reorders fields in `_render_section` or rewrites a `when_to_use` paragraph sees the diff in CI rather than shipping a silently-different system prompt. Golden files live under workspace/tests/snapshots/. 2. CLI-block alignment test + corrected source-of-truth comment. `_A2A_INSTRUCTIONS_CLI` is a separate hand-maintained surface for ollama and other non-MCP runtimes — the registry can't auto-generate it because the CLI subprocess interface uses different command shapes (`peers` vs `list_peers`, etc.). A new `_CLI_A2A_COMMAND_KEYWORDS` mapping declares the registry-tool → CLI-keyword correspondence (or explicit `None` for tools not exposed via subprocess). Two tests enforce coverage: - every a2a tool in the registry is keyed in the mapping - every non-None subcommand keyword literally appears in `_A2A_INSTRUCTIONS_CLI` Caught one real gap: `send_message_to_user` is in the registry but has no CLI subcommand. Mapped to `None` with an explanatory comment. The "no other source of truth" claim in registry.py's docstring was wrong post-#2240 (the CLI block survived) — corrected to describe the two surfaces explicitly and point at the alignment tests as the gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 20:42:15 -07:00
Hongming Wang	5fe52b08e7	feat(harness): coordinator phase-boundary instrumentation for RFC #2251 Adds structured `rfc2251_phase=...` log lines at the deterministic phase boundaries inside route_task_to_team and check_task_status, so an operator running scripts/measure-coordinator-task-bounds.sh against staging can correlate the harness's external timing trace with what phase the coordinator was in at any given second. The harness already exists in staging and measures end-to-end response time + heartbeat trace. What it CAN'T do without this PR is answer "the coordinator response took 7 minutes — was it stuck delegating, or stuck polling children, or stuck synthesizing after all children returned?" The phase logs answer that question. Phases instrumented (deterministic Python boundaries, no agent prompt involvement): route_start → enter route_task_to_team children_fetched → after get_children() returns routing_decided → after build_team_routing_payload delegate_invoked → just before delegate_task_async.ainvoke delegate_returned → after delegate_task_async returns check_status → every check_task_status poll (per-poll) route_returning_decision_only → fall-through path Each line includes elapsed_ms from route_start so per-phase durations are extractable via: grep rfc2251_phase= <container.log> \ \| awk '{...}' to compute deltas between consecutive phases The synthesis phase (after all children return, before agent emits final A2A response) is NOT instrumented here because it's agent-driven (no deterministic Python boundary). The harness operator infers synthesis_secs = total_response_secs − max(check_status_ts). This is reproduction-harness scaffolding; it adds zero behavior. Strip the rfc2251_phase log lines when V1.0 ships and the phase data lands in the structured heartbeat payload instead. Refs: - RFC: molecule-core#2251 - Harness: scripts/measure-coordinator-task-bounds.sh (shipped earlier) - V1.0 gate: this is deliverable #2 of the four pre-V1.0 gates	2026-04-28 20:11:46 -07:00
Hongming Wang	f323def18f	chore(build): include platform_tools in runtime wheel SUBPACKAGES The PR-built wheel + import smoke gate refused the platform_tools package because it's a new subdirectory under workspace/ that wasn't in scripts/build_runtime_package.py:SUBPACKAGES. The drift gate (which exists for exactly this reason) caught it cleanly: error: SUBPACKAGES drifted from workspace/ subdirectories: in workspace/ but NOT in SUBPACKAGES (will ship un-rewritten or be excluded): ['platform_tools'] Adding platform_tools to SUBPACKAGES wires the package into the runtime wheel + applies the canonical from platform_tools.<x> -> from molecule_runtime.platform_tools.<x> import-rewrite step that every other subpackage uses. Verified locally: scripts/build_runtime_package.py succeeds, the rewritten a2a_mcp_server.py reads from molecule_runtime.platform_tools.registry import TOOLS which matches the package layout in the wheel.	2026-04-28 17:19:00 -07:00
Hongming Wang	e9a59cda3b	feat(platform): single-source-of-truth tool registry — adapters consume, no drift Establishes workspace/platform_tools/registry.py as THE place tool naming and docs live. Every consumer reads from it; nothing duplicates the source. Closes the architectural gap behind the doc/tool drift discussion 2026-04-28 — adding hundreds of future runtime SDK adapters should not require touching tool names anywhere except the registry. What the registry owns ToolSpec dataclass with: name, short (one-line description), when_to_use (multi-paragraph agent-facing usage guidance), input_schema (JSON Schema), impl (the actual coroutine in a2a_tools.py), section ('a2a' \| 'memory'). TOOLS list with 8 entries — delegate_task, delegate_task_async, check_task_status, list_peers, get_workspace_info, send_message_to_user, commit_memory, recall_memory. What now reads from the registry - workspace/a2a_mcp_server.py The hardcoded TOOLS list (167 lines of hand-maintained dicts) is gone. Replaced with a 6-line list comprehension over the registry. MCP description = spec.short. inputSchema = spec.input_schema. - workspace/executor_helpers.py get_a2a_instructions(mcp=True) and get_hma_instructions() now GENERATE the agent-facing system-prompt text from the registry. Heading + per-tool bullet (spec.short) + per-tool when_to_use + a section-specific footer. No more hand-maintained instruction blocks that drift from reality. - workspace/builtin_tools/delegation.py Renamed delegate_to_workspace -> delegate_task_async to match registry. check_delegation_status -> check_task_status. Added sync delegate_task @tool wrapping a2a_tools.tool_delegate_task (was missing for LangChain runtimes — CP review Issue 3). - workspace/builtin_tools/memory.py Renamed search_memory -> recall_memory to match registry. - workspace/adapter_base.py, workspace/main.py Bundle all 7 core tools (was 6) into all_tools / base_tools. - workspace/coordinator.py, shared_runtime.py, policies/routing.py Updated system-prompt-text references to use the registry names. Structural alignment tests workspace/tests/test_platform_tools.py — 9 tests pin every registry-to-adapter mapping: - registry names are unique - a2a + memory partition is complete (no orphans) - by_name lookup works - MCP server registers exactly the registry's tool set - MCP description equals registry.short for every tool - MCP inputSchema equals registry.input_schema for every tool - get_a2a_instructions text contains every a2a tool name - get_hma_instructions text contains every memory tool name - pre-rename names (delegate_to_workspace, search_memory, check_delegation_status) cannot leak back Adding a future tool means adding one ToolSpec; the test failure list tells the author exactly which adapter to update. Adapter pattern for future SDK support When (e.g.) AutoGen or Pydantic AI gets adapters, the only work needed for tool surfacing is "wrap registry.TOOLS in your SDK's tool format." Names, descriptions, schemas, impl come from the registry — adapter author writes zero strings. Why this needed to ship now PR #2237 (already in staging) injected MCP-world docs as the default system-prompt content. Without the registry, those docs said "delegate_task" while LangChain runtimes only had "delegate_to_workspace" — workers see docs for tools that don't exist (CP review Issue 1+3). PR #2239 was a tactical rename; this PR is the structural fix that prevents the same class of drift from recurring as new adapters ship. PR #2239 was closed in favor of this — same renames, plus the registry, plus structural tests. Single coherent change. Tests: 1232 pass, 2 xfailed (pre-existing). 9 new in test_platform_tools.py; 4 alignment tests in test_prompt.py from #2237 still pass; original test_executor_helpers tests adapted to the registry-driven world. Refs: CP review Issues 1, 2, 3, 5; project memory project_runtime_native_pluggable.md (platform owns A2A); project memory feedback_doc_tool_alignment.md (this is the structural fix for the tactical lesson).	2026-04-28 17:11:36 -07:00
Hongming Wang	3f99fede5a	Merge pull request #2237 from Molecule-AI/fix/inject-a2a-hma-tool-instructions fix(prompt): inject A2A and HMA tool instructions into system prompt	2026-04-28 23:47:15 +00:00
Hongming Wang	448709f4b4	fix(prompt): inject A2A and HMA tool instructions into system prompt Workers were registering platform tools (delegate_task, delegate_task_async, list_peers, check_task_status, send_message_to_user, commit_memory, recall_memory) but the build_system_prompt assembly never included documentation for any of them. The instruction-text functions get_a2a_instructions() and get_hma_instructions() exist in executor_helpers.py and have unit tests, but were not called from any production code path — workers received system-prompt.md content only and saw the tools as bare names with no usage guidance. Symptom: agents called commit_memory and delegate_task without knowing they were platform tools. They worked when the agent guessed the API correctly and silently failed when the agent didn't. Fix: build_system_prompt() now appends both instruction sets between the Skills section and the Peers section. The placement is intentional — A2A docs explain how to call delegate_task; the peer list is the data that delegate_task operates over, so the docs precede the peer table. New parameter `a2a_mcp: bool = True` lets adapters opt into the CLI subprocess variant of the A2A instructions for runtimes without MCP support (ollama, custom CLI runtimes). Default True covers the MCP-capable majority (claude-code, hermes, langchain, crewai). Adapter callers don't need to change unless they specifically need CLI mode. Tests: 4 new regression tests in test_prompt.py pin - A2A MCP variant injection (default) - A2A CLI variant injection (a2a_mcp=False, with MCP-only fields absent) - HMA instruction injection - A2A docs precede peer list ordering Full suite green: 1223 passed, 2 xfailed.	2026-04-28 16:43:36 -07:00
Hongming Wang	0cdbc2c4f6	chore(deps): batch dep bumps — 11 safe upgrades from 2026-04-28 dependabot wave Consolidates 11 of the 17 open Dependabot PRs (#2215, #2217, #2219-#2225, #2227, #2229) into one PR. Every entry is a patch / minor / floor bump where the impact surface is small and CI carries the proof. Same pattern as the 2026-04-15 batch. Go (workspace-server/go.mod + go.sum, regenerated via `go mod tidy`): - golang.org/x/crypto 0.49.0 → 0.50.0 (#2225) - github.com/golang-jwt/jwt/v5 5.2.2 → 5.3.1 (#2222) - github.com/gin-contrib/cors 1.7.2 → 1.7.7 (#2220) - github.com/docker/go-connections 0.6.0 → 0.7.0 (#2223) - github.com/redis/go-redis/v9 9.7.3 → 9.19.0 (#2217) Python floor bumps (workspace/requirements.txt; current pip-resolved versions don't change unless they happen to be below the new floor): - httpx >=0.27 → >=0.28.1 (#2221) - uvicorn >=0.30 → >=0.46 (#2229) - temporalio >=1.7 → >=1.26 (#2227) - websockets >=12 → >=16 (#2224) - opentelemetry-sdk >=1.24 → >=1.41.1 (#2219) GitHub Actions (SHA-pinned per existing convention): - dorny/paths-filter@d1c1ffe (v3) → @fbd0ab8 (v4.0.1) (#2215) REMOVED from this batch (lockfile platform mismatch): - #2231 @types/node ^22 → ^25.6 (npm install on macOS strips Linux-only @emnapi/* entries from package-lock.json that CI's `npm ci` then refuses; needs a Linux-side install to land cleanly) - #2230 jsdom ^25 → ^29.1 (same) NOT included in this batch (deferred to per-PR human review): - #2228 github/codeql-action v3 → v4 (CodeQL CLI alignment risk) - #2218 actions/setup-node v4 → v6 (default Node version drift) - #2216 actions/upload-artifact v4 → v7 (3 major versions) - #2214 actions/setup-python v5 → v6 (action major) NOT merged (CI failing on dependabot's own PR): - #2233 next 15 → 16 - #2232 tailwindcss 3 → 4 - #2226 typescript 5 → 6 Verified: - workspace-server: `go mod tidy && go build ./... && go test ./...` — green - workspace requirements.txt: floor bumps only	2026-04-28 16:25:46 -07:00
Hongming Wang	96acbd719b	test: update test_peer_capabilities_format for fallback behavior The previous assertion `'Silent Agent' not in result` was pinning the buggy behavior — peers without an agent_card were silently dropped from the prompt. With the fallback to DB name+role those peers are correctly visible. Flip the assertion so the test pins the new (correct) rendering and would catch a regression to the silent-drop behavior.	2026-04-28 14:15:42 -07:00
Hongming Wang	8ff0748ab9	fix(workspace): keep peers visible in coordinator prompt when agent_card is null Bug: a Design Director coordinator with 6 freshly-created worker peers rendered an empty `## Your Peers` section in its system prompt — the hosting registry endpoint correctly returned all 6 peers, but `summarize_peer_cards()` silently dropped every entry whose `agent_card` column was null (the default until A2A discovery has run end-to-end against the worker). The coordinator then refused to delegate any task because "no peers exist". Fix: fall back to the registry row's `name` and `role` columns when `agent_card` is missing, malformed, or wrong-typed, instead of skipping the peer. The registry endpoint (`workspace-server/internal/handlers/discovery.go:queryPeerMaps`) has always returned both fields — they were just being thrown away on the consumer side. `build_peer_section()` now renders `Role: …` when the agent_card-derived skill list is empty so the coordinator's prompt still has something concrete to delegate against. Also hoists `import json` out of the per-peer loop body to module level (was previously imported once per iteration). Tests: new `test_shared_runtime_peer_summary.py` pins all four fallback cases (null / malformed string / wrong type / null + no DB name) plus the agent-card-present happy path and the mixed-list case the coordinator actually consumes. First peer-summary test coverage `shared_runtime.py` has had — no prior tests existed. Refs: 2026-04-27 Design Director discovery report from infra team.	2026-04-28 14:10:29 -07:00
Hongming Wang	3eb599bbb6	fix(workspace): use SDK constant for agent-card readiness probe The initial-prompt readiness probe in workspace/main.py hardcoded the pre-1.x well-known path. After the a2a-sdk 1.x bump the SDK started mounting the agent card at the new canonical path (the value of `a2a.utils.constants.AGENT_CARD_WELL_KNOWN_PATH`), so the probe returned 404 every attempt and silently fell through to "server not ready after 30s, skipping". Net effect: every workspace silently dropped its `initial_prompt` from config.yaml — the agent never sent the kickoff self-message, and users hit a fresh chat with no context. Reported by an external user as "/.well-known/agent.json 404 — the a2a-sdk agent card route was not being mounted at the expected path". The route IS mounted; the probe was looking at the wrong place. Fix imports `AGENT_CARD_WELL_KNOWN_PATH` from `a2a.utils.constants` and uses it directly in the probe URL — the SDK constant is now the single source of truth, so any future rename travels through automatically. Adds two static regression tests pinning the invariant: 1. No hardcoded `/.well-known/agent.json` literal anywhere in main.py. 2. The probe URL fstring interpolates AGENT_CARD_WELL_KNOWN_PATH (catches a "fix" that imports the constant for show but reverts to a literal in the actual GET). Verified manually inside ghcr.io/molecule-ai/workspace-template-langgraph that AGENT_CARD_WELL_KNOWN_PATH == '/.well-known/agent-card.json' and that `create_agent_card_routes(card)` mounts at exactly that path — constant + mount are aligned in the runtime image, so the probe will now find the server. Full workspace test suite: 1209 passed, 2 xfailed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 16:43:32 -07:00
Hongming Wang	e87a9c3858	fix(a2a): auto-retry transient transport errors in send_a2a_message Three different intermittent failures observed during a single manual-test session — RemoteProtocolError, ReadTimeout, ConnectError — each surfaced as a "Failed to deliver to <peer>" error chip in the canvas Agent Comms panel even though the next attempt would have succeeded (verified by direct probes from the same source workspace to the same peer). The error message even told the user "Usually a transient network blip — retry once," but it left the retry to a human reading the error message. Auto-retry inside send_a2a_message itself: up to 5 attempts (1 initial + 4 retries) with exponential backoff (1s, 2s, 4s, 8s, 16s-capped), each backoff jittered ±25% to break sync across siblings. Cumulative wall-clock capped at 600s by _DELEGATE_TOTAL_BUDGET_S so a string of 5×300s ReadTimeouts can't make the caller wait 25 minutes — once the deadline elapses, retries stop even if attempts remain. Retry only on transport-layer transients: - ConnectError / ConnectTimeout (peer's listening socket not ready) - RemoteProtocolError (peer closed TCP without writing — observed when a peer's prior in-flight Claude SDK session aborted) - ReadError / WriteError (network blip on Docker bridge) - ReadTimeout (peer wrote no response in 300s) Application-level errors are NOT retried — they're deterministic and retrying just wastes wall-clock: - HTTP 4xx (peer rejected the request format) - JSON parse failures (peer returned garbage) - JSON-RPC error in response body (peer's runtime errored cleanly) - Programmer-bug exceptions (ValueError, etc.) 8 new tests pin the contract: - retry succeeds after 2 RemoteProtocolErrors - retry succeeds after 1 ConnectError - all 5 attempts fail → returns formatted last-error - capped at exactly _DELEGATE_MAX_ATTEMPTS (regression cover for "did someone bump the constant accidentally?") - JSON-RPC error response NOT retried (1 attempt only) - non-httpx exception NOT retried (programmer bugs stay loud) - total budget caps the loop even if attempts remain - backoff schedule grows exponentially with ±25% jitter Refactor: extracted _format_a2a_error() so the success and exhausted paths share one error-formatting routine. _delegate_backoff_seconds() is a pure function so the schedule is unit-testable without monkey- patching asyncio.sleep. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 13:52:01 -07:00
Hongming Wang	81c4c1321c	fix(runtime): use lowercase wire role for v0.3 JSON-RPC compat layer Manual-test failure surfaced what was hidden behind the MCP-path bug: once delegate_task could actually fire, every cross-workspace call came back as JSON-RPC -32600 "Invalid Request" with the underlying pydantic ValidationError: params.message.role Input should be 'agent' or 'user' [type=enum, input_value='ROLE_USER', input_type=str] PR #2184's a2a-sdk 1.x migration sweep over-corrected: it changed every `"role": "user"` literal in JSON-RPC payload construction to `"role": "ROLE_USER"` to match the protobuf enum names of the 1.x native types (a2a.types.Role.ROLE_USER / ROLE_AGENT). That was correct for in-process Message construction (which the SDK serialises before wire transmission) but WRONG for the 8 sites that hand-build JSON-RPC payloads. The workspace's own a2a-sdk runs inbound requests through the v0.3 compat adapter (/usr/local/lib/python3.11/site-packages/a2a/compat/v0_3/) because main.py sets enable_v0_3_compat=True for backwards compatibility, and that adapter validates against the v0.3 Pydantic Role enum (`agent` \| `user` lowercase). The protobuf-style names blow it up. Reverted the 8 wire-payload sites to lowercase: - workspace/a2a_client.py:74 - workspace/a2a_cli.py:74, 111 - workspace/heartbeat.py:378 - workspace/main.py:464, 563 - workspace/builtin_tools/a2a_tools.py:60 - workspace/builtin_tools/delegation.py:272 Native-type usage at workspace/a2a_executor.py:471 (`Role.ROLE_AGENT`) stays — that's an in-process Message construction; the SDK handles wire serialisation correctly. Updated the misleading comment at main.py:255-257 (which said "outbound payloads are now 1.x-shaped (ROLE_USER)") to spell out the actual rule: outbound JSON-RPC wire payloads MUST use v0.3 shape, native types are only for in-process construction. New regression test test_jsonrpc_wire_role_format.py greps the 6 wire-payload-emitting files for any "ROLE_USER" / "ROLE_AGENT" string literal and fails loud — cheapest possible drift detector. Why E2E missed it: the priority-runtimes harness sends a single message canvas → workspace, but the canvas already used lowercase "user" (it never went through the migration sweep). The bug only surfaces on workspace → workspace delegation, which the harness doesn't exercise. Same gap as #131 (extend smoke to call main() against a stub). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 12:40:11 -07:00
Hongming Wang	9c3695df6d	test(runtime): update molecule_ai_status test for renamed error prefix Pre-existing test_set_status_exception_prints_to_stderr asserted on the legacy "molecule-monorepo-status: failed to update" prefix string. The prior commit renamed it to "molecule_ai_status: failed to update" so the printed label matches the canonical module-form invocation (`python3 -m molecule_runtime.molecule_ai_status`) instead of a shell alias that only ever existed in the dev-only base image. Updating the expected substring in lockstep. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 11:48:05 -07:00
Hongming Wang	28fc7a8cbd	fix(runtime): replace remaining /app/ legacy paths in agent prompts + docstrings Comprehensive sweep follow-up to the MCP server path fix. Audited every /app/ reference in the runtime source against the live claude-code template image and confirmed the actual /app/ contents post-#87 are ONLY: __init__.py, adapter.py, claude_sdk_executor.py, requirements.txt — every other workspace module ships in the wheel under site-packages/molecule_runtime/. Two more leaks found: 1. executor_helpers.py:_A2A_INSTRUCTIONS_CLI — inter-agent system prompt for non-MCP runtimes (Ollama, custom) had 5 lines telling the model `python3 /app/a2a_cli.py X`. Models copy these examples verbatim, so every CLI-runtime delegation would fail at the shell layer (no such file). Replaced with `python3 -m molecule_runtime.a2a_cli` form, which works regardless of where the wheel is installed. 2. molecule_ai_status.py docstring — usage examples invoked `python3 /app/molecule_ai_status.py` and claimed a `molecule-monorepo-status` shell alias. Both broken in current templates: the file's at site-packages, and `which molecule-monorepo-status` errors (the legacy symlink only existed in the dev-only workspace/Dockerfile base image, not in the standalone template Dockerfiles that ship to production). Updated docstring + the __main__ usage banner + the stderr error prefix to use the same `python3 -m molecule_runtime.X` form. Plugins audited and clean: WORKSPACE_PLUGINS_DIR=/configs/plugins, SHARED_PLUGINS_DIR=$PLUGINS_DIR fallback /plugins. No /app/ assumptions. Regression test: `test_a2a_cli_instructions_use_module_invocation_not_legacy_app_path` asserts the legacy /app/a2a_cli.py path can't drift back into the CLI system prompt and that the canonical module form is present. The legacy workspace/Dockerfile + workspace/entrypoint.sh + workspace/scripts/ still contain /app/-shaped paths but are dev-only base-image scaffolding (per workspace/build-all.sh's own header comment) — not shipped to the standalone template images. Out of scope here; can be cleaned up in a separate dead-code pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 11:22:00 -07:00
Hongming Wang	203a4f0f91	fix(runtime): resolve a2a_mcp_server.py path from wheel install location DEFAULT_MCP_SERVER_PATH was hardcoded to /app/a2a_mcp_server.py, which was correct under the pre-#87 monolithic-template Docker layout where the workspace/ tree was COPY'd into /app/. After the universal-runtime refactor (#87, #117), workspace modules ship inside the molecule-ai-workspace-runtime wheel under site-packages/molecule_runtime/, while /app/ now holds only template-specific files (adapter.py + the runtime-native executor for that template). Net effect: in every workspace built since the wheel cutover, Claude Code SDK's mcp_servers={"a2a": {"command": python, "args": ["/app/a2a_mcp_server.py"]}} pointed at a missing file. The subprocess launch failed silently, the SDK registered zero MCP tools, and the agent's list_peers / delegate_task / a2a_send_message / a2a_send_signal all disappeared. Symptom observed today: Design Director said "I tried to reach the perf auditor via the inter-agent MCP tools (list_peers, delegate_task) but those tools didn't resolve in this environment" and fell back to running the audit itself with WebFetch. Why this slipped through E2E: the priority-runtimes harness sends a single message and verifies a reply — it does not exercise inter-agent delegation, so the missing MCP tools are invisible at that layer. Fix: resolve the path relative to executor_helpers.py via __file__, which tracks wherever the wheel is installed (site-packages today, anywhere else tomorrow). The A2A_MCP_SERVER_PATH env override is preserved for tests / non-default layouts. Regression test: assert os.path.exists(DEFAULT_MCP_SERVER_PATH) so any future move of a2a_mcp_server.py out of the package directory fails at unit-test time instead of silently disabling delegation in production. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 11:15:06 -07:00
Hongming Wang	dd57a840b6	fix: comprehensive a2a-sdk 1.x migration sweep across workspace/ Audited every a2a-sdk surface in workspace/ against the installed 1.0.2 wheel. Found and fixed: main.py (the live workspace startup path): • create_jsonrpc_routes(rpc_url='/', enable_v0_3_compat=True) — rpc_url required in 1.x; v0.3 compat enables inbound legacy clients (`"role": "user"` lowercase) without forcing them to upgrade. Pairs with the outbound rename below. a2a_executor.py: • TextPart/FilePart/FileWithUri removed in 1.x. Part is now a flat proto message: Part(text=…) / Part(url=…, filename=…, media_type=…). Updated the file-attachment branch (only reachable when an agent emits files; the harness's PONG path didn't exercise this, but it's a latent crash). • Message field names: messageId/taskId/contextId → message_id/task_id/context_id (proto3 snake_case). • Role enum: Role.agent → Role.ROLE_AGENT (proto enum). Outbound JSON-RPC payloads (8 files): • "role": "user" → "role": "ROLE_USER" — proto3 JSON serialization is strict about enum values. Sites: a2a_client, a2a_cli, main (initial+idle prompts), heartbeat, builtin_tools/a2a_tools, builtin_tools/delegation. Wire JSON keys stay camelCase (proto3 default), only the role enum value changed. google-adk/adapter.py: • new_agent_text_message → new_text_message (4 sites). This adapter's directory has a hyphen, so it can't be imported as a Python module — effectively dead code, but the wheel ships the file and a future fix should keep it correct against 1.x. Why one PR instead of seven: every previous a2a-sdk migration find landed as its own publish → cascade → harness → next-bug cycle. Today's audit ran every a2a-sdk symbol/type/method in workspace/ against the installed 1.0.2 wheel in a single sweep + tested the critical paths (Message construction, Part construction, Role enum parsing) against the actual SDK. Should be the last migration PR. Verified locally: python3 scripts/build_runtime_package.py --version 0.1.99 \ --out /tmp/build-final pip install /tmp/build-final python -c "import molecule_runtime.main; \ from molecule_runtime.a2a_executor import LangGraphA2AExecutor" → ✓ all imports clean against a2a-sdk 1.0.2 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 09:42:57 -07:00
Hongming Wang	c80b3ff0eb	fix: pass rpc_url='/' to create_jsonrpc_routes (a2a-sdk 1.x requirement) 7th a2a-sdk migration find from the v0 → v1 transition. create_jsonrpc_routes() now requires rpc_url as a positional arg (was implicit at root in 0.x). Pass '/' to match a2a.utils.constants.DEFAULT_RPC_URL — that's also what workspace-server's a2a_proxy.go forwards to (POSTs to workspace URL without appending a path). Symptom before fix: every workspace startup crashed with TypeError: create_jsonrpc_routes() missing 1 required positional argument: 'rpc_url' Caught by harness 9 phase 4 (claude-code + langgraph both on 0.1.24). The user's "use langgraph for fast iteration" call cut the diagnose cycle from 15min to ~30s — without that, this would have taken another hermes round-trip to surface. Updated reference_a2a_sdk_v0_to_v1_migration.md memory with this entry alongside the previous 6 finds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 09:33:23 -07:00
Hongming Wang	6859099a08	fix: pass agent_card to DefaultRequestHandler (a2a-sdk 1.x requirement) a2a-sdk 1.x added agent_card as a required argument to DefaultRequestHandler.__init__. main.py constructed it with only agent_executor + task_store, so every workspace startup that reached the handler init step crashed with: TypeError: DefaultRequestHandlerV2.__init__() missing 1 required positional argument: 'agent_card' This is the 6th a2a-sdk migration find from the v0 → v1 transition (see reference_a2a_sdk_v0_to_v1_migration memory). Pattern is the same: SDK exposes a new required arg, our call site needs to pass the existing object we already construct upstream. Why the import-only smoke gates didn't catch this: it's a call-time constructor error inside `async def main()`, not a module load error. The runtime-pin-compat smoke imports main_sync but doesn't invoke main() against a real config. Worth filing a follow-up to extend the smoke to a "construct + dispose" cycle. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 08:53:47 -07:00
Hongming Wang	851fd21fb1	fix(workspace): rename supported_protocols → supported_interfaces (a2a-sdk 1.0) CRITICAL: every workspace boot since the a2a-sdk 1.0 migration (#1974) has been crashing at AgentCard construction with: ValueError: Protocol message AgentCard has no "supported_protocols" field The protobuf field is `supported_interfaces` (plural, interfaces — see a2a-sdk types/a2a_pb2.pyi:189). The 0.3→1.0 migration left the kwarg as `supported_protocols`, which doesn't exist in the 1.0 schema, so the constructor raises before any subsequent line of main runs. Why this hid for so long: - publish-runtime.yml's smoke step only IMPORTED molecule_runtime.main; importing the module is fine, only CONSTRUCTING the AgentCard fails - The user-visible symptom is "Workspace failed: " with empty last_sample_error, indistinguishable from generic boot timeouts - The state_transition_history=True bug (fixed in #2179) was a sibling of this — same migration, same class, just caught first Fix is symmetric with #2179: 1. workspace/main.py: rename the kwarg + comment explaining why 2. .github/workflows/publish-runtime.yml: extend the smoke block to instantiate AgentCard with the exact production call shape, so the next field-rename of this class fails at publish time instead of breaking every workspace startup Verification: - Constructed AgentCard against fresh a2a-sdk 1.0.2 in a clean venv with the corrected kwarg → succeeds - Constructed it with the original `supported_protocols` kwarg → fails immediately with the exact error production sees - Smoke test pinned to mirror main.py's exact call shape; main.py + smoke must stay in lockstep going forward Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 07:54:23 -07:00
Hongming Wang	12d446bc8e	docs: explain why state_transition_history is gone (research-backed) Adds a comment block citing a2a-sdk's own a2a/compat/v0_3/conversions.py, which says verbatim: state_transition_history=None, # No longer supported in v1.0 So a future reader who notices the missing kwarg won't try to add it back. The capability is now universal: every v1.x Task carries a history list and tasks/get supports historyLength via the apply_history_length helper. No flag because nothing's optional. Confirmed by reading the SDK source directly: - a2a/types.py AgentCapabilities exposes only: streaming, push_notifications, extensions, extended_agent_card. - a2a/compat/v0_3/conversions.py explicitly maps None when down-converting v1 → v0.3 (deliberate removal, not rename). - a2a/server/request_handlers/default_request_handler_v2.py uses apply_history_length(task, params) — agent doesn't opt in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 07:20:05 -07:00
Hongming Wang	f531fe1367	fix: drop state_transition_history field — removed in a2a-sdk 1.x a2a-sdk 1.x's AgentCapabilities only exposes 4 fields: streaming, push_notifications, extensions, extended_agent_card. The state_transition_history field was removed in the v1 protobuf schema. main.py still passed it as a kwarg, so every workspace that reached the AgentCard construction step (line 188) crashed: ValueError: Protocol message AgentCapabilities has no "state_transition_history" field Symptom: every claude-code + hermes workspace stuck in `provisioning` forever — caught when the user provisioned a Design Director crew manually via the canvas while harness 5 was running. Why every prior smoke gate missed it: - runtime-pin-compat.yml smokes `from molecule_runtime.main import main_sync` — only imports the module. AgentCapabilities() runs inside `async def main()`, not at module load. - Template image boot smoke does `import every /app/*.py` — same story. main.py imports fine; the field error only fires at call. The fix is one line — drop the kwarg. Fields we actually need (streaming + push_notifications) are still passed. Follow-up worth filing: smoke step that instantiates Adapter() + calls a no-op setup() against a stub config. That would have caught this before publish. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 07:16:16 -07:00
Hongming Wang	5b05d663ee	test: update a2a.helpers mock to export new_text_message The conftest mock only exposed `new_agent_text_message`, the pre-v1 name. After fixing a2a_executor.py to use the v1 name `new_text_message`, the mock didn't satisfy the import → CI red. Mock both names (aliased to the same lambda) so any in-flight test that still references the old name keeps working until the next sweep removes those references. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 05:34:28 -07:00
Hongming Wang	722e1fd175	fix(a2a_executor): migrate to a2a-sdk 1.x API — new_agent_text_message → new_text_message a2a-sdk v1 renamed `new_agent_text_message` → `new_text_message` (role=Role.agent is now the default). Same fix landed in the hermes template earlier today; this is the runtime-side equivalent. NOT dead code: a2a_executor.py is the LangGraph A2A executor, used by the langgraph + deepagents templates. Both templates currently import it via bare `from a2a_executor import LangGraphA2AExecutor` — which is a separate bug in those templates, filed/fixed separately. Symptom in a2a_executor.py form: any langgraph or deepagents workspace that calls create_executor crashes with `ImportError: cannot import name 'new_agent_text_message' from 'a2a.helpers'`. Doesn't surface for claude-code or hermes (their templates use their own executors and don't load a2a_executor). Five call sites updated, one import line, one comment. Test suite already passes against the new symbol — `python -c "from molecule_runtime.a2a_executor import LangGraphA2AExecutor"` resolves cleanly after this change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 05:29:59 -07:00
Hongming Wang	3df5867b56	fix: restore main_sync entry point in workspace/main.py The wheel's pyproject.toml has declared `molecule-runtime = "molecule_runtime.main:main_sync"` since the publish pipeline was created on 2026-04-26, but the function itself was never present in workspace/main.py — it lived in the pre-monorepo molecule-ai-workspace-runtime repo and was lost during the consolidation that made workspace/ the source of truth. The 0.1.15 wheel still had main_sync from a leftover snapshot, so the regression went unnoticed until 0.1.16 (the first wheel built from the new source-of-truth) shipped. Symptom: every workspace container restart loops with ImportError: cannot import name 'main_sync' from 'molecule_runtime.main' — the molecule-runtime CLI script's first line tries to import the missing symbol. Workspaces stay in `provisioning` until the 10-min sweep marks them failed. Caught by .github/workflows/runtime-pin-compat.yml, which already imports the symbol by name as its smoke test. (That check kept failing red on every recent merge_group run; this PR fixes the underlying symbol-not-found instead of the smoke step.) Also strengthens publish-runtime.yml's wheel smoke from `import molecule_runtime.main` (loads the module — passes even when entry-point target is missing) to `from molecule_runtime.main import main_sync` (the actual contract the CLI script needs). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 03:35:49 -07:00
Hongming Wang	d19d35f6b3	test(skills): make watcher test fakes accept current_runtime kwarg The runtime-compat change in this branch added a `current_runtime` kwarg to load_skills(); the watcher passes it through. Test mocks that pre-date the kwarg signature broke with TypeError, which the watcher's reload-error try/except swallowed — the symptom was empty callback lists, not a clear failure. Switching the fakes to accept **kwargs keeps them forward-compat for future load_skills additions without another test churn. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 02:04:26 -07:00
Hongming Wang	d0057912d2	feat(skills): per-skill runtime compatibility (#119 , hermes pattern) SKILL.md frontmatter can now declare `runtime: [claude-code]` or `runtime: [hermes, claude-code]` to opt out of incompatible adapters instead of failing at first invocation. Default `[""]` means universal — existing skill libraries need zero migration. Borrowed from hermes' declarative skill-compat pattern surfaced in the hermes architecture survey. The remaining two patterns (event-log layer, observability config block) stay open under #119. Wiring: - SkillMetadata.runtime: list[str] = [""] - _normalize_runtime_field accepts list, string-sugar, missing -> [""]; malformed warns and falls back to universal so a typo never silently drops a skill. - load_skills(..., current_runtime=...) filters out skills whose runtime list lacks "" or current_runtime, with an INFO log line. - BaseAdapter.start passes type(self).name() so the live adapter drives the filter; SkillsWatcher takes the same kwarg so hot-reload honors it. 8 new tests cover default universal, no-field universal, explicit match/mismatch, string sugar, wildcard short-circuit, current_runtime=None (preserves old behavior), and malformed-warns-not-drops. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 01:57:43 -07:00
Hongming Wang	e99f937630	Merge pull request #2157 from Molecule-AI/chore/drop-cli-executor-from-runtime chore(workspace): drop cli_executor — Phase 3 of #87 [DRAFT]	2026-04-27 08:24:30 +00:00
Hongming Wang	98ca5c50fa	chore(workspace): drop cli_executor — Phase 3 of #87 (DRAFT, blocked on gemini-cli image rebuild) DRAFT — do NOT merge until gemini-cli template image rebuilds with its local cli_executor.py copy (template PR #9 just merged at 07:59 UTC; image build kicks off now). Final adapter-specific deletion from molecule-runtime, completing #87 for the priority adapters (claude-code via PR #2156, plus gemini-cli via this PR + template #9). Deletes: - workspace/cli_executor.py (461 LOC) — CLIAgentExecutor + the RUNTIME_PRESETS dict for codex / ollama / gemini-cli. The file moved to molecule-ai-workspace-template-gemini-cli (PR #9, merged). - workspace/tests/test_agent_base_urls.py — only consumer of CLIAgentExecutor in the test suite. Tests for the executor behavior live in the template repo now. Updates: - workspace/tests/test_executor_helpers.py — docstring refresh: executor_helpers.py is the runtime-agnostic shared helpers; the executor classes themselves live in template repos post-#87. Codex / ollama presets disappear naturally with the file. They never had template repos, so no production path could invoke them anyway — this is dead-code removal as a side effect of the move. Verified-safe-to-delete: - heartbeat.py: doesn't import cli_executor - claude_sdk_executor.py: deleted by PR #2156 (in flight) - preflight.py: only references runtime names by string; no import - main.py: doesn't import cli_executor (uses adapter discovery via ADAPTER_MODULE; the template's adapter constructs the executor) - Only test_agent_base_urls.py + test_executor_helpers.py docstring referenced cli_executor Verification: - 1249/1249 workspace pytest pass (was 1251; -2 = test_agent_base_urls.py cases — exact match) - No live import of cli_executor anywhere in molecule-core after deletion (grep verified) Sequencing: 1. ✅ Template PR #9 (gemini-cli local copy) — MERGED 2. ⏳ Template image rebuild — running 3. THIS PR — wait until image is published, then mark ready-for-review Closes #87 for the priority adapters: workspace/ is now adapter- agnostic except for adapter discovery (ADAPTER_MODULE) + the runtime_wedge primitive. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 01:22:39 -07:00
Hongming Wang	7504aba934	feat(tools): tighten send_message_to_user description to forbid pasting URLs in body Root-cause fix for #118 (chat attachments rendering as plain text links instead of download chips). User flagged with screenshot 2026-04-26 showing the Design Director agent pasting https://files.catbox.moe/… in the message body — chat rendered the URL as plain markdown text, unclickable in the canvas's bubble layout, and unreachable in any SaaS deployment where the user's browser can't egress to catbox. The structured `attachments` field already exists, the canvas's AttachmentChip already renders well, the WebSocket broadcast already carries attachments verbatim — the missing piece was the LLM choosing the body over the structured field. Tighten the tool description so it trains the right behavior. Three targeted strengthenings: 1. Top-level tool description: enumerated use case (4) now reads "via the `attachments` field (NEVER paste file URLs in `message`)". The all-caps NEVER + the explicit field name move the LLM toward the structured path on first read. 2. `message` param: adds an explicit DO NOT rule with rationale. Includes the SaaS-reachability reason so operators can grep for "SaaS" and find this design constraint instead of re-discovering it after a tenant complaint. Calls out catbox.moe + file:// by name as concrete examples of forbidden hosts (those are the two we've seen in production). 3. `attachments` param: leads with REQUIRED, lists the bad alternatives explicitly (pasting URLs, base64-encoding, telling user to look at a path). LLMs handle "use X, NOT Y" framings better than "use X" alone — observed during prompt-engineering iteration on hermes' tool descriptions. Tests pin all three load-bearing phrases (4 new in test_a2a_mcp_server.py) so a future doc edit that softens or drops them fails CI. Brittle by design — these are prompt-engineering invariants, not implementation details. This is the root-cause fix. A defensive canvas-side backstop (auto- detect download-shaped URLs in body and convert to chips) is a follow-up that could land separately if the steering proves insufficient in practice. Verification: - 1190/1190 workspace pytest pass - 4 new test_a2a_mcp_server.py cases all green Closes the steering half of #118. The structured-attachments-only contract was already enforced server-side (PR #2130 added per-attachment validation); this PR closes the prompt-side gap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 01:13:11 -07:00
Hongming Wang	4e6030d783	Merge pull request #2156 from Molecule-AI/chore/drop-claude-sdk-executor-from-runtime chore(workspace): drop claude_sdk_executor — Phase 2 of #87	2026-04-27 08:02:51 +00:00
Hongming Wang	4b5ac2ebc2	chore(workspace): drop claude_sdk_executor — Phase 2 of #87 Phase 2 of the universal-runtime refactor (task #87). Now that the claude-code template repo ships its own claude_sdk_executor.py (template PR #13 merged + image rebuilt at 07:36 UTC) the molecule-runtime no longer needs to ship the file. Deletes: - workspace/claude_sdk_executor.py (704 LOC) - workspace/tests/test_claude_sdk_executor.py (~1.6K LOC) Updates: - workspace/runtime_wedge.py — drops the "Compatibility shim" docstring section. The shim was time-bounded ("removed once #87 Phase 2 lands"); this is that PR. - workspace/tests/test_runtime_wedge.py — drops the TestClaudeSdkExecutorReExportShim test class (the shim doesn't exist anymore so the identity assertions would fail at import). - workspace/tests/conftest.py — drops the claude_agent_sdk stub. Its only consumer was test_claude_sdk_executor.py which is gone; no other test imports the SDK. - workspace/cli_executor.py — comment refresh: claude-code template repo (not workspace/) is now the home for ClaudeSDKExecutor. Verified-safe-to-delete: - heartbeat.py: migrated to runtime_wedge in PR #2154 (no longer imports from claude_sdk_executor) - cli_executor.py: only comments referenced claude_sdk_executor; its line-117 ValueError defends against accidental routing - tests: only test_claude_sdk_executor.py + test_runtime_wedge.py's shim class consumed the deleted module; both removed in this PR Verification: - 1182/1182 workspace pytest pass (was 1251; -69 = exactly the deleted test cases — zero unexpected regressions) - No live import of claude_sdk_executor anywhere in molecule-core after deletion (grep verified) Closes #87 for the claude-code adapter. Hermes is already template-only. The remaining adapter-specific code in workspace/ is cli_executor.py (codex/ollama/gemini-cli) tracked by task #122. preflight.py's SUPPORTED_RUNTIMES static list is tracked by task #123 (PR #2155 in flight). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 00:52:55 -07:00
Hongming Wang	7dba700ac3	feat(preflight): replace SUPPORTED_RUNTIMES static list with adapter discovery Closes task #123 — last piece of #87 cleanup. Pre-fix: workspace/preflight.py:11 hardcoded a tuple of "supported" runtime names (claude-code, codex, ollama, langgraph, etc.). Every new template repo required a code change in molecule-runtime to be recognized — direct violation of the universal-runtime principle (#87) where adapters declare themselves and the runtime stays generic. Post-fix: discovery-based validation via the same ADAPTER_MODULE env var that production load paths already consult (workspace/adapters/__init__.py:get_adapter). Distinguished failure modes so operator messages are concrete: - ADAPTER_MODULE unset → "no adapter installed; set the env var" - ADAPTER_MODULE set but module won't import → import error type + message - module imports but no Adapter class → "convention violation, add `Adapter = YourClass`" - Adapter.name() raises → caught with operator message - Adapter.name() returns non-string → contract violation message - Adapter.name() doesn't match config.runtime → drift WARNING (not fatal; the adapter wins in production, config.yaml is just documentation) The drift case is the one behavioral change worth calling out: the prior static-list path would have hard-failed config.runtime values not in the allowlist. With discovery, an unknown runtime in config.yaml is just a documentation drift — the adapter that's actually installed runs regardless. Operator gets a warning naming both the configured and installed names so they can fix whichever is stale. Tests: - Replaces the obsolete "static list pass/fail" tests with 6 new cases covering each distinguished failure mode, plus a positive test for the adapter-matches-config happy path - Adds an autouse `_default_langgraph_adapter` fixture that pre-installs a fake adapter via sys.modules monkey-patching, so existing tests building default WorkspaceConfig (runtime="langgraph") inherit a valid adapter without each test setting ADAPTER_MODULE - Failure-mode tests opt out of the default fixture via @pytest.mark.no_default_adapter (registered in pytest.ini) - Sentinel pattern (`_UNSET = object()`) for `name_returns` so None is a passable test value (otherwise `is not None` would skip the None branch — exact bug the sentinel avoids) Verification: - 22/22 preflight tests pass (was 16; +6 new failure-path tests) - 1256/1256 workspace pytest pass (was 1251; +5 net) - No production code path other than preflight changed Source: 2026-04-27 #87 cleanup audit after PR #2154 (wedge extraction). This change is independent of the cli_executor.py template moves (task #122) — completes one of the two remaining cleanup items. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 00:44:51 -07:00
Hongming Wang	5e049244d6	refactor(wedge): mark re-exports explicit via __all__ Addresses github-code-quality unused-import flag on the runtime_wedge re-export shim. Adds __all__ listing the names that exist purely for backwards-compat (is_wedged / wedge_reason / _reset_sdk_wedge_for_test) so static analysis recognizes the imports as deliberate exports. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 00:20:23 -07:00
Hongming Wang	feb544938b	refactor(wedge): address review feedback — class wrap + import-path doc + dedupe shim rationale Three changes from /code-review-and-quality on PR #2154: 1. Optional (architecture): wrap state in a private _WedgeState class instead of bare module-level globals. Public API (mark_wedged / clear_wedge / is_wedged / wedge_reason / reset_for_test) is unchanged — adapters never see the class. The class is forward-cover for any future per-scope variant (multiple executors per process, a keyed registry, etc.) without churning the call sites. Today there's exactly one instance (_DEFAULT) so behavior is identical. 2. Optional (readability): clarify the import path in the integration recipe — in a TEMPLATE repo it's `from molecule_runtime.runtime_wedge` (PyPI package); in molecule-core itself it's `from runtime_wedge` (top-level module). Removes the trap where a contributor reading the docstring while editing in-repo copies the template-style import and gets ImportError. 3. Nit (readability): dedupe the shim rationale. claude_sdk_executor's re-export comment now points to runtime_wedge's "Compatibility shim" section as the source of truth instead of restating the same content. Avoids docs-in-two-places drift risk. Verification: - 1251/1251 workspace pytest pass (no behavior change — class wrap is pure plumbing; module-level helpers delegate to the singleton) - All shim re-export identity tests still pass (the shim's `is_wedged is runtime_wedge.is_wedged` assertion holds because we re-export the SAME function object that delegates to _DEFAULT) No new tests needed — the existing test suite covers the public API contract; the class is an implementation detail behind that contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 00:16:33 -07:00
Hongming Wang	cd899c969f	docs(wedge): integration recipe for adapters that want to flip-to-degraded Doc-only follow-up to the wedge-state extraction. Adds proactive guidance so the next adapter (hermes / codex / langgraph / a future template) discovers the runtime_wedge primitive and integrates the ~6 LOC pattern uniformly instead of inventing its own wedge state. Two additions: - workspace/runtime_wedge.py — new "How to use from a NEW adapter" section in the module docstring with the minimum viable integration recipe, what-you-get-for-free list, and explicit DON'TS (don't store local wedge state, don't mark for transient errors, don't write your own clear logic). Plus a "when wedge is the WRONG primitive" note to keep adopters from over-using it. - workspace/adapter_base.py — adds runtime_wedge to the "Cross-cutting capabilities your adapter can opt into" list in BaseAdapter's docstring (alongside capabilities() and idle_timeout_override()). Discoverability path: adapter author reads BaseAdapter docstring → sees runtime_wedge mention → reads runtime_wedge module docstring → has the recipe. Also tightens the "to add a new agent infra" steps in BaseAdapter to match the actual current model (standalone template repo + ADAPTER_MODULE env var) rather than the obsolete workspace/adapters/<infra>/ layout that hasn't been the path since the universal-runtime extraction started. Zero code change. Tests untouched (1251/1251 still pass). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 00:12:14 -07:00
Hongming Wang	1d231ed295	refactor(wedge): extract claude_sdk_executor wedge state into runtime_wedge module Prerequisite for the universal-runtime refactor (task #87) to move claude_sdk_executor.py out of molecule-runtime into the claude-code template repo. heartbeat.py had a hard import: from claude_sdk_executor import is_wedged, wedge_reason which would break the moment the executor moves out of the runtime package — the heartbeat would lose access to the wedge state used to flip workspace status to degraded. Extract the wedge state to a runtime-side module that the heartbeat can keep importing regardless of which adapter executor is wedged: - workspace/runtime_wedge.py — single-flag state + mark_wedged / clear_wedge / is_wedged / wedge_reason / reset_for_test. Same semantics as the original claude_sdk_executor implementation (sticky first-write-wins, auto-clear on observed success). 100 LOC of pure stateless helpers; lock-free ok because there's one executor per workspace process today. - workspace/claude_sdk_executor.py — drops the in-file definitions; re-exports the same names from runtime_wedge as a backwards-compat shim. Any third-party adapter that imported is_wedged / wedge_reason / _mark_sdk_wedged from claude_sdk_executor keeps working for one release cycle while they migrate to runtime_wedge. - workspace/heartbeat.py — _runtime_state_payload() now imports from runtime_wedge instead of claude_sdk_executor. Lazy-import pattern preserved; the docstring updated to explain the new cross-cutting source-of-truth. Tests (10 new in test_runtime_wedge.py): - Default state (unwedged), mark sets flag, first-write-wins, clear restores healthy, clear-when-not-wedged is no-op, re-marking after clear is allowed - Re-export shim: each old name in claude_sdk_executor IS the runtime_wedge function (identity check), state is shared (marking via the executor shim is observable via runtime_wedge and vice versa) Verification: - 1251/1251 workspace pytest pass (was 1241 after orphan deletion; +10 = exactly the new test_runtime_wedge.py cases) - All existing test_claude_sdk_executor.py cases (which call _mark_sdk_wedged via the shim) still pass After this lands + the claude-code template image rebuilds with the local claude_sdk_executor.py copy (template PR #13), the molecule- core deletion of workspace/claude_sdk_executor.py becomes safe (the shim deletion comes alongside the file deletion, since runtime_wedge is the new public API). See project memory `project_runtime_native_pluggable.md`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 00:08:53 -07:00
Hongming Wang	fa8deb9d16	chore(workspace): delete orphan HermesA2AExecutor (dead code, 1.8K LOC) Removes: - workspace/hermes_executor.py (545 LOC) — HermesA2AExecutor, an OpenAI-compat direct-call executor that was the original hermes integration before the template was rewritten to bridge to hermes-agent's sidecar API server. - workspace/tests/test_hermes_executor.py (1307 LOC) — its test file. Verified-dead-code analysis: - Zero `from hermes_executor` / `import hermes_executor` imports anywhere in workspace/, workspace-server/, or workspace-configs-templates/ (excluding the file itself + its test). - The hermes template (workspace-configs-templates/hermes/executor.py) uses HermesAgentProxyExecutor, NOT HermesA2AExecutor — they're independent implementations. The executor.py file imports from `executor` (local), not from molecule_runtime. - Last touched in PR #1974 (2026 a2a-sdk migration to 1.0.0) for SDK compatibility — kept compiling but never wired into any code path. - Older than that, only the 2026 open-source restructure rename. Why now: starting task #87 (universal-runtime violation, move adapter- specific code out of workspace/). Dead-code deletion is the safest first step and motivates the broader refactor by clearing the landscape — no risk of someone defending HermesA2AExecutor as "actually used somewhere." Verification: - 1241/1241 workspace pytest pass (was 1312; the 71 dropped tests are exactly test_hermes_executor.py's coverage) - No new failures, no broken imports anywhere The remaining adapter-specific executors in workspace/ that #87 will eventually relocate (per the user's scope: claude-code + hermes priority, others later): - workspace/claude_sdk_executor.py (757 LOC) → claude-code template repo - workspace/cli_executor.py (461 LOC) → defer (codex/ollama/etc still use the runtime presets here; comes back later when those bump versions) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 23:52:10 -07:00
Hongming Wang	af664e3e87	feat(tools): borrow hermes-style discipline — error/summary caps + sharper MCP descriptions Three small wins from the hermes-agent design survey, bundled because each is too small for its own PR but they all improve the priority adapters (claude-code + hermes) immediately. 1. Hermes-style cap on telemetry fields, applied INSIDE report_activity so every caller benefits without remembering. error_detail capped at 4096 (hermes' value); summary capped at 256 (one-liner ceiling). The existing call site in tool_delegate_task already truncated error_detail at 4096, but moving the cap into the helper closes the door on a future caller pasting a giant traceback. response_text is NOT capped (it's the agent's user-visible reply; truncating would silently drop content). Pinned by 4 new tests including a negative-pin that response_text MUST stay untruncated. 2. Sharper MCP tool descriptions for commit_memory + recall_memory — hermes' delegate_task description literally says "WAIT for the response" and delegate_task_async says "Returns immediately." LLMs pick the right tool variant from descriptions; ambiguity costs accuracy. - commit_memory now states it APPENDS (each call creates a row, no overwrite) and that GLOBAL requires tier 0. - recall_memory now states it's case-insensitive substring search with no pagination, returns all matches, and that empty-query is cheap and safer than a narrow keyword. 3. (no code change) Filed task #120 for the bigger user-flow win — a per-workspace tool enable/disable menu in Canvas Config — and task #121 for model-string passthrough (depends on #87 universal-runtime refactor). Verification: - 1312/1312 Python pytest pass (was 1308, +4 new) See task #119 for the architectural follow-ups (event-log layer, declarative skill compat, observability config block) and project memory `project_runtime_native_pluggable.md`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 23:25:54 -07:00
Hongming Wang	aa70727ab9	fix(test): drop unused MagicMock import in test_heartbeat_runtime_metadata Reviewer bot flagged: import was leftover from earlier scaffolding — all test fixtures use sys.modules monkey-patching with SimpleNamespace instead. Drop to unblock merge. Tests still 5/5 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 22:58:21 -07:00
Hongming Wang	0d3058585b	feat(runtime): adapter-declared idle_timeout_override end-to-end Capability primitive #2 (task #117). The first cross-cutting capability where the adapter actually displaces platform behavior — claude-code's streaming session can legitimately go silent for 8+ minutes during synthesis + slow tool calls; the platform's hardcoded 5min idle timer in a2a_proxy.go cancels it mid-flight (the bug PR #2128 patched at the env-var layer). This PR fixes it at the right layer: the adapter declares "I need 600s" and the platform's dispatch path honors it. Wire shape (Python → Go): POST /registry/heartbeat { "workspace_id": "...", ... "runtime_metadata": { "capabilities": {"heartbeat": false, "scheduler": false, ...}, "idle_timeout_seconds": 600 // optional, omitted = use default } } Default behavior preserved: any adapter that doesn't override BaseAdapter.idle_timeout_override() (returns None by default) sends no idle_timeout_seconds field; the Go side falls through to idleTimeoutDuration (env A2A_IDLE_TIMEOUT_SECONDS, default 5min). Existing langgraph / crewai / deepagents workspaces are unaffected. Components: Python: - adapter_base.py: idle_timeout_override() method on BaseAdapter returning None (the platform-default sentinel). - heartbeat.py: _runtime_metadata_payload() lazy-imports the active adapter and assembles the capability + override block. Try/except swallows ANY error so heartbeat never breaks because of capability discovery — observability outranks capability accuracy. Go: - models.HeartbeatPayload.RuntimeMetadata (pointer so absent = "old runtime, didn't say"; explicit zero-cap = "new runtime, declared no native ownership"). - handlers.runtimeOverrides: in-memory sync.Map cache keyed by workspaceID. Populated by the heartbeat handler, consulted on every dispatchA2A. Reset on platform restart (worst-case 30s of platform-default behavior — acceptable; nothing about overrides is correctness-critical). - a2a_proxy.dispatchA2A: looks up the override before applyIdle Timeout; falls through to global default when absent. Tests: Python (17, all new): - RuntimeCapabilities dataclass shape (frozen, defaults, wire keys) - BaseAdapter.capabilities() default + override + sibling isolation - idle_timeout_override default, positive override, dropped-override - Heartbeat metadata producer: default adapter emits all-False, native adapter emits flag + override, missing ADAPTER_MODULE returns {} (graceful), zero/negative override is omitted from wire, exception inside adapter swallowed Go (6, all new): - SetIdleTimeout + IdleTimeout round-trip - Zero/negative duration clears the override - Empty workspace_id ignored - Replacement (heartbeat overwrites prior value) - Reset clears entire cache - Concurrent reads + writes (sync.Map invariant) Verification: - 1308 / 1308 workspace pytest pass (was 1300, +8) - All Go handlers tests pass (6 new + existing) - go vet clean See project memory `project_runtime_native_pluggable.md` for the architecture principle this implements. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 22:38:01 -07:00
Hongming Wang	205a454c09	feat(runtime): RuntimeCapabilities dataclass + BaseAdapter.capabilities() Foundation primitive for the native+pluggable runtime principle (task #117, blocks #87). Lets each adapter declare which cross-cutting capabilities it owns natively (heartbeat, scheduler, durable session, status mgmt, retry, activity decoration, channel dispatch) versus delegates to the platform's fallback implementation. Pure additive: every existing adapter inherits BaseAdapter.capabilities() which returns RuntimeCapabilities() — every flag False — so today's "platform owns everything" behavior is preserved exactly. Subsequent PRs land platform-side consumers (idle-timeout override, scheduler skip, status-transition hook, etc.) one capability at a time. Why a frozen dataclass instead of class attributes: capabilities are declared at class-load time and read by the platform on every heartbeat. A mutable value would let a runtime change capabilities mid-flight, creating impossible-to-debug state where the platform's idea of who- owns-heartbeat drifts from the adapter's actual code. Why a `to_dict()` with explicit short keys: the Go side will read these from the heartbeat payload by string key. The dict's wire names are pinned independently of Python field names so a Python-side rename doesn't silently break the Go consumer (test pins this). Tests (9 new): - is a frozen dataclass (mutation rejected) - all 7 default flags are False (load-bearing — flipping any default silently moves ownership for langgraph/crewai/deepagents) - to_dict() keys are stable wire names (Go contract) - BaseAdapter.capabilities() default returns all-False - subclass override mechanism works - sibling adapters' defaults aren't affected by an override Verification: - 1300/1300 workspace pytest pass (was 1291, +9) - Zero behavior change for any existing code path See project memory `project_runtime_native_pluggable.md`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 22:17:49 -07:00
Hongming Wang	6eaacf175b	fix(notify): review-flagged Critical + Required findings on PR #2130 Two Critical bugs caught in code review of the agent→user attachments PR: 1. Empty-URI attachments slipped past validation. Gin's go-playground/validator does NOT iterate slice elements without `dive` — verified zero `dive` usage anywhere in workspace-server — so the inner `binding:"required"` tags on NotifyAttachment.URI/Name were never enforced. `attachments: [{"uri":"","name":""}]` would pass validation, broadcast empty-URI chips that render blank in canvas, AND persist them in activity_logs for every page reload to re-render. Added explicit per-element validation in Notify (returns 400 with `attachment[i]: uri and name are required`) plus defence-in-depth in the canvas filter (rejects empty strings, not just non-strings). 3-case regression test pins the rejection. 2. Hardcoded application/octet-stream stripped real mime types. `_upload_chat_files` always passed octet-stream as the multipart Content-Type. chat_files.go:Upload reads `fh.Header.Get("Content-Type")` FIRST and only falls back to extension-sniffing when the header is empty, so every agent-attached file lost its real type forever — broke the canvas's MIME-based icon/preview logic. Now sniff via `mimetypes.guess_type(path)` and only fall back to octet-stream when sniffing returns None. Plus three Required nits: - `sqlmockArgMatcher` was misleading — the closure always returned true after capture, identical to `sqlmock.AnyArg()` semantics, but named like a custom matcher. Renamed to `sqlmockCaptureArg(*string)` so the intent (capture for post-call inspection, not validate via driver-callback) is unambiguous. - Test asserted notify call by `await_args_list[1]` index — fragile to any future _upload_chat_files refactor that adds a pre-flight POST. Now filter call list by URL suffix `/notify` and assert exactly one match. - Added `TestNotify_RejectsAttachmentWithEmptyURIOrName` (3 cases) covering empty-uri, empty-name, both-empty so the Critical fix stays defended. Deferred to follow-up: - ORDER BY tiebreaker for same-millisecond notifies — pre-existing risk, not regression. - Streaming multipart upload — bounded by the platform's 50MB total cap so RAM ceiling is fixed; switch to streaming if cap rises. - Symlink rejection — agent UID can already read whatever its filesystem perms allow via the shell tool; rejecting symlinks doesn't materially shrink the attack surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 19:47:31 -07:00
Hongming Wang	d028fe19ff	feat(notify): agent → user file attachments via send_message_to_user Closes the gap where the Director would say "ZIP is ready at /tmp/foo.zip" in plain text instead of attaching a download chip — the runtime literally had no API for outbound file attachments. The canvas + platform's chat-uploads infrastructure already supported the inbound (user → agent) direction (commit `94d9331c`); this PR wires the outbound side. End-to-end shape: agent: send_message_to_user("Done!", attachments=["/tmp/build.zip"]) ↓ runtime POST /workspaces/<self>/chat/uploads (multipart) ↓ platform /workspace/.molecule/chat-uploads/<uuid>-build.zip → returns {uri: workspace:/...build.zip, name, mimeType, size} ↓ runtime POST /workspaces/<self>/notify {message: "Done!", attachments: [{uri, name, mimeType, size}]} ↓ platform Broadcasts AGENT_MESSAGE with attachments + persists to activity_logs with response_body = {result: "Done!", parts: [{kind:file, file:{...}}]} ↓ canvas WS push: canvas-events.ts adds attachments to agentMessages queue Reload: ChatTab.loadMessagesFromDB → extractFilesFromTask sees parts[] Either path → ChatTab renders download chip via existing path Files changed: workspace-server/internal/handlers/activity.go - NotifyAttachment struct {URI, Name, MimeType, Size} - Notify body accepts attachments[], broadcasts in payload, persists as response_body.parts[].kind="file" canvas/src/store/canvas-events.ts - AGENT_MESSAGE handler reads payload.attachments, type-validates each entry, attaches to agentMessages queue - Skips empty events (was: skipped only when content empty) workspace/a2a_tools.py - tool_send_message_to_user(message, attachments=[paths]) - New _upload_chat_files helper: opens each path, multipart POSTs to /chat/uploads, returns the platform's metadata - Fail-fast on missing file / upload error — never sends a notify with a half-rendered attachment chip workspace/a2a_mcp_server.py - inputSchema declares attachments param so claude-code SDK surfaces it to the model - Defensive filter on the dispatch path (drops non-string entries if the model sends a malformed payload) Tests: - 4 new Python: success path, missing file, upload 5xx, no-attach backwards compat - 1 new Go: Notify-with-attachments persists parts[] in response_body so chat reload reconstructs the chip Why /tmp paths work even though they're outside the canvas's allowed roots: the runtime tool reads the bytes locally and re-uploads through /chat/uploads, which lands the file under /workspace (an allowed root). The agent can specify any readable path. Does NOT include: agent → agent file transfer. Different design problem (cross-workspace download auth: peer would need a credential to call sender's /chat/download). Tracked as a follow-up under task #114. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 19:35:58 -07:00
Hongming Wang	5071454074	fix(delegation): lazy-refresh QUEUED state from platform; live DELEGATION_* events Critical follow-up to PR #2126's review. Two real bugs: 1. Runtime QUEUED never resolved. Platform's drain stitch updates the platform's delegate_result row when a queued delegation finally completes, but never pushes back to the runtime. The LLM polling check_delegation_status saw status="queued" forever — combined with the new docstring guidance ("queued → wait, peer will reply"), the model would wait indefinitely on a state that never resolves. Strictly worse than pre-PR behavior where it would have at least bypassed. 2. Live updates dead code. delegation.go writes activity rows by direct INSERT INTO activity_logs, bypassing the LogActivity helper that fires ACTIVITY_LOGGED. Adding "delegation" to the canvas's ACTIVITY_LOGGED filter (PR #2126 first cut) was inert — initial GET worked, live updates did not. Fix: (1) Runtime side, workspace/builtin_tools/delegation.py: - New `_refresh_queued_from_platform(task_id)` async helper that pulls /workspaces/<self>/delegations and finds the platform-side delegate_result row for our task_id. - check_delegation_status calls _refresh when local status is QUEUED, so the LLM's poll itself drives state convergence. - Best-effort: GET failure leaves local state untouched, next poll retries. - Docstring updated to reflect the actual behavior ("polls transparently — keep polling and you'll see the flip"). - 4 new tests cover: QUEUED → completed via refresh; QUEUED → failed via refresh; refresh keeps QUEUED when platform hasn't resolved; refresh swallows network errors safely. (2) Canvas side, AgentCommsPanel.tsx WS push handler: - Listens for DELEGATION_SENT / DELEGATION_STATUS / DELEGATION_COMPLETE / DELEGATION_FAILED in addition to ACTIVITY_LOGGED. - Each event's payload synthesized into an ActivityEntry shape so toCommMessage's existing delegation branch maps it. Status derived: STATUS uses payload.status, COMPLETE → "completed", FAILED → "failed", SENT → "pending". - The ACTIVITY_LOGGED branch keeps the "delegation" type accepted as a no-op-today / future-proof path: if delegation handlers are ever refactored to call LogActivity, this lights up automatically without another canvas change. Doesn't change: the docstring guidance ("queued → wait, don't bypass") is now actually load-bearing because the refresh path will deliver the eventual outcome. Without the refresh, the guidance was a trap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 16:05:04 -07:00
Hongming Wang	057876cb0c	fix(delegation): runtime handles 202+queued; canvas surfaces delegation rows Two bugs that compounded into the "Director does the work itself" UX: 1. workspace/builtin_tools/delegation.py: _execute_delegation only handled HTTP 200 in the response branch. When the peer's a2a-proxy returned HTTP 202 + {queued: true} (single-SDK-session bottleneck on the peer), the loop fell through. Two iterations later the `if "error" in result` check tried to access an unbound `result`, the goroutine ended quietly, and the delegation stayed at FAILED with error="None". The LLM checking status saw "failed" + the platform's "Delegation queued — target at capacity" log line in chat context, concluded the peer was permanently unavailable, and bypassed delegation to do the work itself. Fix: explicit 202+queued branch. Adds DelegationStatus.QUEUED, marks the local delegation as QUEUED, mirrors to the platform, and returns cleanly without retrying. The retry loop is for transient transport errors — queueing is a real ack, not a failure to retry against (retrying would just re-queue the same task). check_delegation_status docstring extended with explicit per-status guidance: pending/in_progress → wait, queued → wait (peer busy on prior task, reply WILL arrive), completed → use result, failed → real error in error field; only fall back on failed, never queued. 2. canvas/src/components/tabs/chat/AgentCommsPanel.tsx: filter dropped every delegation row because it whitelisted only a2a_send / a2a_receive. activity_type='delegation' rows (written by the platform's /delegate handler with method='delegate' or 'delegate_result') never reached toCommMessage. User saw "No agent-to-agent communications yet" while 6+ delegations existed in the DB. Fix: include "delegation" in the both the initial filter and the WS push filter, plus a delegation branch in toCommMessage that maps the row as outbound (always — platform proxies on our behalf) and uses summary as the primary text source. Tests: - 3 new Python tests cover the 202+queued path: status becomes QUEUED not FAILED; no retry on queued (counted by URL match against the A2A target since the mock is shared across all AsyncClient calls); bare 202 without {queued:true} still falls through to the existing retry-then-FAILED path. - 3 new TS tests cover the delegation mapper: 'delegate' row maps as outbound to target with summary text; queued 'delegate_result' preserves status='queued' (load-bearing for the LLM's wait-vs-bypass decision); missing target_id returns null instead of rendering a ghost. Does NOT solve: the underlying single-SDK-session bottleneck that causes peers to queue in the first place. Tracked as task #102 (parallel SDK sessions per workspace) — real architectural work. This PR makes the runtime handle the queueing correctly so the LLM doesn't bail out, and makes the delegations visible in Agent Comms so operators can see what's happening. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 15:01:50 -07:00
Hongming Wang	09bfd9bdce	fix(tests): hoist _executor_mod alias so async wedge tests pass under --cov The Copilot Auto-fix in `5a8f42b4` addressed the duplicate-import lint by removing 'import claude_sdk_executor as _executor_mod' entirely, but the async wedge tests (test_execute_marks_wedge_, test_execute_clears_wedge_) still call _executor_mod._reset_sdk_wedge_for_test() etc. — so they failed with NameError once that line was removed. Restore the alias, but at the top of the file (alongside the other module- level imports) rather than at line 1248. The late-file binding was the proximate cause of the original CI failure: with --cov enabled (#1817), sys.settrace + the @pytest.mark.asyncio wrapper combination caused the late module-level binding to not be visible from inside the async test bodies, even though the binding existed at module-load time. Hoisting fixes that scope-resolution issue. Verified locally with the exact CI config (--cov-fail-under=86): 1280 passed, 2 xfailed — total coverage 90.25% 🤖 Generated with [Claude Code](https://claude.com/claude-code)	2026-04-26 10:57:21 -07:00
Hongming Wang	5a8f42b405	Potential fix for pull request finding 'Module is imported with 'import' and 'import from'' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>	2026-04-26 10:45:37 -07:00
Hongming Wang	d0f198b24f	merge: resolve staging conflicts (a2a_proxy + workspace_crud) Three files conflicted with staging changes that landed while this PR sat open. Resolved each by combining both intents (not picking one side): - a2a_proxy.go: keep the branch's idle-timeout signature (workspaceID parameter + comment) AND apply staging's #1483 SSRF defense-in-depth check at the top of dispatchA2A. Type-assert h.broadcaster (now an EventEmitter interface per staging) back to Broadcaster for applyIdleTimeout's SubscribeSSE call; falls through to no-op when the assertion fails (test-mock case). - a2a_proxy_test.go: keep both new test suites — branch's TestApplyIdleTimeout_ (3 cases for the idle-timeout helper) AND staging's TestDispatchA2A_RejectsUnsafeURL (#1483 regression). Updated the staging test's dispatchA2A call to pass the workspaceID arg introduced by the branch's signature change. - workspace_crud.go: combine both Delete-cleanup intents: * Branch's cleanupCtx detachment (WithoutCancel + 30s) so canvas hang-up doesn't cancel mid-Docker-call (the container-leak fix) * Branch's stopAndRemove helper that skips RemoveVolume when Stop fails (orphan sweeper handles) * Staging's #1843 stopErrs aggregation so Stop failures bubble up as 500 to the client (the EC2 orphan-instance prevention) Both concerns satisfied: cleanup runs to completion past canvas hangup AND failed Stop calls surface to caller. Build clean, all platform tests pass. 🤖 Generated with [Claude Code](https://claude.com/claude-code)	2026-04-26 10:43:22 -07:00
Hongming Wang	fc2720c1fe	fix(git-token-helper): close TOCTOU window + stop swallowing chmod errors (closes #1552 ) The token-cache helper had three #1552 findings, all in the mode-600-after-the-fact pattern: 1. _write_cache writes .tmp with default umask (typically 022 → 644 on disk) and then chmod 600's after the mv. A concurrent reader in that microsecond-wide window sees the token at mode 644. 2. Each chmod was swallowed via `\|\| true` — if it ever fails, the tokens stay world-readable with no operator signal. 3. _refresh_gh's gh_token_file write has the same shape and same two issues. Hardening: - Wrap the .tmp creates in a `umask 077` block so the files are 600 from creation. Restore the previous umask before return so callers aren't perturbed. - Replace `chmod ... 2>/dev/null \|\| true` with `if ! chmod ...; then echo WARN ...; fi`. A chmod failure is a real signal worth grep'ing. - Apply the same pattern to the _refresh_gh gh_token_file path. `local` is illegal in a top-level case branch, so use a uniquely- named global (_gh_prev_umask) and unset it after. Verified `bash -n` clean and `shellcheck` clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 08:22:29 -07:00

1 2 3 4

194 Commits