Commit Graph

252 Commits

Author SHA1 Message Date
Hongming Wang
103ac09aeb docs(a2a-mcp): list new envelope attrs in initialize instructions
The agent learns about <channel> tag attributes ONLY from the
instructions string returned by initialize. Without this update the
wheel ships peer_name / peer_role / agent_card_url on the wire but
no agent ever uses them — they get printed inline in the push tag,
the agent doesn't know they're there, and the UX gain from the
enrichment is lost.

Update _build_channel_instructions to:
- List the new attrs in the <channel> tag template under PUSH PATH
- Add per-attribute semantics (when present, what to do with them,
  what \"absent\" means — graceful-degrade vs bug)
- Point at the discover endpoint for agent_card_url so the agent
  treats it as a follow-on URL not the body of the message

Tests: structural pin asserting all three attr names appear in the
instructions AND the per-field semantics phrases (\"registry
resolved\", \"discover endpoint\") so a future copy-edit that
shortens the prose can't silently drop the agent guidance.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 17:49:40 -07:00
Hongming Wang
59f0a449bd feat(smoke): consult runtime_wedge after execute() to catch SDK init wedges
Timeout-as-PASS in run_executor_smoke missed the PR-25-class
regression: claude-agent-sdk takes 60s to time out on a malformed
argv, our outer wait_for fires at 5s default and reports "imports
healthy, hit a network boundary." A broken image then ships to GHCR.

Universal fix uses the existing runtime_wedge module (already
documented as the cross-cutting wedge holder, already read by
heartbeat). Adapters opt-in by calling runtime_wedge.mark_wedged()
from their executor's wedge catch arm; the smoke now consults
runtime_wedge.is_wedged() at the end of every result path and
upgrades a provisional PASS to FAIL when the flag is set. Non-opt-in
adapters keep working as before — the check is additive.

CI uses MOLECULE_SMOKE_TIMEOUT_SECS=90 to outlast the SDK's 60s
initialize() handshake so the wedge marks before our outer wait_for
fires. Module + helper docstrings call out the calibration so a
future contributor doesn't lower it without thinking through what
that wins back vs. what it loses.

Tests: 7 new cases pinning the wedge-aware paths — mark+raise (PR-25
shape), mark+block (still-running execute that wait_for cuts short),
clean+clean (additive contract), import-resilience (fail-open when
runtime_wedge unimportable). Regression-injection-checked: silencing
the new check fails both wedge-shape tests at unit-test time.
2026-05-01 17:46:43 -07:00
Hongming Wang
0fec3d6fe4 fix(test): anchor envelope-enrichment TTL test to monotonic baseline
Setting fetched_at = 0.0 assumed wall-clock semantics, but
time.monotonic() returns process uptime — when this test ran
early in the pytest run, current was <300s and the entry was
treated as fresh, silently skipping the re-fetch the assertion
expects. Anchor to time.monotonic() - TTL - 60 so the entry is
unambiguously past the freshness window regardless of when
in the run the test fires.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 17:45:05 -07:00
Hongming Wang
050aa33fc1 feat(a2a-mcp): enrich channel envelope with peer name/role/agent_card_url
The bare envelope only carried `peer_id` for peer_agent inbound, so a
receiving agent had to round-trip to /registry to find out who's
talking. Surface the sender's display name, role, and an agent-card
URL alongside the routing fields so the agent can render
"ops-agent (sre): ping" in one shot without an extra lookup.

a2a_client.py:
- Add _peer_metadata cache `dict[peer_id → (fetched_at, record)]`
- Add enrich_peer_metadata(peer_id) — sync, hits cache or registry
  with a tight 2s timeout, returns None on validation/network/non-200
  so callers can degrade gracefully
- TTL = 5 min so a busy multi-peer chat doesn't hit registry on every
  push, but role/name renames propagate within a session
- Add _agent_card_url_for(peer_id) — deterministic from peer_id alone

a2a_mcp_server.py:
- _build_channel_notification calls enrich_peer_metadata when peer_id
  is non-empty; meta carries peer_name + peer_role + agent_card_url
  alongside the existing routing fields
- agent_card_url surfaces unconditionally (constructable from peer_id);
  peer_name/role only when registry lookup succeeds — never blocks the
  push on a registry stall

Tests: 6 new branches (canvas_user no enrichment / cache hit no GET /
cache miss fetches once / registry-fail graceful degrade / TTL expiry
re-fetches / invalid peer_id skips lookup). Mutation-verified: 6/6
fail without prod code, 39/39 pass with.

Tracks the broader RFC at #2469 (workspace-server activity_type rename
to break the echo loop). Independent of PR #2470 — this is the
metadata-enrichment half of the same UX improvement.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 17:40:09 -07:00
Hongming Wang
2d8c45989a fix(inbox): skip self-notify rows in poller to break echo loop
The workspace-server's `/notify` handler writes the agent's own
send_message_to_user POSTs to activity_logs as activity_type=
'a2a_receive', method='notify', source_id=NULL so the canvas
chat-history loader can restore those bubbles after a page reload.
The activity API exposes the row to /workspaces/:id/activity?
type=a2a_receive, so the inbox poller picks it up and pushes the
agent's own outbound back as an inbound `← molecule: Agent
message: ...` — confirmed live 2026-05-01.

Add `_is_self_notify_row` predicate matched on (method='notify' AND
no source_id) and call it from `_poll_once` before enqueue. The
predicate combines BOTH discriminators so a future caller using
method='notify' with a real peer_id still passes through. Cursor
advances past skipped rows so we don't re-poll the same self-notify
on every iteration.

Belt-and-braces: long-term fix lives in workspace-server (rename
the misclassified activity_type to 'agent_outbound' — RFC at
#2469). This guard stays regardless because it only excludes rows
we never want.

Tests: 7 new — predicate true/false matrix + integrated _poll_once
behavior (skip, cursor advance, notification suppression).
Mutation-verified: reverting inbox.py to the prior shape fails 7/7;
applied state passes 48/48.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 17:35:49 -07:00
Hongming Wang
f96bb9f860 docs(mcp): tagged server:NAME form in dev-channels reference
Claude Code 2.1.x's --dangerously-load-development-channels takes an
allowlist of tagged entries (`server:<name>` or
`plugin:<name>@<marketplace>`), not a bare switch. The instructions
field's push-only-mode message and the inline comment in
`_poll_timeout_secs` both referenced the old bare form. Update both
so an agent or operator reading them lands on the right invocation —
matched against the docs change in [molecule-docs PR #110](https://github.com/Molecule-AI/docs/pull/110).

No behavior change (string-only edits in instructions text + comment).
33/33 tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 17:11:03 -07:00
Hongming Wang
f80e054a95
Merge pull request #2466 from Molecule-AI/feat/universal-push-via-instructions
feat(mcp): universal inbound delivery — instructions-driven polling + optional push
2026-05-01 23:13:05 +00:00
Hongming Wang
c61a6ff9bd chore(mcp): drop unused module-level _CHANNEL_INSTRUCTIONS
The frozen copy was a self-justification — the comment claimed "tests +
tooling rely on import-time identity" but no test or tooling code path
actually references the binding. _build_initialize_result() calls
_build_channel_instructions() fresh per call so env changes take effect,
which is the documented runtime contract.

github-code-quality flagged it; resolving the unused-variable thread so
the staging branch protection's all-conversations-resolved gate clears.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:11:09 -07:00
Hongming Wang
dbd086c7ad test(mcp): comment empty except in bridge test cleanup
Address github-code-quality review on PR #2465: explain why the
OSError swallow in pipe teardown is intentional (best-effort
cleanup of a possibly-already-closed fd).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:07:33 -07:00
Hongming Wang
ea206043d8 feat(mcp): universal inbound delivery — instructions-driven polling + optional push
Why this exists
---------------
Live evidence on 2026-05-01 caught a regression latent in #46's
"push-feel inbound" closure: standard `claude` launches without
`--dangerously-load-development-channels` silently drop our
`notifications/claude/channel` emissions, so canvas/peer messages sat
in the wheel inbox and never reached the agent loop until manual
`inbox_peek`. The flag is research-preview-only; non-Claude-Code MCP
clients (Cursor, Cline, OpenCode, hermes-agent, codex) never receive
the notification at all because the method namespace is Claude-
specific. Push-only delivery shipped as the universal contract is
not actually universal.

What this changes
-----------------
Adds a poll path that works on every spec-compliant MCP client. The
`initialize` `instructions` field — read by every client and surfaced
to the agent's system prompt automatically — now tells the agent to
call `wait_for_message(timeout_secs=N)` at the start of every turn.
Push remains as the strictly-better delivery for hosts that opt in
(Claude Code with the dev flag or a future allowlist entry), but is
no longer load-bearing.

Both paths converge on the same `inbox_pop` ack so duplicate-delivery
on a push+poll race is impossible: whoever surfaces the message to
the agent first pops it, the other side returns empty.

Operator knob
-------------
`MOLECULE_MCP_POLL_TIMEOUT_SECS` controls per-turn poll blocking
(default 2s). 0 disables polling for push-only Claude Code with the
dev flag. Above 60 clamps to 60 — protects against an accidental
five-minute stall per turn. Resolved fresh on every `initialize` so
a relaunch with new env is enough; no wheel rebuild required.

Tests
-----
- structural pins on the new instructions: `wait_for_message` +
  `timeout_secs` named, both PUSH PATH / POLL PATH labels present
- env-resolution: default fallback, garbage fallback, negative
  fallback, 60s clamp
- operator override: `MOLECULE_MCP_POLL_TIMEOUT_SECS=7` reaches the
  agent's instructions string
- timeout=0 toggles to push-only-mode messaging (no
  wait_for_message call asked of the agent)
- existing pins on push path, reply tools, prompt-injection defense,
  meta attributes — all preserved

Successor to #46. Closure milestone for this PR (per
feedback_close_on_user_visible_not_merge.md): launched `claude`
against the published wheel, sent a canvas message, observed the
agent surfaces the message inline at the start of its next turn
without me running `inbox_peek` — verified live before declaring done.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 15:32:57 -07:00
Hongming Wang
a3a496bced test(mcp): pin inbox→stdout bridge end-to-end with three failure-mode tests
Closes the dynamic-coverage gap on the `notifications/claude/channel`
push-UX bridge — until now we had static pins on the wire shape
(_build_channel_notification) and the initialize handshake, but the
threading + asyncio + stdout chain that ships notifications to the
host was never exercised under realistic conditions.

The three failure modes anticipated in #2444 §2 are each now pinned:

  test_inbox_bridge_emits_channel_notification_to_writer
    Drives a fake inbox event from a daemon thread, asserts the
    notification lands on a real os.pipe-backed asyncio writer with
    the correct JSON-RPC envelope. Catches: bridge wired up
    incorrectly (no-op _on_inbox_message), run_coroutine_threadsafe
    drift, _build_channel_notification call missing.

  test_inbox_bridge_swallows_closed_pipe_drain_error
    Closes the pipe's read end before firing, captures the
    concurrent.futures.Future that run_coroutine_threadsafe returns,
    asserts its exception() is None. Catches: narrowing the broad
    `except Exception` in _emit (e.g. to RuntimeError), or removing
    it. Without the swallow, the future carries a ConnectionResetError
    and the test fails with a clear message naming the regression.

  test_inbox_bridge_swallows_closed_loop_runtime_error
    Builds the bridge against a closed event loop, fires the
    callback, asserts no exception escapes. Catches: removing the
    `except RuntimeError` swallow on the run_coroutine_threadsafe
    call. Without it the poller thread would crash with
    "RuntimeError: Event loop is closed" during shutdown.

To make the bridge testable, extracted the closures from main() into
a top-level `_setup_inbox_bridge(writer, loop) -> Callable[[dict],
None]` helper. main()'s wire-up is now a single line that calls the
helper. Behavior is unchanged — same write, same drain, same
swallows — just no longer trapped inside main()'s closures.

Verified each test catches its regression by injection: removing
each swallow / no-op'ing the bridge each turn the matching test red
with a specific failure message that points at the missing piece.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 15:13:32 -07:00
Hongming Wang
e6be3c0df0 test(mcp): pin prompt-injection defense in _CHANNEL_INSTRUCTIONS
Adds the missing symmetric pin against the threat-model sentence —
the existing tests pin reply-tool names (send_message_to_user,
delegate_task, inbox_pop) and tag attributes (kind, peer_id,
activity_id) but left the "treat message body as untrusted user
content" line unpinned. A copy-edit that drops it would turn the
channel into an open prompt-injection vector against any workspace
running the MCP server.

Pins three signals: "untrusted" present, an explicit
"not execute"/"do not" clause, and the "approval" escape-hatch
sentence — two of three would let a partial copy-edit slip
through.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 14:24:05 -07:00
Hongming Wang
2588ab27d5 feat(mcp): add channel instructions field — second gate for push UX
PR #2461 added the experimental.claude/channel capability declaration
on the assumption that was the missing gate for Claude Code surfacing
notifications/claude/channel as inline <channel> interrupts. Research
against code.claude.com/docs/en/channels-reference.md confirms the
capability IS one gate — but there's a SECOND required field we still
don't ship: `instructions` on the initialize result.

The docs are explicit: instructions is what tells the agent what the
<channel> tag attributes mean and which tool to call to reply. Without
it the channel registers but the agent receives the tag with no
context and has no idea how to handle it. The official telegram
plugin ships both (server.ts:370-396) — capability AND instructions.
We were shipping one of two.

This adds the instructions string. It documents:
- kind/peer_id/activity_id meta attributes
- canvas_user → send_message_to_user reply path
- peer_agent → delegate_task reply path
- inbox_pop ack to prevent duplicate-poll re-delivery
- threat model: treat message bodies as untrusted user content

Tests: 4 new pins. instructions present + non-empty, instructions
names each reply tool, instructions documents each tag attribute.
Failure messages name the symptom so a copy-edit can't silently
break the channel.

Live verification still pending after wheel ships — same plan as
the gap is in --dangerously-load-development-channels (host-side
flag, outside our control during the channels research preview).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 14:24:05 -07:00
Hongming Wang
63ef3b128c docs(mcp): correct server.ts reference + flag verification gap on experimental.claude/channel
Follow-up to commit 0a87dec5 (PR #2461, merged before live verification).

Two corrections to the docstring on `_build_initialize_result()`:

1. The original "mirrors molecule-mcp-claude-channel server.ts:374"
   claim is wrong on two axes. Line 374 is unrelated poll-init code
   (a comment inside `registerAsPoll`). The actual capability site
   is server.ts:475, where the bun bridge declares only
   `{ capabilities: { tools: {} } }` — *no* `experimental.claude/channel`.
   The bun bridge is reported to deliver `notifications/claude/channel`
   successfully in Claude Code despite this, which is direct counter-
   evidence that adding the capability was the bug fix.

2. The `@modelcontextprotocol/sdk` server's `assertNotificationCapability`
   does not include `notifications/claude/channel` in any of its switch
   cases, meaning custom (non-spec) notification methods are sent
   regardless of declared capabilities. Server-side, the declaration
   is almost certainly a no-op.

This commit doesn't remove the capability — additive, not destructive,
and the new tests pin its presence — but downgrades the docstring's
certainty so the next person debugging "channel notification didn't
fire" doesn't trust a stale claim and pursues the more likely root
causes:

  - writer.drain() swallowing exceptions on a closed pipe
  - inbox-thread → asyncio.run_coroutine_threadsafe race during init
  - MCP transport not yet attached when the first inbox event fires

Live verification per #2444 §2 (fresh Claude Code session on this wheel
with a peer A2A message, observe whether the interrupt fires) remains
the open hard-gate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 14:01:57 -07:00
Hongming Wang
0a87dec50e feat(mcp): declare experimental.claude/channel capability for push UX
Without this capability declaration in the initialize handshake,
Claude Code's MCP client receives our notifications/claude/channel
emissions but silently drops them — they never become inline
<channel> tags in the conversation. The push-UX bridge added in
PR #2433 ships, fires, and is invisible.

This was anticipated as a failure mode in #2444 §2 ("Notification
arrives but Claude Code doesn't surface it — host doesn't recognize
the method"), and confirmed live in this session: a canvas chat
"hi" landed in the inbox queue (inbox_peek returned it) but never
woke the agent until inbox_peek was called by hand.

The contract matches molecule-mcp-claude-channel/server.ts:374
where the bun bridge declares the same experimental flag.

Refactor: extracted _build_initialize_result() so the handshake
shape is unit-testable. Pure function, no behavioral change beyond
adding the experimental capability to the result.

Tests: 3 new pins on the initialize result (capability presence,
tools-still-there, protocolVersion stable). Closes the live-
verification gap §2 of #2444.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 13:45:06 -07:00
Hongming Wang
b8fdbd9fab fix(runtime): register configs_dir in TOP_LEVEL_MODULES + drop alias
Wheel-build smoke gate detected `configs_dir` missing from
scripts/build_runtime_package.py:TOP_LEVEL_MODULES. Without it the
build would ship `import configs_dir` un-rewritten and every
external-runtime install would die on `ModuleNotFoundError` at first
import.

Two callers used `import configs_dir as _configs_dir` to belt-and-
suspenders against an imagined name collision, but the rewriter
rejects `import X as Y` because the rewrite would produce
`import molecule_runtime.X as X as Y` (invalid syntax). No actual
collision exists (only docstring/comment references). Switched to
plain `import configs_dir`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 13:13:57 -07:00
Hongming Wang
c636022d2f fix(runtime): auto-fallback CONFIGS_DIR for non-container hosts (closes #2458)
The runtime persists per-workspace state (`.auth_token`,
`.platform_inbound_secret`, `.mcp_inbox_cursor`) under `/configs` —
the workspace-EC2 mount path. Inside a container that's writable,
agent-owned. Outside a container, `/configs` either doesn't exist or
isn't writable by an unprivileged user.

The default broke the external-runtime path (`pip install
molecule-ai-workspace-runtime` + `molecule-mcp` on a Mac/Linux
laptop). First heartbeat tries to persist `.platform_inbound_secret`
and crashes:

    [Errno 30] Read-only file system: '/configs'

The heartbeat thread logs and dies. Workspace flips offline within
a minute. Operator sees no actionable error.

Adds workspace/configs_dir.py — single resolution point with a tiered
fallback:

  1. CONFIGS_DIR env var, if set — explicit operator override
     (preserves existing tests + custom deployments verbatim).
  2. /configs — if it exists AND is writable. In-container default;
     unchanged behavior for every prod workspace.
  3. ~/.molecule-workspace — created with mode 0700 so per-file 0600
     perms aren't undermined by a world-readable parent.

Migrates the four readers (platform_auth, platform_inbound_auth,
mcp_cli, inbox) to call configs_dir.resolve() instead of
inlining `Path(os.environ.get("CONFIGS_DIR", "/configs"))`.

Existing tests that assert the old `/configs`-as-default contract
updated to assert the new contract: when CONFIGS_DIR is unset, path
resolves to a writable location — `/configs` if present, fallback
otherwise. Tests skip the fallback branch on hosts that DO have a
writable `/configs` (CI containers).

Verified the original repro is fixed: with no CONFIGS_DIR set on
macOS, configs_dir.resolve() returns ~/.molecule-workspace, the dir
exists, and writes succeed.

Test suite: 1454 passed, 3 skipped, 2 xfailed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 13:07:55 -07:00
Hongming Wang
2e8892ebc4 fix(workspace): surface errno + path on chat-upload mkdir failure
Production incident on hongming.moleculesai.app 2026-05-01T18:30Z —
fresh-tenant signup chat upload returned 500 with the body
{"error":"failed to prepare uploads dir"}. Diagnosis required SSM
access to the workspace stderr to recover errno + actual path.

The root-cause fix lives in claude-code template entrypoint
(molecule-ai-workspace-template-claude-code#23 — pre-create the
.molecule subtree as root before gosu drops to agent). This change
is the diagnostic improvement: when mkdir fails for any reason in
the future (EACCES, ENOSPC, EROFS, etc.), the response carries
the errno + offending path so the operator inspecting browser
devtools sees the real cause without needing SSM.

Backwards compatible — top-level "error" key is unchanged so
existing canvas / external alert rules continue to match. New
fields are additive: path, errno, detail.

Test pins the diagnostic shape so a future struct refactor can't
silently drop these fields.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 11:47:53 -07:00
Hongming Wang
c06c4c0f56
Merge pull request #2450 from Molecule-AI/feat/observability-config-schema
feat(config): observability block schema (#119 PR-1 of 4)
2026-05-01 05:20:11 +00:00
Hongming Wang
645c1862c4 feat(a2a-client): surface 410 Gone as 'removed' error so callers can re-onboard (#2429)
Follow-up A to PR #2449 — that PR taught the platform to return 410
Gone for status='removed' workspaces; this PR teaches get_workspace_info
to consume that signal.

Before: every non-200 collapsed into {"error": "not found"}, which
made the 2026-04-30 incident impossible to diagnose — the operator
KNEW the workspace_id existed (they'd just registered it), but the
runtime kept reporting "not found" for a deleted-but-not-purged row.

After: 410 produces a distinct {"error": "removed", "id", "removed_at",
"hint"} dict so callers (heartbeat-loop, channel bridge, dashboard
tools) can surface "your workspace was deleted, re-onboard" instead
of "not found". Falls back to a default hint if the platform body
isn't parseable so the actionable signal doesn't depend on body
shape parity.

Two new tests:
  - TestGetWorkspaceInfo.test_410_returns_removed_with_hint
  - TestGetWorkspaceInfo.test_410_with_unparseable_body_falls_back_to_default_hint

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 22:08:08 -07:00
Hongming Wang
59902bce83 feat(config): add observability block schema (#119 PR-1 of 4)
Hermes-style declarative block grouping cadence + verbosity knobs into
one place. Schema-only in this PR — wiring into heartbeat.py and main.py
lands in PR-3 of the #119 stack.

Two fields with live consumers waiting:
- heartbeat_interval_seconds (default 30, clamped to [5, 300])
  → heartbeat.py:134 currently has hard-coded HEARTBEAT_INTERVAL = 30
- log_level (default "INFO", uppercased at parse)
  → main.py:465 currently has hard-coded log_level="info"

Clamp band [5, 300] is intentional: sub-5s flooded the platform during
IR-2026-03-11; >5min lets crashed workspaces look healthy long enough
to mask failure. Coerce at parse so adapters and heartbeat.py can read
the value without re-validating.

Tests pin defaults, explicit YAML override, partial override, and
parametrized clamp behavior (10 cases including garbage strings + None).

Part of: task #119 (adopt hermes-style architecture)
Stack:  PR-1 schema → PR-2 event_log → PR-3 wire consumers → PR-4 skill compat
2026-04-30 21:58:45 -07:00
Hongming Wang
661eec2659 chore(smoke-mode): harden module-load + drop dead except clause
Two follow-ups from the #2275 Phase 1 self-review:

1. `_SMOKE_TIMEOUT_SECS = float(os.environ.get(...))` was evaluated at
   module load. main.py imports smoke_mode unconditionally — before
   the is_smoke_mode() check — so a malformed
   MOLECULE_SMOKE_TIMEOUT_SECS env value would SystemExit every
   workspace boot, not just smoke runs. Wrapped in try/except with a
   5.0 fallback. Probability of a typo'd env var hitting production
   is low (it's a CI-only knob), but the footgun is removed entirely.
   Regression test reloads the module under a malformed env value.

2. `_real_a2a_sdk_available()` caught (ImportError, AttributeError).
   `from X import Y` raises ImportError when Y is missing on X — never
   AttributeError. Dropped the unreachable branch.

No behavior change for the happy path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 21:31:08 -07:00
Hongming Wang
aacaba024c feat(wheel-smoke): exercise executor.execute() to catch lazy imports (#2275)
The existing wheel-publish smoke (`wheel_smoke.py`) only IMPORTS
`molecule_runtime.main` at module scope. Lazy imports buried inside
`async def execute(...)` bodies (e.g. `from a2a.types import FilePart`)
NEVER evaluate at static-import time — they crash at first message
delivery in production.

The 2026-04-2x v0→v1 a2a-sdk migration shipped 5 such regressions in
templates that all looked fine at module-load smoke. This change adds
`smoke_mode.py` plus a `MOLECULE_SMOKE_MODE=1` short-circuit in
`main.py`: after `adapter.create_executor(...)`, the boot path invokes
`executor.execute(stub_ctx, stub_queue)` once with a 5s timeout
(`MOLECULE_SMOKE_TIMEOUT_SECS`). Healthy import tree → execution
proceeds far enough to hit a network boundary and times out (exit 0).
Broken lazy import → `ImportError` / `ModuleNotFoundError` from inside
the executor body (exit 1). Other downstream errors (auth, validation)
pass — those are caught by adapter-level tests, not this gate.

Stub `(RequestContext, EventQueue)` is built from the real a2a-sdk so
SendMessageRequest/RequestContext constructor changes also surface as
import-tree failures (the regression class also includes "SDK
refactored mid-publish"). The stub-build itself is wrapped — if it
raises, that's a smoke fail too.

Phase 2 (separate PR, molecule-ci) wires this into
publish-template-image.yml so the publish gate runs the boot smoke
against every template image before pushing the tag.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 21:21:18 -07:00
Hongming Wang
0b2ea0a50f
Merge pull request #2441 from Molecule-AI/feat/explicit-provider-field
feat(config): add explicit `provider:` field alongside `model:` (PR-1 of stack)
2026-05-01 03:54:27 +00:00
Hongming Wang
067ad83ce5 feat(config): add explicit provider: field alongside model:
Adds a top-level `provider` slug to WorkspaceConfig and RuntimeConfig so
adapters can route to a specific gateway without re-implementing
slug-prefix parsing across hermes / claude-code / codex.

Resolution chain in load_config (mirrors how `model` resolves):

  1. ``LLM_PROVIDER`` env var — what canvas Save+Restart sets so the
     operator's Provider dropdown choice survives a CP-driven restart
     (the regenerated /configs/config.yaml drops most user fields).
  2. Explicit YAML ``provider:`` — operator pinned it in the file.
  3. Derive from the model slug prefix for backward compat:
       ``anthropic:claude-opus-4-7`` → ``anthropic``
       ``minimax/abab7-chat-preview`` → ``minimax``
       bare model names → ``""`` (let the adapter decide).

`runtime_config.provider` falls back to the top-level resolved
provider, the same shape PR #2438 added for `runtime_config.model`.

Why a separate field at all (we already parse the slug):
  - Custom model aliases without a recognizable prefix need an
    explicit signal — the canvas Provider dropdown writes it.
  - Adapters were each rolling their own slug-parse (hermes's
    derive-provider.sh, claude-code's adapter-default branch, etc.);
    one resolution point in load_config kills that drift class.
  - Canvas needs a stable storage field that doesn't get clobbered
    every time the user picks a new model.

Backward-compatible: when `provider:` is absent, slug derivation
keeps every existing config.yaml working without a migration.

PR-1 of a multi-PR stack (Option B from RFC discussion). Subsequent
PRs plumb the field through workspace-server env, CP user-data,
adapters (hermes prefers explicit over derive-provider.sh), and
canvas Provider dropdown UI.

Tests cover all four resolution paths + runtime_config inheritance:
  - test_provider_default_empty_when_bare_model
  - test_provider_derived_from_colon_slug
  - test_provider_derived_from_slash_slug
  - test_provider_yaml_explicit_wins_over_derived
  - test_provider_env_override_beats_yaml_and_derived
  - test_runtime_config_provider_yaml_wins_over_top_level
  - test_provider_default_from_default_model

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 20:47:09 -07:00
Hongming Wang
6e92fe0a08 chore: rewriter unit tests + drop misleading noqa on import inbox
Two small follow-ups to the PR #2433#2436#2439 incident chain.

1) `import inbox  # noqa: F401` in workspace/a2a_mcp_server.py was
   misleading — `inbox` IS used (at the bridge wiring inside main()).
   F401 means "imported but unused", which would mask a real future
   F401 if the usage is removed. Drop the noqa, keep the explanatory
   block comment about the rewriter's `import X` → `import mr.X as X`
   expansion (and the `import X as Y` → `import mr.X as X as Y` trap
   the comment exists to prevent re-introducing).

2) scripts/test_build_runtime_package.py — 17 unit tests covering
   `rewrite_imports()` and `build_import_rewriter()` in
   scripts/build_runtime_package.py. Until now the function had zero
   coverage despite the entire wheel build depending on it. Tests
   pin: bare-import aliasing, dotted-import preservation, indented
   imports, from-imports (simple + dotted + multi-symbol + block),
   the `import X as Y` rejection added in PR #2436 (with comment-
   stripping + indented + comma-not-alias edge cases), allowlist
   anchoring (`a2a` ≠ `a2a_tools`), and end-to-end reproduction
   of the PR #2433 failing pattern + the #2436 fix pattern.

3) Wire scripts/test_*.py into CI by adding a second discover pass
   to test-ops-scripts.yml. Top-level scripts/ tests live alongside
   their target file (parallels the scripts/ops/ test layout); the
   existing scripts/ops/ pass keeps running because scripts/ops/
   has no __init__.py so a single discover from scripts/ root
   doesn't recurse. Two passes is simpler than retrofitting
   namespace packages. Path filter widened from `scripts/ops/**`
   to `scripts/**` so PRs touching the build script trigger the
   new tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 20:45:32 -07:00
Hongming Wang
5ad41f63ce
Merge pull request #2438 from Molecule-AI/fix/runtime-config-model-fallback-clean
fix(config): runtime_config.model falls back to top-level model
2026-05-01 03:31:20 +00:00
Hongming Wang
0070d0bd59 fix(config): runtime_config.model falls back to top-level model
External feedback (2026-04-30): "Provisioner doesn't read model from
config.yaml and doesn't set MODEL env var. Without MODEL, the adapter
defaults to sonnet and bypasses the mimo routing." Confirmed accurate
for SaaS workspaces.

Trace: claude-code-default/adapter.py reads `runtime_config.model or
"sonnet"` (and hermes reads HERMES_DEFAULT_MODEL via install.sh, which
IS plumbed). For claude-code there's nothing — workspace/config.py
loaded `runtime_config.model` only from YAML, ignoring MODEL_PROVIDER
env. The CP user-data script regenerates /configs/config.yaml at every
boot with only `name`, `runtime`, `a2a` keys (intentionally minimal so
it doesn't carry stale state) — so any user-set runtime_config.model
is wiped on every restart, and the adapter falls back to "sonnet" even
when the user picked Opus in the canvas Config tab.

Fix: when YAML omits runtime_config.model, fall back to the top-level
resolved `model`, which already honors MODEL_PROVIDER env override.
One-line in workspace/config.py. Now MODEL_PROVIDER → top-level model
→ runtime_config.model → adapter sees the user's selection. Sticky
across CP-driven restarts; the canvas Save+Restart loop works as
intended for every runtime, not just hermes.

Tests:
  test_runtime_config_model_falls_back_to_top_level — top-level set, runtime_config empty → fallback wins
  test_runtime_config_model_yaml_wins_over_top_level — YAML explicit → fallback skipped (precedence)
  test_runtime_config_model_picks_up_env_via_top_level — full canvas Save+Restart simulation: env → top-level → runtime_config.model

Negative-control verified: removing the `or model` flips both fallback
tests red with the expected "" vs expected-model mismatch; restoring
flips them green. The yaml-wins test passes either way (correctly,
because precedence is preserved).

Replaces closed PR #2435 — that PR's commit was on a contaminated
branch and accidentally captured unrelated WIP changes (build script
+ a2a_mcp_server refactor) instead of this fix. Self-review caught it
and closed the PR. This branch is clean off main + diff verified
before push.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 20:28:50 -07:00
Hongming Wang
0acdf3bb56 fix(wheel): import inbox without alias to dodge rewriter collision
PR #2433 (notifications/claude/channel) shipped 'import inbox as
_inbox_module' inside a2a_mcp_server.py:main(). The build script's
import rewriter expands plain 'import inbox' to
'import molecule_runtime.inbox as inbox', so the original source
became 'import molecule_runtime.inbox as inbox as _inbox_module',
which is invalid Python.

Caught at the publish-runtime + PR-built-wheel-smoke gate (the
SyntaxError trace is in run 25200422679). The wheel didn't ship to
PyPI because publish-runtime's smoke-import step refused to install
it, but staging is currently sitting on a broken-build commit until
this fix-forward lands.

Changes:
- a2a_mcp_server.py: lift `import inbox` to top of file (rewriter
  produces clean `import molecule_runtime.inbox as inbox`), call
  inbox.set_notification_callback directly in main()
- build_runtime_package.py: rewrite_imports() now raises ValueError
  when it sees 'import X as Y' for any X in the workspace allowlist,
  instead of silently producing a syntax-error wheel. Operator gets
  a clear actionable error at build time pointing at the offending
  line + suggested rewrites ('from X import …' or plain 'import X').

The build-time gate (this PR's rewriter check) catches the regression
class earlier than the smoke-time gate (PR #2433's failure). Adding
'PR-built wheel + import smoke' to staging branch protection's
required checks is filed separately so this class doesn't merge again.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 20:21:54 -07:00
Hongming Wang
0a3ec53f34 feat(mcp): notifications/claude/channel for push-feel inbox UX
Adds a notification seam to the universal molecule-mcp wheel so push-
notification-capable MCP hosts (Claude Code today; any compliant
client tomorrow) get inbound A2A messages as conversation interrupts
instead of having to poll wait_for_message / inbox_peek.

Wire-up:
- inbox.py: module-level _NOTIFICATION_CALLBACK + set_notification_callback()
  Fires from InboxState.record() AFTER lock release, with same dict
  shape inbox_peek returns. Best-effort — a raising callback never
  prevents the message from landing in the queue.
- a2a_mcp_server.py: _build_channel_notification() pure helper +
  bridge wiring in main() that schedules notifications via
  asyncio.run_coroutine_threadsafe (poller is a daemon thread, MCP
  loop is asyncio).
- Method name 'notifications/claude/channel' matches the contract
  documented in molecule-mcp-claude-channel/server.ts:509.
- wheel_smoke.py: pin set_notification_callback as a published name,
  same regression class as the 0.1.16 main_sync incident.

Pollers (wait_for_message / inbox_peek) keep working unchanged for
runtimes without notification support.

Tests: 6 new in test_inbox.py (callback fires once on record, dedupe
short-circuits before fire, raising cb doesn't break inbox, set/clear
semantics), 5 new in test_a2a_mcp_server.py (method name pin, content
mapping, meta routing, no-id JSON-RPC notification spec, missing-
field tolerance). All 59 combined tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 20:10:01 -07:00
Hongming Wang
c4bb803329 feat(mcp_cli): agent_card from env vars (capability discovery)
External molecule-mcp runtimes register with hardcoded agent_card.name
= molecule-mcp-{id[:8]} and skills=[]. That made every external
workspace look identical on the canvas and gave peer agents calling
list_peers no signal beyond name — they had to guess capabilities.

Three new env vars let the operator declare identity + capabilities
without code changes:

  * MOLECULE_AGENT_NAME — display name on canvas (default unchanged)
  * MOLECULE_AGENT_DESCRIPTION — one-line description (default empty)
  * MOLECULE_AGENT_SKILLS — comma-separated skill names

Comma-separated skills get expanded to {"name": "..."} objects — the
minimum shape that satisfies both shared_runtime.summarize_peers
(reads s["name"]) AND canvas SkillsTab.tsx (id falls back to name).

Strict-superset behaviour: when no env vars are set, agent_card
matches the previous hardcoded value exactly. No regression for
operators who haven't migrated.

Why this matters end-to-end:
  * Canvas Skills tab now shows each declared skill as a chip
  * Peer agents calling list_peers see {name, skills} per peer and
    can route delegations to the right specialist
  * Same applies to the canvas Details tab + workspace card hover

Tests cover: defaults match prior behaviour; name override; CSV →
skill objects; whitespace stripping + empty entries dropped;
description omitted when unset (keeps wire payload minimal);
whitespace-only name falls back to default; end-to-end through
_platform_register's payload.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 18:57:39 -07:00
Hongming Wang
210f6e066a
Merge pull request #2424 from Molecule-AI/fix/in-container-heartbeat-persists-inbound-secret
fix(workspace): in-container heartbeat persists platform_inbound_secret
2026-05-01 01:36:52 +00:00
Hongming Wang
d887ce8e96 fix(mcp_cli): escalate consecutive heartbeat 401s with re-onboard guidance
The universal molecule-mcp wheel runs in a daemon thread, posting
/registry/heartbeat every 20s. When the workspace gets deleted
server-side (DELETE /workspaces/:id), the platform revokes all tokens
for that workspace. Previous behaviour: heartbeat would 401 forever,
log at WARNING per tick, no actionable signal anywhere.

Failure mode hit on hongmingwang tenant 2026-04-30: workspace
a1771dba was deleted at some prior time, the channel-bridge .env
still pointed at it, MCP tools 401-ed silently with the operator
having no idea why. The register-time path at mcp_cli.py:104-111
already does loud + actionable for 401 (sys.exit(3) with regenerate-
from-canvas-Tokens text) — extend the same pattern to the heartbeat.

Behaviour:
  * count < 3: WARNING per tick (could be transient blip)
  * count == 3: ERROR with re-onboard instructions, names the dead
    workspace_id, points at the canvas Tokens tab
  * count > 3 and every 20 ticks (~7 min): re-log ERROR so a session
    that started after the first ERROR still catches it

5xx and other non-auth HTTP errors do NOT increment the auth-failure
counter — that would mislead the operator (e.g. a server blip would
trigger "token revoked" when the token is fine).

Tests cover: single 401 stays at WARNING; 3 consecutive 401s escalate
to ERROR with the right keywords; 403 treated identically; recovery
via 200 resets the counter; 5xx never triggers the auth path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 18:26:35 -07:00
Hongming Wang
98845c8f42 fix(workspace): in-container heartbeat persists platform_inbound_secret
Follow-up to PR #2421. The standalone wrapper (mcp_cli.py) got
heartbeat-time secret persistence in #2421, but the in-container
heartbeat (workspace/heartbeat.py) was missed — and that's the path
every workspace EC2 actually runs. Result: hongmingwang Claude Code
agent stayed 401-forever on chat upload after this morning's deploy
because the workspace's runtime never picked up the lazy-healed
secret.

The in-container _loop now captures the heartbeat response and calls
the same _persist_inbound_secret_from_heartbeat helper used by the
standalone path, on both the first POST and the 401-retry POST.
Defensive on every error (non-JSON, non-dict, empty, save failure) —
liveness contract trumps secret persistence.

Tests pin: happy path, absent secret, empty string, non-JSON body,
non-dict body, save_inbound_secret OSError, end-to-end loop.
2026-04-30 18:18:10 -07:00
Hongming Wang
c733454a56
Merge pull request #2421 from Molecule-AI/fix/heartbeat-delivers-inbound-secret
fix(workspace): deliver platform_inbound_secret on every heartbeat
2026-05-01 00:54:00 +00:00
Hongming Wang
993f8c494e refactor(workspace-runtime): send_a2a_message takes peer_id, validates UUID
Two cleanups stacked on PR #2418:

1. Refactor `send_a2a_message(target_url, msg)` →
   `send_a2a_message(peer_id, msg)`. After #2418 every caller passes
   `${PLATFORM_URL}/workspaces/{peer_id}/a2a` — the function's
   parameter pretended to accept arbitrary URLs but in practice only
   one shape is meaningful. Owning URL construction inside the
   function makes the contract honest and centralises the peer-id
   validation introduced below.

2. Add `_validate_peer_id` UUID-shape check at the trust boundary.
   `discover_peer` and `send_a2a_message` are the entry points where
   agent-controlled strings flow into URL paths; rejecting non-UUID
   input at this layer eliminates the URL-interpolation class of
   bug (`workspace_id="../admin"` etc.) regardless of how the rest
   of the codebase interpolates ids elsewhere. Auth was already
   gating malicious access — this is consistency + clear failure
   over silent platform 4xx.

In-container tests cover positive UUIDs, malformed input
(``"ws-abc"``, ``"../admin"``, empty), and the contract that
``tool_delegate_task`` hands the peer_id to ``send_a2a_message``
without building URLs itself.

Live-verified: external delegation 8dad3e29 → 97ac32e9 returned
"refactor verified" from Claude Code Agent through the refactored
code; ``_validate_peer_id`` rejects ``"ws-abc"`` and ``"../admin"``
and accepts canonical UUIDs.

Stacked on PR #2418 (proxy-routing fix). Will rebase onto staging
once #2418 merges.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 17:43:01 -07:00
Hongming Wang
a5c5139e3a fix(workspace): deliver platform_inbound_secret on every heartbeat
Heartbeat now echoes the workspace's platform_inbound_secret on every
beat (mirroring /registry/register), and the molecule-mcp client
persists it to /configs/.platform_inbound_secret on receipt.

Symptom (2026-04-30, hongmingwang tenant): chat upload returned 503
"workspace will pick it up on its next heartbeat" and then 401 on
retry — permanent until workspace restart. The 503 message was a lie:
heartbeat used to discard the platform_inbound_secret entirely; only
register delivered it, and register fires once at startup.

Server (Go):
  - Heartbeat handler reuses readOrLazyHealInboundSecret (the same
    helper chat_files + register use), so heartbeat-time recovery
    covers the rotate / mid-life NULL-column case the existing
    register-time heal can't reach.
  - Failure is non-fatal: liveness contract trumps secret delivery,
    chat_files retries lazy-heal on its own next request.

Client (Python):
  - _persist_inbound_secret_from_heartbeat parses the heartbeat 200
    response and persists via platform_inbound_auth.save_inbound_secret.
  - All exceptions swallowed — heartbeat liveness > secret persistence;
    next tick (≤20s) retries.

Tests:
  - Server: pin secret-present, lazy-heal-mint-on-NULL, and heal-
    failure-omits-field branches.
  - Client: pin persist-on-200, skip-on-empty, skip-on-non-dict-body,
    skip-on-401, swallow-save-OSError.
2026-04-30 17:36:33 -07:00
Hongming Wang
aefb44aff2 fix(workspace-runtime): route delegate_task through platform A2A proxy
tool_delegate_task was POSTing directly to peer["url"], which is
the Docker-internal hostname (e.g. http://ws-X-Y:8000) for in-
container peers. External callers — the standalone molecule-mcp
wrapper running on an operator's laptop — get [Errno 8] nodename
nor servname every single delegation, breaking the universal-MCP
path's last "ride the same code as in-container" claim.

The platform's /workspaces/:peer-id/a2a proxy endpoint already
handles internal forwarding for in-container peers AND is the only
path external runtimes can use. Unify on it: in-container callers
pay one extra HTTP hop on the same Docker bridge (microseconds);
external callers get a working delegation path for the first time.

discover_peer is still called for access-control + online-status
detection — only the routing target changes. Verified live on
2026-04-30 against workspace 8dad3e29 (external mac runtime) →
97ac32e9 (Claude Code Agent in-container): direct POST returned
ConnectError, proxy POST returned "acknowledged from claude code
agent" as requested.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 17:13:50 -07:00
Hongming Wang
d061642cfc test(inbox): bind side-effecting pop() before assert
CodeQL flagged the bare `assert state.pop(...) is None` — under
`python -O` asserts are stripped, which would skip the call entirely
and the test would silently pass without exercising the code. Bind
the result first so the call always runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 16:39:45 -07:00
Hongming Wang
b47d4ceb00 feat(workspace-runtime): add inbox polling for standalone molecule-mcp path
The universal MCP server (a2a_mcp_server.py) was outbound-only — agents
in standalone runtimes (Claude Code, hermes, codex, etc.) could
delegate, list peers, and write memories, but never observed the
canvas-user or peer-agent messages addressed to them. This blocked
"constantly responding" loops without forcing operators back onto a
runtime-specific channel plugin.

This PR closes the inbound gap with a poller-fed in-memory queue and
three new MCP tools:

  - wait_for_message(timeout_secs?) — block until next message arrives
  - inbox_peek(limit?)              — list pending messages (non-destructive)
  - inbox_pop(activity_id)          — drop a handled message

A daemon thread polls /workspaces/:id/activity?type=a2a_receive every
5s, fills the queue from the cursor (since_id), and persists the cursor
to ${CONFIGS_DIR}/.mcp_inbox_cursor so a restart doesn't replay backlog.
On 410 (cursor pruned) we fall back to since_secs=600 for a bounded
recovery window. Activity-row → InboxMessage extraction mirrors the
molecule-mcp-claude-channel plugin's extractText (envelope shapes #1-3
+ summary fallback).

mcp_cli.main starts the poller alongside the existing register +
heartbeat threads. In-container runtimes (which have push delivery via
canvas WebSocket) skip activation, so inbox tools return an
informational "(inbox not enabled)" message instead of double-delivery.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 16:32:48 -07:00
Hongming Wang
b54ceb799f fix: address 5-axis review findings on PR #2413
Critical:
- ExternalConnectModal.tsx: filledUniversalMcp substitution searched
  for WORKSPACE_AUTH_TOKEN but the snippet's placeholder is now
  MOLECULE_WORKSPACE_TOKEN (changed in the previous polish commit
  876c0bfc). Operators copy-pasting the MCP tab would have gotten a
  literal "<paste from create response>" instead of the token. Fix
  the substitution to match the new placeholder name.

Important:
- mcp_cli._platform_register: 401/403 from initial register now hard-
  exits with code 3 + an actionable stderr message pointing the
  operator at the canvas Tokens tab. Pre-fix: warning log + continue,
  which made a bad-token startup silently fail (heartbeat 401's
  forever, every tool call also 401's, no clear surfacing in the
  operator's MCP client). 500/503 still log + continue (transient
  platform blips shouldn't abort the MCP loop).
- a2a_mcp_server.cli_main docstring: removed stale claim that this is
  the wheel's console-script entry-point target. The actual target is
  mcp_cli.main since 2026-04-30. Wheel-smoke pins both names so the
  functionality was correct, but the doc was lying.

Test coverage: 3 new mcp_cli tests:
  - register 401 exits code=3 + stderr mentions canvas Tokens tab
  - register 403 (C18 hijack rejection) takes same path
  - register 500/503 does NOT exit — only auth errors hard-fail

Findings deferred to follow-up (acceptable per review rubric):
  - Code dedup across mcp_cli / heartbeat.py / molecule_agent SDK
  - Pooled httpx.Client for connection reuse
  - Heartbeat exponential backoff
  - Token-resolution ordering parity (env-first vs file-first)
    between mcp_cli.main and platform_auth.get_token

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 16:06:59 -07:00
Hongming Wang
427300f3a4 feat: make molecule-mcp standalone (built-in register + heartbeat) + recover awaiting_agent on heartbeat
Two paired fixes that together let an external operator run a single
process (molecule-mcp) and see their workspace come up online in the
canvas — the bug surfaced live when status stuck at "awaiting_agent /
OFFLINE" despite an active MCP server.

Platform side (workspace-server/internal/handlers/registry.go):
  Heartbeat handler already auto-recovers offline → online and
  provisioning → online, but NOT awaiting_agent → online. Healthsweep
  flips stale-heartbeat external workspaces TO awaiting_agent, and
  with no recovery path the workspace stays "OFFLINE — Restart" in the
  canvas forever. Add the symmetric branch: if currentStatus ==
  "awaiting_agent" and a heartbeat arrives, flip to online + broadcast
  WORKSPACE_ONLINE. Mirrors the existing offline/provisioning patterns
  exactly. Test: TestHeartbeatHandler_AwaitingAgentToOnline asserts
  the SQL UPDATE fires with the awaiting_agent guard clause.

Wheel side (workspace/mcp_cli.py):
  molecule-mcp was outbound-only — operators had to run a separate
  SDK process to register + heartbeat. Now mcp_cli.main():
    1. Calls /registry/register at startup (idempotent upsert flips
       status awaiting_agent → online via the existing register path).
    2. Spawns a daemon thread that POSTs /registry/heartbeat every
       20s. 20s is comfortably under the healthsweep stale window so
       a single missed beat doesn't cause status churn.
    3. Runs the MCP stdio loop in the foreground.

  Both calls set Origin: ${PLATFORM_URL} so the SaaS edge WAF accepts
  them. Threaded heartbeat (not asyncio) chosen because it doesn't
  need to share an event loop with the MCP stdio server — daemon=True
  cleanly dies when the operator's runtime exits.

  MOLECULE_MCP_DISABLE_HEARTBEAT=1 escape hatch lets in-container
  callers (which have heartbeat.py running already) reuse the entry
  point without double-heartbeating. Default is enabled.

End-to-end verification (live, against
hongmingwang.moleculesai.app, workspace 8dad3e29-...):
  pre-fix:  status=awaiting_agent → canvas shows OFFLINE forever
  post-fix: ran `molecule-mcp` for 5s standalone → canvas state:
            status=online runtime=external agent=molecule-mcp-8dad3e29

Test coverage: 7 new mcp_cli tests (register-at-startup, heartbeat-
thread-spawned, disable-env-skips-both, env-and-file token resolution,
register payload shape, heartbeat endpoint + headers); 1 new platform
test (awaiting_agent → online recovery). Full workspace + handlers
suites green: 1355 Python, full Go handlers passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 15:42:44 -07:00
Hongming Wang
74c5e0d7a8 fix(workspace-runtime): add Origin header so SaaS edge WAF accepts MCP tool calls
Discovered while smoke-testing the molecule-mcp external-runtime path
against a live tenant (hongmingwang.moleculesai.app). Every tool call
that hit /workspaces/* or /registry/*/peers returned 404 — but
/registry/register and /registry/heartbeat returned 200. Diagnosis:
the tenant's edge WAF requires a same-origin header. Without it,
unhandled paths get silently rewritten to the canvas Next.js app,
which has no /workspaces or /registry/:id/peers route and returns an
empty 404. The molecule-mcp-claude-channel plugin already sets this
header (server.ts:271-276); the workspace runtime never did because
in-container PLATFORM_URLs (Docker network) aren't behind the WAF.

Fix: extend platform_auth.auth_headers() to include
Origin: ${PLATFORM_URL} whenever PLATFORM_URL is set. Inside-container
behavior is unchanged (the WAF is path-irrelevant for the internal
hostnames). External-runtime calls now thread the WAF correctly.

Verification (live, against a freshly-registered external workspace):
  pre-fix:  get_workspace_info → "not found", list_peers → 404
  post-fix: get_workspace_info → full workspace JSON,
            list_peers → "Claude Code Agent (ID: 97ac32e9..., status: online)"

This is the kind of bug unit tests can never catch — caught only by
running the wheel against the real tenant. Memory:
feedback_always_run_e2e.md.

Test coverage: 4 new tests in test_platform_auth.py — Origin alone
when no token + Origin + Authorization both, no-PLATFORM_URL falls
through to original empty-dict behavior, env-token path with Origin.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 15:30:15 -07:00
Hongming Wang
169e284d57 feat(workspace-runtime): expose universal MCP server to runtime=external operators
Ship the baseline universal MCP path that any external runtime (Claude
Code, hermes, codex, anything that speaks MCP stdio) can use, before
optimizing per-runtime channels. Today the workspace MCP server only
spins up inside the container; external operators have no way to call
the 8 platform tools (delegate_task, list_peers, send_message_to_user,
commit_memory, etc.) from outside.

Three additive changes:

1. **`platform_auth.get_token()` env-var fallback** — adds
   `MOLECULE_WORKSPACE_TOKEN` as a fallback when no
   `${CONFIGS_DIR}/.auth_token` file exists. File-first preserves
   in-container behavior unchanged. External operators (no /configs
   volume) now have a way to supply the token without faking the
   filesystem layout.

2. **`molecule-mcp` console script** — adds a new entry point in the
   published `molecule-ai-workspace-runtime` PyPI wheel. Operators run
   `pip install molecule-ai-workspace-runtime`, set 3 env vars
   (WORKSPACE_ID, PLATFORM_URL, MOLECULE_WORKSPACE_TOKEN), and register
   the binary in their agent's MCP config. `mcp_cli.main` is a thin
   validator wrapper — it checks env BEFORE importing the heavy
   `a2a_mcp_server` module so a misconfigured first-run gets a friendly
   3-line error instead of a 20-line module-level RuntimeError
   traceback.

3. **Wheel smoke gate** — extends `scripts/wheel_smoke.py` to assert
   `cli_main` and `mcp_cli.main` are importable. Same regression class
   as the 0.1.16 main_sync incident: a silent rename or unrewritten
   import here would break every external operator on the next wheel
   publish (memory: feedback_runtime_publish_pipeline_gates.md).

Test coverage:
- `tests/test_platform_auth.py` — 8 new tests for the env-var fallback:
  file-priority, env-fallback, whitespace handling, cache, header
  construction, empty-env-as-unset.
- `tests/test_mcp_cli.py` — 8 new tests for the validator: each
  required var separately, file-or-env satisfies token requirement,
  whitespace-only env treated as missing, help mentions canvas Tokens
  tab.
- Full `workspace/tests/` suite green: 1346 passed, 1 skipped.
- Local end-to-end: built wheel, installed in venv, ran `molecule-mcp`
  with no env → friendly error; with env → MCP server starts.

Why now / why this shape: user redirect was "support the baseline
first so all runtimes can use, then optimize". A claude-only MCP
channel leaves hermes/codex/third-party operators broken on
runtime=external. This PR ships the runtime-agnostic baseline; per-
runtime polish (claude-channel push delivery, hermes-native
bindings) is a follow-up PR. PR #2412 fixed the partner bug where
canvas Restart silently revoked the operator's token — the two
together unblock the external-runtime story end-to-end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 15:20:19 -07:00
Hongming Wang
3b34dfefbc feat(workspace): surface peer-discovery failure reason instead of "may be isolated"
Closes #2397. Today, every empty-peer condition (true empty, 401/403, 404,
5xx, network) collapses to a single message: "No peers available (this
workspace may be isolated)". The user has no way to tell whether they need
to provision more workspaces (true isolation), restart the workspace
(auth), re-register (404), page on-call (5xx), or check network (timeout) —
five different operator actions, one ambiguous string.

Wire:
  - new helper get_peers_with_diagnostic() in a2a_client.py returns
    (peers, error_summary). error_summary is None on 200; a short
    actionable string on every other branch.
  - get_peers() now shims through it so non-tool callers (system-prompt
    formatters) keep the bare-list contract.
  - tool_list_peers() switches to the diagnostic helper and surfaces the
    actual reason. The "may be isolated" string is removed; true empty
    now reads "no peers in the platform registry."

Tests:
  - TestGetPeersWithDiagnostic: 200, 200-empty, 401, 403, 404, 5xx,
    network exception, 200-but-non-list-body, and the bare-list-shim
    regression guard.
  - TestToolListPeers: each diagnostic branch surfaces its reason +
    explicit assertion that "may be isolated" is gone.

Coverage 91.53% (floor 86%). 122 a2a tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 11:09:26 -07:00
Hongming Wang
899a2231d6 test(platform_auth): module-functions signature snapshot drift gate
Pin the 5 public functions adapters and the runtime hot-path import
through ``from platform_auth import``:

  - ``auth_headers`` — every outbound httpx call merges this in
  - ``self_source_headers`` — A2A peer + self-message header builder
  - ``get_token`` — main.py reads on boot to decide register-vs-resume
  - ``save_token`` — main.py persists the platform-issued token
  - ``refresh_cache`` — 401-retry path drops in-process cache (#1877)

A grep across workspace/ shows 14+ runtime modules import these:
main.py, heartbeat.py, a2a_client.py, a2a_tools.py, consolidation.py,
events.py, executor_helpers.py (3 sites), molecule_ai_status.py,
builtin_tools/memory.py (3 sites), builtin_tools/temporal_workflow.py
(2 sites). Renaming any of the five (e.g. ``auth_headers`` →
``bearer_headers``) makes every one of those imports raise ImportError
at workspace boot — the failure surface is deep in heartbeat init,
nowhere near the rename site.

Same drift class as the BaseAdapter signature snapshot (#2378, #2380),
skill_loader gate (#2381), runtime_wedge gate (#2383). Reuses the
``_signature_snapshot.py`` helpers shipped in #2381.

Defense-in-depth: ``test_snapshot_has_required_functions`` asserts
the five names are still present, so removing one even with a
synchronized snapshot edit forces an explicit edit here with a
justification.

``clear_cache`` is intentionally NOT in the snapshot — it's a
test-only helper. Production code MUST NOT depend on it.

Verified red on deliberate rename: ``auth_headers`` →
``bearer_headers`` produces a clean diff of the missing function in
the failure message. Restored before commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 08:41:42 -07:00
Hongming Wang
70176e6c8f test(runtime_wedge): module-functions signature snapshot drift gate
BaseAdapter docstring tells adapter authors:

> ``runtime_wedge.mark_wedged()`` / ``clear_wedge()`` — flip the
> workspace to ``degraded`` + auto-recover when your SDK hits a
> non-recoverable error class. Import directly from ``runtime_wedge``;
> the heartbeat forwards the state to the platform automatically.

That's a contract — adapter templates depend on the four module-level
functions (``is_wedged``, ``wedge_reason``, ``mark_wedged``,
``clear_wedge``) being importable by those exact names with those
exact signatures. Renaming any silently breaks every adapter that
calls them: the import resolves the module fine, the
``AttributeError`` only surfaces when the adapter actually hits its
first SDK error — long after the rename merges.

Same drift class as #2378 / #2380 / #2381 (BaseAdapter, skill_loader)
applied to the module-level function surface.

Changes:

  - tests/_signature_snapshot.py gains build_module_functions_record.
    Walks a module's public top-level functions, optionally filtered
    to a specific name list (used here — runtime_wedge has internal
    helpers like reset_for_test that intentionally aren't part of
    the contract). Skips re-exports via __module__ check so a
    `from foo import bar` doesn't pollute the snapshot.
  - tests/test_runtime_wedge_signature.py snapshots the four
    contract functions. Plus a defense-in-depth required-functions
    test that catches removal even when source + snapshot are
    updated together.

Verified: deliberately renaming `mark_wedged` → `mark_wedged_RENAMED`
trips the gate with full snapshot diff in the failure message.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 07:01:10 -07:00
Hongming Wang
e336688278 test: extract shared signature-snapshot helpers + skill_loader gate
Two changes in one PR (tightly coupled — the second wouldn't make
sense without the first):

1. Hoist the inspect-based snapshot helpers out of
   test_adapter_base_signature.py into tests/_signature_snapshot.py
   so future surfaces don't copy-paste introspection logic.

   - build_class_signature_record(cls): walks public methods,
     unwraps static/class/abstract methods, returns a stable
     {class, methods: [...]} dict.
   - build_dataclass_record(cls): walks dataclass fields via
     dataclasses.fields(), returns {name, frozen, fields: [...]}.
   - compare_against_snapshot(actual, path): writes-on-first-run +
     diff-on-drift, with both expected and actual JSON in failure
     message.

   test_adapter_base_signature.py is rewritten to use the helpers;
   the existing snapshot file is byte-identical (no behavior change).

2. New gate: tests/test_skill_loader_signature.py covers the
   public dataclasses exported from skill_loader/loader.py:

   - SkillMetadata: every adapter pattern-matches on .runtime for
     skill-compat filtering. Renaming this field would silently
     break per-adapter skill loading — the loader still returns
     objects, but adapters' `if "*" in skill.metadata.runtime`
     raises AttributeError at workspace boot.
   - LoadedSkill: returned in SetupResult.loaded_skills.

   Includes test_snapshot_has_required_skill_metadata_fields
   defense-in-depth: ensures the runtime / id / name / description
   fields stay even if both source and snapshot are updated together.

Verified: deliberately renaming SkillMetadata.runtime trips the
gate with full snapshot diff in the failure message.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 06:27:20 -07:00
Hongming Wang
12e39c7311 test(adapter_base): extend signature snapshot to public dataclasses (#2364 item 2 followup)
Follows up #2378. The BaseAdapter snapshot covers method signatures
but `adapter_base.py` also exports three public dataclasses that
form the call/return contract between the platform and every
adapter:

  - SetupResult — returned by adapter._common_setup()
  - AdapterConfig — passed into adapter setup hooks
  - RuntimeCapabilities — returned by adapter.capabilities();
    drives platform-side dispatch routing (#117)

Renaming a RuntimeCapabilities flag silently disables every
adapter's capability declaration (the platform fallback runs)
without an AttributeError to surface the breakage. That's exactly
the drift class the snapshot pattern is meant to catch.

Changes:

  - _build_dataclass_snapshot walks SetupResult, AdapterConfig,
    RuntimeCapabilities via dataclasses.fields(), capturing field
    name + type annotation + has_default per field, plus the
    @dataclass(frozen=...) flag.
  - _build_full_snapshot composes method + dataclass records into
    one stable JSON snapshot.
  - test_snapshot_has_required_dataclass_fields — defense-in-depth
    test parallel to test_snapshot_has_required_methods. Catches
    field removal even when both source AND snapshot are updated
    together. Required field set is intentionally short (the flags
    that drive platform dispatch + the adapter-level config knobs).

Verified: deliberately renaming `provides_native_heartbeat` →
`provides_native_heartbeat_RENAMED` trips
test_base_adapter_signature_matches_snapshot with a full diff in
the failure message.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 05:53:10 -07:00
Hongming Wang
8488a188c2 test(adapter_base): signature snapshot — drift gate for adapter public surface (#2364 item 2)
Every workspace template (langgraph, claude-code, hermes, etc.)
subclasses BaseAdapter. Renaming, removing, or re-typing a method
on the base class silently breaks templates: the override stops
being recognized as an override; the old method-name's caller
silently invokes the default no-op; the new method-name is
unimplemented in templates that haven't migrated.

Recent #87 universal-runtime + #1957 recordResource refactor both
renamed/added methods. Without a frozen snapshot, the next rename
ships quietly and surfaces only when a template's CI catches the
AttributeError days later — long after the merge window for an
easy revert.

This snapshot pins BaseAdapter's public method surface against a
checked-in JSON file. Same-shape pattern as PR #2363's A2A
protocol-compat replay gate, applied to a Python public-API
surface instead of JSON message shapes. Both close drift classes
by snapshotting the structural surface that consumers depend on.

Two tests:

  1. test_base_adapter_signature_matches_snapshot — full
     introspection diff against tests/snapshots/adapter_base_signature.json.
     Drift = test failure with both expected + actual JSON in
     the message so the reviewer sees what changed.
  2. test_snapshot_has_required_methods — defense-in-depth: even
     if both the source AND snapshot are updated together
     (intentional API removal), this catches removal of the
     short list of methods that EVERY template depends on (name,
     display_name, description, capabilities, memory_filename).
     Removing one of these requires explicit edit to the
     `required` set with a justification.

Verified the gate fires red on a deliberate rename
(memory_filename → memory_filename_RENAMED) — failure message
shows the full snapshot diff including parameter shapes and
return annotations.

Updating the snapshot is the explicit acknowledgment that a
template-affecting API change is intentional. Reviewer of the
introducing PR sees the snapshot diff and decides whether
template repos need coordinated updates.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 05:18:39 -07:00
Hongming Wang
4299475746 feat(prompt): Platform Capabilities preamble at top of system prompt
Closes #2332 item 1 (workspace awareness — agents don't surface
platform-native tools up front).

The dogfooding session surfaced that agents weren't using A2A
delegation, persistent memory, or send_message_to_user. The tools
were registered AND documented in the system prompt — but only in
sections #8 (Inter-Agent Communication) and #9 (Hierarchical Memory),
which agents read AFTER they've already started reasoning about a
plan from earlier sections.

This adds a tight inventory at section #1.5 (immediately after
Platform Instructions, before role-specific prompt files) — every
tool name + its short description in a bulleted block. Detailed
when_to_use docs in sections #8/#9 stay; this preamble is the
elevator pitch ("you have these"), the later sections are the
manual ("here's when and how").

Generated from `platform_tools.registry` ToolSpecs — every tool's
`name` + `short` flow through automatically, no manual sync. A new
`get_capabilities_preamble(mcp: bool)` helper in executor_helpers
mirrors the existing get_a2a_instructions / get_hma_instructions
pattern.

CLI-runtime agents (mcp=False) get an empty preamble — they see
_A2A_INSTRUCTIONS_CLI's hand-written subcommand vocabulary further
down, and the registry's MCP tool names would conflict.

Tests:
  - test_capabilities_preamble_appears_in_mcp_prompt: header present
  - test_capabilities_preamble_lists_every_registry_tool: every
    a2a + memory tool from registry shows up (drift catches at test
    time — adding a new tool to registry surfaces here automatically)
  - test_capabilities_preamble_precedes_prompt_files: ordering
    invariant (toolkit before role docs)
  - test_capabilities_preamble_skipped_for_cli_runtime: empty when
    mcp=False

All 40 prompt + platform_tools tests pass.
2026-04-29 21:31:13 -07:00
Hongming Wang
e955597a98 feat(chat_files): rewrite Download as HTTP-forward (RFC #2312, PR-D)
Mirrors PR-C's Upload migration: replaces the docker-cp tar-stream
extraction with a streaming HTTP GET to the workspace's own
/internal/file/read endpoint. Closes the SaaS gap for downloads —
without this PR, GET /workspaces/:id/chat/download still returns 503
on Railway-hosted SaaS even after A+B+C+F land.

Stacks: PR-A #2313 → PR-B #2314 → PR-C #2315 → PR-F #2319 → this PR.

Why a single broad /internal/file/read instead of /internal/chat/download:

  Today's chat_files.go::Download already accepts paths under any of the
  four allowed roots {/configs, /workspace, /home, /plugins} — it's not
  strictly chat. Future PRs (template export, etc.) will reuse this
  endpoint via the same forward pattern; reusing avoids three near-
  identical handlers (one per domain) with duplicated path-safety logic.

Path safety is duplicated on platform + workspace sides — defence in
depth via two parallel checks, not "trust the workspace."

Changes:
  * workspace/internal_file_read.py — Starlette handler. Validates path
    (must be absolute, under allowed roots, no traversal, canonicalises
    cleanly). lstat (not stat) so a symlink at the path doesn't redirect
    the read. Streams via FileResponse (no buffering). Mirrors Go's
    contentDispositionAttachment for Content-Disposition header.
  * workspace/main.py — registers GET /internal/file/read alongside the
    POST /internal/chat/uploads/ingest from PR-B.
  * scripts/build_runtime_package.py — adds internal_file_read to
    TOP_LEVEL_MODULES so the publish-runtime cascade rewrites its
    imports correctly. Also includes the PR-B additions
    (internal_chat_uploads, platform_inbound_auth) since this branch
    was rooted before PR-B's drift-gate fix; merge-clean alphabetic
    additions.
  * workspace-server/internal/handlers/chat_files.go — Download
    rewritten as streaming HTTP GET forward. Resolves workspace URL +
    platform_inbound_secret (same shape as Upload), builds GET request
    with path query param, propagates response headers (Content-Type /
    Content-Length / Content-Disposition) + body. Drops archive/tar
    + mime imports (no longer needed). Drops Docker-exec branch entirely
    — Download is now uniform across self-hosted Docker and SaaS EC2.
  * workspace-server/internal/handlers/chat_files_test.go — replaces
    TestChatDownload_DockerUnavailable (stale post-rewrite) with 4
    new tests:
      - TestChatDownload_WorkspaceNotInDB → 404 on missing row
      - TestChatDownload_NoInboundSecret → 503 on NULL column
        (with RFC #2312 detail in body)
      - TestChatDownload_ForwardsToWorkspace_HappyPath → forward shape
        (auth header, GET method, /internal/file/read path) + headers
        propagated + body byte-for-byte
      - TestChatDownload_404FromWorkspacePropagated → 404 from
        workspace propagates (NOT remapped to 500)
    Existing TestChatDownload_InvalidPath path-safety tests preserved.
  * workspace/tests/test_internal_file_read.py — 21 tests covering
    _validate_path matrix (absolute, allowed roots, traversal, double-
    slash, exact-match-on-root), 401 on missing/wrong/no-secret-file
    bearer, 400 on missing path/outside-root/traversal, 404 on missing
    file, happy-path streaming with correct Content-Type +
    Content-Disposition, special-char escaping in Content-Disposition,
    symlink-redirect-rejection (lstat-not-stat protection).

Test results:
  * go test ./internal/handlers/ ./internal/wsauth/ — green
  * pytest workspace/tests/ — 1292 passed (was 1272 before PR-D)

Refs #2312 (parent RFC), #2308 (chat upload+download 503 incident).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 15:19:02 -07:00
Hongming Wang
055e447355 feat(saas): deliver platform_inbound_secret via /registry/register (RFC #2312, PR-F)
Closes the SaaS-side gap that PR-A acknowledged but didn't fix: SaaS
workspaces have no persistent /configs volume, so the platform_inbound_secret
that PR-A's provisioner wrote at workspace creation never reaches the
runtime. Without this, even after the entire RFC #2312 stack lands,
SaaS chat upload would 401 (workspace fails-closed when /configs/.platform_inbound_secret
is missing).

Solution: return the secret in the /registry/register response body
on every register call. The runtime extracts it and persists to
/configs/.platform_inbound_secret at mode 0600. Idempotent — Docker-
mode workspaces also receive it and overwrite the value the provisioner
already wrote (same value until rotation).

Why on every register, not just first-register:
  * SaaS containers can be restarted (deploys, drains, EBS detach/
    re-attach) — /configs is rebuilt empty on each fresh start.
  * The auth_token is "issue once" because re-issuing rotates and
    invalidates the previous one. The inbound secret has no rotation
    flow yet (#2318) so re-sending the same value is harmless.
  * Eliminates the bootstrap window where a restarted SaaS workspace
    has no inbound secret on disk and would 401 every platform call.

Changes:
  * workspace-server/internal/handlers/registry.go — Register handler
    reads workspaces.platform_inbound_secret via wsauth.ReadPlatformInboundSecret
    and includes it in the response body. Legacy workspaces (NULL
    column) get a successful registration with the field omitted.
  * workspace-server/internal/handlers/registry_test.go — two new tests:
      - TestRegister_ReturnsPlatformInboundSecret_RFC2312_PRF: secret
        present in DB → secret in response, alongside auth_token.
      - TestRegister_NoInboundSecret_OmitsField: NULL column → field
        omitted, registration still 200.
  * workspace/platform_inbound_auth.py — adds save_inbound_secret(secret).
    Atomic write via tmp + os.replace, mode 0600 from os.open(O_CREAT,
    0o600) so a concurrent reader never sees 0644-default. Resets the
    in-process cache after write so the next get_inbound_secret() returns
    the freshly-written value (rotation-safe when it lands).
  * workspace/main.py — register-response handler extracts
    platform_inbound_secret alongside auth_token and persists via
    save_inbound_secret. Mirrors the existing save_token pattern.
  * workspace/tests/test_platform_inbound_auth.py — 6 new tests for
    save_inbound_secret: writes file, mode 0600, overwrite-existing,
    cache invalidation after save, empty-input no-op, parent-dir creation
    for fresh installs.

Test results:
  * go test ./internal/handlers/ ./internal/wsauth/ — all green
  * pytest workspace/tests/ — 1272 passed (was 1266 before this PR)

Refs #2312 (parent RFC), #2308 (chat upload 503 incident).
Stacks: PR-A #2313 → PR-B #2314 → PR-C #2315 → this PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 15:12:34 -07:00
Hongming Wang
d1de330152 feat(workspace): /internal/chat/uploads/ingest endpoint (RFC #2312, PR-B)
Stacked on PR-A (#2313). The platform-side rewrite that actually calls
this endpoint lands in PR-C; this PR adds the workspace-side consumer
+ hardening so PR-C is a small Go-only diff.

What this adds:

  * platform_inbound_auth.py — auth gate mirroring transcript_auth.py.
    Reads /configs/.platform_inbound_secret (delivered by the PR-A
    provisioner). Fail-closed when the file is missing/empty/unreadable.
    Constant-time compare via hmac.compare_digest.

  * internal_chat_uploads.py — POST /internal/chat/uploads/ingest.
    Multipart parse → sanitize each filename → write to
    /workspace/.molecule/chat-uploads/<random>-<name> with
    O_CREAT|O_EXCL|O_NOFOLLOW. Same response shape (uri/name/mimeType/
    size + workspace: URI scheme) as the legacy Go handler — canvas /
    agent code that resolves "workspace:..." paths keeps working.

  * Wired into workspace/main.py via starlette_app.add_route alongside
    the existing /transcript route.

  * python-multipart>=0.0.18 added to requirements.txt (Starlette's
    Request.form() needs it; ≥ 0.0.18 closes CVE-2024-53981).

Test coverage (36 tests, all green; full workspace suite 1266 passed):

  * test_platform_inbound_auth.py — 14 tests:
      happy path, fail-closed on missing file, empty file, whitespace-
      only file, missing/case-wrong/empty Bearer prefix, in-process
      cache, default CONFIGS_DIR fallback, end-to-end file → authorized.

  * test_internal_chat_uploads.py — 22 tests:
      sanitize_filename matrix (incl. ../traversal, CJK chars, length
      truncation), 401 on missing/wrong/no-secret-file bearer, single +
      batch upload happy paths, unique random prefix on duplicate names,
      mimetype guess fallback, 400 on missing files field, 413 on per-
      file + total-body oversize, symlink-at-target refusal (with
      sentinel-content unchanged assertion).

Why this is safe to ship before PR-C:

  * No platform-side caller yet → no behavior change visible to users.
  * Auth fails closed; nothing on the network can hit a write path
    until the platform forwards with the matching bearer.
  * Workspace's existing routes (/health, /transcript, /handle/*) are
    unchanged.

Refs #2312 (parent RFC), #2308 (chat upload 503 incident).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 14:16:32 -07:00
Hongming Wang
59d65ba557 fix: resolve git/gh from PATH instead of hardcoded /usr/local/bin
Closes #2289.

Some workspace template images ship `/usr/local/bin/{git,gh}` wrappers
that bake `GH_TOKEN` into argv handling (preferred — auto-PR creation
authenticates without explicit token plumbing); other templates have
plain `/usr/bin/git` installed via apt with no wrapper. The hardcoded
`_GIT = "/usr/local/bin/git"` crashed every auto-push attempt on the
latter image class:

    FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/bin/git'
      File "/app/molecule_runtime/executor_helpers.py", line 524, in _auto_push_and_pr_sync
        subprocess.run(['/usr/local/bin/git', 'rev-parse', '--is-inside-work-tree'], ...)

`shutil.which("git")` walks PATH in order — finds the `/usr/local/bin/`
wrapper first when it exists, falls back to `/usr/bin/git` otherwise.
GH_TOKEN injection still wins on wrapper-equipped images; auto-push
no longer crashes on bare-apt images.

Verified locally: `shutil.which("git")` resolves to `/usr/bin/git` on
the bug-reporter's image; `shutil.which("gh")` resolves to the
homebrew path on dev. Both paths exist + are executable on respective
hosts.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 02:07:19 -07:00
Hongming Wang
a57382e918 feat(runtime): add new_response_message helper for adapter A2A responses
Surfaced via cross-template review of the a2a-sdk v0→v1 migration:
every adapter executor (claude-code, gemini-cli, crewai, openclaw,
autogen) builds A2A response Messages independently using
`new_text_message(text)` from the SDK, which omits `task_id` and
`context_id`. The runtime's own canonical pattern in
`workspace/a2a_executor.py:466-475` correctly threads both:

    Message(
        message_id=uuid.uuid4().hex,
        role=Role.ROLE_AGENT,
        parts=_parts,
        task_id=task_id,           # ← canonical
        context_id=context_id,     # ← canonical
    )

Adapters skipping these correlation fields means the platform's a2a
proxy can't reliably tie the response back to the originating task.
This is a divergence from canonical, not necessarily a strict bug
(task_id may be optional with a default) — but it's enough of a
correlation/observability gap that the canonical pattern bothers to
thread it.

Add `new_response_message(context, text, files=None)` to
executor_helpers.py — single home for response Message construction.
Templates can migrate from `new_text_message(text)` to this helper
in stacked PRs once the runtime publishes to PyPI.

The helper:
  - Reads `context.task_id`/`context.context_id` from the inbound
    RequestContext, falling back to fresh UUIDs (RequestContextBuilder
    always sets them in production; fallback is for unit tests).
  - Sets `role=Role.ROLE_AGENT` (the v1 enum value).
  - Builds text Parts via `Part(text=...)` and file Parts via
    `Part(url="workspace:<path>", filename=..., media_type=...)`.
  - Returns a v1 protobuf Message ready for
    `event_queue.enqueue_event(...)`.

Why "files=None" with the workspace: URI scheme as the file Part
shape: matches the canonical pattern in a2a_executor.py exactly so
the platform's chat-attachment download path (executor_helpers.py
`resolve_attachment_uri`) interprets responses uniformly across all
adapters.

Tests (5, all pass with --no-cov against the live runtime image):
  - test_new_response_message_text_only
  - test_new_response_message_with_files
  - test_new_response_message_files_only_no_text
  - test_new_response_message_falls_back_when_context_ids_unset
  - test_new_response_message_handles_missing_attrs

The conftest's a2a stubs needed an extension for Message + Role +
Part with kwargs preservation. Strictly additive — no existing tests
affected. (The 19 pre-existing failures in test_executor_helpers.py
are unrelated debt from the commit_memory/recall_memory rewrite,
visible on staging baseline before this change.)

Per-template migration is the follow-up: claude-code, gemini-cli,
crewai, openclaw, autogen all call `new_text_message(text)` today;
each gets a per-repo PR replacing it with
`new_response_message(context, text)`. This PR ships the helper
first so the templates have something to import.

Refs: PR #2266/#2267 (restart-race), claude-code #15 (FilePart fix),
gemini-cli #10/crewai #8/openclaw #9/autogen #8 (rename PRs).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 01:13:34 -07:00
a18d116606
Merge pull request #2261 from Molecule-AI/fix/harness-cleanup-failed-event
harness: SaaS routing + provider-agnostic config for RFC #2251 measurement
2026-04-29 05:35:43 +00:00
Hongming Wang
00e4766046 docs: registry pattern + harness scripts READMEs
Two docs covering load-bearing patterns from today's work that
weren't previously discoverable:

1. workspace/platform_tools/README.md — explains the ToolSpec
   single-source-of-truth pattern (#2240), the CLI-block alignment
   gap that hand-maintained generation can't close (#2258), the
   snapshot golden files + LF-pinning (#2260), and the add/rename/
   remove playbook. The next reader who lands in
   workspace/platform_tools/ now has the design rationale + the
   safe-edit procedure colocated with the code.

2. scripts/README.md — disambiguates the three measure-coordinator-
   task-bounds.sh files that now exist across two repos:

     - scripts/measure-coordinator-task-bounds.sh        (canonical OSS, this repo)
     - scripts/measure-coordinator-task-bounds-runner.sh (Hermes/MiniMax variant, this repo)
     - scripts/measure-coordinator-task-bounds.sh        (production-shape, in molecule-controlplane)

   Cross-references reference_harness_pair_pattern (auto-memory) for
   the cross-repo design rationale. Documents the common safety
   pattern (cleanup trap, DRY_RUN, non-target guard,
   cleanup_*_failed events) and the heartbeat-trace caveat.

Refs: #2240, #2254, #2257, #2258, #2259, #2260; molecule-controlplane#321.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 22:19:40 -07:00
9bc3d6e352
Potential fix for pull request finding 'Unused global variable'
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
2026-04-28 20:45:53 -07:00
Hongming Wang
ddf6720498 chore(registry): snapshot tests + CLI-block alignment for #2240
Two follow-ups from the #2240 code review:

1. Snapshot tests for the rendered tool-instruction blocks. The
   structural tests added in #2240 guarantee tool NAMES are present;
   these new tests pin the SHAPE — bullet ordering, heading style,
   footer placement — so a future contributor who reorders fields in
   `_render_section` or rewrites a `when_to_use` paragraph sees the
   diff in CI rather than shipping a silently-different system prompt.
   Golden files live under workspace/tests/snapshots/.

2. CLI-block alignment test + corrected source-of-truth comment.
   `_A2A_INSTRUCTIONS_CLI` is a separate hand-maintained surface for
   ollama and other non-MCP runtimes — the registry can't auto-generate
   it because the CLI subprocess interface uses different command
   shapes (`peers` vs `list_peers`, etc.). A new
   `_CLI_A2A_COMMAND_KEYWORDS` mapping declares the registry-tool →
   CLI-keyword correspondence (or explicit `None` for tools not
   exposed via subprocess). Two tests enforce coverage:

     - every a2a tool in the registry is keyed in the mapping
     - every non-None subcommand keyword literally appears in
       `_A2A_INSTRUCTIONS_CLI`

   Caught one real gap: `send_message_to_user` is in the registry but
   has no CLI subcommand. Mapped to `None` with an explanatory comment.

   The "no other source of truth" claim in registry.py's docstring
   was wrong post-#2240 (the CLI block survived) — corrected to
   describe the two surfaces explicitly and point at the alignment
   tests as the gate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 20:42:15 -07:00
Hongming Wang
5fe52b08e7 feat(harness): coordinator phase-boundary instrumentation for RFC #2251
Adds structured `rfc2251_phase=...` log lines at the deterministic phase
boundaries inside route_task_to_team and check_task_status, so an
operator running scripts/measure-coordinator-task-bounds.sh against
staging can correlate the harness's external timing trace with what
phase the coordinator was in at any given second.

The harness already exists in staging and measures end-to-end response
time + heartbeat trace. What it CAN'T do without this PR is answer
"the coordinator response took 7 minutes — was it stuck delegating, or
stuck polling children, or stuck synthesizing after all children
returned?" The phase logs answer that question.

Phases instrumented (deterministic Python boundaries, no agent prompt
involvement):

  route_start             → enter route_task_to_team
  children_fetched        → after get_children() returns
  routing_decided         → after build_team_routing_payload
  delegate_invoked        → just before delegate_task_async.ainvoke
  delegate_returned       → after delegate_task_async returns
  check_status            → every check_task_status poll (per-poll)
  route_returning_decision_only → fall-through path

Each line includes elapsed_ms from route_start so per-phase durations
are extractable via:

  grep rfc2251_phase= <container.log> \
    | awk '{...}' to compute deltas between consecutive phases

The synthesis phase (after all children return, before agent emits
final A2A response) is NOT instrumented here because it's
agent-driven (no deterministic Python boundary). The harness operator
infers synthesis_secs = total_response_secs − max(check_status_ts).

This is reproduction-harness scaffolding; it adds zero behavior. Strip
the rfc2251_phase log lines when V1.0 ships and the phase data lands
in the structured heartbeat payload instead.

Refs:
  - RFC: molecule-core#2251
  - Harness: scripts/measure-coordinator-task-bounds.sh (shipped earlier)
  - V1.0 gate: this is deliverable #2 of the four pre-V1.0 gates
2026-04-28 20:11:46 -07:00
Hongming Wang
f323def18f chore(build): include platform_tools in runtime wheel SUBPACKAGES
The PR-built wheel + import smoke gate refused the platform_tools
package because it's a new subdirectory under workspace/ that wasn't
in scripts/build_runtime_package.py:SUBPACKAGES. The drift gate (which
exists for exactly this reason) caught it cleanly:

  error: SUBPACKAGES drifted from workspace/ subdirectories:
    in workspace/ but NOT in SUBPACKAGES (will ship un-rewritten or
    be excluded): ['platform_tools']

Adding platform_tools to SUBPACKAGES wires the package into the
runtime wheel + applies the canonical
  from platform_tools.<x> -> from molecule_runtime.platform_tools.<x>
import-rewrite step that every other subpackage uses.

Verified locally: scripts/build_runtime_package.py succeeds, the
rewritten a2a_mcp_server.py reads
  from molecule_runtime.platform_tools.registry import TOOLS
which matches the package layout in the wheel.
2026-04-28 17:19:00 -07:00
Hongming Wang
e9a59cda3b feat(platform): single-source-of-truth tool registry — adapters consume, no drift
Establishes workspace/platform_tools/registry.py as THE place tool
naming and docs live. Every consumer reads from it; nothing duplicates
the source. Closes the architectural gap behind the doc/tool drift
discussion 2026-04-28 — adding hundreds of future runtime SDK adapters
should not require touching tool names anywhere except the registry.

What the registry owns

  ToolSpec dataclass with: name, short (one-line description), when_to_use
  (multi-paragraph agent-facing usage guidance), input_schema (JSON Schema),
  impl (the actual coroutine in a2a_tools.py), section ('a2a' | 'memory').

  TOOLS list with 8 entries — delegate_task, delegate_task_async,
  check_task_status, list_peers, get_workspace_info, send_message_to_user,
  commit_memory, recall_memory.

What now reads from the registry

  - workspace/a2a_mcp_server.py
      The hardcoded TOOLS list (167 lines of hand-maintained dicts) is
      gone. Replaced with a 6-line list comprehension over the registry.
      MCP description = spec.short. inputSchema = spec.input_schema.

  - workspace/executor_helpers.py
      get_a2a_instructions(mcp=True) and get_hma_instructions() now
      GENERATE the agent-facing system-prompt text from the registry.
      Heading + per-tool bullet (spec.short) + per-tool when_to_use +
      a section-specific footer. No more hand-maintained instruction
      blocks that drift from reality.

  - workspace/builtin_tools/delegation.py
      Renamed delegate_to_workspace -> delegate_task_async to match
      registry. check_delegation_status -> check_task_status. Added
      sync delegate_task @tool wrapping a2a_tools.tool_delegate_task
      (was missing for LangChain runtimes — CP review Issue 3).

  - workspace/builtin_tools/memory.py
      Renamed search_memory -> recall_memory to match registry.

  - workspace/adapter_base.py, workspace/main.py
      Bundle all 7 core tools (was 6) into all_tools / base_tools.

  - workspace/coordinator.py, shared_runtime.py, policies/routing.py
      Updated system-prompt-text references to use the registry names.

Structural alignment tests

  workspace/tests/test_platform_tools.py — 9 tests pin every
  registry-to-adapter mapping:
    - registry names are unique
    - a2a + memory partition is complete (no orphans)
    - by_name lookup works
    - MCP server registers exactly the registry's tool set
    - MCP description equals registry.short for every tool
    - MCP inputSchema equals registry.input_schema for every tool
    - get_a2a_instructions text contains every a2a tool name
    - get_hma_instructions text contains every memory tool name
    - pre-rename names (delegate_to_workspace, search_memory,
      check_delegation_status) cannot leak back

  Adding a future tool means adding one ToolSpec; the test failure
  list tells the author exactly which adapter to update.

Adapter pattern for future SDK support

  When (e.g.) AutoGen or Pydantic AI gets adapters, the only work
  needed for tool surfacing is "wrap registry.TOOLS in your SDK's
  tool format." Names, descriptions, schemas, impl come from the
  registry — adapter author writes zero strings.

Why this needed to ship now

  PR #2237 (already in staging) injected MCP-world docs as the
  default system-prompt content. Without the registry, those docs
  said "delegate_task" while LangChain runtimes only had
  "delegate_to_workspace" — workers see docs for tools that don't
  exist (CP review Issue 1+3). PR #2239 was a tactical rename;
  this PR is the structural fix that prevents the same class of
  drift from recurring as new adapters ship.

  PR #2239 was closed in favor of this — same renames, plus the
  registry, plus structural tests. Single coherent change.

Tests: 1232 pass, 2 xfailed (pre-existing). 9 new in
test_platform_tools.py; 4 alignment tests in test_prompt.py from
#2237 still pass; original test_executor_helpers tests adapted to
the registry-driven world.

Refs: CP review Issues 1, 2, 3, 5; project memory
project_runtime_native_pluggable.md (platform owns A2A);
project memory feedback_doc_tool_alignment.md (this is the structural
fix for the tactical lesson).
2026-04-28 17:11:36 -07:00
Hongming Wang
3f99fede5a
Merge pull request #2237 from Molecule-AI/fix/inject-a2a-hma-tool-instructions
fix(prompt): inject A2A and HMA tool instructions into system prompt
2026-04-28 23:47:15 +00:00
Hongming Wang
448709f4b4 fix(prompt): inject A2A and HMA tool instructions into system prompt
Workers were registering platform tools (delegate_task, delegate_task_async,
list_peers, check_task_status, send_message_to_user, commit_memory,
recall_memory) but the build_system_prompt assembly never included
documentation for any of them. The instruction-text functions
get_a2a_instructions() and get_hma_instructions() exist in
executor_helpers.py and have unit tests, but were not called from any
production code path — workers received system-prompt.md content only
and saw the tools as bare names with no usage guidance.

Symptom: agents called commit_memory and delegate_task without knowing
they were platform tools. They worked when the agent guessed the API
correctly and silently failed when the agent didn't.

Fix: build_system_prompt() now appends both instruction sets between
the Skills section and the Peers section. The placement is intentional —
A2A docs explain how to call delegate_task; the peer list is the data
that delegate_task operates over, so the docs precede the peer table.

New parameter `a2a_mcp: bool = True` lets adapters opt into the CLI
subprocess variant of the A2A instructions for runtimes without MCP
support (ollama, custom CLI runtimes). Default True covers the
MCP-capable majority (claude-code, hermes, langchain, crewai). Adapter
callers don't need to change unless they specifically need CLI mode.

Tests: 4 new regression tests in test_prompt.py pin
  - A2A MCP variant injection (default)
  - A2A CLI variant injection (a2a_mcp=False, with MCP-only fields absent)
  - HMA instruction injection
  - A2A docs precede peer list ordering

Full suite green: 1223 passed, 2 xfailed.
2026-04-28 16:43:36 -07:00
Hongming Wang
0cdbc2c4f6 chore(deps): batch dep bumps — 11 safe upgrades from 2026-04-28 dependabot wave
Consolidates 11 of the 17 open Dependabot PRs (#2215, #2217, #2219-#2225,
#2227, #2229) into one PR. Every entry is a patch / minor / floor bump
where the impact surface is small and CI carries the proof.

Same pattern as the 2026-04-15 batch.

Go (workspace-server/go.mod + go.sum, regenerated via `go mod tidy`):
  - golang.org/x/crypto                    0.49.0  → 0.50.0   (#2225)
  - github.com/golang-jwt/jwt/v5           5.2.2   → 5.3.1    (#2222)
  - github.com/gin-contrib/cors            1.7.2   → 1.7.7    (#2220)
  - github.com/docker/go-connections       0.6.0   → 0.7.0    (#2223)
  - github.com/redis/go-redis/v9           9.7.3   → 9.19.0   (#2217)

Python floor bumps (workspace/requirements.txt; current pip-resolved
versions don't change unless they happen to be below the new floor):
  - httpx                                  >=0.27  → >=0.28.1 (#2221)
  - uvicorn                                >=0.30  → >=0.46   (#2229)
  - temporalio                             >=1.7   → >=1.26   (#2227)
  - websockets                             >=12    → >=16     (#2224)
  - opentelemetry-sdk                      >=1.24  → >=1.41.1 (#2219)

GitHub Actions (SHA-pinned per existing convention):
  - dorny/paths-filter@d1c1ffe (v3) → @fbd0ab8 (v4.0.1)        (#2215)

REMOVED from this batch (lockfile platform mismatch):
  - #2231 @types/node ^22 → ^25.6   (npm install on macOS strips
    Linux-only @emnapi/* entries from package-lock.json that CI's
    `npm ci` then refuses; needs a Linux-side install to land cleanly)
  - #2230 jsdom ^25 → ^29.1          (same)

NOT included in this batch (deferred to per-PR human review):
  - #2228 github/codeql-action     v3 → v4   (CodeQL CLI alignment risk)
  - #2218 actions/setup-node       v4 → v6   (default Node version drift)
  - #2216 actions/upload-artifact  v4 → v7   (3 major versions)
  - #2214 actions/setup-python     v5 → v6   (action major)

NOT merged (CI failing on dependabot's own PR):
  - #2233 next 15 → 16
  - #2232 tailwindcss 3 → 4
  - #2226 typescript 5 → 6

Verified:
  - workspace-server: `go mod tidy && go build ./... && go test ./...` — green
  - workspace requirements.txt: floor bumps only
2026-04-28 16:25:46 -07:00
Hongming Wang
96acbd719b test: update test_peer_capabilities_format for fallback behavior
The previous assertion `'Silent Agent' not in result` was pinning
the buggy behavior — peers without an agent_card were silently
dropped from the prompt. With the fallback to DB name+role those
peers are correctly visible. Flip the assertion so the test pins
the new (correct) rendering and would catch a regression to the
silent-drop behavior.
2026-04-28 14:15:42 -07:00
Hongming Wang
8ff0748ab9 fix(workspace): keep peers visible in coordinator prompt when agent_card is null
Bug: a Design Director coordinator with 6 freshly-created worker peers
rendered an empty `## Your Peers` section in its system prompt — the
hosting registry endpoint correctly returned all 6 peers, but
`summarize_peer_cards()` silently dropped every entry whose
`agent_card` column was null (the default until A2A discovery has
run end-to-end against the worker). The coordinator then refused to
delegate any task because "no peers exist".

Fix: fall back to the registry row's `name` and `role` columns when
`agent_card` is missing, malformed, or wrong-typed, instead of
skipping the peer. The registry endpoint
(`workspace-server/internal/handlers/discovery.go:queryPeerMaps`) has
always returned both fields — they were just being thrown away on
the consumer side. `build_peer_section()` now renders `Role: …` when
the agent_card-derived skill list is empty so the coordinator's
prompt still has something concrete to delegate against.

Also hoists `import json` out of the per-peer loop body to module
level (was previously imported once per iteration).

Tests: new `test_shared_runtime_peer_summary.py` pins all four
fallback cases (null / malformed string / wrong type / null + no
DB name) plus the agent-card-present happy path and the mixed-list
case the coordinator actually consumes. First peer-summary test
coverage `shared_runtime.py` has had — no prior tests existed.

Refs: 2026-04-27 Design Director discovery report from infra team.
2026-04-28 14:10:29 -07:00
Hongming Wang
3eb599bbb6 fix(workspace): use SDK constant for agent-card readiness probe
The initial-prompt readiness probe in workspace/main.py hardcoded the
pre-1.x well-known path. After the a2a-sdk 1.x bump the SDK started
mounting the agent card at the new canonical path (the value of
`a2a.utils.constants.AGENT_CARD_WELL_KNOWN_PATH`), so the probe
returned 404 every attempt and silently fell through to "server not
ready after 30s, skipping". Net effect: every workspace silently
dropped its `initial_prompt` from config.yaml — the agent never sent
the kickoff self-message, and users hit a fresh chat with no context.

Reported by an external user as "/.well-known/agent.json 404 — the
a2a-sdk agent card route was not being mounted at the expected path".
The route IS mounted; the probe was looking at the wrong place.

Fix imports `AGENT_CARD_WELL_KNOWN_PATH` from `a2a.utils.constants`
and uses it directly in the probe URL — the SDK constant is now the
single source of truth, so any future rename travels through
automatically.

Adds two static regression tests pinning the invariant:
  1. No hardcoded `/.well-known/agent.json` literal anywhere in
     main.py.
  2. The probe URL fstring interpolates AGENT_CARD_WELL_KNOWN_PATH
     (catches a "fix" that imports the constant for show but reverts
     to a literal in the actual GET).

Verified manually inside ghcr.io/molecule-ai/workspace-template-langgraph
that AGENT_CARD_WELL_KNOWN_PATH == '/.well-known/agent-card.json' and
that `create_agent_card_routes(card)` mounts at exactly that path —
constant + mount are aligned in the runtime image, so the probe will
now find the server.

Full workspace test suite: 1209 passed, 2 xfailed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 16:43:32 -07:00
Hongming Wang
e87a9c3858 fix(a2a): auto-retry transient transport errors in send_a2a_message
Three different intermittent failures observed during a single
manual-test session — RemoteProtocolError, ReadTimeout, ConnectError —
each surfaced as a "Failed to deliver to <peer>" error chip in the
canvas Agent Comms panel even though the next attempt would have
succeeded (verified by direct probes from the same source workspace
to the same peer). The error message even told the user "Usually a
transient network blip — retry once," but it left the retry to a
human reading the error message.

Auto-retry inside send_a2a_message itself: up to 5 attempts (1
initial + 4 retries) with exponential backoff (1s, 2s, 4s, 8s,
16s-capped), each backoff jittered ±25% to break sync across
siblings. Cumulative wall-clock capped at 600s by
_DELEGATE_TOTAL_BUDGET_S so a string of 5×300s ReadTimeouts can't
make the caller wait 25 minutes — once the deadline elapses, retries
stop even if attempts remain.

Retry only on transport-layer transients:
  - ConnectError / ConnectTimeout (peer's listening socket not ready)
  - RemoteProtocolError (peer closed TCP without writing — observed
    when a peer's prior in-flight Claude SDK session aborted)
  - ReadError / WriteError (network blip on Docker bridge)
  - ReadTimeout (peer wrote no response in 300s)

Application-level errors are NOT retried — they're deterministic and
retrying just wastes wall-clock:
  - HTTP 4xx (peer rejected the request format)
  - JSON parse failures (peer returned garbage)
  - JSON-RPC error in response body (peer's runtime errored cleanly)
  - Programmer-bug exceptions (ValueError, etc.)

8 new tests pin the contract:
  - retry succeeds after 2 RemoteProtocolErrors
  - retry succeeds after 1 ConnectError
  - all 5 attempts fail → returns formatted last-error
  - capped at exactly _DELEGATE_MAX_ATTEMPTS (regression cover for
    "did someone bump the constant accidentally?")
  - JSON-RPC error response NOT retried (1 attempt only)
  - non-httpx exception NOT retried (programmer bugs stay loud)
  - total budget caps the loop even if attempts remain
  - backoff schedule grows exponentially with ±25% jitter

Refactor: extracted _format_a2a_error() so the success and exhausted
paths share one error-formatting routine. _delegate_backoff_seconds()
is a pure function so the schedule is unit-testable without monkey-
patching asyncio.sleep.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 13:52:01 -07:00
Hongming Wang
81c4c1321c fix(runtime): use lowercase wire role for v0.3 JSON-RPC compat layer
Manual-test failure surfaced what was hidden behind the MCP-path bug:
once delegate_task could actually fire, every cross-workspace call
came back as JSON-RPC -32600 "Invalid Request" with the underlying
pydantic ValidationError:

    params.message.role
      Input should be 'agent' or 'user' [type=enum,
      input_value='ROLE_USER', input_type=str]

PR #2184's a2a-sdk 1.x migration sweep over-corrected: it changed
every `"role": "user"` literal in JSON-RPC payload construction to
`"role": "ROLE_USER"` to match the protobuf enum names of the 1.x
native types (a2a.types.Role.ROLE_USER / ROLE_AGENT). That was
correct for in-process Message construction (which the SDK
serialises before wire transmission) but WRONG for the 8 sites that
hand-build JSON-RPC payloads. The workspace's own a2a-sdk runs
inbound requests through the v0.3 compat adapter
(/usr/local/lib/python3.11/site-packages/a2a/compat/v0_3/) because
main.py sets enable_v0_3_compat=True for backwards compatibility,
and that adapter validates against the v0.3 Pydantic Role enum
(`agent` | `user` lowercase). The protobuf-style names blow it up.

Reverted the 8 wire-payload sites to lowercase:
  - workspace/a2a_client.py:74
  - workspace/a2a_cli.py:74, 111
  - workspace/heartbeat.py:378
  - workspace/main.py:464, 563
  - workspace/builtin_tools/a2a_tools.py:60
  - workspace/builtin_tools/delegation.py:272

Native-type usage at workspace/a2a_executor.py:471 (`Role.ROLE_AGENT`)
stays — that's an in-process Message construction; the SDK handles
wire serialisation correctly.

Updated the misleading comment at main.py:255-257 (which said
"outbound payloads are now 1.x-shaped (ROLE_USER)") to spell out
the actual rule: outbound JSON-RPC wire payloads MUST use v0.3
shape, native types are only for in-process construction.

New regression test test_jsonrpc_wire_role_format.py greps the 6
wire-payload-emitting files for any "ROLE_USER" / "ROLE_AGENT"
string literal and fails loud — cheapest possible drift detector.

Why E2E missed it: the priority-runtimes harness sends a single
message canvas → workspace, but the canvas already used lowercase
"user" (it never went through the migration sweep). The bug only
surfaces on workspace → workspace delegation, which the harness
doesn't exercise. Same gap as #131 (extend smoke to call main()
against a stub).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 12:40:11 -07:00
Hongming Wang
9c3695df6d test(runtime): update molecule_ai_status test for renamed error prefix
Pre-existing test_set_status_exception_prints_to_stderr asserted on the
legacy "molecule-monorepo-status: failed to update" prefix string. The
prior commit renamed it to "molecule_ai_status: failed to update" so
the printed label matches the canonical module-form invocation
(`python3 -m molecule_runtime.molecule_ai_status`) instead of a shell
alias that only ever existed in the dev-only base image. Updating the
expected substring in lockstep.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 11:48:05 -07:00
Hongming Wang
28fc7a8cbd fix(runtime): replace remaining /app/ legacy paths in agent prompts + docstrings
Comprehensive sweep follow-up to the MCP server path fix. Audited every
/app/ reference in the runtime source against the live claude-code
template image and confirmed the actual /app/ contents post-#87 are
ONLY: __init__.py, adapter.py, claude_sdk_executor.py, requirements.txt
— every other workspace module ships in the wheel under
site-packages/molecule_runtime/. Two more leaks found:

1. executor_helpers.py:_A2A_INSTRUCTIONS_CLI — inter-agent system prompt
   for non-MCP runtimes (Ollama, custom) had 5 lines telling the model
   `python3 /app/a2a_cli.py X`. Models copy these examples verbatim, so
   every CLI-runtime delegation would fail at the shell layer (no such
   file). Replaced with `python3 -m molecule_runtime.a2a_cli` form,
   which works regardless of where the wheel is installed.

2. molecule_ai_status.py docstring — usage examples invoked
   `python3 /app/molecule_ai_status.py` and claimed a
   `molecule-monorepo-status` shell alias. Both broken in current
   templates: the file's at site-packages, and `which
   molecule-monorepo-status` errors (the legacy symlink only existed
   in the dev-only workspace/Dockerfile base image, not in the
   standalone template Dockerfiles that ship to production).
   Updated docstring + the __main__ usage banner + the stderr error
   prefix to use the same `python3 -m molecule_runtime.X` form.

Plugins audited and clean: WORKSPACE_PLUGINS_DIR=/configs/plugins,
SHARED_PLUGINS_DIR=$PLUGINS_DIR fallback /plugins. No /app/
assumptions.

Regression test: `test_a2a_cli_instructions_use_module_invocation_not_legacy_app_path`
asserts the legacy /app/a2a_cli.py path can't drift back into the CLI
system prompt and that the canonical module form is present.

The legacy workspace/Dockerfile + workspace/entrypoint.sh + workspace/scripts/
still contain /app/-shaped paths but are dev-only base-image scaffolding
(per workspace/build-all.sh's own header comment) — not shipped to the
standalone template images. Out of scope here; can be cleaned up in a
separate dead-code pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 11:22:00 -07:00
Hongming Wang
203a4f0f91 fix(runtime): resolve a2a_mcp_server.py path from wheel install location
DEFAULT_MCP_SERVER_PATH was hardcoded to /app/a2a_mcp_server.py, which
was correct under the pre-#87 monolithic-template Docker layout where
the workspace/ tree was COPY'd into /app/. After the universal-runtime
refactor (#87, #117), workspace modules ship inside the
molecule-ai-workspace-runtime wheel under
site-packages/molecule_runtime/, while /app/ now holds only
template-specific files (adapter.py + the runtime-native executor for
that template).

Net effect: in every workspace built since the wheel cutover, Claude
Code SDK's mcp_servers={"a2a": {"command": python, "args":
["/app/a2a_mcp_server.py"]}} pointed at a missing file. The subprocess
launch failed silently, the SDK registered zero MCP tools, and the
agent's list_peers / delegate_task / a2a_send_message / a2a_send_signal
all disappeared. Symptom observed today: Design Director said
"I tried to reach the perf auditor via the inter-agent MCP tools
(list_peers, delegate_task) but those tools didn't resolve in this
environment" and fell back to running the audit itself with WebFetch.

Why this slipped through E2E: the priority-runtimes harness sends a
single message and verifies a reply — it does not exercise inter-agent
delegation, so the missing MCP tools are invisible at that layer.

Fix: resolve the path relative to executor_helpers.py via __file__,
which tracks wherever the wheel is installed (site-packages today,
anywhere else tomorrow). The A2A_MCP_SERVER_PATH env override is
preserved for tests / non-default layouts.

Regression test: assert os.path.exists(DEFAULT_MCP_SERVER_PATH) so
any future move of a2a_mcp_server.py out of the package directory
fails at unit-test time instead of silently disabling delegation in
production.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 11:15:06 -07:00
Hongming Wang
dd57a840b6 fix: comprehensive a2a-sdk 1.x migration sweep across workspace/
Audited every a2a-sdk surface in workspace/ against the installed
1.0.2 wheel. Found and fixed:

main.py (the live workspace startup path):
  • create_jsonrpc_routes(rpc_url='/', enable_v0_3_compat=True) —
    rpc_url required in 1.x; v0.3 compat enables inbound legacy
    clients (`"role": "user"` lowercase) without forcing them to
    upgrade. Pairs with the outbound rename below.

a2a_executor.py:
  • TextPart/FilePart/FileWithUri removed in 1.x. Part is now a
    flat proto message: Part(text=…) / Part(url=…, filename=…,
    media_type=…). Updated the file-attachment branch (only
    reachable when an agent emits files; the harness's PONG path
    didn't exercise this, but it's a latent crash).
  • Message field names: messageId/taskId/contextId →
    message_id/task_id/context_id (proto3 snake_case).
  • Role enum: Role.agent → Role.ROLE_AGENT (proto enum).

Outbound JSON-RPC payloads (8 files):
  • "role": "user" → "role": "ROLE_USER" — proto3 JSON serialization
    is strict about enum values. Sites: a2a_client, a2a_cli, main
    (initial+idle prompts), heartbeat, builtin_tools/a2a_tools,
    builtin_tools/delegation. Wire JSON keys stay camelCase
    (proto3 default), only the role enum value changed.

google-adk/adapter.py:
  • new_agent_text_message → new_text_message (4 sites). This
    adapter's directory has a hyphen, so it can't be imported as a
    Python module — effectively dead code, but the wheel ships the
    file and a future fix should keep it correct against 1.x.

Why one PR instead of seven: every previous a2a-sdk migration find
landed as its own publish → cascade → harness → next-bug cycle.
Today's audit ran every a2a-sdk symbol/type/method in workspace/
against the installed 1.0.2 wheel in a single sweep + tested the
critical paths (Message construction, Part construction, Role enum
parsing) against the actual SDK. Should be the last migration PR.

Verified locally:
  python3 scripts/build_runtime_package.py --version 0.1.99 \
      --out /tmp/build-final
  pip install /tmp/build-final
  python -c "import molecule_runtime.main; \
             from molecule_runtime.a2a_executor import LangGraphA2AExecutor"
  → ✓ all imports clean against a2a-sdk 1.0.2

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 09:42:57 -07:00
Hongming Wang
c80b3ff0eb fix: pass rpc_url='/' to create_jsonrpc_routes (a2a-sdk 1.x requirement)
7th a2a-sdk migration find from the v0 → v1 transition.
create_jsonrpc_routes() now requires rpc_url as a positional arg
(was implicit at root in 0.x). Pass '/' to match
a2a.utils.constants.DEFAULT_RPC_URL — that's also what
workspace-server's a2a_proxy.go forwards to (POSTs to workspace URL
without appending a path).

Symptom before fix: every workspace startup crashed with
  TypeError: create_jsonrpc_routes() missing 1 required positional
  argument: 'rpc_url'

Caught by harness 9 phase 4 (claude-code + langgraph both on
0.1.24). The user's "use langgraph for fast iteration" call cut
the diagnose cycle from 15min to ~30s — without that, this would
have taken another hermes round-trip to surface.

Updated reference_a2a_sdk_v0_to_v1_migration.md memory with this
entry alongside the previous 6 finds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 09:33:23 -07:00
Hongming Wang
6859099a08 fix: pass agent_card to DefaultRequestHandler (a2a-sdk 1.x requirement)
a2a-sdk 1.x added agent_card as a required argument to
DefaultRequestHandler.__init__. main.py constructed it with only
agent_executor + task_store, so every workspace startup that reached
the handler init step crashed with:

  TypeError: DefaultRequestHandlerV2.__init__() missing 1 required
  positional argument: 'agent_card'

This is the 6th a2a-sdk migration find from the v0 → v1 transition
(see reference_a2a_sdk_v0_to_v1_migration memory). Pattern is the
same: SDK exposes a new required arg, our call site needs to pass
the existing object we already construct upstream.

Why the import-only smoke gates didn't catch this: it's a call-time
constructor error inside `async def main()`, not a module load
error. The runtime-pin-compat smoke imports main_sync but doesn't
invoke main() against a real config. Worth filing a follow-up to
extend the smoke to a "construct + dispose" cycle.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 08:53:47 -07:00
Hongming Wang
851fd21fb1 fix(workspace): rename supported_protocols → supported_interfaces (a2a-sdk 1.0)
CRITICAL: every workspace boot since the a2a-sdk 1.0 migration (#1974)
has been crashing at AgentCard construction with:
  ValueError: Protocol message AgentCard has no "supported_protocols" field

The protobuf field is `supported_interfaces` (plural, interfaces — see
a2a-sdk types/a2a_pb2.pyi:189). The 0.3→1.0 migration left the kwarg
as `supported_protocols`, which doesn't exist in the 1.0 schema, so
the constructor raises before any subsequent line of main runs.

Why this hid for so long:
  - publish-runtime.yml's smoke step only IMPORTED molecule_runtime.main;
    importing the module is fine, only CONSTRUCTING the AgentCard fails
  - The user-visible symptom is "Workspace failed: " with empty
    last_sample_error, indistinguishable from generic boot timeouts
  - The state_transition_history=True bug (fixed in #2179) was a
    sibling of this — same migration, same class, just caught first

Fix is symmetric with #2179:
  1. workspace/main.py: rename the kwarg + comment explaining why
  2. .github/workflows/publish-runtime.yml: extend the smoke block to
     instantiate AgentCard with the exact production call shape, so
     the next field-rename of this class fails at publish time
     instead of breaking every workspace startup

Verification:
  - Constructed AgentCard against fresh a2a-sdk 1.0.2 in a clean
    venv with the corrected kwarg → succeeds
  - Constructed it with the original `supported_protocols` kwarg →
    fails immediately with the exact error production sees
  - Smoke test pinned to mirror main.py's exact call shape; main.py
    + smoke must stay in lockstep going forward

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 07:54:23 -07:00
Hongming Wang
12d446bc8e docs: explain why state_transition_history is gone (research-backed)
Adds a comment block citing a2a-sdk's own
a2a/compat/v0_3/conversions.py, which says verbatim:

  state_transition_history=None,  # No longer supported in v1.0

So a future reader who notices the missing kwarg won't try to add it
back. The capability is now universal: every v1.x Task carries a
history list and tasks/get supports historyLength via the
apply_history_length helper. No flag because nothing's optional.

Confirmed by reading the SDK source directly:
- a2a/types.py AgentCapabilities exposes only: streaming,
  push_notifications, extensions, extended_agent_card.
- a2a/compat/v0_3/conversions.py explicitly maps None when
  down-converting v1 → v0.3 (deliberate removal, not rename).
- a2a/server/request_handlers/default_request_handler_v2.py uses
  apply_history_length(task, params) — agent doesn't opt in.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 07:20:05 -07:00
Hongming Wang
f531fe1367 fix: drop state_transition_history field — removed in a2a-sdk 1.x
a2a-sdk 1.x's AgentCapabilities only exposes 4 fields:
streaming, push_notifications, extensions, extended_agent_card.
The state_transition_history field was removed in the v1 protobuf
schema. main.py still passed it as a kwarg, so every workspace
that reached the AgentCard construction step (line 188) crashed:

  ValueError: Protocol message AgentCapabilities has no
  "state_transition_history" field

Symptom: every claude-code + hermes workspace stuck in `provisioning`
forever — caught when the user provisioned a Design Director crew
manually via the canvas while harness 5 was running.

Why every prior smoke gate missed it:
- runtime-pin-compat.yml smokes `from molecule_runtime.main import
  main_sync` — only imports the module. AgentCapabilities() runs
  inside `async def main()`, not at module load.
- Template image boot smoke does `import every /app/*.py` — same
  story. main.py imports fine; the field error only fires at call.

The fix is one line — drop the kwarg. Fields we actually need
(streaming + push_notifications) are still passed.

Follow-up worth filing: smoke step that instantiates Adapter() +
calls a no-op setup() against a stub config. That would have
caught this before publish.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 07:16:16 -07:00
Hongming Wang
5b05d663ee test: update a2a.helpers mock to export new_text_message
The conftest mock only exposed `new_agent_text_message`, the pre-v1
name. After fixing a2a_executor.py to use the v1 name
`new_text_message`, the mock didn't satisfy the import → CI red.

Mock both names (aliased to the same lambda) so any in-flight test
that still references the old name keeps working until the next
sweep removes those references.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 05:34:28 -07:00
Hongming Wang
722e1fd175 fix(a2a_executor): migrate to a2a-sdk 1.x API — new_agent_text_message → new_text_message
a2a-sdk v1 renamed `new_agent_text_message` → `new_text_message`
(role=Role.agent is now the default). Same fix landed in the hermes
template earlier today; this is the runtime-side equivalent.

NOT dead code: a2a_executor.py is the LangGraph A2A executor, used by
the langgraph + deepagents templates. Both templates currently import
it via bare `from a2a_executor import LangGraphA2AExecutor` — which is
a separate bug in those templates, filed/fixed separately.

Symptom in a2a_executor.py form: any langgraph or deepagents workspace
that calls create_executor crashes with `ImportError: cannot import
name 'new_agent_text_message' from 'a2a.helpers'`. Doesn't surface for
claude-code or hermes (their templates use their own executors and
don't load a2a_executor).

Five call sites updated, one import line, one comment. Test suite
already passes against the new symbol — `python -c "from
molecule_runtime.a2a_executor import LangGraphA2AExecutor"` resolves
cleanly after this change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 05:29:59 -07:00
Hongming Wang
3df5867b56 fix: restore main_sync entry point in workspace/main.py
The wheel's pyproject.toml has declared
`molecule-runtime = "molecule_runtime.main:main_sync"` since the
publish pipeline was created on 2026-04-26, but the function
itself was never present in workspace/main.py — it lived in the
pre-monorepo molecule-ai-workspace-runtime repo and was lost
during the consolidation that made workspace/ the source of truth.

The 0.1.15 wheel still had main_sync from a leftover snapshot,
so the regression went unnoticed until 0.1.16 (the first wheel
built from the new source-of-truth) shipped. Symptom: every
workspace container restart loops with

  ImportError: cannot import name 'main_sync' from 'molecule_runtime.main'

— the molecule-runtime CLI script's first line tries to import
the missing symbol. Workspaces stay in `provisioning` until the
10-min sweep marks them failed.

Caught by .github/workflows/runtime-pin-compat.yml, which already
imports the symbol by name as its smoke test. (That check kept
failing red on every recent merge_group run; this PR fixes the
underlying symbol-not-found instead of the smoke step.)

Also strengthens publish-runtime.yml's wheel smoke from
`import molecule_runtime.main` (loads the module — passes even
when entry-point target is missing) to `from molecule_runtime.main
import main_sync` (the actual contract the CLI script needs).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 03:35:49 -07:00
Hongming Wang
d19d35f6b3 test(skills): make watcher test fakes accept current_runtime kwarg
The runtime-compat change in this branch added a `current_runtime`
kwarg to load_skills(); the watcher passes it through. Test mocks
that pre-date the kwarg signature broke with TypeError, which the
watcher's reload-error try/except swallowed — the symptom was empty
callback lists, not a clear failure.

Switching the fakes to accept **kwargs keeps them forward-compat for
future load_skills additions without another test churn.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 02:04:26 -07:00
Hongming Wang
d0057912d2 feat(skills): per-skill runtime compatibility (#119, hermes pattern)
SKILL.md frontmatter can now declare `runtime: [claude-code]` or
`runtime: [hermes, claude-code]` to opt out of incompatible adapters
instead of failing at first invocation. Default `["*"]` means universal —
existing skill libraries need zero migration.

Borrowed from hermes' declarative skill-compat pattern surfaced in the
hermes architecture survey. The remaining two patterns (event-log
layer, observability config block) stay open under #119.

Wiring:
- SkillMetadata.runtime: list[str] = ["*"]
- _normalize_runtime_field accepts list, string-sugar, missing -> ["*"];
  malformed warns and falls back to universal so a typo never silently
  drops a skill.
- load_skills(..., current_runtime=...) filters out skills whose runtime
  list lacks "*" or current_runtime, with an INFO log line.
- BaseAdapter.start passes type(self).name() so the live adapter drives
  the filter; SkillsWatcher takes the same kwarg so hot-reload honors it.

8 new tests cover default universal, no-field universal, explicit
match/mismatch, string sugar, wildcard short-circuit, current_runtime=None
(preserves old behavior), and malformed-warns-not-drops.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 01:57:43 -07:00
Hongming Wang
e99f937630
Merge pull request #2157 from Molecule-AI/chore/drop-cli-executor-from-runtime
chore(workspace): drop cli_executor — Phase 3 of #87 [DRAFT]
2026-04-27 08:24:30 +00:00
Hongming Wang
98ca5c50fa chore(workspace): drop cli_executor — Phase 3 of #87 (DRAFT, blocked on gemini-cli image rebuild)
DRAFT — do NOT merge until gemini-cli template image rebuilds with
its local cli_executor.py copy (template PR #9 just merged at
07:59 UTC; image build kicks off now).

Final adapter-specific deletion from molecule-runtime, completing #87
for the priority adapters (claude-code via PR #2156, plus gemini-cli
via this PR + template #9).

Deletes:
  - workspace/cli_executor.py (461 LOC) — CLIAgentExecutor + the
    RUNTIME_PRESETS dict for codex / ollama / gemini-cli. The file
    moved to molecule-ai-workspace-template-gemini-cli (PR #9, merged).
  - workspace/tests/test_agent_base_urls.py — only consumer of
    CLIAgentExecutor in the test suite. Tests for the executor
    behavior live in the template repo now.

Updates:
  - workspace/tests/test_executor_helpers.py — docstring refresh:
    executor_helpers.py is the runtime-agnostic shared helpers; the
    executor classes themselves live in template repos post-#87.

Codex / ollama presets disappear naturally with the file. They never
had template repos, so no production path could invoke them anyway —
this is dead-code removal as a side effect of the move.

Verified-safe-to-delete:
  - heartbeat.py: doesn't import cli_executor
  - claude_sdk_executor.py: deleted by PR #2156 (in flight)
  - preflight.py: only references runtime names by string; no import
  - main.py: doesn't import cli_executor (uses adapter discovery via
    ADAPTER_MODULE; the template's adapter constructs the executor)
  - Only test_agent_base_urls.py + test_executor_helpers.py docstring
    referenced cli_executor

Verification:
  - 1249/1249 workspace pytest pass (was 1251; -2 = test_agent_base_urls.py
    cases — exact match)
  - No live import of cli_executor anywhere in molecule-core after deletion
    (grep verified)

Sequencing:
  1.  Template PR #9 (gemini-cli local copy) — MERGED
  2.  Template image rebuild — running
  3. THIS PR — wait until image is published, then mark ready-for-review

Closes #87 for the priority adapters: workspace/ is now adapter-
agnostic except for adapter discovery (ADAPTER_MODULE) + the
runtime_wedge primitive.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 01:22:39 -07:00
Hongming Wang
7504aba934 feat(tools): tighten send_message_to_user description to forbid pasting URLs in body
Root-cause fix for #118 (chat attachments rendering as plain text links
instead of download chips). User flagged with screenshot 2026-04-26
showing the Design Director agent pasting https://files.catbox.moe/…
in the message body — chat rendered the URL as plain markdown text,
unclickable in the canvas's bubble layout, and unreachable in any SaaS
deployment where the user's browser can't egress to catbox.

The structured `attachments` field already exists, the canvas's
AttachmentChip already renders well, the WebSocket broadcast already
carries attachments verbatim — the missing piece was the LLM choosing
the body over the structured field. Tighten the tool description so it
trains the right behavior.

Three targeted strengthenings:

  1. Top-level tool description: enumerated use case (4) now reads
     "via the `attachments` field (NEVER paste file URLs in `message`)".
     The all-caps NEVER + the explicit field name move the LLM toward
     the structured path on first read.

  2. `message` param: adds an explicit DO NOT rule with rationale.
     Includes the SaaS-reachability reason so operators can grep for
     "SaaS" and find this design constraint instead of re-discovering it
     after a tenant complaint. Calls out catbox.moe + file:// by name as
     concrete examples of forbidden hosts (those are the two we've seen
     in production).

  3. `attachments` param: leads with REQUIRED, lists the bad
     alternatives explicitly (pasting URLs, base64-encoding, telling
     user to look at a path). LLMs handle "use X, NOT Y" framings
     better than "use X" alone — observed during prompt-engineering
     iteration on hermes' tool descriptions.

Tests pin all three load-bearing phrases (4 new in test_a2a_mcp_server.py)
so a future doc edit that softens or drops them fails CI. Brittle by
design — these are prompt-engineering invariants, not implementation
details.

This is the root-cause fix. A defensive canvas-side backstop (auto-
detect download-shaped URLs in body and convert to chips) is a
follow-up that could land separately if the steering proves
insufficient in practice.

Verification:
  - 1190/1190 workspace pytest pass
  - 4 new test_a2a_mcp_server.py cases all green

Closes the steering half of #118. The structured-attachments-only
contract was already enforced server-side (PR #2130 added per-attachment
validation); this PR closes the prompt-side gap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 01:13:11 -07:00
Hongming Wang
4e6030d783
Merge pull request #2156 from Molecule-AI/chore/drop-claude-sdk-executor-from-runtime
chore(workspace): drop claude_sdk_executor — Phase 2 of #87
2026-04-27 08:02:51 +00:00
Hongming Wang
4b5ac2ebc2 chore(workspace): drop claude_sdk_executor — Phase 2 of #87
Phase 2 of the universal-runtime refactor (task #87). Now that the
claude-code template repo ships its own claude_sdk_executor.py
(template PR #13 merged + image rebuilt at 07:36 UTC) the
molecule-runtime no longer needs to ship the file.

Deletes:
  - workspace/claude_sdk_executor.py (704 LOC)
  - workspace/tests/test_claude_sdk_executor.py (~1.6K LOC)

Updates:
  - workspace/runtime_wedge.py — drops the "Compatibility shim" docstring
    section. The shim was time-bounded ("removed once #87 Phase 2 lands");
    this is that PR.
  - workspace/tests/test_runtime_wedge.py — drops the
    TestClaudeSdkExecutorReExportShim test class (the shim doesn't
    exist anymore so the identity assertions would fail at import).
  - workspace/tests/conftest.py — drops the claude_agent_sdk stub.
    Its only consumer was test_claude_sdk_executor.py which is gone;
    no other test imports the SDK.
  - workspace/cli_executor.py — comment refresh: claude-code template
    repo (not workspace/) is now the home for ClaudeSDKExecutor.

Verified-safe-to-delete:
  - heartbeat.py: migrated to runtime_wedge in PR #2154 (no longer
    imports from claude_sdk_executor)
  - cli_executor.py: only comments referenced claude_sdk_executor;
    its line-117 ValueError defends against accidental routing
  - tests: only test_claude_sdk_executor.py + test_runtime_wedge.py's
    shim class consumed the deleted module; both removed in this PR

Verification:
  - 1182/1182 workspace pytest pass (was 1251; -69 = exactly the
    deleted test cases — zero unexpected regressions)
  - No live import of claude_sdk_executor anywhere in molecule-core
    after deletion (grep verified)

Closes #87 for the claude-code adapter. Hermes is already template-only.
The remaining adapter-specific code in workspace/ is cli_executor.py
(codex/ollama/gemini-cli) tracked by task #122. preflight.py's
SUPPORTED_RUNTIMES static list is tracked by task #123 (PR #2155 in
flight).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 00:52:55 -07:00
Hongming Wang
7dba700ac3 feat(preflight): replace SUPPORTED_RUNTIMES static list with adapter discovery
Closes task #123 — last piece of #87 cleanup.

Pre-fix: workspace/preflight.py:11 hardcoded a tuple of "supported"
runtime names (claude-code, codex, ollama, langgraph, etc.). Every
new template repo required a code change in molecule-runtime to be
recognized — direct violation of the universal-runtime principle
(#87) where adapters declare themselves and the runtime stays generic.

Post-fix: discovery-based validation via the same ADAPTER_MODULE env
var that production load paths already consult
(workspace/adapters/__init__.py:get_adapter). Distinguished failure
modes so operator messages are concrete:

  - ADAPTER_MODULE unset → "no adapter installed; set the env var"
  - ADAPTER_MODULE set but module won't import → import error type +
    message
  - module imports but no Adapter class → "convention violation, add
    `Adapter = YourClass`"
  - Adapter.name() raises → caught with operator message
  - Adapter.name() returns non-string → contract violation message
  - Adapter.name() doesn't match config.runtime → drift WARNING (not
    fatal; the adapter wins in production, config.yaml is just
    documentation)

The drift case is the one behavioral change worth calling out: the
prior static-list path would have hard-failed config.runtime values
not in the allowlist. With discovery, an unknown runtime in
config.yaml is just a documentation drift — the adapter that's
actually installed runs regardless. Operator gets a warning naming
both the configured and installed names so they can fix whichever
is stale.

Tests:
  - Replaces the obsolete "static list pass/fail" tests with 6 new
    cases covering each distinguished failure mode, plus a positive
    test for the adapter-matches-config happy path
  - Adds an autouse `_default_langgraph_adapter` fixture that
    pre-installs a fake adapter via sys.modules monkey-patching, so
    existing tests building default WorkspaceConfig (runtime="langgraph")
    inherit a valid adapter without each test setting ADAPTER_MODULE
  - Failure-mode tests opt out of the default fixture via
    @pytest.mark.no_default_adapter (registered in pytest.ini)
  - Sentinel pattern (`_UNSET = object()`) for `name_returns` so None
    is a passable test value (otherwise `is not None` would skip the
    None branch — exact bug the sentinel avoids)

Verification:
  - 22/22 preflight tests pass (was 16; +6 new failure-path tests)
  - 1256/1256 workspace pytest pass (was 1251; +5 net)
  - No production code path other than preflight changed

Source: 2026-04-27 #87 cleanup audit after PR #2154 (wedge extraction).
This change is independent of the cli_executor.py template moves
(task #122) — completes one of the two remaining cleanup items.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 00:44:51 -07:00
Hongming Wang
5e049244d6 refactor(wedge): mark re-exports explicit via __all__
Addresses github-code-quality unused-import flag on the runtime_wedge
re-export shim.  Adds __all__ listing the names that exist purely for
backwards-compat (is_wedged / wedge_reason / _reset_sdk_wedge_for_test)
so static analysis recognizes the imports as deliberate exports.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 00:20:23 -07:00
Hongming Wang
feb544938b refactor(wedge): address review feedback — class wrap + import-path doc + dedupe shim rationale
Three changes from /code-review-and-quality on PR #2154:

1. Optional (architecture): wrap state in a private _WedgeState class
   instead of bare module-level globals. Public API (mark_wedged /
   clear_wedge / is_wedged / wedge_reason / reset_for_test) is
   unchanged — adapters never see the class. The class is forward-cover
   for any future per-scope variant (multiple executors per process, a
   keyed registry, etc.) without churning the call sites. Today there's
   exactly one instance (_DEFAULT) so behavior is identical.

2. Optional (readability): clarify the import path in the integration
   recipe — in a TEMPLATE repo it's `from molecule_runtime.runtime_wedge`
   (PyPI package); in molecule-core itself it's `from runtime_wedge`
   (top-level module). Removes the trap where a contributor reading the
   docstring while editing in-repo copies the template-style import and
   gets ImportError.

3. Nit (readability): dedupe the shim rationale. claude_sdk_executor's
   re-export comment now points to runtime_wedge's "Compatibility shim"
   section as the source of truth instead of restating the same content.
   Avoids docs-in-two-places drift risk.

Verification:
  - 1251/1251 workspace pytest pass (no behavior change — class wrap
    is pure plumbing; module-level helpers delegate to the singleton)
  - All shim re-export identity tests still pass (the shim's
    `is_wedged is runtime_wedge.is_wedged` assertion holds because we
    re-export the SAME function object that delegates to _DEFAULT)

No new tests needed — the existing test suite covers the public API
contract; the class is an implementation detail behind that contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 00:16:33 -07:00
Hongming Wang
cd899c969f docs(wedge): integration recipe for adapters that want to flip-to-degraded
Doc-only follow-up to the wedge-state extraction. Adds proactive
guidance so the next adapter (hermes / codex / langgraph / a future
template) discovers the runtime_wedge primitive and integrates the
~6 LOC pattern uniformly instead of inventing its own wedge state.

Two additions:

  - workspace/runtime_wedge.py — new "How to use from a NEW adapter"
    section in the module docstring with the minimum viable
    integration recipe, what-you-get-for-free list, and explicit
    DON'TS (don't store local wedge state, don't mark for transient
    errors, don't write your own clear logic). Plus a "when wedge is
    the WRONG primitive" note to keep adopters from over-using it.

  - workspace/adapter_base.py — adds runtime_wedge to the
    "Cross-cutting capabilities your adapter can opt into" list in
    BaseAdapter's docstring (alongside capabilities() and
    idle_timeout_override()). Discoverability path: adapter author
    reads BaseAdapter docstring → sees runtime_wedge mention → reads
    runtime_wedge module docstring → has the recipe.

Also tightens the "to add a new agent infra" steps in BaseAdapter to
match the actual current model (standalone template repo + ADAPTER_MODULE
env var) rather than the obsolete workspace/adapters/<infra>/ layout
that hasn't been the path since the universal-runtime extraction
started.

Zero code change. Tests untouched (1251/1251 still pass).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 00:12:14 -07:00
Hongming Wang
1d231ed295 refactor(wedge): extract claude_sdk_executor wedge state into runtime_wedge module
Prerequisite for the universal-runtime refactor (task #87) to move
claude_sdk_executor.py out of molecule-runtime into the claude-code
template repo. heartbeat.py had a hard import:

    from claude_sdk_executor import is_wedged, wedge_reason

which would break the moment the executor moves out of the runtime
package — the heartbeat would lose access to the wedge state used to
flip workspace status to degraded.

Extract the wedge state to a runtime-side module that the heartbeat
can keep importing regardless of which adapter executor is wedged:

  - workspace/runtime_wedge.py — single-flag state + mark_wedged /
    clear_wedge / is_wedged / wedge_reason / reset_for_test. Same
    semantics as the original claude_sdk_executor implementation
    (sticky first-write-wins, auto-clear on observed success). 100
    LOC of pure stateless helpers; lock-free ok because there's one
    executor per workspace process today.

  - workspace/claude_sdk_executor.py — drops the in-file definitions;
    re-exports the same names from runtime_wedge as a backwards-compat
    shim. Any third-party adapter that imported is_wedged / wedge_reason
    / _mark_sdk_wedged from claude_sdk_executor keeps working for one
    release cycle while they migrate to runtime_wedge.

  - workspace/heartbeat.py — _runtime_state_payload() now imports
    from runtime_wedge instead of claude_sdk_executor. Lazy-import
    pattern preserved; the docstring updated to explain the new
    cross-cutting source-of-truth.

Tests (10 new in test_runtime_wedge.py):
  - Default state (unwedged), mark sets flag, first-write-wins,
    clear restores healthy, clear-when-not-wedged is no-op,
    re-marking after clear is allowed
  - Re-export shim: each old name in claude_sdk_executor IS the
    runtime_wedge function (identity check), state is shared
    (marking via the executor shim is observable via runtime_wedge
    and vice versa)

Verification:
  - 1251/1251 workspace pytest pass (was 1241 after orphan deletion;
    +10 = exactly the new test_runtime_wedge.py cases)
  - All existing test_claude_sdk_executor.py cases (which call
    _mark_sdk_wedged via the shim) still pass

After this lands + the claude-code template image rebuilds with the
local claude_sdk_executor.py copy (template PR #13), the molecule-
core deletion of workspace/claude_sdk_executor.py becomes safe (the
shim deletion comes alongside the file deletion, since runtime_wedge
is the new public API).

See project memory `project_runtime_native_pluggable.md`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 00:08:53 -07:00
Hongming Wang
fa8deb9d16 chore(workspace): delete orphan HermesA2AExecutor (dead code, 1.8K LOC)
Removes:
  - workspace/hermes_executor.py (545 LOC) — HermesA2AExecutor, an
    OpenAI-compat direct-call executor that was the original hermes
    integration before the template was rewritten to bridge to
    hermes-agent's sidecar API server.
  - workspace/tests/test_hermes_executor.py (1307 LOC) — its test file.

Verified-dead-code analysis:
  - Zero `from hermes_executor` / `import hermes_executor` imports
    anywhere in workspace/, workspace-server/, or
    workspace-configs-templates/ (excluding the file itself + its test).
  - The hermes template (workspace-configs-templates/hermes/executor.py)
    uses HermesAgentProxyExecutor, NOT HermesA2AExecutor — they're
    independent implementations. The executor.py file imports from
    `executor` (local), not from molecule_runtime.
  - Last touched in PR #1974 (2026 a2a-sdk migration to 1.0.0) for SDK
    compatibility — kept compiling but never wired into any code path.
  - Older than that, only the 2026 open-source restructure rename.

Why now: starting task #87 (universal-runtime violation, move adapter-
specific code out of workspace/). Dead-code deletion is the safest
first step and motivates the broader refactor by clearing the
landscape — no risk of someone defending HermesA2AExecutor as
"actually used somewhere."

Verification:
  - 1241/1241 workspace pytest pass (was 1312; the 71 dropped tests
    are exactly test_hermes_executor.py's coverage)
  - No new failures, no broken imports anywhere

The remaining adapter-specific executors in workspace/ that #87 will
eventually relocate (per the user's scope: claude-code + hermes priority,
others later):
  - workspace/claude_sdk_executor.py (757 LOC) → claude-code template repo
  - workspace/cli_executor.py (461 LOC) → defer (codex/ollama/etc still
    use the runtime presets here; comes back later when those bump versions)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 23:52:10 -07:00
Hongming Wang
af664e3e87 feat(tools): borrow hermes-style discipline — error/summary caps + sharper MCP descriptions
Three small wins from the hermes-agent design survey, bundled because
each is too small for its own PR but they all improve the priority
adapters (claude-code + hermes) immediately.

1. Hermes-style cap on telemetry fields, applied INSIDE report_activity
   so every caller benefits without remembering. error_detail capped at
   4096 (hermes' value); summary capped at 256 (one-liner ceiling). The
   existing call site in tool_delegate_task already truncated error_detail
   at 4096, but moving the cap into the helper closes the door on a
   future caller pasting a giant traceback. response_text is NOT capped
   (it's the agent's user-visible reply; truncating would silently drop
   content). Pinned by 4 new tests including a negative-pin that
   response_text MUST stay untruncated.

2. Sharper MCP tool descriptions for commit_memory + recall_memory —
   hermes' delegate_task description literally says "WAIT for the response"
   and delegate_task_async says "Returns immediately." LLMs pick the
   right tool variant from descriptions; ambiguity costs accuracy.
   - commit_memory now states it APPENDS (each call creates a row, no
     overwrite) and that GLOBAL requires tier 0.
   - recall_memory now states it's case-insensitive substring search
     with no pagination, returns all matches, and that empty-query is
     cheap and safer than a narrow keyword.

3. (no code change) Filed task #120 for the bigger user-flow win — a
   per-workspace tool enable/disable menu in Canvas Config — and task
   #121 for model-string passthrough (depends on #87 universal-runtime
   refactor).

Verification:
  - 1312/1312 Python pytest pass (was 1308, +4 new)

See task #119 for the architectural follow-ups (event-log layer,
declarative skill compat, observability config block) and project
memory `project_runtime_native_pluggable.md`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 23:25:54 -07:00
Hongming Wang
aa70727ab9 fix(test): drop unused MagicMock import in test_heartbeat_runtime_metadata
Reviewer bot flagged: import was leftover from earlier scaffolding —
all test fixtures use sys.modules monkey-patching with SimpleNamespace
instead. Drop to unblock merge. Tests still 5/5 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 22:58:21 -07:00
Hongming Wang
0d3058585b feat(runtime): adapter-declared idle_timeout_override end-to-end
Capability primitive #2 (task #117). The first cross-cutting capability
where the adapter actually displaces platform behavior — claude-code's
streaming session can legitimately go silent for 8+ minutes during
synthesis + slow tool calls; the platform's hardcoded 5min idle timer
in a2a_proxy.go cancels it mid-flight (the bug PR #2128 patched at
the env-var layer). This PR fixes it at the right layer: the adapter
declares "I need 600s" and the platform's dispatch path honors it.

Wire shape (Python → Go):

  POST /registry/heartbeat
  {
    "workspace_id": "...",
    ...
    "runtime_metadata": {
      "capabilities": {"heartbeat": false, "scheduler": false, ...},
      "idle_timeout_seconds": 600    // optional, omitted = use default
    }
  }

Default behavior preserved: any adapter that doesn't override
BaseAdapter.idle_timeout_override() (returns None by default) sends
no idle_timeout_seconds field; the Go side falls through to
idleTimeoutDuration (env A2A_IDLE_TIMEOUT_SECONDS, default 5min).
Existing langgraph / crewai / deepagents workspaces are unaffected.

Components:

  Python:
  - adapter_base.py: idle_timeout_override() method on BaseAdapter
    returning None (the platform-default sentinel).
  - heartbeat.py: _runtime_metadata_payload() lazy-imports the active
    adapter and assembles the capability + override block. Try/except
    swallows ANY error so heartbeat never breaks because of capability
    discovery — observability outranks capability accuracy.

  Go:
  - models.HeartbeatPayload.RuntimeMetadata (pointer so absent =
    "old runtime, didn't say"; explicit zero-cap = "new runtime,
    declared no native ownership").
  - handlers.runtimeOverrides: in-memory sync.Map cache keyed by
    workspaceID. Populated by the heartbeat handler, consulted on
    every dispatchA2A. Reset on platform restart (worst-case 30s of
    platform-default behavior — acceptable; nothing about overrides
    is correctness-critical).
  - a2a_proxy.dispatchA2A: looks up the override before applyIdle
    Timeout; falls through to global default when absent.

Tests:
  Python (17, all new):
    - RuntimeCapabilities dataclass shape (frozen, defaults, wire keys)
    - BaseAdapter.capabilities() default + override + sibling isolation
    - idle_timeout_override default, positive override, dropped-override
    - Heartbeat metadata producer: default adapter emits all-False,
      native adapter emits flag + override, missing ADAPTER_MODULE
      returns {} (graceful), zero/negative override is omitted from
      wire, exception inside adapter swallowed
  Go (6, all new):
    - SetIdleTimeout + IdleTimeout round-trip
    - Zero/negative duration clears the override
    - Empty workspace_id ignored
    - Replacement (heartbeat overwrites prior value)
    - Reset clears entire cache
    - Concurrent reads + writes (sync.Map invariant)

Verification:
  - 1308 / 1308 workspace pytest pass (was 1300, +8)
  - All Go handlers tests pass (6 new + existing)
  - go vet clean

See project memory `project_runtime_native_pluggable.md` for the
architecture principle this implements.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 22:38:01 -07:00
Hongming Wang
205a454c09 feat(runtime): RuntimeCapabilities dataclass + BaseAdapter.capabilities()
Foundation primitive for the native+pluggable runtime principle (task
#117, blocks #87). Lets each adapter declare which cross-cutting
capabilities it owns natively (heartbeat, scheduler, durable session,
status mgmt, retry, activity decoration, channel dispatch) versus
delegates to the platform's fallback implementation.

Pure additive: every existing adapter inherits BaseAdapter.capabilities()
which returns RuntimeCapabilities() — every flag False — so today's
"platform owns everything" behavior is preserved exactly. Subsequent
PRs land platform-side consumers (idle-timeout override, scheduler
skip, status-transition hook, etc.) one capability at a time.

Why a frozen dataclass instead of class attributes: capabilities are
declared at class-load time and read by the platform on every heartbeat.
A mutable value would let a runtime change capabilities mid-flight,
creating impossible-to-debug state where the platform's idea of who-
owns-heartbeat drifts from the adapter's actual code.

Why a `to_dict()` with explicit short keys: the Go side will read these
from the heartbeat payload by string key. The dict's wire names are
pinned independently of Python field names so a Python-side rename
doesn't silently break the Go consumer (test pins this).

Tests (9 new):
  - is a frozen dataclass (mutation rejected)
  - all 7 default flags are False (load-bearing — flipping any default
    silently moves ownership for langgraph/crewai/deepagents)
  - to_dict() keys are stable wire names (Go contract)
  - BaseAdapter.capabilities() default returns all-False
  - subclass override mechanism works
  - sibling adapters' defaults aren't affected by an override

Verification:
  - 1300/1300 workspace pytest pass (was 1291, +9)
  - Zero behavior change for any existing code path

See project memory `project_runtime_native_pluggable.md`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 22:17:49 -07:00
Hongming Wang
6eaacf175b fix(notify): review-flagged Critical + Required findings on PR #2130
Two Critical bugs caught in code review of the agent→user attachments PR:

1. **Empty-URI attachments slipped past validation.** Gin's
   go-playground/validator does NOT iterate slice elements without
   `dive` — verified zero `dive` usage anywhere in workspace-server —
   so the inner `binding:"required"` tags on NotifyAttachment.URI/Name
   were never enforced. `attachments: [{"uri":"","name":""}]` would
   pass validation, broadcast empty-URI chips that render blank in
   canvas, AND persist them in activity_logs for every page reload to
   re-render. Added explicit per-element validation in Notify (returns
   400 with `attachment[i]: uri and name are required`) plus
   defence-in-depth in the canvas filter (rejects empty strings, not
   just non-strings).
   3-case regression test pins the rejection.

2. **Hardcoded application/octet-stream stripped real mime types.**
   `_upload_chat_files` always passed octet-stream as the multipart
   Content-Type. chat_files.go:Upload reads `fh.Header.Get("Content-Type")`
   FIRST and only falls back to extension-sniffing when the header is
   empty, so every agent-attached file lost its real type forever —
   broke the canvas's MIME-based icon/preview logic. Now sniff via
   `mimetypes.guess_type(path)` and only fall back to octet-stream
   when sniffing returns None.

Plus three Required nits:

- `sqlmockArgMatcher` was misleading — the closure always returned
  true after capture, identical to `sqlmock.AnyArg()` semantics, but
  named like a custom matcher. Renamed to `sqlmockCaptureArg(*string)`
  so the intent (capture for post-call inspection, not validate via
  driver-callback) is unambiguous.
- Test asserted notify call by `await_args_list[1]` index — fragile
  to any future _upload_chat_files refactor that adds a pre-flight
  POST. Now filter call list by URL suffix `/notify` and assert
  exactly one match.
- Added `TestNotify_RejectsAttachmentWithEmptyURIOrName` (3 cases)
  covering empty-uri, empty-name, both-empty so the Critical fix
  stays defended.

Deferred to follow-up:

- ORDER BY tiebreaker for same-millisecond notifies — pre-existing
  risk, not regression.
- Streaming multipart upload — bounded by the platform's 50MB total
  cap so RAM ceiling is fixed; switch to streaming if cap rises.
- Symlink rejection — agent UID can already read whatever its
  filesystem perms allow via the shell tool; rejecting symlinks
  doesn't materially shrink the attack surface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 19:47:31 -07:00
Hongming Wang
d028fe19ff feat(notify): agent → user file attachments via send_message_to_user
Closes the gap where the Director would say "ZIP is ready at /tmp/foo.zip"
in plain text instead of attaching a download chip — the runtime literally
had no API for outbound file attachments. The canvas + platform's
chat-uploads infrastructure already supported the inbound (user → agent)
direction (commit 94d9331c); this PR wires the outbound side.

End-to-end shape:

  agent: send_message_to_user("Done!", attachments=["/tmp/build.zip"])
   ↓ runtime
  POST /workspaces/<self>/chat/uploads (multipart)
   ↓ platform
  /workspace/.molecule/chat-uploads/<uuid>-build.zip
   → returns {uri: workspace:/...build.zip, name, mimeType, size}
   ↓ runtime
  POST /workspaces/<self>/notify
   {message: "Done!", attachments: [{uri, name, mimeType, size}]}
   ↓ platform
  Broadcasts AGENT_MESSAGE with attachments + persists to activity_logs
  with response_body = {result: "Done!", parts: [{kind:file, file:{...}}]}
   ↓ canvas
  WS push: canvas-events.ts adds attachments to agentMessages queue
  Reload: ChatTab.loadMessagesFromDB → extractFilesFromTask sees parts[]
  Either path → ChatTab renders download chip via existing path

Files changed:

  workspace-server/internal/handlers/activity.go
    - NotifyAttachment struct {URI, Name, MimeType, Size}
    - Notify body accepts attachments[], broadcasts in payload,
      persists as response_body.parts[].kind="file"

  canvas/src/store/canvas-events.ts
    - AGENT_MESSAGE handler reads payload.attachments, type-validates
      each entry, attaches to agentMessages queue
    - Skips empty events (was: skipped only when content empty)

  workspace/a2a_tools.py
    - tool_send_message_to_user(message, attachments=[paths])
    - New _upload_chat_files helper: opens each path, multipart POSTs
      to /chat/uploads, returns the platform's metadata
    - Fail-fast on missing file / upload error — never sends a notify
      with a half-rendered attachment chip

  workspace/a2a_mcp_server.py
    - inputSchema declares attachments param so claude-code SDK
      surfaces it to the model
    - Defensive filter on the dispatch path (drops non-string entries
      if the model sends a malformed payload)

  Tests:
    - 4 new Python: success path, missing file, upload 5xx, no-attach
      backwards compat
    - 1 new Go: Notify-with-attachments persists parts[] in
      response_body so chat reload reconstructs the chip

Why /tmp paths work even though they're outside the canvas's allowed
roots: the runtime tool reads the bytes locally and re-uploads through
/chat/uploads, which lands the file under /workspace (an allowed root).
The agent can specify any readable path.

Does NOT include: agent → agent file transfer. Different design problem
(cross-workspace download auth: peer would need a credential to call
sender's /chat/download). Tracked as a follow-up under task #114.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 19:35:58 -07:00
Hongming Wang
5071454074 fix(delegation): lazy-refresh QUEUED state from platform; live DELEGATION_* events
Critical follow-up to PR #2126's review. Two real bugs:

1. **Runtime QUEUED never resolved.** Platform's drain stitch updates
   the platform's delegate_result row when a queued delegation finally
   completes, but never pushes back to the runtime. The LLM polling
   check_delegation_status saw status="queued" forever — combined with
   the new docstring guidance ("queued → wait, peer will reply"), the
   model would wait indefinitely on a state that never resolves.
   Strictly worse than pre-PR behavior where it would have at least
   bypassed.

2. **Live updates dead code.** delegation.go writes activity rows by
   direct INSERT INTO activity_logs, bypassing the LogActivity helper
   that fires ACTIVITY_LOGGED. Adding "delegation" to the canvas's
   ACTIVITY_LOGGED filter (PR #2126 first cut) was inert — initial
   GET worked, live updates did not.

Fix:

(1) Runtime side, workspace/builtin_tools/delegation.py:
  - New `_refresh_queued_from_platform(task_id)` async helper that
    pulls /workspaces/<self>/delegations and finds the platform-side
    delegate_result row for our task_id.
  - check_delegation_status calls _refresh when local status is
    QUEUED, so the LLM's poll itself drives state convergence.
  - Best-effort: GET failure leaves local state untouched, next
    poll retries.
  - Docstring updated to reflect the actual behavior ("polls
    transparently — keep polling and you'll see the flip").
  - 4 new tests cover: QUEUED → completed via refresh; QUEUED →
    failed via refresh; refresh keeps QUEUED when platform hasn't
    resolved; refresh swallows network errors safely.

(2) Canvas side, AgentCommsPanel.tsx WS push handler:
  - Listens for DELEGATION_SENT / DELEGATION_STATUS / DELEGATION_COMPLETE
    / DELEGATION_FAILED in addition to ACTIVITY_LOGGED.
  - Each event's payload synthesized into an ActivityEntry shape
    so toCommMessage's existing delegation branch maps it. Status
    derived: STATUS uses payload.status, COMPLETE → "completed",
    FAILED → "failed", SENT → "pending".
  - The ACTIVITY_LOGGED branch keeps the "delegation" type accepted
    as a no-op-today / future-proof path: if delegation handlers
    are ever refactored to call LogActivity, this lights up
    automatically without another canvas change.

Doesn't change: the docstring guidance ("queued → wait, don't bypass")
is now actually load-bearing because the refresh path will deliver
the eventual outcome. Without the refresh, the guidance was a trap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 16:05:04 -07:00
Hongming Wang
057876cb0c fix(delegation): runtime handles 202+queued; canvas surfaces delegation rows
Two bugs that compounded into the "Director does the work itself" UX:

1. workspace/builtin_tools/delegation.py: _execute_delegation only
   handled HTTP 200 in the response branch. When the peer's a2a-proxy
   returned HTTP 202 + {queued: true} (single-SDK-session bottleneck
   on the peer), the loop fell through. Two iterations later the
   `if "error" in result` check tried to access an unbound `result`,
   the goroutine ended quietly, and the delegation stayed at FAILED
   with error="None". The LLM checking status saw "failed" + the
   platform's "Delegation queued — target at capacity" log line in
   chat context, concluded the peer was permanently unavailable, and
   bypassed delegation to do the work itself.

   Fix: explicit 202+queued branch. Adds DelegationStatus.QUEUED,
   marks the local delegation as QUEUED, mirrors to the platform,
   and returns cleanly without retrying. The retry loop is for
   transient transport errors — queueing is a real ack, not a failure
   to retry against (retrying would just re-queue the same task).

   check_delegation_status docstring extended with explicit per-status
   guidance: pending/in_progress → wait, queued → wait (peer busy on
   prior task, reply WILL arrive), completed → use result, failed →
   real error in error field; only fall back on failed, never queued.

2. canvas/src/components/tabs/chat/AgentCommsPanel.tsx: filter dropped
   every delegation row because it whitelisted only a2a_send /
   a2a_receive. activity_type='delegation' rows (written by the
   platform's /delegate handler with method='delegate' or
   'delegate_result') never reached toCommMessage. User saw "No
   agent-to-agent communications yet" while 6+ delegations existed
   in the DB.

   Fix: include "delegation" in the both the initial filter and the
   WS push filter, plus a delegation branch in toCommMessage that
   maps the row as outbound (always — platform proxies on our behalf)
   and uses summary as the primary text source.

Tests:
  - 3 new Python tests cover the 202+queued path: status becomes
    QUEUED not FAILED; no retry on queued (counted by URL match
    against the A2A target since the mock is shared across all
    AsyncClient calls); bare 202 without {queued:true} still
    falls through to the existing retry-then-FAILED path.
  - 3 new TS tests cover the delegation mapper: 'delegate' row
    maps as outbound to target with summary text; queued
    'delegate_result' preserves status='queued' (load-bearing for
    the LLM's wait-vs-bypass decision); missing target_id returns
    null instead of rendering a ghost.

Does NOT solve: the underlying single-SDK-session bottleneck that
causes peers to queue in the first place. Tracked as task #102
(parallel SDK sessions per workspace) — real architectural work.
This PR makes the runtime handle the queueing correctly so the LLM
doesn't bail out, and makes the delegations visible in Agent Comms
so operators can see what's happening.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 15:01:50 -07:00
Hongming Wang
09bfd9bdce fix(tests): hoist _executor_mod alias so async wedge tests pass under --cov
The Copilot Auto-fix in 5a8f42b4 addressed the duplicate-import lint by
removing 'import claude_sdk_executor as _executor_mod' entirely, but the
async wedge tests (test_execute_marks_wedge_*, test_execute_clears_wedge_*)
still call _executor_mod._reset_sdk_wedge_for_test() etc. — so they failed
with NameError once that line was removed.

Restore the alias, but at the top of the file (alongside the other module-
level imports) rather than at line 1248. The late-file binding was the
proximate cause of the original CI failure: with --cov enabled (#1817),
sys.settrace + the @pytest.mark.asyncio wrapper combination caused the
late module-level binding to not be visible from inside the async test
bodies, even though the binding existed at module-load time. Hoisting
fixes that scope-resolution issue.

Verified locally with the exact CI config (--cov-fail-under=86):
  1280 passed, 2 xfailed — total coverage 90.25%

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2026-04-26 10:57:21 -07:00
Hongming Wang
5a8f42b405
Potential fix for pull request finding 'Module is imported with 'import' and 'import from''
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
2026-04-26 10:45:37 -07:00
Hongming Wang
d0f198b24f merge: resolve staging conflicts (a2a_proxy + workspace_crud)
Three files conflicted with staging changes that landed while this PR
sat open. Resolved each by combining both intents (not picking one side):

- a2a_proxy.go: keep the branch's idle-timeout signature
  (workspaceID parameter + comment) AND apply staging's #1483 SSRF
  defense-in-depth check at the top of dispatchA2A. Type-assert
  h.broadcaster (now an EventEmitter interface per staging) back to
  *Broadcaster for applyIdleTimeout's SubscribeSSE call; falls through
  to no-op when the assertion fails (test-mock case).

- a2a_proxy_test.go: keep both new test suites — branch's
  TestApplyIdleTimeout_* (3 cases for the idle-timeout helper) AND
  staging's TestDispatchA2A_RejectsUnsafeURL (#1483 regression). Updated
  the staging test's dispatchA2A call to pass the workspaceID arg
  introduced by the branch's signature change.

- workspace_crud.go: combine both Delete-cleanup intents:
  * Branch's cleanupCtx detachment (WithoutCancel + 30s) so canvas
    hang-up doesn't cancel mid-Docker-call (the container-leak fix)
  * Branch's stopAndRemove helper that skips RemoveVolume when Stop
    fails (orphan sweeper handles)
  * Staging's #1843 stopErrs aggregation so Stop failures bubble up
    as 500 to the client (the EC2 orphan-instance prevention)
  Both concerns satisfied: cleanup runs to completion past canvas
  hangup AND failed Stop calls surface to caller.

Build clean, all platform tests pass.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2026-04-26 10:43:22 -07:00
Hongming Wang
fc2720c1fe fix(git-token-helper): close TOCTOU window + stop swallowing chmod errors (closes #1552)
The token-cache helper had three #1552 findings, all in the
mode-600-after-the-fact pattern:

1. _write_cache writes .tmp with default umask (typically 022 → 644
   on disk) and then chmod 600's after the mv. A concurrent reader
   in that microsecond-wide window sees the token at mode 644.
2. Each chmod was swallowed via `|| true` — if it ever fails, the
   tokens stay world-readable with no operator signal.
3. _refresh_gh's gh_token_file write has the same shape and same
   two issues.

Hardening:

- Wrap the .tmp creates in a `umask 077` block so the files are 600
  from creation. Restore the previous umask before return so callers
  aren't perturbed.
- Replace `chmod ... 2>/dev/null || true` with `if ! chmod ...; then
  echo WARN ...; fi`. A chmod failure is a real signal worth grep'ing.
- Apply the same pattern to the _refresh_gh gh_token_file path.
  `local` is illegal in a top-level case branch, so use a uniquely-
  named global (_gh_prev_umask) and unset it after.

Verified `bash -n` clean and `shellcheck` clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 08:22:29 -07:00
Hongming Wang
5d294936b3 fixup: lower coverage floor 92→86 to match post-omit measurement
The 97% number from CI run 24956647701 was measured WITHOUT a
.coveragerc omit list. Once this PR's prescribed omit set is in
effect (`*/__init__.py`, `*/tests/*`, `plugins_registry/*` — files
that don't carry behavior), the actual measurement of behavior-bearing
code on the same staging snapshot is 91.11% (run 24957664272).

86% sits at the issue's prescribed `current − 5pp` margin and
unblocks CI without lowering the bar in real terms.
2026-04-26 06:24:36 -07:00
Hongming Wang
355355a80a test(workspace): centralize pytest-cov config + 92% floor (closes #1817)
The Python workspace already runs pytest-cov in CI but with no
threshold and inline-flagged config. CI run 24956647701 (2026-04-26
staging) reports 97% coverage on the package — well above the issue's
75% target. The actionable gap is locking in a floor so a regression
can't sneak past, and centralizing config so local `pytest` matches CI.

Changes:

- workspace/pytest.ini — coverage flags moved into addopts (-q,
  --cov=., --cov-report=term-missing, --cov-fail-under=92).
  92% = current 97% measurement minus the 5pp safety margin
  the issue's Step 3 prescribes.

- workspace/.coveragerc (new) — [run] omit list and [report]
  skip_covered. coverage.py doesn't read pytest.ini sections, so
  the omit config has to live here.

- .github/workflows/ci.yml — removed the inline --cov flags from the
  Python Lint & Test step; now reads from pytest.ini. Workflow stays
  the same single-command shape, just simpler.

Result: any PR that drops coverage below 92% fails CI loudly. Floor
ratchets up by replacing 92 with current measurement on a future
test-writing pass — same shape as Go coverage gates landed elsewhere.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 06:21:22 -07:00
Hongming Wang
76d0f8d004 fix(a2a): document the metadata-attach except-pass in a2a_executor (closes #1787)
GitHub Code Quality bot flagged the empty `except (AttributeError,
TypeError): pass` at workspace/a2a_executor.py:424 as a nit on PR #1783.
The suppression IS intentional — `new_agent_text_message()` returns
a plain string in MagicMock paths in tests where assignment to
`.metadata` raises despite hasattr being true.

This:
  - Adds a why-comment citing the test-mock motivation, commit
    dcbcf19 (the original guard), and issue #1787 so the next
    code-quality pass doesn't re-flag it.
  - Adds `logger.debug("metadata attach skipped (non-Message ...")`
    for observability — debug-level so production logs stay quiet
    but ops can flip the level if metadata loss is ever suspected.

Behavior unchanged. 43 existing a2a_executor tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 05:23:00 -07:00
rabbitblood
4a4a740804 refactor(test_config): parametrize the 3 yaml-default cases (simplify on #2085)
Collapses test_compliance_default_when_yaml_omits_block,
_when_yaml_block_is_empty, _explicit_optout_still_works into one
parametrized test_compliance_default_via_load_config with three
ids (yaml_omits_block, yaml_block_empty, yaml_explicit_optout).

The dataclass-default test stays separate (no tmp_path needed).

Coverage and assertions identical; net -19 lines, same 4 logical cases.
prompt_injection check moves out of per-case to a single tail-assert
since no payload overrode it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 02:03:59 -07:00
rabbitblood
577294b8f4 test(config): lock ComplianceConfig default to owasp_agentic (#2059)
PR #2056 flipped ComplianceConfig.mode default from "" to "owasp_agentic"
so every shipped template gets prompt-injection detection + PII redaction
by default. The flip is correct + already shipping, but no test asserts
the new default — a silent revert (or a refactor that reintroduces the
old "" default) would pass workspace/tests/ and ship a workspace with
compliance silently off.

Add 4 regression tests:

- test_compliance_dataclass_default — ComplianceConfig() with no args
  returns mode='owasp_agentic' + prompt_injection='detect'
- test_compliance_default_when_yaml_omits_block — load_config on a yaml
  without `compliance:` key still produces owasp_agentic
- test_compliance_default_when_yaml_block_is_empty — load_config on
  `compliance: {}` (a common shape during template editing) still
  produces owasp_agentic; covers the load_config()
  `.get("mode", "owasp_agentic")` default-fill path
- test_compliance_explicit_optout_still_works — `mode: ""` in yaml
  must disable compliance (the documented opt-out path)

23/23 tests pass locally (4 new + 19 existing).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 02:01:57 -07:00
Hongming Wang
2ee4b67cab chore: third-pass review polish — empty-stream gate test + Callable type
Pass 3 review came back Approve with two optional polish items.
Both taken to fully converge the loop:

1. Regression test for the empty-stream wedge-clear gate (added in
   3c4eef49). A degenerate stream that iterates without raising but
   emits NEITHER an AssistantMessage NOR a ResultMessage must NOT
   clear the wedge flag — pre-set wedge persists, the next heartbeat
   still reports runtime_state="wedged". Pins the gate against
   future regression.

2. Replaced the type annotation `"dict[str, callable[[dict], str]]"`
   (lowercase `callable`, string-quoted) with the proper
   `dict[str, Callable[[dict], str]]` using `Callable` from
   `collections.abc`. Benign before (`from __future__ import
   annotations` makes the annotation a string Python never
   evaluates), but pyright/mypy may flag the lowercase form.

65 Python tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 08:52:32 -07:00
Hongming Wang
3c4eef49aa chore: second-pass review polish — symmetry + clearer test fixtures
Round-2 review of the wedge/idle/progress bundle came back Approve
with 4 optional polish items. All taken:

1. Migration 043 down file gained `SET LOCAL lock_timeout = '5s'`
   matching the up file. A rollback under the same load that
   motivated the up-file guard would otherwise stall writers.

2. _clear_sdk_wedge_on_success now gates on actual stream content
   (result_text or assistant_chunks). A degenerate "iterator
   returned without raising but emitted nothing" case (possible
   from a partial stream or stub SDK) no longer falsely advertises
   recovery — only a real successful query (≥1 ResultMessage or
   AssistantMessage TextBlock) clears the wedge.

3. isUpstreamBusyError dropped the redundant
   `strings.Contains(msg, "context deadline exceeded")` fallback.
   *url.Error.Unwrap propagates the typed sentinel since Go 1.13;
   errors.Is(err, context.DeadlineExceeded) catches the real
   net/http shape. The substring was a foot-gun (would also match
   user-content with that phrase). Test fixture updated to use
   `fmt.Errorf("Post: %w", context.DeadlineExceeded)` which
   reflects what net/http actually returns.

4. TestIsUpstreamBusyError added a context.Canceled case (both
   typed and wrapped via %w) — pins the new applyIdleTimeout
   classification.

No critical/required findings on second pass; reviewer verdict was
Approve. Items above are polish for symmetry and test clarity.

1010 canvas + 64 Python + full Go suites pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 08:48:30 -07:00
Hongming Wang
892de784b3 fix: review-driven hardening of wedge detector + idle timeout + progress feed
Bundle review of pieces 1/2/3 surfaced two critical issues plus a
handful of required + optional fixes. All addressed.

Critical:

1. Migration 043 was missing 'paused' and 'hibernated' from the
   workspace_status enum. Both are real production statuses written
   by workspace_restart.go (lines 283 and 406), introduced by
   migration 029_workspace_hibernation. The original `USING
   status::workspace_status` cast would have errored mid-transaction
   on any production DB containing those values. Added both. Also
   added `SET LOCAL lock_timeout = '5s'` so the migration aborts
   instead of stalling the workspace fleet behind a slow SELECT.

2. The chat activity-feed window kept only 8 lines, and a single
   multi-tool turn (Read 5 files + Grep + Bash + Edit + delegate)
   easily flushed older context before the user could read it.
   Extracted appendActivityLine to chat/activityLog.ts with a
   20-line window AND consecutive-duplicate collapse (same tool
   on the same target twice in a row is noise, not new progress).
   5 unit tests pin the behavior.

Required:

3. The SDK wedge flag was sticky-only — a single transient
   Control-request-timeout from a flaky network blip locked the
   workspace into degraded for the whole process lifetime, even
   when the next query() would have succeeded. Added
   _clear_sdk_wedge_on_success(), called from _run_query's success
   path. The next heartbeat after a working query reports
   runtime_state empty and the platform recovers the workspace to
   online without a manual restart. New regression test.

4. _report_tool_use now sets target_id = WORKSPACE_ID for self-
   actions, matching the convention other self-logged activity
   rows use. DB consumers joining on target_id see a well-defined
   value instead of NULL.

Optional taken:

5. Tightened _WEDGE_ERROR_PATTERNS from "control request timeout"
   to "control request timeout: initialize" — suffix-anchored so a
   future SDK error on an in-flight tool-call control message
   doesn't get misclassified as the unrecoverable post-init wedge.

6. Dropped the redundant "context canceled" substring fallback in
   isUpstreamBusyError. errors.Is(err, context.Canceled) is the
   typed check; the substring would also match healthy client-side
   aborts, which we don't want classified as upstream-busy.

Verified: 1010 canvas tests + 64 Python tests + full Go suite pass;
migration applies cleanly on dev DB with all 8 enum values; reverse
migration restores TEXT.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 08:43:10 -07:00
Hongming Wang
166c7f77af feat(chat): stream per-tool progress into MyChat live feed
Two halves of the same UX win — the user wants to see what Claude is
doing while a chat reply is in flight instead of staring at "0s" for
minutes.

Workspace side (claude_sdk_executor.py):
  - The executor's _run_query message loop already iterated the SDK
    stream for AssistantMessage.TextBlock content. Now also detects
    ToolUseBlock / ServerToolUseBlock entries (by class name, since
    the conftest stub doesn't define them) and fires-and-forgets a
    POST /workspaces/:id/activity row of type agent_log per tool use.
  - _summarize_tool_use maps the common tools (Read, Write, Edit,
    Bash, Glob, Grep, WebFetch, WebSearch, Task, TodoWrite) to a
    one-line summary with the file path / pattern / command, falling
    back to "🛠 <tool>(…)" for anything else. Truncated at 200 chars.
  - Posts directly to /workspaces/:id/activity rather than going
    through a2a_tools.report_activity, which would also push a
    /registry/heartbeat current_task and double-log as a TASK_UPDATED
    line in the same chat feed.
  - All failures swallowed silently — telemetry must not break
    the conversation.

Canvas side (ChatTab.tsx):
  - The existing ACTIVITY_LOGGED handler streams a2a_send /
    a2a_receive / task_update events into a sliding-window
    activityLog state. Two issues fixed:
      1. No `msg.workspace_id === workspaceId` filter — a sibling
         workspace's a2a_send was leaking into the wrong chat
         panel as "→ Delegating to X...". Added an early return.
      2. No agent_log render branch. Added one that renders the
         summary verbatim (the workspace already prefixed its
         own emoji icon, so no double-icon).
  - Existing 8-line sliding window keeps the UI scoped; older
    progress lines naturally roll off as new ones arrive.

Result: when DD is delegating to Visual Designer + reading
config files + running Bash to lint, the spinner area shows:
  📄 Read /configs/system-prompt.md
   Bash: pnpm test
  → Delegating to Visual Designer...
  ← Visual Designer responded (47s)

instead of bare "0s · Processing with Claude Code..." for minutes.

63 Python tests + 58 canvas chat tests pass; tsc clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 08:28:55 -07:00
Hongming Wang
4eb09e2146 feat(platform,workspace): SDK-wedge detection + workspace_status ENUM
Heartbeat lies. The asyncio task that POSTs /registry/heartbeat lives
in its own process slot, so a workspace whose claude_agent_sdk has
wedged on `Control request timeout: initialize` keeps reporting
"online" — every chat send hangs the full 5-min platform deadline
even though the runtime is dead in the water. This commit teaches
the workspace to admit it's wedged and the platform to honor that
admission by flipping status → degraded.

Five layers, all in one commit because they share a contract:

1. Migration 043 — convert workspaces.status from free-form TEXT to
   a real `workspace_status` Postgres ENUM with the 6 values
   production code actually writes (provisioning, online, offline,
   degraded, failed, removed). Locks the value set; future typo
   writes error at the DB instead of silently storing rogue strings.
   Down migration reverts to TEXT and drops the type.

2. workspace-server/internal/models — `HeartbeatPayload` gains a
   `runtime_state string` field. Empty = healthy. Currently the only
   non-empty value the handler honors is "wedged"; future symptoms
   can extend without another migration.

3. workspace-server/internal/handlers/registry.go — `evaluateStatus`
   gains a wedge branch BEFORE the existing error_rate >= 0.5 path:
   if `RuntimeState=="wedged"` and currently online, flip to
   degraded and broadcast WORKSPACE_DEGRADED with the wedge sample
   error. Recovery (`degraded → online`) now requires BOTH
   error_rate < 0.1 AND runtime_state cleared, so a workspace still
   reporting wedged stays degraded even when its error count
   happens to be 0 (the wedge captures a runtime state, not an
   error count).

4. workspace/claude_sdk_executor.py — module-level `_sdk_wedged_reason`
   flag set when execute()'s catch block sees an error matching
   `_WEDGE_ERROR_PATTERNS` (currently just "control request
   timeout"). Sticky for the process lifetime; the SDK's internal
   client-process state is corrupted on this error and only a
   workspace restart (= new Python process = fresh module state)
   clears it. Helpers `is_wedged()` / `wedge_reason()` /
   `_reset_sdk_wedge_for_test()` exposed.

5. workspace/heartbeat.py — heartbeat body now layers on
   `_runtime_state_payload()` for both the happy path and the
   401-retry path. Lazy-imports claude_sdk_executor so non-Claude
   runtimes (where the module may not even be importable) keep
   working unchanged.

Canvas required no changes — `STATUS_CONFIG.degraded` was already
defined in design-tokens.ts (amber dot, "Degraded" label) and
WorkspaceNode.tsx already renders `lastSampleError` underneath the
status pill when status === "degraded". The existing wiring just
never fired because nothing was writing degraded in this code path.

Tests:
- 3 Go handler tests for the new transitions (online → degraded on
  wedged, degraded stays put while still wedged, degraded → online
  after wedge clears)
- 5 Python wedge-detector tests (default clean, mark sets flag,
  sticky-first-wins, execute() flips on Control request timeout,
  execute() does NOT flip on unrelated errors)
- Migration smoke-tested against the local dev DB (3 existing rows,
  all enum-compatible; migration applied cleanly, post-state has
  the column as workspace_status type and the index preserved)

Verified: 79 Python tests pass; full Go test suite passes; migration
applies clean on a real DB; reverse migration restores the column to
TEXT.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 00:59:15 -07:00
Hongming Wang
c159d85eb5 fix(a2a): review-driven hardening — prefix-anchored type check, error_detail cap, shared hint module
Three required fixes from the bundle review of 391e1872:

1. workspace/a2a_client.py: substring `type_name in msg` could miss
   the diagnostic prefix when an exception's message embedded a
   different class name mid-string (e.g. `OSError("see ConnectionError
   below")` → printed as plain msg, type lost). Switched to a
   prefix-anchored check (`msg.startswith(f"{type_name}:")` etc.) so
   the type label is always added when not already at the start of
   the message.

2. workspace/a2a_tools.py: `activity_logs.error_detail` is unbounded
   TEXT on the platform (handlers/activity.go does not validate
   length). A buggy or hostile peer could stream arbitrarily large
   error messages into the caller's activity log. Cap at 4096 chars
   at the producer — comfortably above any real exception traceback,
   well below an obvious-DoS threshold.

3. New regression test for JSON-RPC `code=0` — pins the
   `code is not None` semantics so the code is preserved in the
   detail rather than collapsing into the no-code path. Code=0 is
   not valid per the spec, but a malformed peer can still emit it
   and we want it visible for diagnosis.

Plus one optional taken: extracted the A2A-error → hint mapping into
canvas/src/components/tabs/chat/a2aErrorHint.ts. The two prior copies
(AgentCommsPanel.inferCauseHint + ActivityTab.inferA2AErrorHint) had
already drifted — Activity tab gained `not found`/`offline` cases the
chat panel never picked up, AgentCommsPanel handled empty-input
explicitly while Activity didn't. The shared module is the merged
superset, with 10 unit tests pinning each named pattern + the
"most specific first" ordering (Claude SDK wedge wins over generic
timeout).

Skipped (per analysis):
- Unicode-naive 120-char slice — Python str[:N] slices on code
  points, not bytes. Safe.
- Nested [A2A_ERROR] confusion — non-issue per reviewer; outer
  prefix winning still produces a structured render.
- MessagePreview + JsonBlock dual render on errors — intentional
  drilldown; raw JSON is below the fold for operators who need it.
- console.warn dedup — refetches don't happen per-event so spam
  risk is low.
- str(data)[:200] materialization — A2A response bodies aren't
  typically MB-sized.

Verified: 1005 canvas tests pass (10 new hint tests); 10 Python
send_a2a_message tests pass (1 new for code=0); tsc clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 23:47:44 -07:00
Hongming Wang
391e187281 fix(a2a,canvas): make delivery failures comprehensive instead of "[A2A_ERROR] "
Symptom: Activity tab and Agent Comms surfaced bare "[A2A_ERROR] "
(prefix + nothing) for failed delegations. Operator had no signal
to act on — no exception type, no target, no hint about what went
wrong, no next step. Fix is in three layers.

1. workspace/a2a_client.py — every error path now produces an
   actionable detail string:

   - except branch: some httpx exceptions (RemoteProtocolError,
     ConnectionReset variants) stringify to "". Pre-fix the catch
     was `f"{_A2A_ERROR_PREFIX}{e}"` → bare prefix. Now falls back
     to `<TypeName> (no message — likely connection reset or silent
     timeout)` and always appends `[target=<url>]` for traceability
     in chained delegations.
   - JSON-RPC error branch: previously dropped error.code on the
     floor and printed "unknown" when message was missing. Now
     surfaces both, including the well-defined "JSON-RPC error
     with no message (code=N)" path.
   - "neither result nor error" branch: pre-fix returned
     str(payload) which the canvas rendered as a successful
     response block. Now tagged as A2A_ERROR with a payload
     snippet so downstream UI routes through the error path.

2. workspace/a2a_tools.py — tool_delegate_task now passes
   error_detail (the stripped error message) through to the
   activity-log POST. The platform's activity_logs.error_detail
   column is the canvas's red error chip source; populating it
   makes the failure visible in the row header without the user
   having to expand into raw response_body JSON. The summary line
   also gets a 120-char prefix of the cause so the collapsed row
   reads "React Engineer failed: ConnectionResetError: ... [target=...]"
   instead of "React Engineer failed".

3. canvas/src/components/tabs/ActivityTab.tsx — MessagePreview
   now detects [A2A_ERROR]-prefixed bodies and renders a
   structured error block (red chip, stripped detail, cause hint)
   instead of the previous gray text-block that showed the literal
   "[A2A_ERROR]" string. inferA2AErrorHint mirrors the patterns
   from AgentCommsPanel.inferCauseHint so the same symptom reads
   the same way in both surfaces (Claude SDK init wedge → restart
   workspace; timeout → busy/stuck; connection-reset → transient
   blip then check logs).

Tests: 9 send_a2a_message tests pass (including a new regression
test for the empty-stringifying-exception case that the user
reported); 995 canvas tests pass; tsc clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 23:40:05 -07:00
Hongming Wang
65b531acf6 fix(workspace): tag self-originated A2A POSTs with X-Workspace-ID
Workspace runtime fired four classes of A2A request to the platform
without the X-Workspace-ID header that identifies the source
workspace: heartbeat self-messages, initial_prompt, idle-loop fires,
and peer-to-peer A2A from runtime tools. The platform's a2a_receive
logger keys source_id off that header — without it, every such row
was written with source_id=NULL, which the canvas's My Chat tab
filters as ?source=canvas (i.e. "user typed this") and rendered the
internal triggers as if the human user had sent them. The
"Delegation results are ready..." heartbeat trigger was visible to
end users in the chat history; delegate_task A2A calls between agents
were misclassified the same way.

Centralise the header construction in a new platform_auth helper
self_source_headers(workspace_id) that returns auth_headers() PLUS
{X-Workspace-ID: <id>}. Apply it to:

  - heartbeat.py self-message (refactored from inline header dict)
  - main.py initial_prompt POST
  - main.py idle_prompt POST
  - a2a_client.py send_a2a_message (peer A2A from runtime)
  - builtin_tools/a2a_tools.py delegate_task (was missing ALL headers)

Tests:
  - test_heartbeat.py asserts the X-Workspace-ID header is set on
    the self-message POST.
  - test_a2a_tools_module.py asserts the same on delegate_task POSTs;
    FakeClient.post mocks updated to accept the headers kwarg.

Production effect lands the moment workspace containers are rebuilt
with this code; existing rows in activity_logs keep their NULL
source_id (legacy data). The canvas-side filter (#follow-up)
covers the historical-rows case until backfill.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 19:54:43 -07:00
Hongming Wang
a34121d451 fix(a2a_executor): remove shadowing local Part import that broke streaming
Python scoping rule: any name assigned anywhere in a function body
is local for the entire body. The outbound-files block at ~L442
had `from a2a.types import ... Part ...`, which made `Part` a local
name throughout the execute() function. The astream_events loop at
L358 — which runs BEFORE that import — then raised:

  UnboundLocalError: cannot access local variable 'Part' where it
  is not associated with a value

Every streaming A2A reply died with "Agent error: cannot access
local variable 'Part' where it is not associated with a value"
instead of the actual agent text. 5 tests caught it:
  - test_streaming_plain_string_content
  - test_streaming_anthropic_content_blocks
  - test_non_stream_events_ignored
  - test_core_execute_on_chat_model_end_captures_last_ai_message
  - test_core_execute_pii_redaction_when_pii_found

Fix: drop `Part` from the function-scope import (it is already
imported at module level on line 42) and leave a comment pinning
the rationale so a future refactor doesn't re-introduce the shadow.

All 43 test_a2a_executor tests pass locally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 14:21:04 -07:00
Hongming Wang
425df5e5a9 merge(staging): resolve conflicts + fix 7 test regressions on top of #2061
- Merge origin/staging into fix/canvas-multilevel-layout-ux. 18 files
  auto-merged (mostly canvas/tabs/chat and workspace-server handlers
  the earlier DIRTY marker was stale relative to current staging).

- Fix 7 test failures surfaced by the merge:

  1. Canvas.pan-to-node.test.tsx — mockGetIntersectingNodes was
     inferred as vi.fn(() => never[]); mockReturnValueOnce of a node
     object failed type check. Explicit return-type annotation.

  2. Canvas.pan-to-node.test.tsx + Canvas.a11y.test.tsx — Canvas.tsx
     reads deletingIds.size (new multilevel-layout state). Both mock
     stores lacked deletingIds; added new Set<string>() to each.

  3. canvas-batch-partial-failure.test.ts — makeWS() built a wire-
     format WorkspaceData (snake_case, with x/y/uptime_seconds). The
     store's node.data is now WorkspaceNodeData (camelCase, no wire-
     only fields). Rewrote makeWS to produce WorkspaceNodeData and
     updated 5 call-site casts. No assertions changed.

  4. ConfigTab.hermes.test.tsx — two tests pinned pre-#2061 behavior
     that the PR intentionally inverts:

       a. "shows hermes-specific info banner" — RUNTIMES_WITH_OWN_CONFIG
          now contains only {"external"}, so the banner is no longer
          shown for hermes. Inverted assertion: now pins ABSENCE of
          the banner, with a comment noting the inversion.

       b. "config.yaml runtime wins over DB" — priority reversed:
          DB is now authoritative so the tier-on-node badge matches
          the form. Inverted scenario: DB=hermes + yaml=crewai →
          form shows hermes. Switched test's DB runtime off langgraph
          because the dropdown collapses langgraph into an empty-
          valued "default" option that would hide the win signal.

- No production code changed — this commit is staging merge + test
  realignment only. 953/953 canvas tests pass. tsc --noEmit clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 13:50:39 -07:00
Hongming Wang
94d9331c76 feat(canvas+platform): chat attachments, model selection, deploy/delete UX
Session's accumulated UX work across frontend and platform. Reviewable
in four logical sections — diff is large but internally cohesive
(each section fixes a gap the next one depends on).

## Chat attachments — user ↔ agent file round trip

- New POST /workspaces/:id/chat/uploads (multipart, 50 MB total /
  25 MB per file, UUID-prefixed storage under
  /workspace/.molecule/chat-uploads/).
- New GET /workspaces/:id/chat/download with RFC 6266 filename
  escaping and binary-safe io.CopyN streaming.
- Canvas: drag-and-drop onto chat pane, pending-file pills,
  per-message attachment chips with fetch+blob download (anchor
  navigation can't carry auth headers).
- A2A flow carries FileParts end-to-end; hermes template executor
  now consumes attachments via platform helpers.

## Platform attachment helpers (workspace/executor_helpers.py)

Every runtime's executor routes through the same helpers so future
runtimes inherit attachment awareness for free:
- extract_attached_files — resolve workspace:/file:///bare URIs,
  reject traversal, skip non-existent.
- build_user_content_with_files — manifest for non-image files,
  multi-modal list (text + image_url) for images. Respects
  MOLECULE_DISABLE_IMAGE_INLINING for providers whose vision
  adapter hangs on base64 payloads (MiniMax M2.7).
- collect_outbound_files — scans agent reply for /workspace/...
  paths, stages each into chat-uploads/ (download endpoint
  whitelist), emits as FileParts in the A2A response.
- ensure_workspace_writable — called at molecule-runtime startup
  so non-root agents can write /workspace without each template
  having to chmod in its Dockerfile.

Hermes template executor + langgraph (a2a_executor.py) + claude-code
(claude_sdk_executor.py) all adopt the helpers.

## Model selection & related platform fixes

- PUT /workspaces/:id/model — was 404'ing, so canvas "Save"
  silently lost the model choice. Stores into workspace_secrets
  (MODEL_PROVIDER), auto-restarts via RestartByID.
- applyRuntimeModelEnv falls back to envVars["MODEL_PROVIDER"]
  so Restart propagates the stored model to HERMES_DEFAULT_MODEL
  without needing the caller to rehydrate payload.Model.
- ConfigTab Tier dropdown now reads from workspaces row, not the
  (stale) config.yaml — fixes "badge shows T3, form shows T2".

## ChatTab & WebSocket UX fixes

- Send button no longer locks after a dropped TASK_COMPLETE —
  `sending` no longer initializes from data.currentTask.
- A2A POST timeout 15 s → 120 s. LLM turns routinely exceed 15 s;
  the previous default aborted fetches while the server was still
  replying, producing "agent may be unreachable" on success.
- socket.ts: disposed flag + reconnectTimer cancellation + handler
  detachment fix zombie-WebSocket in React StrictMode.
- Hermes Config tab: RUNTIMES_WITH_OWN_CONFIG drops 'hermes' —
  the adaptor's purpose IS the form, banner was contradictory.
- workspace_provision.go auto-recovery: try <runtime>-default AND
  bare <runtime> for template path (hermes lives at the bare name).

## Org deploy/delete animation (theme-ready CSS)

- styles/theme-tokens.css — design tokens (durations, easings,
  colors). Light theme overrides by setting only the deltas.
- styles/org-deploy.css — animation classes + keyframes, every
  value references a token. prefers-reduced-motion respected.
- Canvas projects node.draggable=false onto locked workspaces
  (deploying children AND actively-deleting ids) — RF's
  authoritative drag lock; useDragHandlers retains a belt-and-
  braces check.
- Organ cancel button (red pulse pill on root during deploy)
  cascades via existing DELETE /workspaces/:id?confirm=true.
- Auto fit-view after each arrival, debounced 500 ms so rapid
  sibling arrivals coalesce into one fit (previous per-event
  fit made the viewport lurch continuously).
- Auto-fit respects user-pan — onMoveEnd stamps a user-pan
  timestamp only when event !== null (ignores programmatic
  fitView) so auto-fits don't self-cancel.
- deletingIds store slice + useOrgDeployState merge gives the
  delete flow the same dim + non-draggable treatment as deploy.
- Platform-level classNames.ts shared by canvas-events +
  useCanvasViewport (DRY'd 3 copies of split/filter/join).

## Server payload change

- org_import.go WORKSPACE_PROVISIONING broadcast now includes
  parent_id + parent-RELATIVE x/y (slotX/slotY) so the canvas
  renders the child at the right parent-nested slot without doing
  any absolute-position walk. createWorkspaceTree signature gains
  relX, relY alongside absX, absY; both call sites updated.

## Tests

- workspace/tests/test_executor_helpers.py — 11 new cases
  covering URI resolution (including traversal rejection),
  attached-file extraction (both Part shapes), manifest-only
  vs multi-modal content, large-image skip, outbound staging,
  dedup, and ensure_workspace_writable (chmod 777 + non-root
  tolerance).
- workspace-server chat_files_test.go — upload validation,
  Content-Disposition escaping, filename sanitisation.
- workspace-server secrets_test.go — SetModel upsert, empty
  clears, invalid UUID rejection.
- tests/e2e/test_chat_attachments_e2e.sh — round-trip against
  a live hermes workspace.
- tests/e2e/test_chat_attachments_multiruntime_e2e.sh — static
  plumbing check + round-trip across hermes/langgraph/claude-code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 13:27:51 -07:00
Hongming Wang
9af058b82d fix(compliance): flip default mode to owasp_agentic (detect-only)
Prior state: compliance.mode default was "" (fully off) and no template
in the repo set it explicitly — so prompt-injection detection, PII
redaction, and agency-limit checks were silently disabled on every
live workspace, despite the machinery being present in
workspace/builtin_tools/compliance.py.

This was surfaced during a 2026-04-24 review of the A2A inbound path:
a2a_executor.py gates three security checks on
  _compliance_cfg.mode == "owasp_agentic"
and default config never matches, so every A2A message skipped all three.

Fix: default is now owasp_agentic + prompt_injection=detect. Detect mode
logs injection attempts as audit events without blocking — no UX cost,
just visibility. Operators who want stricter enforcement set
`prompt_injection: block` per workspace. Operators who genuinely want
compliance fully off can set `mode: ""` (not recommended; documented).

Changes:
- ComplianceConfig.mode default: "" → "owasp_agentic"
- Yaml parser fallback default: "" → "owasp_agentic" (must match dataclass)
- Docstring updated with rationale + opt-out snippet

Tests: 66/66 test_compliance.py + test_a2a_executor.py pass. 19/19
test_config.py pass. The one test asserting compliance_mode == "" is
for the "config load failed" fallback path (different from the default
config path) — correctly unchanged.

Security posture improvement: prompt-injection detection is now always
on for every workspace created after this ships, with zero behavior
change for legitimate inputs. Block mode remains an opt-in when an
operator wants to actively reject injection attempts rather than just
log them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 11:52:09 -07:00
Molecule AI Core Platform Lead
6f24cc0961 fix(executors): move set_current_task inside try so active_tasks always decrements (#2026)
If asyncio.CancelledError arrived during the heartbeat HTTP push inside
set_current_task() (the increment call), the code raised before entering
the try/finally block in _execute_locked. The finally block never ran,
so active_tasks stayed at 1 forever. Every subsequent heartbeat reported
active_tasks=1, the server saw active_tasks < max_concurrent_tasks as
false (1 < 1), and DrainQueueForWorkspace never fired. Queued A2A
requests were permanently stuck.

Fix: move set_current_task(increment) to be the FIRST statement inside
the try block, not before it. set_current_task's synchronous portion
(heartbeat.active_tasks mutation) still runs unconditionally; only the
optional HTTP push can be cancelled. The finally block now always runs
and always decrements active_tasks back to 0.

Affected executors: claude_sdk_executor, cli_executor, a2a_executor.
hermes_executor is not affected (does not call set_current_task).

Root cause of today's "active_tasks: 1 + queue drain never triggers"
P1 pattern across three workspaces.

All 167 executor tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 18:03:12 +00:00
rabbitblood
7b662d2494 ci(gh-wrapper): translate --assignee @me → --label team:<role>
Fixes #1957. All agents share one PAT, so `gh issue create --assignee @me`
resolves to the CEO. Today's "6 issues @me for 7 cycles" defect signal
turned out to be CEO-load misclassified as team-stagnation.

Translation rules:
- `--assignee @me` → `--label team:<role-slug>`
- `--reviewer @me` → dropped (review-bot scans labels, not requests)
- `--assignee user` (real user) → unchanged

role-slug derived from GIT_AUTHOR_NAME ("Molecule AI Core-BE" → "core-be").
The wrapper already handled the title-prefix + body-footer transforms;
these are just two more cases in the existing arg-walk loop.

Backward compat: any agent prompt that doesn't use @me passes through
unchanged. Agents don't need prompt updates — the wrapper is transparent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 00:34:21 -07:00
molecule-ai[bot]
35bcad9204
feat(workspace): migrate a2a-sdk from 0.3.x to 1.0.0 (KI-009) (#1974)
* feat(workspace): migrate a2a-sdk from 0.3.x to 1.0.0 (KI-009)

Migrates all workspace code from a2a-sdk v0.3.x to v1.0.0, following the
official migration guide from a2aproject/a2a-python.

Breaking changes applied:
- A2AStarletteApplication → Starlette route factory
  (create_agent_card_routes + create_jsonrpc_routes)
- AgentCard.url removed; url+protocol now in supported_protocols[].url
- AgentCapabilities fields renamed to snake_case
  (pushNotifications→push_notifications,
   stateTransitionHistory→state_transition_history)
- AgentCard.defaultInputModes/outputModes → default_input_modes/output_modes
- TaskState.canceled → TaskState.TASK_STATE_CANCELED
- a2a.utils → a2a.helpers
- Part(root=TextPart(text=t)) → Part(text=t) (TextPart removed)

Files changed:
- requirements.txt: pinned >=1.0.0,<2.0
- main.py: Starlette route factory + AgentCard restructure
- a2a_executor.py: Part() + TaskState + helpers import
- hermes_executor.py: TaskState + helpers import
- google-adk/adapter.py: TaskState + helpers import
- cli_executor.py: helpers import
- claude_sdk_executor.py: helpers import
- tests/conftest.py: a2a.helpers mock stub
- tests/test_a2a_executor.py: TaskState enum key
- adapters/google-adk/test_adapter.py: Part + helpers stub

Refs: KI-009
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(test): update _TaskState mock to a2a-sdk v1 enum name (TASK_STATE_CANCELED)

---------

Co-authored-by: Molecule AI Tech Researcher <tech-researcher@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>
2026-04-24 04:43:17 +00:00
61c5f8ad9a feat(plugin): implement MCPServerAdaptor (issue #847)
Rule-of-three threshold met: 4 plugin proposals (molecule-firecrawl
#512, molecule-github-mcp #520, molecule-browser-use #553, mcp-connector
#573) all independently shipped the same mcpServers-adapter pattern.

Adds MCPServerAdaptor to builtins.py — plugins wrapping an MCP server
now declare `from plugins_registry.builtins import MCPServerAdaptor as
Adaptor` in their per-runtime adapter file. The adaptor:

- Merges mcpServers from settings-fragment.json into
  <configs>/.claude/settings.json (deep-merge so multiple plugins'
  servers coexist).
- Optionally ships skills/rules/setup.sh via AgentskillsAdaptor
  delegation.
- On uninstall: removes skills/rules but intentionally leaves
  mcpServers entries in settings.json (users may share configs with
  other tools or have manually curated entries).

Also fixes _deep_merge_hooks: non-hook top-level keys that are dicts
(e.g. mcpServers) are now deep-merged with existing values instead of
being skipped via setdefault.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 01:42:13 +00:00
b5e2142c46 fix(#1877): close token-rotation race on restart — Option A+Option B combined
Platform side (Option B):
- provisioner.go: add WriteAuthTokenToVolume() — writes .auth_token to
  the Docker named volume BEFORE ContainerStart using a throwaway alpine
  container, eliminating the race window where a restarted container could
  read a stale token before WriteFilesToContainer writes the new one.
- workspace_provision.go: call WriteAuthTokenToVolume() in issueAndInjectToken
  as a best-effort pre-write before the container starts.

Runtime side (Option A):
- heartbeat.py: on HTTPStatusError 401 from /registry/heartbeat, call
  refresh_cache() to force re-read of /configs/.auth_token from disk,
  then retry the heartbeat once. Fall through to normal failure tracking
  if the retry also fails.
- platform_auth.py: add refresh_cache() which discards the in-process
  _cached_token and calls get_token() to re-read from disk.

Together these eliminate the >1 consecutive 401 window described in
issue #1877. Pre-write (B) is the primary fix; runtime retry (A) is the
self-healing fallback for any residual race.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 17:47:18 -07:00
Hongming Wang
925a71887d
fix(workspace): credential helper security hardening (#1797)
Four findings from security audit (internal/security/credential-token-backlog.md):

1. STDERR LEAK — molecule-git-token-helper.sh:146,153 logged ${response}
   on platform errors. The response body MAY contain the token in some
   failure modes (alternate JSON key shape on partial success). Now:
   - capture curl's stderr to a tmp file (not $response) so we can log
     the curl error message without ever interpolating the response body
   - on empty-token branch, log only response size (bytes) for debug
2. CHMOD 600 — already in place at lines 116, 124, 223 (verified, no change)
3. RESPAWN SUPERVISION — entrypoint.sh wrapped daemon launch in a
   while-true bash loop with 30s back-off. Without this, a daemon crash
   silently leaves the workspace stuck on an expired token until the
   container restarts. Logs to /home/agent/.gh-token-refresh.log
   (agent-writable; /var/log is root-owned).
4. JITTER — molecule-gh-token-refresh.sh: added 0..120s random offset to
   each sleep so 39 containers don't synchronize their refresh requests
   against the platform endpoint.

Also:
- Daemon now sends helper output to /dev/null instead of merging stderr,
  belt-and-suspenders against any future helper change that might write
  the token to stdout.
- Daemon log lines include rc=$? on failure for actionable triage.

Inherent risks (org-wide token blast, prompt-injection theft, bearer
in volume, no audit log) tracked in internal/security/credential-token-backlog.md
as separate roadmap items.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>
2026-04-23 18:14:55 +00:00
Molecule AI Marketing Lead
e00797ba35 fix(security): prevent cross-tenant memory contamination in commit_memory/recall_memory (GH#1610)
Two critical gaps in a2a_tools.py let any tenant workspace poison org-wide
(GLOBAL) memory and bypass all RBAC enforcement:

1. tool_commit_memory had no RBAC check — any agent could write any scope.
2. tool_commit_memory had no root-workspace enforcement for GLOBAL scope —
   Tenant A could POST scope=GLOBAL and pollute the shared memory store
   that Tenant B's agent reads as trusted context.

Fix adds:
- _ROLE_PERMISSIONS table (mirrors builtin_tools/audit.py) so a2a_tools
  has isolated RBAC logic without depending on memory.py.
- _check_memory_write_permission() / _check_memory_read_permission() helpers:
  evaluate RBAC roles from WorkspaceConfig; fail closed (deny) on errors.
- _is_root_workspace() / _get_workspace_tier(): read WorkspaceConfig.tier
  (0 = root/org, 1+ = tenant) from config.yaml; fall back to
  WORKSPACE_TIER env var.
- tool_commit_memory now (a) checks memory.write RBAC, (b) rejects
  GLOBAL scope for non-root workspaces, (c) embeds workspace_id in the
  POST body so the platform can namespace-isolate and audit cross-workspace
  writes.
- tool_recall_memory now checks memory.read RBAC before any HTTP call,
  and always sends workspace_id as a GET param for platform cross-validation.

Security regression tests added:
- GLOBAL scope denied for non-root (tier>0) workspaces.
- RBAC denial blocks all scope levels (including LOCAL) on write.
- RBAC denial blocks recall entirely.
- workspace_id present in POST body and GET params.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 10:21:34 -07:00
Hongming Wang
2885583d05 feat(workspace): 45-min gh-token refresh daemon + credential helper cache
Extracted from the now-closed PR #1664 (Molecule-AI/molecule-core).

- New scripts/molecule-gh-token-refresh.sh background daemon — every
  45 min (TOKEN_REFRESH_INTERVAL_SEC) calls the credential helper's
  _refresh_gh action to keep both gh CLI auth and the on-disk cache
  fresh through the GitHub App installation token's ~60 min TTL.
- scripts/molecule-git-token-helper.sh rewritten with a ~50 min
  on-disk cache (${CACHE_DIR}/gh_installation_token + _expiry
  companion file), a cache > API > env-var fallback chain, a new
  _refresh_gh action (invoked by the daemon above), a _invalidate_cache
  action, and path references flipped from /workspace/scripts/... to
  /app/scripts/... to match the runtime image layout.
- Dockerfile copies the new refresh daemon and extends mkdir to
  create /home/agent/.molecule-token-cache at build time.
- entrypoint.sh configures the git credential helper for github.com
  while still root (so the global gitconfig is written before the
  gosu handoff), creates + chowns the token cache dir, then as agent
  starts the refresh daemon in the background and does an initial
  gh auth login from GITHUB_TOKEN/GH_TOKEN so gh works before the
  first refresh fires.

Dropped from PR #1664: cosmetic em-dash -> ASCII hyphen rewrites
(charset-normalizer noise) that would conflict with the repo's
existing em-dash convention used elsewhere in workspace/.
2026-04-22 19:52:46 -07:00
rabbitblood
dcbcf19da1 fix(test): guard msg.metadata assignment for non-Message returns
new_agent_text_message returns a real Message object in production but
some test mocks return a plain string. Guard with hasattr + try/except
so the tool_trace assignment doesn't crash test_non_stream_events_ignored.
2026-04-22 16:24:55 -07:00
rabbitblood
ed26f2733a fix(review): address code review blockers on tool-trace + instructions
BLOCKERS fixed:
- instructions.go: Drop team-scope queries (teams/team_members tables don't
  exist in any migration). Schema column kept for future. Restored Resolve
  to /workspaces/:id/instructions/resolve under wsAuth — closes auth gap
  that allowed cross-workspace enumeration of operator policy.
- migration 040: Add CHECK constraints on title (<=200) and content (<=8192)
  to prevent token-budget DoS via oversized instructions.
- a2a_executor.py: Pair on_tool_start/on_tool_end via run_id instead of
  list-position so parallel tool calls don't drop or clobber outputs. Cap
  tool_trace at 200 entries to prevent runaway loops bloating JSONB.

HIGH fixes:
- instructions.go: Add length validation in Create + Update handlers.
  Removed dead rows_ shadow variable. Replaced string concatenation in
  Resolve with strings.Builder.
- prompt.py: Drop httpx timeout 10s -> 3s (boot hot path). Switch print
  to logger.warning. Add Authorization bearer header from
  MOLECULE_WORKSPACE_TOKEN env var.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-22 16:18:06 -07:00
rabbitblood
d7afd15e59 feat: platform instructions system with global/team/workspace scope
Adds a configurable instruction injection system that prepends rules to
every agent's system prompt. Instructions are stored in the DB and fetched
at workspace startup, supporting three scopes:

- Global: applies to all agents (e.g., "verify with tools before reporting")
- Team: applies to agents in a specific team
- Workspace: applies to a single agent (role-specific rules)

Components:
- Migration 040: platform_instructions table with scope hierarchy
- Go API: CRUD endpoints + resolve endpoint that merges scopes
- Python runtime: fetches instructions at startup via /instructions/resolve
  and prepends them to the system prompt as highest-priority context

Initial global instructions seeded:
1. Verify Before Acting (check issues/PRs/docs first)
2. Verify Output Before Reporting (second signal before reporting done)
3. Tool Usage Requirements (claims must include tool output)
4. No Hallucinated Emergencies (CRITICAL needs proof)
5. Staging-First Workflow (never push to main directly)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-22 15:17:14 -07:00
rabbitblood
6c618c9c3f feat: add tool_trace to activity_logs for platform-level agent observability
Every A2A response now includes a tool_trace — the list of tools/commands
the agent actually invoked during execution. This enables verifying agent
claims against what they actually did, catches hallucinated "I checked X"
responses, and provides an audit trail for the CEO to control hundreds of
agents by checking the top-level PM's trace.

Changes:
- Python runtime: collect tool name/input/output_preview on every
  on_tool_start/on_tool_end event, embed in Message.metadata.tool_trace
- Go platform: extract tool_trace from A2A response metadata, store in
  new activity_logs.tool_trace JSONB column with GIN index
- Activity API: expose tool_trace in List and broadcast endpoints
- Migration 039: adds tool_trace column + GIN index

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-22 15:17:14 -07:00
Hongming Wang
1aea013e20 fix(ci): unblock main CI on ubuntu-latest — IPv6-safe addr + MagicMock seed
Two latent bugs the self-hosted Mac mini had been hiding. Both caught
by the newer toolchain on ubuntu-latest runners after PR #1626.

1. workspace-server/internal/handlers/terminal.go:442
   `fmt.Sprintf("%s:%d", host, port)` flagged by go vet as unsafe
   for IPv6 (it omits the required [::] brackets). Replaced with
   `net.JoinHostPort(host, strconv.Itoa(port))` which handles both
   IPv4 and IPv6 correctly. No runtime behaviour change — the only
   call site passes "127.0.0.1", so the bug would never trigger in
   practice, but vet is right to flag it as a latent correctness
   issue.

2. workspace/tests/test_a2a_executor.py::test_set_current_task_updates_heartbeat
   `MagicMock()` auto-creates attributes on first access, so
   `getattr(heartbeat, "active_tasks", 0)` in shared_runtime.py
   returned a MagicMock rather than the default 0. Adding 1 to a
   MagicMock returns another MagicMock, so the assertion
   `heartbeat.active_tasks == 1` never held. Seeding
   `heartbeat.active_tasks = 0` before the first call makes
   getattr() return a real int, matching how the real HeartbeatLoop
   class initialises itself.

Both pre-existed on main and were hidden by the older Python / Go
toolchains on the Mac mini runner. Verified locally (venv pytest
pass, `go vet ./...` + `go build ./...` clean on workspace-server).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 13:18:46 -07:00
molecule-ai[bot]
64ccf8e179
fix: CWE-78 rm scope, go vet failures, delegation idempotency
* refactor: split 4 oversized handler files into focused sub-files

- org.go (1099 lines) → org.go + org_import.go + org_helpers.go
- mcp.go (1001 lines) → mcp.go + mcp_tools.go
- workspace.go (934 lines) → workspace.go + workspace_crud.go
- a2a_proxy.go (825 lines) → a2a_proxy.go + a2a_proxy_helpers.go

No functional changes — same package, same exports, same tests.
All files stay under 635 lines.

Note: isSafeURL and isPrivateOrMetadataIP are duplicated between
mcp_tools.go and a2a_proxy_helpers.go — this is a pre-existing issue
from the original mcp.go and a2a_proxy.go, not introduced by this split.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(runtime+scheduler): increment/decrement active_tasks counter (refs #1386)

* docs(tutorials): add Self-Hosted AI Agents guide — Docker, Fly Machines, bare metal

* docs: add Remote Agents feature + Phase 30 blog links to docs index

* docs(marketing): update Phase 30 brief — Action 5 complete, docs/index.md update noted

* docs(api-ref): add workspace file copy API reference (#1281)

Documents TemplatesHandler.copyFilesToContainer (container_files.go):
- Endpoint overview: PUT /workspaces/:id/files/*path
- Parameter descriptions for all four function parameters
- CWE-22 path traversal protection (PRs #1267/1270/1271)
- Defense-in-depth: validateRelPath at handler + archive boundary
- Full error code table (400/404/500)
- curl example with success and path-traversal rejection cases

Also covers: writeViaEphemeral routing, findContainer fallback,
allowed roots allow-list, and related links to platform-api.md.

Co-authored-by: Molecule AI Technical Writer <technical-writer@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(security): CWE-78/CWE-22 — block shell injection in deleteViaEphemeral (#1310)

## Summary
Issue #1273: deleteViaEphemeral interpolated filePath directly into
rm command, enabling both shell injection (CWE-78) and path traversal
(CWE-22) attacks.

## Changes
1. Added validateRelPath(filePath) guard before constructing the rm command.
   validateRelPath blocks absolute paths and ".." traversal sequences.
2. Changed Cmd from "/configs/"+filePath (string interpolation) to
   []string{"rm", "-rf", "/configs", filePath} (exec form). This
   eliminates shell injection entirely — filePath is a plain argument,
   never interpreted as shell code.

## Security properties
- validateRelPath: blocks "../" and absolute paths before they reach Docker
- Exec form: filePath cannot inject shell metacharacters even if validation
  is somehow bypassed
- "/configs" as separate arg: rm has exactly two arguments, no room for
  injected args

Closes #1273.

Co-authored-by: Molecule AI Infra-Runtime-BE <infra-runtime-be@agents.moleculesai.app>

* fix(security): backport SSRF defence (CWE-918) to main — isSafeURL in a2a_proxy.go (#1292) (#1302)

* fix(security): backport SSRF defence (CWE-918) to main — isSafeURL in mcp.go and a2a_proxy.go

Issue #1042: 3 CodeQL SSRF findings across mcp.go and a2a_proxy.go.
staging already ships the fix (PRs #1147, #1154 → merged); main did not include it.

- mcp.go: add isSafeURL() + isPrivateOrMetadataIP() helpers; validate
  agentURL before outbound calls in mcpCallTool (line ~529) and
  toolDelegateTaskAsync (line ~607)
- a2a_proxy.go: add identical isSafeURL() + isPrivateOrMetadataIP()
  helpers; call isSafeURL() before dispatchA2A in resolveAgentURL()
  (blocks finding #1 at line 462)
- mcp_test.go: 19 new tests covering all blocked URL patterns:
  file://, ftp://, 127.0.0.1, ::1, 169.254.169.254, 10.x.x.x,
  172.16.x.x, 192.168.x.x, empty hostname, invalid URL,
  isPrivateOrMetadataIP across all private/CGNAT/metadata ranges

1. URL scheme enforcement — http/https only
2. IP literal blocking — loopback, link-local, RFC-1918, CGNAT, doc/test ranges
3. DNS hostname resolution — blocks internal hostnames resolving to private IPs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci-blocker): remove duplicate isSafeURL/isPrivateOrMetadataIP from mcp.go

Issue #1292: PR #1274 duplicated isSafeURL + isPrivateOrMetadataIP in
mcp.go — both functions already exist on main at lines 829 and 876.
Kept the mcp.go definitions (the originals) and removed the 70-line
duplicate appended at end of file. a2a_proxy.go functions are
unchanged — they serve the same purpose via a separate code path.

* fix: remove orphaned commit-text lines from a2a_proxy.go

Three lines from the PR/commit title were accidentally baked into the
file during the rebase from #1274 to #1302, causing a Go syntax error
(a bare string literal at statement level followed by dangling braces).

Deletion restores:
  }
  return agentURL, nil
}

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Molecule AI Infra-Runtime-BE <infra-runtime-be@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Molecule AI Core-BE <core-be@agents.moleculesai.app>
Co-authored-by: Molecule AI SDK Lead <sdk-lead@agents.moleculesai.app>

* fix(canvas/test): patch test regressions from PR #1243 + proximity hitbox fix (#1313)

* fix(ci): revert cancel-in-progress to true — ubuntu-runner dispatch stalled

With cancel-in-progress: false, pending CI runs accumulate in the
ci-staging concurrency group. New pushes create queued runs, but
GitHub dispatches multiple runs for the same SHA instead of replacing
the pending one. All runs get stuck/cancelled before completing.

Reverting to cancel-in-progress: true restores CI operation — runs
that are superseded are cancelled, freeing the concurrency slot for
the new run to proceed.

Runner availability (ubuntu-latest dispatch stall) is a separate
infra issue tracked independently.

* fix(security): validate tar header names in copyFilesToContainer — CWE-22 path traversal (#1043)

Tar header names were built from raw map keys without validation. A malicious
server-side caller could embed "../" in a file name to escape the destPath
volume mount (/configs) and write files outside the intended directory.

Fix: validate each name with filepath.Clean + IsAbs + HasPrefix("..") checks
before using it in the tar header, then join with destPath for the archive
header. Also guard parent-directory creation against traversal.

Closes #1043.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas/test): patch regressed tests from PR #1243 orgs-page flakiness fix

Two regressions introduced by PR #1243 (fix issue #1207):

1. **ContextMenu.keyboard.test.tsx** — `setPendingDelete` now receives
   `{id, name, hasChildren}` (cascade-delete UX, PR #1252), but the test
   expected only `{id, name}`. Added `hasChildren: false` to the assertion.

2. **orgs-page.test.tsx** — 10 tests awaited `vi.advanceTimersByTimeAsync(50)`
   without `act()`. With fake timers, `setState` (synchronous) is flushed by
   `advanceTimersByTimeAsync`, but the React state update it triggers is a
   microtask — so the test saw stale render. Wrapping in `act(async () =>
   { await vi.advanceTimersByTimeAsync(50); })` ensures microtasks drain
   before assertions run.

All 813 vitest tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas): add 100px proximity threshold to drag-to-nest detection

Fixes #1052 — previously, getIntersectingNodes() returned any node whose
bounding box overlapped the dragged node, regardless of actual pixel
distance. On a sparse canvas this triggered the "Nest Workspace" dialog
even when the dragged node was nowhere near any target.

The fix adds an on-node-drag proximity filter: only nodes within 100px
(center-to-center) of the dragged node are eligible as nest targets.
Distance is computed as squared Euclidean to avoid the sqrt overhead in
the hot drag path.

Added two tests to Canvas.pan-to-node.test.tsx covering the mock wiring
and confirming the regression is addressed in Canvas.tsx.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>
Co-authored-by: Molecule AI Core-FE <core-fe@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas): add ?? 0 guard for optional budget_used in progressPct (#1324) (#1327)

* fix(ci): revert cancel-in-progress to true — ubuntu-runner dispatch stalled

With cancel-in-progress: false, pending CI runs accumulate in the
ci-staging concurrency group. New pushes create queued runs, but
GitHub dispatches multiple runs for the same SHA instead of replacing
the pending one. All runs get stuck/cancelled before completing.

Reverting to cancel-in-progress: true restores CI operation — runs
that are superseded are cancelled, freeing the concurrency slot for
the new run to proceed.

Runner availability (ubuntu-latest dispatch stall) is a separate
infra issue tracked independently.

* fix(security): validate tar header names in copyFilesToContainer — CWE-22 path traversal (#1043)

Tar header names were built from raw map keys without validation. A malicious
server-side caller could embed "../" in a file name to escape the destPath
volume mount (/configs) and write files outside the intended directory.

Fix: validate each name with filepath.Clean + IsAbs + HasPrefix("..") checks
before using it in the tar header, then join with destPath for the archive
header. Also guard parent-directory creation against traversal.

Closes #1043.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas/test): patch regressed tests from PR #1243 orgs-page flakiness fix

Two regressions introduced by PR #1243 (fix issue #1207):

1. **ContextMenu.keyboard.test.tsx** — `setPendingDelete` now receives
   `{id, name, hasChildren}` (cascade-delete UX, PR #1252), but the test
   expected only `{id, name}`. Added `hasChildren: false` to the assertion.

2. **orgs-page.test.tsx** — 10 tests awaited `vi.advanceTimersByTimeAsync(50)`
   without `act()`. With fake timers, `setState` (synchronous) is flushed by
   `advanceTimersByTimeAsync`, but the React state update it triggers is a
   microtask — so the test saw stale render. Wrapping in `act(async () =>
   { await vi.advanceTimersByTimeAsync(50); })` ensures microtasks drain
   before assertions run.

All 813 vitest tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas): add 100px proximity threshold to drag-to-nest detection

Fixes #1052 — previously, getIntersectingNodes() returned any node whose
bounding box overlapped the dragged node, regardless of actual pixel
distance. On a sparse canvas this triggered the "Nest Workspace" dialog
even when the dragged node was nowhere near any target.

The fix adds an on-node-drag proximity filter: only nodes within 100px
(center-to-center) of the dragged node are eligible as nest targets.
Distance is computed as squared Euclidean to avoid the sqrt overhead in
the hot drag path.

Added two tests to Canvas.pan-to-node.test.tsx covering the mock wiring
and confirming the regression is addressed in Canvas.tsx.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas): add ?? 0 guard for optional budget_used in progressPct

Fixes #1324 — TypeScript strict mode flags budget.budget_used as
possibly undefined in the progressPct ternary, even though the
outer condition checks budget_limit > 0.

Fix: use nullish coalescing (budget_used ?? 0) so progress shows 0%
when the backend returns a partial shape (provisioning-stuck
workspaces). Also adds a test covering the undefined-budget_used
case with the progress bar aria-valuenow and fill width both at 0%.

Closes #1324.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>
Co-authored-by: Molecule AI Core-FE <core-fe@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas): add ?? 0 guard for optional budget_used in progressPct (issue #1324) (#1329)

* fix(ci): revert cancel-in-progress to true — ubuntu-runner dispatch stalled

With cancel-in-progress: false, pending CI runs accumulate in the
ci-staging concurrency group. New pushes create queued runs, but
GitHub dispatches multiple runs for the same SHA instead of replacing
the pending one. All runs get stuck/cancelled before completing.

Reverting to cancel-in-progress: true restores CI operation — runs
that are superseded are cancelled, freeing the concurrency slot for
the new run to proceed.

Runner availability (ubuntu-latest dispatch stall) is a separate
infra issue tracked independently.

* fix(security): validate tar header names in copyFilesToContainer — CWE-22 path traversal (#1043)

Tar header names were built from raw map keys without validation. A malicious
server-side caller could embed "../" in a file name to escape the destPath
volume mount (/configs) and write files outside the intended directory.

Fix: validate each name with filepath.Clean + IsAbs + HasPrefix("..") checks
before using it in the tar header, then join with destPath for the archive
header. Also guard parent-directory creation against traversal.

Closes #1043.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas/test): patch regressed tests from PR #1243 orgs-page flakiness fix

Two regressions introduced by PR #1243 (fix issue #1207):

1. **ContextMenu.keyboard.test.tsx** — `setPendingDelete` now receives
   `{id, name, hasChildren}` (cascade-delete UX, PR #1252), but the test
   expected only `{id, name}`. Added `hasChildren: false` to the assertion.

2. **orgs-page.test.tsx** — 10 tests awaited `vi.advanceTimersByTimeAsync(50)`
   without `act()`. With fake timers, `setState` (synchronous) is flushed by
   `advanceTimersByTimeAsync`, but the React state update it triggers is a
   microtask — so the test saw stale render. Wrapping in `act(async () =>
   { await vi.advanceTimersByTimeAsync(50); })` ensures microtasks drain
   before assertions run.

All 813 vitest tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas): add 100px proximity threshold to drag-to-nest detection

Fixes #1052 — previously, getIntersectingNodes() returned any node whose
bounding box overlapped the dragged node, regardless of actual pixel
distance. On a sparse canvas this triggered the "Nest Workspace" dialog
even when the dragged node was nowhere near any target.

The fix adds an on-node-drag proximity filter: only nodes within 100px
(center-to-center) of the dragged node are eligible as nest targets.
Distance is computed as squared Euclidean to avoid the sqrt overhead in
the hot drag path.

Added two tests to Canvas.pan-to-node.test.tsx covering the mock wiring
and confirming the regression is addressed in Canvas.tsx.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas): add ?? 0 guard for optional budget_used in progressPct

Fixes #1324 — TypeScript strict mode flags budget.budget_used as
possibly undefined in the progressPct ternary, even though the
outer condition checks budget_limit > 0.

Fix: use nullish coalescing (budget_used ?? 0) so progress shows 0%
when the backend returns a partial shape (provisioning-stuck
workspaces). Also adds a test covering the undefined-budget_used
case with the progress bar aria-valuenow and fill width both at 0%.

Closes #1324.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>
Co-authored-by: Molecule AI Core-FE <core-fe@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(platform): unblock SaaS workspace registration end-to-end

Every workspace in the cross-EC2 SaaS provisioning shape was failing
registration, heartbeat, or A2A routing. Four distinct blockers sat
between "EC2 is up" and "agent responds"; three are platform-side and
fixed here (the fourth is in the CP user-data, separate PR).

1. SSRF validator blocked RFC-1918 (registry.go + mcp.go)
   validateAgentURL and isPrivateOrMetadataIP rejected 172.16.0.0/12,
   which contains the AWS default VPC range (172.31.x.x) that every
   sibling workspace EC2 registers from. Registration returned 400 and
   the 10-min provision sweep flipped status to failed. RFC-1918 +
   IPv6 ULA are now gated behind saasMode(); link-local (169.254/16),
   loopback, IPv6 metadata (fe80::/10, ::1), and TEST-NET stay blocked
   unconditionally in both modes.

   saasMode() resolution order:
     1. MOLECULE_DEPLOY_MODE=saas|self-hosted (explicit operator flag)
     2. MOLECULE_ORG_ID presence (legacy implicit signal, kept for
        back-compat so existing deployments don't need a config change)

   isPrivateOrMetadataIP now actually checks IPv6 — previously it
   returned false on any non-IPv4 input, which would let a registered
   [::1] or [fe80::...] URL bypass the SSRF check entirely.

2. Orphan auth-token minting (workspace_provision.go)
   issueAndInjectToken mints a token and stuffs it into
   cfg.ConfigFiles[".auth_token"]. The Docker provisioner writes that
   file into the /configs volume — the CP provisioner ignores it
   (only cfg.EnvVars crosses the wire). Result: live token in DB, no
   plaintext on disk, RegistryHandler.requireWorkspaceToken 401s every
   /registry/register attempt because the workspace is no longer in
   the "no live token → bootstrap-allowed" state. Now no-ops in SaaS
   mode; the register handler already mints on first successful
   register and returns the plaintext in the response body for the
   runtime to persist locally.

   Also removes the redundant wsauth.IssueToken call at the bottom of
   provisionWorkspaceCP, which created the same orphan-token pattern
   a second time.

3. Compaction artefacts (bundle/importer.go, handlers/org_tokens.go,
   scheduler.go, workspace_provision.go)
   Four pre-existing compile errors on main from an earlier session's
   code truncation: missing tuple destructuring on ExecContext /
   redactSecrets / orgTokenActor, missing close-brace in
   Scheduler.fireSchedule's panic recovery. All one-line mechanical
   fixes; without them the binary would not build.

Tests
-----
ssrf_test.go adds:
  * TestSaasMode — covers the env resolution ladder (explicit flag
    wins over legacy signal, case-insensitive, whitespace tolerant)
  * TestIsPrivateOrMetadataIP_SaaSMode — asserts RFC-1918 + IPv6 ULA
    flip to allowed, metadata/loopback/TEST-NET still blocked
  * TestIsPrivateOrMetadataIP_IPv6 — regression guard for the old
    "returns false for all IPv6" behaviour

Follow-up issue for CP-sourced workspace_id attestation will be filed
separately — closes the residual intra-VPC SSRF + token-race windows
the SaaS-mode relaxation introduces.

Verified end-to-end today on workspace 6565a2e0 (hermes runtime, OpenAI
provider) — agent returned "PONG" in 1.4s after register → heartbeat →
A2A proxy → runtime.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(runtime+scheduler): increment/decrement active_tasks + max_concurrent (#1408)

Runtime (shared_runtime.py):
- set_current_task now increments active_tasks on task start, decrements
  on completion (was binary 0/1)
- Counter never goes below 0 (max(0, n-1))
- Pushes heartbeat immediately on BOTH increment and decrement (#1372)

Scheduler (scheduler.go):
- Reads max_concurrent_tasks from DB (default 1, backward compatible)
- Skips cron only when active_tasks >= max_concurrent_tasks (was > 0)
- Leaders can be configured with max_concurrent_tasks > 1 to accept
  A2A delegations while a cron runs

Platform:
- Added max_concurrent_tasks column to workspaces (migration 037)
- Workspace model + list/get queries include the new field
- API exposes max_concurrent_tasks in workspace JSON

Config.yaml support (future): runtime_config.max_concurrent_tasks

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(review): address 3 critical issues from code review

1. BLOCKER: executor_helpers.py now uses increment/decrement too
   (was still binary 0/1, stomping the counter for CLI + SDK executors)

2. BUG: asymmetric getattr defaults fixed — both paths use default 0
   (was 0 on increment, 1 on decrement)

3. UX: current_task preserved when active_tasks > 0 on decrement
   (was clearing task description even when other tasks still running)

4. Scheduler polling loop re-reads max_concurrent_tasks on each poll
   (was using stale value from initial query)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Hongming Wang <hongmingwangrabbit@gmail.com>
Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>
Co-authored-by: Molecule AI Technical Writer <technical-writer@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Molecule AI Infra-Runtime-BE <infra-runtime-be@agents.moleculesai.app>
Co-authored-by: Molecule AI Core-BE <core-be@agents.moleculesai.app>
Co-authored-by: Molecule AI SDK Lead <sdk-lead@agents.moleculesai.app>
Co-authored-by: Molecule AI Core-FE <core-fe@agents.moleculesai.app>
Co-authored-by: Hongming Wang <hongmingwang.rabbit@users.noreply.github.com>

* docs: workspace files API reference, skill catalog, and links

* docs: fix secrets endpoint path across docs

The workspace secrets endpoint is `/workspaces/:id/secrets`, not
`/secrets/values`. This was wrong in quickstart.md (Path 2: Remote Agent)
and workspace-runtime.md (registration flow example and comparison table).
The external-agent-registration guide already had the correct path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: fix broken blog cross-link in skills-vs-bundled-tools post

Link path had an extra `/docs/` segment: `/docs/blog/...` instead of
`/blog/...`. Nextra resolves blog posts directly under `/blog/<slug>`,
not under `/docs/blog/`.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: add skill-catalog.md guide

Linked from the skills-vs-bundled-tools blog post as a reference
for TTS/image-generation/web-search skills. The blog promises
"install directly via the CLI" with a skill catalog — this page
fills that promise by documenting available skill types, install
commands, version management, custom skill authoring, and removal.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(marketing): update Phase 30 brief — Action 5 complete, docs/index.md update noted

* docs(api-ref): add workspace file copy API reference

Documents TemplatesHandler.copyFilesToContainer (container_files.go):
- Endpoint overview: PUT /workspaces/:id/files/*path
- Parameter descriptions for all four function parameters
- CWE-22 path traversal protection (PRs #1267/1270/1271)
- Defense-in-depth: validateRelPath at handler + archive boundary
- Full error code table (400/404/500)
- curl example with success and path-traversal rejection cases

Also covers: writeViaEphemeral routing, findContainer fallback,
allowed roots allow-list, and related links to platform-api.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Molecule AI Technical Writer <technical-writer@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>

* fix(handlers): add saasMode() gating to isPrivateOrMetadataIP in a2a_proxy_helpers.go

Issue #1421 / #1401: PR #1363 (handler split) moved isPrivateOrMetadataIP
into a2a_proxy_helpers.go but kept the OLD pre-SaaS version — it
unconditionally blocks RFC-1918 addresses, regressing the fix in
commits 1125a02 / cf10733.

The A2A proxy path now has the same SaaS-gated logic as registry.go:
- Cloud metadata (169.254/16, fe80::/10, ::1) always blocked in both modes
- RFC-1918 (10/8, 172.16/12, 192.168/16) + IPv6 ULA (fc00::/7) blocked in
  self-hosted, allowed in SaaS cross-EC2 mode
- IPv6 addresses now properly checked (previous version returned false for all)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(marketing): Discord adapter Day 2 Reddit + HN community copy

* fix(tests): supply *events.Broadcaster pointer to captureBroadcaster

Cannot use *captureBroadcaster as *events.Broadcaster when the struct
embeds events.Broadcaster as a value — must initialize as a named field.

Fixes go vet error in workspace_provision_test.go:
  cannot use broadcaster (*captureBroadcaster) as *events.Broadcaster value

* Merge pull request #1429 from fix/canvas-tooltip-clear-timer

Without this, a 400ms setTimeout from onFocus/onMouseEnter that fires
after onBlur will re-show a tooltip the user just dismissed. The
setShow(false) in onBlur closes the tooltip immediately but leaves the
timer pending — Tab-blur followed by timer-fire would re-show it.

Fix: add clearTimeout(timerRef.current) at the top of onBlur, mirroring
the pattern already used in onMouseLeave and onFocus.

Refs: PR #1367 (a11y keyboard support — this was a pre-existing gap)

Co-authored-by: Molecule AI App-FE <app-fe@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas/test): add missing children:[] to setPendingDelete expectation (#1426)

PR #1252 (cascade-delete UX) updated setPendingDelete to pass a
children array for cascade-warning rendering. The keyboard-a11y test
assertion was not updated to match.

Test: clicking 'Delete' hoists state to the store and closes the menu

Co-authored-by: Molecule AI Core-QA <core-qa@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas/test): add children:[] to setPendingDelete + \&apos; entity fix (closes #1380) (#1427)

* ci: retry — trigger fresh runner allocation

* fix(canvas/test): add children:[] to setPendingDelete assertion

setPendingDelete now includes children:[] (PR #1383 extended the
pendingDelete type). The keyboard accessibility test at line 225 used
exact object matching which omitted the new field, causing a failure
after staging merged #1383.

Issue: #1380

* fix(canvas): replace &apos; HTML entity with straight apostrophe

JSX does not entity-decode &apos; — it renders the literal text
"&apos;" instead of "'".  Found at line 157 (payment confirmed) and
line 321 (empty org list).  Replaced with a straight apostrophe,
which JSX handles correctly.

Ref: issue #1375
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: DevOps Engineer <devops@molecule.ai>
Co-authored-by: Molecule AI Core-UIUX <core-uiux@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* Merge pull request #1430 from fix/1421-saas-ssrf-helpers

Issue #1421 / #1401: PR #1363 (handler split) moved isPrivateOrMetadataIP
into a2a_proxy_helpers.go but kept the OLD pre-SaaS version — it
unconditionally blocks RFC-1918 addresses, regressing the fix in
commits 1125a02 / cf10733.

The A2A proxy path now has the same SaaS-gated logic as registry.go:
- Cloud metadata (169.254/16, fe80::/10, ::1) always blocked in both modes
- RFC-1918 (10/8, 172.16/12, 192.168/16) + IPv6 ULA (fc00::/7) blocked in
  self-hosted, allowed in SaaS cross-EC2 mode
- IPv6 addresses now properly checked (previous version returned false for all)

Co-authored-by: Molecule AI Core-BE <core-be@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(P0): CWE-22 path traversal in copyFilesToContainer + ContextMenu test

Issue #1434 — CWE-22 Path Traversal Regression:
PR #1280 (dc218212) correctly used cleaned path in tar header.
PR #1363 (e9615af) regressed to using uncleaned `name`.
Fix: use `clean` in filepath.Join AND add defence-in-depth escape check.

Issue #1422 — ContextMenu Test Regression:
PR #1340 expanded pendingDelete store type to include `children:[]`.
Test assertion missing the field — add `children:[]` to match.

Note: ssrf.go created (shared isSafeURL/isPrivateOrMetadataIP) to
prepare for the handler-split refactor fix — current branch has no
build error, but the shared file will prevent regression when PR #1363
is merged. isSafeURL/isPrivateOrMetadataIP retained in both files
for now to avoid breaking callers while the split is finalized.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: resolve 3 go vet failures + add idempotency_key to delegate_task_async

- workspace_provision_test.go: add missing mock := setupTestDB(t) to
  TestSeedInitialMemories_Truncation — mock was referenced but never
  declared, causing "undefined: mock" vet error
- orgtoken/tokens_test.go: discard unused orgID return value with _ in
  Validate call — "declared and not used" vet error
- a2a_tools.py: delegate_task_async now sends idempotency_key (SHA-256
  of workspace_id + task) to POST /workspaces/:id/delegate, fixing
  duplicate task execution when an agent restarts mid-delegation (#1456)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: airenostars <airenostars@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>
Co-authored-by: Hongming Wang <hongmingwangrabbit@gmail.com>
Co-authored-by: Molecule AI Technical Writer <technical-writer@agents.moleculesai.app>
Co-authored-by: Molecule AI Infra-Runtime-BE <infra-runtime-be@agents.moleculesai.app>
Co-authored-by: Molecule AI Core-BE <core-be@agents.moleculesai.app>
Co-authored-by: Molecule AI SDK Lead <sdk-lead@agents.moleculesai.app>
Co-authored-by: Molecule AI Core-FE <core-fe@agents.moleculesai.app>
Co-authored-by: Hongming Wang <hongmingwang.rabbit@users.noreply.github.com>
Co-authored-by: Molecule AI Community Manager <community-manager@agents.moleculesai.app>
Co-authored-by: Molecule AI App-FE <app-fe@agents.moleculesai.app>
Co-authored-by: Molecule AI Core-QA <core-qa@agents.moleculesai.app>
Co-authored-by: DevOps Engineer <devops@molecule.ai>
Co-authored-by: Molecule AI Core-UIUX <core-uiux@agents.moleculesai.app>
Co-authored-by: Molecule AI Dev Lead <dev-lead@agents.moleculesai.app>
2026-04-21 18:22:30 +00:00
molecule-ai[bot]
859d676f70
fix(CI): correct BASE in detect-changes (PR/push race); catch RuntimeError in conftest (#1473)
- ci.yml: replace if/else BASE assignment with GITHUB_BASE_REF default
  + pull_request base.sha override pattern. Prevents push events from
    overwriting the correct PR base SHA when both events fire together.
- conftest.py: catch RuntimeError in addition to ImportError when
  importing coordinator.py, which raises RuntimeError at import time
  when WORKSPACE_ID is not set (before the ImportError guard).

Co-authored-by: Molecule AI Release Manager <release-manager@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 18:15:45 +00:00
molecule-ai[bot]
4675402e58
feat(workspace): pre-stop serialization for pause/resume (closes #1386)
Add a pre-stop hook that captures agent state before container exit and
writes a scrubbed snapshot to /configs/.agent_snapshot.json. On restart,
the snapshot is loaded and the adapter's restore_state() is called before
the A2A server starts.

- New lib/pre_stop.py: build_snapshot / write_snapshot / read_snapshot /
  delete_snapshot + _scrub_value deep-scrubber (uses lib.snapshot_scrub
  to redact API keys, tokens, and sandbox output before persisting)
- BaseAdapter.pre_stop_state(): captures _executor._session_id and recent
  transcript_lines; overridden by adapters with richer in-memory state
- BaseAdapter.restore_state(): stores snapshot fields as adapter attrs
  for create_executor() to pick up
- main.py: calls pre_stop serialization in finally block (after server
  serves) and restore_state() after adapter setup, before server starts
- Added 12 unit tests covering scrub, read/write, adapter integration

Co-authored-by: Molecule AI Infra-Runtime-BE <infra-runtime-be@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 12:40:44 +00:00
molecule-ai[bot]
3bef6af241 fix: apply #1124 env-var defaults + scrub F1088 credentials from INCIDENT_LOG.md (#1347)
- PLATFORM_URL: replace unreachable http://platform:8080 mesh-only default
  with Docker-aware detection (host.docker.internal in containers,
  localhost for local dev) across all workspace Python modules and the
  git-token-helper shell script.
- WORKSPACE_ID: add fail-fast validation in main.py (SystemExit if empty)
  consistent with coordinator.py / a2a_cli.py patterns already in place.
- INCIDENT_LOG.md: replace all 3 F1088 credential types with
  ***REDACTED*** (sk-cp- 2x, github_pat_ 2x, ADMIN_TOKEN base64 3x).

Fixes #1124, #1333.

Co-authored-by: Molecule AI Dev Lead <dev-lead@agents.moleculesai.app>
2026-04-21 08:11:44 +00:00
molecule-ai[bot]
e07e22ad57 fix(orchestrator): fail-fast if WORKSPACE_ID env var is unset/empty (#1124) (#1336)
* fix(orchestrator): fail-fast if WORKSPACE_ID env var is unset/empty

Issue #1124: orchestrator GET /workspaces/{WORKSPACE_ID} returned 404
because 5 Python modules defaulted WORKSPACE_ID to "" instead of
validating the injected value. Empty string produced URLs like
/workspaces//heartbeat — route not found.

Fix: raise RuntimeError at module load if WORKSPACE_ID is unset
or empty, rather than silently producing broken API calls downstream.

Files changed (all same pattern):
- workspace/a2a_cli.py
- workspace/a2a_client.py
- workspace/coordinator.py
- workspace/consolidation.py
- workspace/molecule_ai_status.py

The platform (provisioner.go:375) correctly injects WORKSPACE_ID at
container provision time. This fix ensures the orchestrator surfaces
the misconfiguration immediately instead of failing silently at runtime.

Closes #1124.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(incidents): rebuild INCIDENT_LOG — linter reset, all sections restored

Rebuilt after linter reset. Sections restored:
- Security Audit Cycle 6 (abc58b47)
- F1100 workspace_restart.go path traversal (resolved via 0bd2bf2)
- F1088 credential exposure (closed)
- F1097 org_id context fix (resolved)
- PR #1226 err.Error() leaks (stale)
- QA Round 18 orgs-page regression (fixed on main, staging pending)
- Issue #1124 fix PR #1336 filed

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Molecule AI Core Platform Lead <core-platform-lead@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 07:42:00 +00:00
molecule-ai[bot]
d3b310e895 Merge pull request #1049 from Molecule-AI/feat/platform-native-hma-instructions
feat(runtime): inject HMA memory instructions at platform level (#1047)
2026-04-20 08:47:20 -07:00
rabbitblood
b1bb5f838a fix: GitHub token refresh — add WorkspaceAuth path for credential helper (#1068)
PR #729 tightened AdminAuth to require ADMIN_TOKEN, breaking the
workspace credential helper which called /admin/github-installation-token
with a workspace bearer token. Tokens expired after 60 min with no refresh.

Fix: Add /workspaces/:id/github-installation-token under WorkspaceAuth
so any authenticated workspace can refresh its GitHub token. Keep the
admin path as backward-compatible alias.

Update molecule-git-token-helper.sh to use the workspace-scoped path
when WORKSPACE_ID is set.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 08:30:02 -07:00
rabbitblood
5bc3edfbdd Fix test assertions to account for HMA instructions in system prompt
Mock get_hma_instructions in exact-match tests so they don't break
when HMA content is appended. Add a dedicated test for HMA inclusion.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 01:05:05 -07:00
rabbitblood
6103ca341c feat(runtime): inject HMA memory instructions at platform level (#1047)
Every agent now gets hierarchical memory instructions in their system
prompt automatically — no template configuration needed. Instructions
cover commit_memory (LOCAL/TEAM/GLOBAL scopes), recall_memory, and
when to use each proactively.

Follows the same pattern as A2A instructions: defined in
executor_helpers.py, injected by _build_system_prompt() in the
claude_sdk_executor.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 00:22:47 -07:00
rabbitblood
46c20731e6 feat: event-driven cron triggers + auto-push hook for agent productivity
Three changes to boost agent throughput:

1. Event-driven cron triggers (webhooks.go): GitHub issues/opened events
   fire all "pick-up-work" schedules immediately. PR review/submitted
   events fire "PR review" and "security review" schedules. Uses
   next_run_at=now() so the scheduler picks them up on next tick.

2. Auto-push hook (executor_helpers.py): After every task completion,
   agents automatically push unpushed commits and open a PR targeting
   staging. Guards: only on non-protected branches with unpushed work.
   Uses /usr/local/bin/git and /usr/local/bin/gh wrappers with baked-in
   GH_TOKEN. Never crashes the agent — all errors logged and continued.

3. Integration (claude_sdk_executor.py): auto_push_hook() called in the
   _execute_locked finally block after commit_memory.

Closes productivity gap where agents wrote code but never pushed,
and where work crons only fired on timers instead of reacting to events.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-19 20:26:35 -07:00
Hongming Wang
3976361483 feat(workspace): snapshot secret scrubber (closes #823)
Sub-issue of #799, security condition C4. Standalone module in
workspace/lib/snapshot_scrub.py with three public functions:

- scrub_content(str) → str: regex-based redaction of secret patterns
- is_sandbox_content(str) → bool: detect run_code tool output markers
- scrub_snapshot(dict) → dict: walk memories, scrub each, drop sandbox entries

Patterns covered: sk-ant-/sk-proj-, ghp_/ghs_/github_pat_, AKIA,
cfut_, mol_pk_, ctx7_, Bearer, env-var assignments, base64 blobs ≥33 chars.

21 unit tests, 100% coverage on new code.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-19 00:32:42 -07:00
Hongming Wang
37ed319562 fix: update workspace script comments for workspace-template → workspace rename
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 01:48:05 -07:00