Closes#2397. Today, every empty-peer condition (true empty, 401/403, 404,
5xx, network) collapses to a single message: "No peers available (this
workspace may be isolated)". The user has no way to tell whether they need
to provision more workspaces (true isolation), restart the workspace
(auth), re-register (404), page on-call (5xx), or check network (timeout) —
five different operator actions, one ambiguous string.
Wire:
- new helper get_peers_with_diagnostic() in a2a_client.py returns
(peers, error_summary). error_summary is None on 200; a short
actionable string on every other branch.
- get_peers() now shims through it so non-tool callers (system-prompt
formatters) keep the bare-list contract.
- tool_list_peers() switches to the diagnostic helper and surfaces the
actual reason. The "may be isolated" string is removed; true empty
now reads "no peers in the platform registry."
Tests:
- TestGetPeersWithDiagnostic: 200, 200-empty, 401, 403, 404, 5xx,
network exception, 200-but-non-list-body, and the bare-list-shim
regression guard.
- TestToolListPeers: each diagnostic branch surfaces its reason +
explicit assertion that "may be isolated" is gone.
Coverage 91.53% (floor 86%). 122 a2a tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pin the 5 public functions adapters and the runtime hot-path import
through ``from platform_auth import``:
- ``auth_headers`` — every outbound httpx call merges this in
- ``self_source_headers`` — A2A peer + self-message header builder
- ``get_token`` — main.py reads on boot to decide register-vs-resume
- ``save_token`` — main.py persists the platform-issued token
- ``refresh_cache`` — 401-retry path drops in-process cache (#1877)
A grep across workspace/ shows 14+ runtime modules import these:
main.py, heartbeat.py, a2a_client.py, a2a_tools.py, consolidation.py,
events.py, executor_helpers.py (3 sites), molecule_ai_status.py,
builtin_tools/memory.py (3 sites), builtin_tools/temporal_workflow.py
(2 sites). Renaming any of the five (e.g. ``auth_headers`` →
``bearer_headers``) makes every one of those imports raise ImportError
at workspace boot — the failure surface is deep in heartbeat init,
nowhere near the rename site.
Same drift class as the BaseAdapter signature snapshot (#2378, #2380),
skill_loader gate (#2381), runtime_wedge gate (#2383). Reuses the
``_signature_snapshot.py`` helpers shipped in #2381.
Defense-in-depth: ``test_snapshot_has_required_functions`` asserts
the five names are still present, so removing one even with a
synchronized snapshot edit forces an explicit edit here with a
justification.
``clear_cache`` is intentionally NOT in the snapshot — it's a
test-only helper. Production code MUST NOT depend on it.
Verified red on deliberate rename: ``auth_headers`` →
``bearer_headers`` produces a clean diff of the missing function in
the failure message. Restored before commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
BaseAdapter docstring tells adapter authors:
> ``runtime_wedge.mark_wedged()`` / ``clear_wedge()`` — flip the
> workspace to ``degraded`` + auto-recover when your SDK hits a
> non-recoverable error class. Import directly from ``runtime_wedge``;
> the heartbeat forwards the state to the platform automatically.
That's a contract — adapter templates depend on the four module-level
functions (``is_wedged``, ``wedge_reason``, ``mark_wedged``,
``clear_wedge``) being importable by those exact names with those
exact signatures. Renaming any silently breaks every adapter that
calls them: the import resolves the module fine, the
``AttributeError`` only surfaces when the adapter actually hits its
first SDK error — long after the rename merges.
Same drift class as #2378 / #2380 / #2381 (BaseAdapter, skill_loader)
applied to the module-level function surface.
Changes:
- tests/_signature_snapshot.py gains build_module_functions_record.
Walks a module's public top-level functions, optionally filtered
to a specific name list (used here — runtime_wedge has internal
helpers like reset_for_test that intentionally aren't part of
the contract). Skips re-exports via __module__ check so a
`from foo import bar` doesn't pollute the snapshot.
- tests/test_runtime_wedge_signature.py snapshots the four
contract functions. Plus a defense-in-depth required-functions
test that catches removal even when source + snapshot are
updated together.
Verified: deliberately renaming `mark_wedged` → `mark_wedged_RENAMED`
trips the gate with full snapshot diff in the failure message.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two changes in one PR (tightly coupled — the second wouldn't make
sense without the first):
1. Hoist the inspect-based snapshot helpers out of
test_adapter_base_signature.py into tests/_signature_snapshot.py
so future surfaces don't copy-paste introspection logic.
- build_class_signature_record(cls): walks public methods,
unwraps static/class/abstract methods, returns a stable
{class, methods: [...]} dict.
- build_dataclass_record(cls): walks dataclass fields via
dataclasses.fields(), returns {name, frozen, fields: [...]}.
- compare_against_snapshot(actual, path): writes-on-first-run +
diff-on-drift, with both expected and actual JSON in failure
message.
test_adapter_base_signature.py is rewritten to use the helpers;
the existing snapshot file is byte-identical (no behavior change).
2. New gate: tests/test_skill_loader_signature.py covers the
public dataclasses exported from skill_loader/loader.py:
- SkillMetadata: every adapter pattern-matches on .runtime for
skill-compat filtering. Renaming this field would silently
break per-adapter skill loading — the loader still returns
objects, but adapters' `if "*" in skill.metadata.runtime`
raises AttributeError at workspace boot.
- LoadedSkill: returned in SetupResult.loaded_skills.
Includes test_snapshot_has_required_skill_metadata_fields
defense-in-depth: ensures the runtime / id / name / description
fields stay even if both source and snapshot are updated together.
Verified: deliberately renaming SkillMetadata.runtime trips the
gate with full snapshot diff in the failure message.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follows up #2378. The BaseAdapter snapshot covers method signatures
but `adapter_base.py` also exports three public dataclasses that
form the call/return contract between the platform and every
adapter:
- SetupResult — returned by adapter._common_setup()
- AdapterConfig — passed into adapter setup hooks
- RuntimeCapabilities — returned by adapter.capabilities();
drives platform-side dispatch routing (#117)
Renaming a RuntimeCapabilities flag silently disables every
adapter's capability declaration (the platform fallback runs)
without an AttributeError to surface the breakage. That's exactly
the drift class the snapshot pattern is meant to catch.
Changes:
- _build_dataclass_snapshot walks SetupResult, AdapterConfig,
RuntimeCapabilities via dataclasses.fields(), capturing field
name + type annotation + has_default per field, plus the
@dataclass(frozen=...) flag.
- _build_full_snapshot composes method + dataclass records into
one stable JSON snapshot.
- test_snapshot_has_required_dataclass_fields — defense-in-depth
test parallel to test_snapshot_has_required_methods. Catches
field removal even when both source AND snapshot are updated
together. Required field set is intentionally short (the flags
that drive platform dispatch + the adapter-level config knobs).
Verified: deliberately renaming `provides_native_heartbeat` →
`provides_native_heartbeat_RENAMED` trips
test_base_adapter_signature_matches_snapshot with a full diff in
the failure message.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Every workspace template (langgraph, claude-code, hermes, etc.)
subclasses BaseAdapter. Renaming, removing, or re-typing a method
on the base class silently breaks templates: the override stops
being recognized as an override; the old method-name's caller
silently invokes the default no-op; the new method-name is
unimplemented in templates that haven't migrated.
Recent #87 universal-runtime + #1957 recordResource refactor both
renamed/added methods. Without a frozen snapshot, the next rename
ships quietly and surfaces only when a template's CI catches the
AttributeError days later — long after the merge window for an
easy revert.
This snapshot pins BaseAdapter's public method surface against a
checked-in JSON file. Same-shape pattern as PR #2363's A2A
protocol-compat replay gate, applied to a Python public-API
surface instead of JSON message shapes. Both close drift classes
by snapshotting the structural surface that consumers depend on.
Two tests:
1. test_base_adapter_signature_matches_snapshot — full
introspection diff against tests/snapshots/adapter_base_signature.json.
Drift = test failure with both expected + actual JSON in
the message so the reviewer sees what changed.
2. test_snapshot_has_required_methods — defense-in-depth: even
if both the source AND snapshot are updated together
(intentional API removal), this catches removal of the
short list of methods that EVERY template depends on (name,
display_name, description, capabilities, memory_filename).
Removing one of these requires explicit edit to the
`required` set with a justification.
Verified the gate fires red on a deliberate rename
(memory_filename → memory_filename_RENAMED) — failure message
shows the full snapshot diff including parameter shapes and
return annotations.
Updating the snapshot is the explicit acknowledgment that a
template-affecting API change is intentional. Reviewer of the
introducing PR sees the snapshot diff and decides whether
template repos need coordinated updates.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes#2332 item 1 (workspace awareness — agents don't surface
platform-native tools up front).
The dogfooding session surfaced that agents weren't using A2A
delegation, persistent memory, or send_message_to_user. The tools
were registered AND documented in the system prompt — but only in
sections #8 (Inter-Agent Communication) and #9 (Hierarchical Memory),
which agents read AFTER they've already started reasoning about a
plan from earlier sections.
This adds a tight inventory at section #1.5 (immediately after
Platform Instructions, before role-specific prompt files) — every
tool name + its short description in a bulleted block. Detailed
when_to_use docs in sections #8/#9 stay; this preamble is the
elevator pitch ("you have these"), the later sections are the
manual ("here's when and how").
Generated from `platform_tools.registry` ToolSpecs — every tool's
`name` + `short` flow through automatically, no manual sync. A new
`get_capabilities_preamble(mcp: bool)` helper in executor_helpers
mirrors the existing get_a2a_instructions / get_hma_instructions
pattern.
CLI-runtime agents (mcp=False) get an empty preamble — they see
_A2A_INSTRUCTIONS_CLI's hand-written subcommand vocabulary further
down, and the registry's MCP tool names would conflict.
Tests:
- test_capabilities_preamble_appears_in_mcp_prompt: header present
- test_capabilities_preamble_lists_every_registry_tool: every
a2a + memory tool from registry shows up (drift catches at test
time — adding a new tool to registry surfaces here automatically)
- test_capabilities_preamble_precedes_prompt_files: ordering
invariant (toolkit before role docs)
- test_capabilities_preamble_skipped_for_cli_runtime: empty when
mcp=False
All 40 prompt + platform_tools tests pass.
Mirrors PR-C's Upload migration: replaces the docker-cp tar-stream
extraction with a streaming HTTP GET to the workspace's own
/internal/file/read endpoint. Closes the SaaS gap for downloads —
without this PR, GET /workspaces/:id/chat/download still returns 503
on Railway-hosted SaaS even after A+B+C+F land.
Stacks: PR-A #2313 → PR-B #2314 → PR-C #2315 → PR-F #2319 → this PR.
Why a single broad /internal/file/read instead of /internal/chat/download:
Today's chat_files.go::Download already accepts paths under any of the
four allowed roots {/configs, /workspace, /home, /plugins} — it's not
strictly chat. Future PRs (template export, etc.) will reuse this
endpoint via the same forward pattern; reusing avoids three near-
identical handlers (one per domain) with duplicated path-safety logic.
Path safety is duplicated on platform + workspace sides — defence in
depth via two parallel checks, not "trust the workspace."
Changes:
* workspace/internal_file_read.py — Starlette handler. Validates path
(must be absolute, under allowed roots, no traversal, canonicalises
cleanly). lstat (not stat) so a symlink at the path doesn't redirect
the read. Streams via FileResponse (no buffering). Mirrors Go's
contentDispositionAttachment for Content-Disposition header.
* workspace/main.py — registers GET /internal/file/read alongside the
POST /internal/chat/uploads/ingest from PR-B.
* scripts/build_runtime_package.py — adds internal_file_read to
TOP_LEVEL_MODULES so the publish-runtime cascade rewrites its
imports correctly. Also includes the PR-B additions
(internal_chat_uploads, platform_inbound_auth) since this branch
was rooted before PR-B's drift-gate fix; merge-clean alphabetic
additions.
* workspace-server/internal/handlers/chat_files.go — Download
rewritten as streaming HTTP GET forward. Resolves workspace URL +
platform_inbound_secret (same shape as Upload), builds GET request
with path query param, propagates response headers (Content-Type /
Content-Length / Content-Disposition) + body. Drops archive/tar
+ mime imports (no longer needed). Drops Docker-exec branch entirely
— Download is now uniform across self-hosted Docker and SaaS EC2.
* workspace-server/internal/handlers/chat_files_test.go — replaces
TestChatDownload_DockerUnavailable (stale post-rewrite) with 4
new tests:
- TestChatDownload_WorkspaceNotInDB → 404 on missing row
- TestChatDownload_NoInboundSecret → 503 on NULL column
(with RFC #2312 detail in body)
- TestChatDownload_ForwardsToWorkspace_HappyPath → forward shape
(auth header, GET method, /internal/file/read path) + headers
propagated + body byte-for-byte
- TestChatDownload_404FromWorkspacePropagated → 404 from
workspace propagates (NOT remapped to 500)
Existing TestChatDownload_InvalidPath path-safety tests preserved.
* workspace/tests/test_internal_file_read.py — 21 tests covering
_validate_path matrix (absolute, allowed roots, traversal, double-
slash, exact-match-on-root), 401 on missing/wrong/no-secret-file
bearer, 400 on missing path/outside-root/traversal, 404 on missing
file, happy-path streaming with correct Content-Type +
Content-Disposition, special-char escaping in Content-Disposition,
symlink-redirect-rejection (lstat-not-stat protection).
Test results:
* go test ./internal/handlers/ ./internal/wsauth/ — green
* pytest workspace/tests/ — 1292 passed (was 1272 before PR-D)
Refs #2312 (parent RFC), #2308 (chat upload+download 503 incident).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the SaaS-side gap that PR-A acknowledged but didn't fix: SaaS
workspaces have no persistent /configs volume, so the platform_inbound_secret
that PR-A's provisioner wrote at workspace creation never reaches the
runtime. Without this, even after the entire RFC #2312 stack lands,
SaaS chat upload would 401 (workspace fails-closed when /configs/.platform_inbound_secret
is missing).
Solution: return the secret in the /registry/register response body
on every register call. The runtime extracts it and persists to
/configs/.platform_inbound_secret at mode 0600. Idempotent — Docker-
mode workspaces also receive it and overwrite the value the provisioner
already wrote (same value until rotation).
Why on every register, not just first-register:
* SaaS containers can be restarted (deploys, drains, EBS detach/
re-attach) — /configs is rebuilt empty on each fresh start.
* The auth_token is "issue once" because re-issuing rotates and
invalidates the previous one. The inbound secret has no rotation
flow yet (#2318) so re-sending the same value is harmless.
* Eliminates the bootstrap window where a restarted SaaS workspace
has no inbound secret on disk and would 401 every platform call.
Changes:
* workspace-server/internal/handlers/registry.go — Register handler
reads workspaces.platform_inbound_secret via wsauth.ReadPlatformInboundSecret
and includes it in the response body. Legacy workspaces (NULL
column) get a successful registration with the field omitted.
* workspace-server/internal/handlers/registry_test.go — two new tests:
- TestRegister_ReturnsPlatformInboundSecret_RFC2312_PRF: secret
present in DB → secret in response, alongside auth_token.
- TestRegister_NoInboundSecret_OmitsField: NULL column → field
omitted, registration still 200.
* workspace/platform_inbound_auth.py — adds save_inbound_secret(secret).
Atomic write via tmp + os.replace, mode 0600 from os.open(O_CREAT,
0o600) so a concurrent reader never sees 0644-default. Resets the
in-process cache after write so the next get_inbound_secret() returns
the freshly-written value (rotation-safe when it lands).
* workspace/main.py — register-response handler extracts
platform_inbound_secret alongside auth_token and persists via
save_inbound_secret. Mirrors the existing save_token pattern.
* workspace/tests/test_platform_inbound_auth.py — 6 new tests for
save_inbound_secret: writes file, mode 0600, overwrite-existing,
cache invalidation after save, empty-input no-op, parent-dir creation
for fresh installs.
Test results:
* go test ./internal/handlers/ ./internal/wsauth/ — all green
* pytest workspace/tests/ — 1272 passed (was 1266 before this PR)
Refs #2312 (parent RFC), #2308 (chat upload 503 incident).
Stacks: PR-A #2313 → PR-B #2314 → PR-C #2315 → this PR.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stacked on PR-A (#2313). The platform-side rewrite that actually calls
this endpoint lands in PR-C; this PR adds the workspace-side consumer
+ hardening so PR-C is a small Go-only diff.
What this adds:
* platform_inbound_auth.py — auth gate mirroring transcript_auth.py.
Reads /configs/.platform_inbound_secret (delivered by the PR-A
provisioner). Fail-closed when the file is missing/empty/unreadable.
Constant-time compare via hmac.compare_digest.
* internal_chat_uploads.py — POST /internal/chat/uploads/ingest.
Multipart parse → sanitize each filename → write to
/workspace/.molecule/chat-uploads/<random>-<name> with
O_CREAT|O_EXCL|O_NOFOLLOW. Same response shape (uri/name/mimeType/
size + workspace: URI scheme) as the legacy Go handler — canvas /
agent code that resolves "workspace:..." paths keeps working.
* Wired into workspace/main.py via starlette_app.add_route alongside
the existing /transcript route.
* python-multipart>=0.0.18 added to requirements.txt (Starlette's
Request.form() needs it; ≥ 0.0.18 closes CVE-2024-53981).
Test coverage (36 tests, all green; full workspace suite 1266 passed):
* test_platform_inbound_auth.py — 14 tests:
happy path, fail-closed on missing file, empty file, whitespace-
only file, missing/case-wrong/empty Bearer prefix, in-process
cache, default CONFIGS_DIR fallback, end-to-end file → authorized.
* test_internal_chat_uploads.py — 22 tests:
sanitize_filename matrix (incl. ../traversal, CJK chars, length
truncation), 401 on missing/wrong/no-secret-file bearer, single +
batch upload happy paths, unique random prefix on duplicate names,
mimetype guess fallback, 400 on missing files field, 413 on per-
file + total-body oversize, symlink-at-target refusal (with
sentinel-content unchanged assertion).
Why this is safe to ship before PR-C:
* No platform-side caller yet → no behavior change visible to users.
* Auth fails closed; nothing on the network can hit a write path
until the platform forwards with the matching bearer.
* Workspace's existing routes (/health, /transcript, /handle/*) are
unchanged.
Refs #2312 (parent RFC), #2308 (chat upload 503 incident).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes#2289.
Some workspace template images ship `/usr/local/bin/{git,gh}` wrappers
that bake `GH_TOKEN` into argv handling (preferred — auto-PR creation
authenticates without explicit token plumbing); other templates have
plain `/usr/bin/git` installed via apt with no wrapper. The hardcoded
`_GIT = "/usr/local/bin/git"` crashed every auto-push attempt on the
latter image class:
FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/bin/git'
File "/app/molecule_runtime/executor_helpers.py", line 524, in _auto_push_and_pr_sync
subprocess.run(['/usr/local/bin/git', 'rev-parse', '--is-inside-work-tree'], ...)
`shutil.which("git")` walks PATH in order — finds the `/usr/local/bin/`
wrapper first when it exists, falls back to `/usr/bin/git` otherwise.
GH_TOKEN injection still wins on wrapper-equipped images; auto-push
no longer crashes on bare-apt images.
Verified locally: `shutil.which("git")` resolves to `/usr/bin/git` on
the bug-reporter's image; `shutil.which("gh")` resolves to the
homebrew path on dev. Both paths exist + are executable on respective
hosts.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Surfaced via cross-template review of the a2a-sdk v0→v1 migration:
every adapter executor (claude-code, gemini-cli, crewai, openclaw,
autogen) builds A2A response Messages independently using
`new_text_message(text)` from the SDK, which omits `task_id` and
`context_id`. The runtime's own canonical pattern in
`workspace/a2a_executor.py:466-475` correctly threads both:
Message(
message_id=uuid.uuid4().hex,
role=Role.ROLE_AGENT,
parts=_parts,
task_id=task_id, # ← canonical
context_id=context_id, # ← canonical
)
Adapters skipping these correlation fields means the platform's a2a
proxy can't reliably tie the response back to the originating task.
This is a divergence from canonical, not necessarily a strict bug
(task_id may be optional with a default) — but it's enough of a
correlation/observability gap that the canonical pattern bothers to
thread it.
Add `new_response_message(context, text, files=None)` to
executor_helpers.py — single home for response Message construction.
Templates can migrate from `new_text_message(text)` to this helper
in stacked PRs once the runtime publishes to PyPI.
The helper:
- Reads `context.task_id`/`context.context_id` from the inbound
RequestContext, falling back to fresh UUIDs (RequestContextBuilder
always sets them in production; fallback is for unit tests).
- Sets `role=Role.ROLE_AGENT` (the v1 enum value).
- Builds text Parts via `Part(text=...)` and file Parts via
`Part(url="workspace:<path>", filename=..., media_type=...)`.
- Returns a v1 protobuf Message ready for
`event_queue.enqueue_event(...)`.
Why "files=None" with the workspace: URI scheme as the file Part
shape: matches the canonical pattern in a2a_executor.py exactly so
the platform's chat-attachment download path (executor_helpers.py
`resolve_attachment_uri`) interprets responses uniformly across all
adapters.
Tests (5, all pass with --no-cov against the live runtime image):
- test_new_response_message_text_only
- test_new_response_message_with_files
- test_new_response_message_files_only_no_text
- test_new_response_message_falls_back_when_context_ids_unset
- test_new_response_message_handles_missing_attrs
The conftest's a2a stubs needed an extension for Message + Role +
Part with kwargs preservation. Strictly additive — no existing tests
affected. (The 19 pre-existing failures in test_executor_helpers.py
are unrelated debt from the commit_memory/recall_memory rewrite,
visible on staging baseline before this change.)
Per-template migration is the follow-up: claude-code, gemini-cli,
crewai, openclaw, autogen all call `new_text_message(text)` today;
each gets a per-repo PR replacing it with
`new_response_message(context, text)`. This PR ships the helper
first so the templates have something to import.
Refs: PR #2266/#2267 (restart-race), claude-code #15 (FilePart fix),
gemini-cli #10/crewai #8/openclaw #9/autogen #8 (rename PRs).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two docs covering load-bearing patterns from today's work that
weren't previously discoverable:
1. workspace/platform_tools/README.md — explains the ToolSpec
single-source-of-truth pattern (#2240), the CLI-block alignment
gap that hand-maintained generation can't close (#2258), the
snapshot golden files + LF-pinning (#2260), and the add/rename/
remove playbook. The next reader who lands in
workspace/platform_tools/ now has the design rationale + the
safe-edit procedure colocated with the code.
2. scripts/README.md — disambiguates the three measure-coordinator-
task-bounds.sh files that now exist across two repos:
- scripts/measure-coordinator-task-bounds.sh (canonical OSS, this repo)
- scripts/measure-coordinator-task-bounds-runner.sh (Hermes/MiniMax variant, this repo)
- scripts/measure-coordinator-task-bounds.sh (production-shape, in molecule-controlplane)
Cross-references reference_harness_pair_pattern (auto-memory) for
the cross-repo design rationale. Documents the common safety
pattern (cleanup trap, DRY_RUN, non-target guard,
cleanup_*_failed events) and the heartbeat-trace caveat.
Refs: #2240, #2254, #2257, #2258, #2259, #2260; molecule-controlplane#321.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two follow-ups from the #2240 code review:
1. Snapshot tests for the rendered tool-instruction blocks. The
structural tests added in #2240 guarantee tool NAMES are present;
these new tests pin the SHAPE — bullet ordering, heading style,
footer placement — so a future contributor who reorders fields in
`_render_section` or rewrites a `when_to_use` paragraph sees the
diff in CI rather than shipping a silently-different system prompt.
Golden files live under workspace/tests/snapshots/.
2. CLI-block alignment test + corrected source-of-truth comment.
`_A2A_INSTRUCTIONS_CLI` is a separate hand-maintained surface for
ollama and other non-MCP runtimes — the registry can't auto-generate
it because the CLI subprocess interface uses different command
shapes (`peers` vs `list_peers`, etc.). A new
`_CLI_A2A_COMMAND_KEYWORDS` mapping declares the registry-tool →
CLI-keyword correspondence (or explicit `None` for tools not
exposed via subprocess). Two tests enforce coverage:
- every a2a tool in the registry is keyed in the mapping
- every non-None subcommand keyword literally appears in
`_A2A_INSTRUCTIONS_CLI`
Caught one real gap: `send_message_to_user` is in the registry but
has no CLI subcommand. Mapped to `None` with an explanatory comment.
The "no other source of truth" claim in registry.py's docstring
was wrong post-#2240 (the CLI block survived) — corrected to
describe the two surfaces explicitly and point at the alignment
tests as the gate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds structured `rfc2251_phase=...` log lines at the deterministic phase
boundaries inside route_task_to_team and check_task_status, so an
operator running scripts/measure-coordinator-task-bounds.sh against
staging can correlate the harness's external timing trace with what
phase the coordinator was in at any given second.
The harness already exists in staging and measures end-to-end response
time + heartbeat trace. What it CAN'T do without this PR is answer
"the coordinator response took 7 minutes — was it stuck delegating, or
stuck polling children, or stuck synthesizing after all children
returned?" The phase logs answer that question.
Phases instrumented (deterministic Python boundaries, no agent prompt
involvement):
route_start → enter route_task_to_team
children_fetched → after get_children() returns
routing_decided → after build_team_routing_payload
delegate_invoked → just before delegate_task_async.ainvoke
delegate_returned → after delegate_task_async returns
check_status → every check_task_status poll (per-poll)
route_returning_decision_only → fall-through path
Each line includes elapsed_ms from route_start so per-phase durations
are extractable via:
grep rfc2251_phase= <container.log> \
| awk '{...}' to compute deltas between consecutive phases
The synthesis phase (after all children return, before agent emits
final A2A response) is NOT instrumented here because it's
agent-driven (no deterministic Python boundary). The harness operator
infers synthesis_secs = total_response_secs − max(check_status_ts).
This is reproduction-harness scaffolding; it adds zero behavior. Strip
the rfc2251_phase log lines when V1.0 ships and the phase data lands
in the structured heartbeat payload instead.
Refs:
- RFC: molecule-core#2251
- Harness: scripts/measure-coordinator-task-bounds.sh (shipped earlier)
- V1.0 gate: this is deliverable #2 of the four pre-V1.0 gates
The PR-built wheel + import smoke gate refused the platform_tools
package because it's a new subdirectory under workspace/ that wasn't
in scripts/build_runtime_package.py:SUBPACKAGES. The drift gate (which
exists for exactly this reason) caught it cleanly:
error: SUBPACKAGES drifted from workspace/ subdirectories:
in workspace/ but NOT in SUBPACKAGES (will ship un-rewritten or
be excluded): ['platform_tools']
Adding platform_tools to SUBPACKAGES wires the package into the
runtime wheel + applies the canonical
from platform_tools.<x> -> from molecule_runtime.platform_tools.<x>
import-rewrite step that every other subpackage uses.
Verified locally: scripts/build_runtime_package.py succeeds, the
rewritten a2a_mcp_server.py reads
from molecule_runtime.platform_tools.registry import TOOLS
which matches the package layout in the wheel.
Establishes workspace/platform_tools/registry.py as THE place tool
naming and docs live. Every consumer reads from it; nothing duplicates
the source. Closes the architectural gap behind the doc/tool drift
discussion 2026-04-28 — adding hundreds of future runtime SDK adapters
should not require touching tool names anywhere except the registry.
What the registry owns
ToolSpec dataclass with: name, short (one-line description), when_to_use
(multi-paragraph agent-facing usage guidance), input_schema (JSON Schema),
impl (the actual coroutine in a2a_tools.py), section ('a2a' | 'memory').
TOOLS list with 8 entries — delegate_task, delegate_task_async,
check_task_status, list_peers, get_workspace_info, send_message_to_user,
commit_memory, recall_memory.
What now reads from the registry
- workspace/a2a_mcp_server.py
The hardcoded TOOLS list (167 lines of hand-maintained dicts) is
gone. Replaced with a 6-line list comprehension over the registry.
MCP description = spec.short. inputSchema = spec.input_schema.
- workspace/executor_helpers.py
get_a2a_instructions(mcp=True) and get_hma_instructions() now
GENERATE the agent-facing system-prompt text from the registry.
Heading + per-tool bullet (spec.short) + per-tool when_to_use +
a section-specific footer. No more hand-maintained instruction
blocks that drift from reality.
- workspace/builtin_tools/delegation.py
Renamed delegate_to_workspace -> delegate_task_async to match
registry. check_delegation_status -> check_task_status. Added
sync delegate_task @tool wrapping a2a_tools.tool_delegate_task
(was missing for LangChain runtimes — CP review Issue 3).
- workspace/builtin_tools/memory.py
Renamed search_memory -> recall_memory to match registry.
- workspace/adapter_base.py, workspace/main.py
Bundle all 7 core tools (was 6) into all_tools / base_tools.
- workspace/coordinator.py, shared_runtime.py, policies/routing.py
Updated system-prompt-text references to use the registry names.
Structural alignment tests
workspace/tests/test_platform_tools.py — 9 tests pin every
registry-to-adapter mapping:
- registry names are unique
- a2a + memory partition is complete (no orphans)
- by_name lookup works
- MCP server registers exactly the registry's tool set
- MCP description equals registry.short for every tool
- MCP inputSchema equals registry.input_schema for every tool
- get_a2a_instructions text contains every a2a tool name
- get_hma_instructions text contains every memory tool name
- pre-rename names (delegate_to_workspace, search_memory,
check_delegation_status) cannot leak back
Adding a future tool means adding one ToolSpec; the test failure
list tells the author exactly which adapter to update.
Adapter pattern for future SDK support
When (e.g.) AutoGen or Pydantic AI gets adapters, the only work
needed for tool surfacing is "wrap registry.TOOLS in your SDK's
tool format." Names, descriptions, schemas, impl come from the
registry — adapter author writes zero strings.
Why this needed to ship now
PR #2237 (already in staging) injected MCP-world docs as the
default system-prompt content. Without the registry, those docs
said "delegate_task" while LangChain runtimes only had
"delegate_to_workspace" — workers see docs for tools that don't
exist (CP review Issue 1+3). PR #2239 was a tactical rename;
this PR is the structural fix that prevents the same class of
drift from recurring as new adapters ship.
PR #2239 was closed in favor of this — same renames, plus the
registry, plus structural tests. Single coherent change.
Tests: 1232 pass, 2 xfailed (pre-existing). 9 new in
test_platform_tools.py; 4 alignment tests in test_prompt.py from
#2237 still pass; original test_executor_helpers tests adapted to
the registry-driven world.
Refs: CP review Issues 1, 2, 3, 5; project memory
project_runtime_native_pluggable.md (platform owns A2A);
project memory feedback_doc_tool_alignment.md (this is the structural
fix for the tactical lesson).
Workers were registering platform tools (delegate_task, delegate_task_async,
list_peers, check_task_status, send_message_to_user, commit_memory,
recall_memory) but the build_system_prompt assembly never included
documentation for any of them. The instruction-text functions
get_a2a_instructions() and get_hma_instructions() exist in
executor_helpers.py and have unit tests, but were not called from any
production code path — workers received system-prompt.md content only
and saw the tools as bare names with no usage guidance.
Symptom: agents called commit_memory and delegate_task without knowing
they were platform tools. They worked when the agent guessed the API
correctly and silently failed when the agent didn't.
Fix: build_system_prompt() now appends both instruction sets between
the Skills section and the Peers section. The placement is intentional —
A2A docs explain how to call delegate_task; the peer list is the data
that delegate_task operates over, so the docs precede the peer table.
New parameter `a2a_mcp: bool = True` lets adapters opt into the CLI
subprocess variant of the A2A instructions for runtimes without MCP
support (ollama, custom CLI runtimes). Default True covers the
MCP-capable majority (claude-code, hermes, langchain, crewai). Adapter
callers don't need to change unless they specifically need CLI mode.
Tests: 4 new regression tests in test_prompt.py pin
- A2A MCP variant injection (default)
- A2A CLI variant injection (a2a_mcp=False, with MCP-only fields absent)
- HMA instruction injection
- A2A docs precede peer list ordering
Full suite green: 1223 passed, 2 xfailed.
Consolidates 11 of the 17 open Dependabot PRs (#2215, #2217, #2219-#2225,
#2227, #2229) into one PR. Every entry is a patch / minor / floor bump
where the impact surface is small and CI carries the proof.
Same pattern as the 2026-04-15 batch.
Go (workspace-server/go.mod + go.sum, regenerated via `go mod tidy`):
- golang.org/x/crypto 0.49.0 → 0.50.0 (#2225)
- github.com/golang-jwt/jwt/v5 5.2.2 → 5.3.1 (#2222)
- github.com/gin-contrib/cors 1.7.2 → 1.7.7 (#2220)
- github.com/docker/go-connections 0.6.0 → 0.7.0 (#2223)
- github.com/redis/go-redis/v9 9.7.3 → 9.19.0 (#2217)
Python floor bumps (workspace/requirements.txt; current pip-resolved
versions don't change unless they happen to be below the new floor):
- httpx >=0.27 → >=0.28.1 (#2221)
- uvicorn >=0.30 → >=0.46 (#2229)
- temporalio >=1.7 → >=1.26 (#2227)
- websockets >=12 → >=16 (#2224)
- opentelemetry-sdk >=1.24 → >=1.41.1 (#2219)
GitHub Actions (SHA-pinned per existing convention):
- dorny/paths-filter@d1c1ffe (v3) → @fbd0ab8 (v4.0.1) (#2215)
REMOVED from this batch (lockfile platform mismatch):
- #2231 @types/node ^22 → ^25.6 (npm install on macOS strips
Linux-only @emnapi/* entries from package-lock.json that CI's
`npm ci` then refuses; needs a Linux-side install to land cleanly)
- #2230 jsdom ^25 → ^29.1 (same)
NOT included in this batch (deferred to per-PR human review):
- #2228 github/codeql-action v3 → v4 (CodeQL CLI alignment risk)
- #2218 actions/setup-node v4 → v6 (default Node version drift)
- #2216 actions/upload-artifact v4 → v7 (3 major versions)
- #2214 actions/setup-python v5 → v6 (action major)
NOT merged (CI failing on dependabot's own PR):
- #2233 next 15 → 16
- #2232 tailwindcss 3 → 4
- #2226 typescript 5 → 6
Verified:
- workspace-server: `go mod tidy && go build ./... && go test ./...` — green
- workspace requirements.txt: floor bumps only
The previous assertion `'Silent Agent' not in result` was pinning
the buggy behavior — peers without an agent_card were silently
dropped from the prompt. With the fallback to DB name+role those
peers are correctly visible. Flip the assertion so the test pins
the new (correct) rendering and would catch a regression to the
silent-drop behavior.
Bug: a Design Director coordinator with 6 freshly-created worker peers
rendered an empty `## Your Peers` section in its system prompt — the
hosting registry endpoint correctly returned all 6 peers, but
`summarize_peer_cards()` silently dropped every entry whose
`agent_card` column was null (the default until A2A discovery has
run end-to-end against the worker). The coordinator then refused to
delegate any task because "no peers exist".
Fix: fall back to the registry row's `name` and `role` columns when
`agent_card` is missing, malformed, or wrong-typed, instead of
skipping the peer. The registry endpoint
(`workspace-server/internal/handlers/discovery.go:queryPeerMaps`) has
always returned both fields — they were just being thrown away on
the consumer side. `build_peer_section()` now renders `Role: …` when
the agent_card-derived skill list is empty so the coordinator's
prompt still has something concrete to delegate against.
Also hoists `import json` out of the per-peer loop body to module
level (was previously imported once per iteration).
Tests: new `test_shared_runtime_peer_summary.py` pins all four
fallback cases (null / malformed string / wrong type / null + no
DB name) plus the agent-card-present happy path and the mixed-list
case the coordinator actually consumes. First peer-summary test
coverage `shared_runtime.py` has had — no prior tests existed.
Refs: 2026-04-27 Design Director discovery report from infra team.
The initial-prompt readiness probe in workspace/main.py hardcoded the
pre-1.x well-known path. After the a2a-sdk 1.x bump the SDK started
mounting the agent card at the new canonical path (the value of
`a2a.utils.constants.AGENT_CARD_WELL_KNOWN_PATH`), so the probe
returned 404 every attempt and silently fell through to "server not
ready after 30s, skipping". Net effect: every workspace silently
dropped its `initial_prompt` from config.yaml — the agent never sent
the kickoff self-message, and users hit a fresh chat with no context.
Reported by an external user as "/.well-known/agent.json 404 — the
a2a-sdk agent card route was not being mounted at the expected path".
The route IS mounted; the probe was looking at the wrong place.
Fix imports `AGENT_CARD_WELL_KNOWN_PATH` from `a2a.utils.constants`
and uses it directly in the probe URL — the SDK constant is now the
single source of truth, so any future rename travels through
automatically.
Adds two static regression tests pinning the invariant:
1. No hardcoded `/.well-known/agent.json` literal anywhere in
main.py.
2. The probe URL fstring interpolates AGENT_CARD_WELL_KNOWN_PATH
(catches a "fix" that imports the constant for show but reverts
to a literal in the actual GET).
Verified manually inside ghcr.io/molecule-ai/workspace-template-langgraph
that AGENT_CARD_WELL_KNOWN_PATH == '/.well-known/agent-card.json' and
that `create_agent_card_routes(card)` mounts at exactly that path —
constant + mount are aligned in the runtime image, so the probe will
now find the server.
Full workspace test suite: 1209 passed, 2 xfailed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three different intermittent failures observed during a single
manual-test session — RemoteProtocolError, ReadTimeout, ConnectError —
each surfaced as a "Failed to deliver to <peer>" error chip in the
canvas Agent Comms panel even though the next attempt would have
succeeded (verified by direct probes from the same source workspace
to the same peer). The error message even told the user "Usually a
transient network blip — retry once," but it left the retry to a
human reading the error message.
Auto-retry inside send_a2a_message itself: up to 5 attempts (1
initial + 4 retries) with exponential backoff (1s, 2s, 4s, 8s,
16s-capped), each backoff jittered ±25% to break sync across
siblings. Cumulative wall-clock capped at 600s by
_DELEGATE_TOTAL_BUDGET_S so a string of 5×300s ReadTimeouts can't
make the caller wait 25 minutes — once the deadline elapses, retries
stop even if attempts remain.
Retry only on transport-layer transients:
- ConnectError / ConnectTimeout (peer's listening socket not ready)
- RemoteProtocolError (peer closed TCP without writing — observed
when a peer's prior in-flight Claude SDK session aborted)
- ReadError / WriteError (network blip on Docker bridge)
- ReadTimeout (peer wrote no response in 300s)
Application-level errors are NOT retried — they're deterministic and
retrying just wastes wall-clock:
- HTTP 4xx (peer rejected the request format)
- JSON parse failures (peer returned garbage)
- JSON-RPC error in response body (peer's runtime errored cleanly)
- Programmer-bug exceptions (ValueError, etc.)
8 new tests pin the contract:
- retry succeeds after 2 RemoteProtocolErrors
- retry succeeds after 1 ConnectError
- all 5 attempts fail → returns formatted last-error
- capped at exactly _DELEGATE_MAX_ATTEMPTS (regression cover for
"did someone bump the constant accidentally?")
- JSON-RPC error response NOT retried (1 attempt only)
- non-httpx exception NOT retried (programmer bugs stay loud)
- total budget caps the loop even if attempts remain
- backoff schedule grows exponentially with ±25% jitter
Refactor: extracted _format_a2a_error() so the success and exhausted
paths share one error-formatting routine. _delegate_backoff_seconds()
is a pure function so the schedule is unit-testable without monkey-
patching asyncio.sleep.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Manual-test failure surfaced what was hidden behind the MCP-path bug:
once delegate_task could actually fire, every cross-workspace call
came back as JSON-RPC -32600 "Invalid Request" with the underlying
pydantic ValidationError:
params.message.role
Input should be 'agent' or 'user' [type=enum,
input_value='ROLE_USER', input_type=str]
PR #2184's a2a-sdk 1.x migration sweep over-corrected: it changed
every `"role": "user"` literal in JSON-RPC payload construction to
`"role": "ROLE_USER"` to match the protobuf enum names of the 1.x
native types (a2a.types.Role.ROLE_USER / ROLE_AGENT). That was
correct for in-process Message construction (which the SDK
serialises before wire transmission) but WRONG for the 8 sites that
hand-build JSON-RPC payloads. The workspace's own a2a-sdk runs
inbound requests through the v0.3 compat adapter
(/usr/local/lib/python3.11/site-packages/a2a/compat/v0_3/) because
main.py sets enable_v0_3_compat=True for backwards compatibility,
and that adapter validates against the v0.3 Pydantic Role enum
(`agent` | `user` lowercase). The protobuf-style names blow it up.
Reverted the 8 wire-payload sites to lowercase:
- workspace/a2a_client.py:74
- workspace/a2a_cli.py:74, 111
- workspace/heartbeat.py:378
- workspace/main.py:464, 563
- workspace/builtin_tools/a2a_tools.py:60
- workspace/builtin_tools/delegation.py:272
Native-type usage at workspace/a2a_executor.py:471 (`Role.ROLE_AGENT`)
stays — that's an in-process Message construction; the SDK handles
wire serialisation correctly.
Updated the misleading comment at main.py:255-257 (which said
"outbound payloads are now 1.x-shaped (ROLE_USER)") to spell out
the actual rule: outbound JSON-RPC wire payloads MUST use v0.3
shape, native types are only for in-process construction.
New regression test test_jsonrpc_wire_role_format.py greps the 6
wire-payload-emitting files for any "ROLE_USER" / "ROLE_AGENT"
string literal and fails loud — cheapest possible drift detector.
Why E2E missed it: the priority-runtimes harness sends a single
message canvas → workspace, but the canvas already used lowercase
"user" (it never went through the migration sweep). The bug only
surfaces on workspace → workspace delegation, which the harness
doesn't exercise. Same gap as #131 (extend smoke to call main()
against a stub).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pre-existing test_set_status_exception_prints_to_stderr asserted on the
legacy "molecule-monorepo-status: failed to update" prefix string. The
prior commit renamed it to "molecule_ai_status: failed to update" so
the printed label matches the canonical module-form invocation
(`python3 -m molecule_runtime.molecule_ai_status`) instead of a shell
alias that only ever existed in the dev-only base image. Updating the
expected substring in lockstep.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comprehensive sweep follow-up to the MCP server path fix. Audited every
/app/ reference in the runtime source against the live claude-code
template image and confirmed the actual /app/ contents post-#87 are
ONLY: __init__.py, adapter.py, claude_sdk_executor.py, requirements.txt
— every other workspace module ships in the wheel under
site-packages/molecule_runtime/. Two more leaks found:
1. executor_helpers.py:_A2A_INSTRUCTIONS_CLI — inter-agent system prompt
for non-MCP runtimes (Ollama, custom) had 5 lines telling the model
`python3 /app/a2a_cli.py X`. Models copy these examples verbatim, so
every CLI-runtime delegation would fail at the shell layer (no such
file). Replaced with `python3 -m molecule_runtime.a2a_cli` form,
which works regardless of where the wheel is installed.
2. molecule_ai_status.py docstring — usage examples invoked
`python3 /app/molecule_ai_status.py` and claimed a
`molecule-monorepo-status` shell alias. Both broken in current
templates: the file's at site-packages, and `which
molecule-monorepo-status` errors (the legacy symlink only existed
in the dev-only workspace/Dockerfile base image, not in the
standalone template Dockerfiles that ship to production).
Updated docstring + the __main__ usage banner + the stderr error
prefix to use the same `python3 -m molecule_runtime.X` form.
Plugins audited and clean: WORKSPACE_PLUGINS_DIR=/configs/plugins,
SHARED_PLUGINS_DIR=$PLUGINS_DIR fallback /plugins. No /app/
assumptions.
Regression test: `test_a2a_cli_instructions_use_module_invocation_not_legacy_app_path`
asserts the legacy /app/a2a_cli.py path can't drift back into the CLI
system prompt and that the canonical module form is present.
The legacy workspace/Dockerfile + workspace/entrypoint.sh + workspace/scripts/
still contain /app/-shaped paths but are dev-only base-image scaffolding
(per workspace/build-all.sh's own header comment) — not shipped to the
standalone template images. Out of scope here; can be cleaned up in a
separate dead-code pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DEFAULT_MCP_SERVER_PATH was hardcoded to /app/a2a_mcp_server.py, which
was correct under the pre-#87 monolithic-template Docker layout where
the workspace/ tree was COPY'd into /app/. After the universal-runtime
refactor (#87, #117), workspace modules ship inside the
molecule-ai-workspace-runtime wheel under
site-packages/molecule_runtime/, while /app/ now holds only
template-specific files (adapter.py + the runtime-native executor for
that template).
Net effect: in every workspace built since the wheel cutover, Claude
Code SDK's mcp_servers={"a2a": {"command": python, "args":
["/app/a2a_mcp_server.py"]}} pointed at a missing file. The subprocess
launch failed silently, the SDK registered zero MCP tools, and the
agent's list_peers / delegate_task / a2a_send_message / a2a_send_signal
all disappeared. Symptom observed today: Design Director said
"I tried to reach the perf auditor via the inter-agent MCP tools
(list_peers, delegate_task) but those tools didn't resolve in this
environment" and fell back to running the audit itself with WebFetch.
Why this slipped through E2E: the priority-runtimes harness sends a
single message and verifies a reply — it does not exercise inter-agent
delegation, so the missing MCP tools are invisible at that layer.
Fix: resolve the path relative to executor_helpers.py via __file__,
which tracks wherever the wheel is installed (site-packages today,
anywhere else tomorrow). The A2A_MCP_SERVER_PATH env override is
preserved for tests / non-default layouts.
Regression test: assert os.path.exists(DEFAULT_MCP_SERVER_PATH) so
any future move of a2a_mcp_server.py out of the package directory
fails at unit-test time instead of silently disabling delegation in
production.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Audited every a2a-sdk surface in workspace/ against the installed
1.0.2 wheel. Found and fixed:
main.py (the live workspace startup path):
• create_jsonrpc_routes(rpc_url='/', enable_v0_3_compat=True) —
rpc_url required in 1.x; v0.3 compat enables inbound legacy
clients (`"role": "user"` lowercase) without forcing them to
upgrade. Pairs with the outbound rename below.
a2a_executor.py:
• TextPart/FilePart/FileWithUri removed in 1.x. Part is now a
flat proto message: Part(text=…) / Part(url=…, filename=…,
media_type=…). Updated the file-attachment branch (only
reachable when an agent emits files; the harness's PONG path
didn't exercise this, but it's a latent crash).
• Message field names: messageId/taskId/contextId →
message_id/task_id/context_id (proto3 snake_case).
• Role enum: Role.agent → Role.ROLE_AGENT (proto enum).
Outbound JSON-RPC payloads (8 files):
• "role": "user" → "role": "ROLE_USER" — proto3 JSON serialization
is strict about enum values. Sites: a2a_client, a2a_cli, main
(initial+idle prompts), heartbeat, builtin_tools/a2a_tools,
builtin_tools/delegation. Wire JSON keys stay camelCase
(proto3 default), only the role enum value changed.
google-adk/adapter.py:
• new_agent_text_message → new_text_message (4 sites). This
adapter's directory has a hyphen, so it can't be imported as a
Python module — effectively dead code, but the wheel ships the
file and a future fix should keep it correct against 1.x.
Why one PR instead of seven: every previous a2a-sdk migration find
landed as its own publish → cascade → harness → next-bug cycle.
Today's audit ran every a2a-sdk symbol/type/method in workspace/
against the installed 1.0.2 wheel in a single sweep + tested the
critical paths (Message construction, Part construction, Role enum
parsing) against the actual SDK. Should be the last migration PR.
Verified locally:
python3 scripts/build_runtime_package.py --version 0.1.99 \
--out /tmp/build-final
pip install /tmp/build-final
python -c "import molecule_runtime.main; \
from molecule_runtime.a2a_executor import LangGraphA2AExecutor"
→ ✓ all imports clean against a2a-sdk 1.0.2
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7th a2a-sdk migration find from the v0 → v1 transition.
create_jsonrpc_routes() now requires rpc_url as a positional arg
(was implicit at root in 0.x). Pass '/' to match
a2a.utils.constants.DEFAULT_RPC_URL — that's also what
workspace-server's a2a_proxy.go forwards to (POSTs to workspace URL
without appending a path).
Symptom before fix: every workspace startup crashed with
TypeError: create_jsonrpc_routes() missing 1 required positional
argument: 'rpc_url'
Caught by harness 9 phase 4 (claude-code + langgraph both on
0.1.24). The user's "use langgraph for fast iteration" call cut
the diagnose cycle from 15min to ~30s — without that, this would
have taken another hermes round-trip to surface.
Updated reference_a2a_sdk_v0_to_v1_migration.md memory with this
entry alongside the previous 6 finds.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
a2a-sdk 1.x added agent_card as a required argument to
DefaultRequestHandler.__init__. main.py constructed it with only
agent_executor + task_store, so every workspace startup that reached
the handler init step crashed with:
TypeError: DefaultRequestHandlerV2.__init__() missing 1 required
positional argument: 'agent_card'
This is the 6th a2a-sdk migration find from the v0 → v1 transition
(see reference_a2a_sdk_v0_to_v1_migration memory). Pattern is the
same: SDK exposes a new required arg, our call site needs to pass
the existing object we already construct upstream.
Why the import-only smoke gates didn't catch this: it's a call-time
constructor error inside `async def main()`, not a module load
error. The runtime-pin-compat smoke imports main_sync but doesn't
invoke main() against a real config. Worth filing a follow-up to
extend the smoke to a "construct + dispose" cycle.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CRITICAL: every workspace boot since the a2a-sdk 1.0 migration (#1974)
has been crashing at AgentCard construction with:
ValueError: Protocol message AgentCard has no "supported_protocols" field
The protobuf field is `supported_interfaces` (plural, interfaces — see
a2a-sdk types/a2a_pb2.pyi:189). The 0.3→1.0 migration left the kwarg
as `supported_protocols`, which doesn't exist in the 1.0 schema, so
the constructor raises before any subsequent line of main runs.
Why this hid for so long:
- publish-runtime.yml's smoke step only IMPORTED molecule_runtime.main;
importing the module is fine, only CONSTRUCTING the AgentCard fails
- The user-visible symptom is "Workspace failed: " with empty
last_sample_error, indistinguishable from generic boot timeouts
- The state_transition_history=True bug (fixed in #2179) was a
sibling of this — same migration, same class, just caught first
Fix is symmetric with #2179:
1. workspace/main.py: rename the kwarg + comment explaining why
2. .github/workflows/publish-runtime.yml: extend the smoke block to
instantiate AgentCard with the exact production call shape, so
the next field-rename of this class fails at publish time
instead of breaking every workspace startup
Verification:
- Constructed AgentCard against fresh a2a-sdk 1.0.2 in a clean
venv with the corrected kwarg → succeeds
- Constructed it with the original `supported_protocols` kwarg →
fails immediately with the exact error production sees
- Smoke test pinned to mirror main.py's exact call shape; main.py
+ smoke must stay in lockstep going forward
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a comment block citing a2a-sdk's own
a2a/compat/v0_3/conversions.py, which says verbatim:
state_transition_history=None, # No longer supported in v1.0
So a future reader who notices the missing kwarg won't try to add it
back. The capability is now universal: every v1.x Task carries a
history list and tasks/get supports historyLength via the
apply_history_length helper. No flag because nothing's optional.
Confirmed by reading the SDK source directly:
- a2a/types.py AgentCapabilities exposes only: streaming,
push_notifications, extensions, extended_agent_card.
- a2a/compat/v0_3/conversions.py explicitly maps None when
down-converting v1 → v0.3 (deliberate removal, not rename).
- a2a/server/request_handlers/default_request_handler_v2.py uses
apply_history_length(task, params) — agent doesn't opt in.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
a2a-sdk 1.x's AgentCapabilities only exposes 4 fields:
streaming, push_notifications, extensions, extended_agent_card.
The state_transition_history field was removed in the v1 protobuf
schema. main.py still passed it as a kwarg, so every workspace
that reached the AgentCard construction step (line 188) crashed:
ValueError: Protocol message AgentCapabilities has no
"state_transition_history" field
Symptom: every claude-code + hermes workspace stuck in `provisioning`
forever — caught when the user provisioned a Design Director crew
manually via the canvas while harness 5 was running.
Why every prior smoke gate missed it:
- runtime-pin-compat.yml smokes `from molecule_runtime.main import
main_sync` — only imports the module. AgentCapabilities() runs
inside `async def main()`, not at module load.
- Template image boot smoke does `import every /app/*.py` — same
story. main.py imports fine; the field error only fires at call.
The fix is one line — drop the kwarg. Fields we actually need
(streaming + push_notifications) are still passed.
Follow-up worth filing: smoke step that instantiates Adapter() +
calls a no-op setup() against a stub config. That would have
caught this before publish.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The conftest mock only exposed `new_agent_text_message`, the pre-v1
name. After fixing a2a_executor.py to use the v1 name
`new_text_message`, the mock didn't satisfy the import → CI red.
Mock both names (aliased to the same lambda) so any in-flight test
that still references the old name keeps working until the next
sweep removes those references.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
a2a-sdk v1 renamed `new_agent_text_message` → `new_text_message`
(role=Role.agent is now the default). Same fix landed in the hermes
template earlier today; this is the runtime-side equivalent.
NOT dead code: a2a_executor.py is the LangGraph A2A executor, used by
the langgraph + deepagents templates. Both templates currently import
it via bare `from a2a_executor import LangGraphA2AExecutor` — which is
a separate bug in those templates, filed/fixed separately.
Symptom in a2a_executor.py form: any langgraph or deepagents workspace
that calls create_executor crashes with `ImportError: cannot import
name 'new_agent_text_message' from 'a2a.helpers'`. Doesn't surface for
claude-code or hermes (their templates use their own executors and
don't load a2a_executor).
Five call sites updated, one import line, one comment. Test suite
already passes against the new symbol — `python -c "from
molecule_runtime.a2a_executor import LangGraphA2AExecutor"` resolves
cleanly after this change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The wheel's pyproject.toml has declared
`molecule-runtime = "molecule_runtime.main:main_sync"` since the
publish pipeline was created on 2026-04-26, but the function
itself was never present in workspace/main.py — it lived in the
pre-monorepo molecule-ai-workspace-runtime repo and was lost
during the consolidation that made workspace/ the source of truth.
The 0.1.15 wheel still had main_sync from a leftover snapshot,
so the regression went unnoticed until 0.1.16 (the first wheel
built from the new source-of-truth) shipped. Symptom: every
workspace container restart loops with
ImportError: cannot import name 'main_sync' from 'molecule_runtime.main'
— the molecule-runtime CLI script's first line tries to import
the missing symbol. Workspaces stay in `provisioning` until the
10-min sweep marks them failed.
Caught by .github/workflows/runtime-pin-compat.yml, which already
imports the symbol by name as its smoke test. (That check kept
failing red on every recent merge_group run; this PR fixes the
underlying symbol-not-found instead of the smoke step.)
Also strengthens publish-runtime.yml's wheel smoke from
`import molecule_runtime.main` (loads the module — passes even
when entry-point target is missing) to `from molecule_runtime.main
import main_sync` (the actual contract the CLI script needs).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The runtime-compat change in this branch added a `current_runtime`
kwarg to load_skills(); the watcher passes it through. Test mocks
that pre-date the kwarg signature broke with TypeError, which the
watcher's reload-error try/except swallowed — the symptom was empty
callback lists, not a clear failure.
Switching the fakes to accept **kwargs keeps them forward-compat for
future load_skills additions without another test churn.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SKILL.md frontmatter can now declare `runtime: [claude-code]` or
`runtime: [hermes, claude-code]` to opt out of incompatible adapters
instead of failing at first invocation. Default `["*"]` means universal —
existing skill libraries need zero migration.
Borrowed from hermes' declarative skill-compat pattern surfaced in the
hermes architecture survey. The remaining two patterns (event-log
layer, observability config block) stay open under #119.
Wiring:
- SkillMetadata.runtime: list[str] = ["*"]
- _normalize_runtime_field accepts list, string-sugar, missing -> ["*"];
malformed warns and falls back to universal so a typo never silently
drops a skill.
- load_skills(..., current_runtime=...) filters out skills whose runtime
list lacks "*" or current_runtime, with an INFO log line.
- BaseAdapter.start passes type(self).name() so the live adapter drives
the filter; SkillsWatcher takes the same kwarg so hot-reload honors it.
8 new tests cover default universal, no-field universal, explicit
match/mismatch, string sugar, wildcard short-circuit, current_runtime=None
(preserves old behavior), and malformed-warns-not-drops.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DRAFT — do NOT merge until gemini-cli template image rebuilds with
its local cli_executor.py copy (template PR #9 just merged at
07:59 UTC; image build kicks off now).
Final adapter-specific deletion from molecule-runtime, completing #87
for the priority adapters (claude-code via PR #2156, plus gemini-cli
via this PR + template #9).
Deletes:
- workspace/cli_executor.py (461 LOC) — CLIAgentExecutor + the
RUNTIME_PRESETS dict for codex / ollama / gemini-cli. The file
moved to molecule-ai-workspace-template-gemini-cli (PR #9, merged).
- workspace/tests/test_agent_base_urls.py — only consumer of
CLIAgentExecutor in the test suite. Tests for the executor
behavior live in the template repo now.
Updates:
- workspace/tests/test_executor_helpers.py — docstring refresh:
executor_helpers.py is the runtime-agnostic shared helpers; the
executor classes themselves live in template repos post-#87.
Codex / ollama presets disappear naturally with the file. They never
had template repos, so no production path could invoke them anyway —
this is dead-code removal as a side effect of the move.
Verified-safe-to-delete:
- heartbeat.py: doesn't import cli_executor
- claude_sdk_executor.py: deleted by PR #2156 (in flight)
- preflight.py: only references runtime names by string; no import
- main.py: doesn't import cli_executor (uses adapter discovery via
ADAPTER_MODULE; the template's adapter constructs the executor)
- Only test_agent_base_urls.py + test_executor_helpers.py docstring
referenced cli_executor
Verification:
- 1249/1249 workspace pytest pass (was 1251; -2 = test_agent_base_urls.py
cases — exact match)
- No live import of cli_executor anywhere in molecule-core after deletion
(grep verified)
Sequencing:
1. ✅ Template PR #9 (gemini-cli local copy) — MERGED
2. ⏳ Template image rebuild — running
3. THIS PR — wait until image is published, then mark ready-for-review
Closes#87 for the priority adapters: workspace/ is now adapter-
agnostic except for adapter discovery (ADAPTER_MODULE) + the
runtime_wedge primitive.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Root-cause fix for #118 (chat attachments rendering as plain text links
instead of download chips). User flagged with screenshot 2026-04-26
showing the Design Director agent pasting https://files.catbox.moe/…
in the message body — chat rendered the URL as plain markdown text,
unclickable in the canvas's bubble layout, and unreachable in any SaaS
deployment where the user's browser can't egress to catbox.
The structured `attachments` field already exists, the canvas's
AttachmentChip already renders well, the WebSocket broadcast already
carries attachments verbatim — the missing piece was the LLM choosing
the body over the structured field. Tighten the tool description so it
trains the right behavior.
Three targeted strengthenings:
1. Top-level tool description: enumerated use case (4) now reads
"via the `attachments` field (NEVER paste file URLs in `message`)".
The all-caps NEVER + the explicit field name move the LLM toward
the structured path on first read.
2. `message` param: adds an explicit DO NOT rule with rationale.
Includes the SaaS-reachability reason so operators can grep for
"SaaS" and find this design constraint instead of re-discovering it
after a tenant complaint. Calls out catbox.moe + file:// by name as
concrete examples of forbidden hosts (those are the two we've seen
in production).
3. `attachments` param: leads with REQUIRED, lists the bad
alternatives explicitly (pasting URLs, base64-encoding, telling
user to look at a path). LLMs handle "use X, NOT Y" framings
better than "use X" alone — observed during prompt-engineering
iteration on hermes' tool descriptions.
Tests pin all three load-bearing phrases (4 new in test_a2a_mcp_server.py)
so a future doc edit that softens or drops them fails CI. Brittle by
design — these are prompt-engineering invariants, not implementation
details.
This is the root-cause fix. A defensive canvas-side backstop (auto-
detect download-shaped URLs in body and convert to chips) is a
follow-up that could land separately if the steering proves
insufficient in practice.
Verification:
- 1190/1190 workspace pytest pass
- 4 new test_a2a_mcp_server.py cases all green
Closes the steering half of #118. The structured-attachments-only
contract was already enforced server-side (PR #2130 added per-attachment
validation); this PR closes the prompt-side gap.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 2 of the universal-runtime refactor (task #87). Now that the
claude-code template repo ships its own claude_sdk_executor.py
(template PR #13 merged + image rebuilt at 07:36 UTC) the
molecule-runtime no longer needs to ship the file.
Deletes:
- workspace/claude_sdk_executor.py (704 LOC)
- workspace/tests/test_claude_sdk_executor.py (~1.6K LOC)
Updates:
- workspace/runtime_wedge.py — drops the "Compatibility shim" docstring
section. The shim was time-bounded ("removed once #87 Phase 2 lands");
this is that PR.
- workspace/tests/test_runtime_wedge.py — drops the
TestClaudeSdkExecutorReExportShim test class (the shim doesn't
exist anymore so the identity assertions would fail at import).
- workspace/tests/conftest.py — drops the claude_agent_sdk stub.
Its only consumer was test_claude_sdk_executor.py which is gone;
no other test imports the SDK.
- workspace/cli_executor.py — comment refresh: claude-code template
repo (not workspace/) is now the home for ClaudeSDKExecutor.
Verified-safe-to-delete:
- heartbeat.py: migrated to runtime_wedge in PR #2154 (no longer
imports from claude_sdk_executor)
- cli_executor.py: only comments referenced claude_sdk_executor;
its line-117 ValueError defends against accidental routing
- tests: only test_claude_sdk_executor.py + test_runtime_wedge.py's
shim class consumed the deleted module; both removed in this PR
Verification:
- 1182/1182 workspace pytest pass (was 1251; -69 = exactly the
deleted test cases — zero unexpected regressions)
- No live import of claude_sdk_executor anywhere in molecule-core
after deletion (grep verified)
Closes#87 for the claude-code adapter. Hermes is already template-only.
The remaining adapter-specific code in workspace/ is cli_executor.py
(codex/ollama/gemini-cli) tracked by task #122. preflight.py's
SUPPORTED_RUNTIMES static list is tracked by task #123 (PR #2155 in
flight).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes task #123 — last piece of #87 cleanup.
Pre-fix: workspace/preflight.py:11 hardcoded a tuple of "supported"
runtime names (claude-code, codex, ollama, langgraph, etc.). Every
new template repo required a code change in molecule-runtime to be
recognized — direct violation of the universal-runtime principle
(#87) where adapters declare themselves and the runtime stays generic.
Post-fix: discovery-based validation via the same ADAPTER_MODULE env
var that production load paths already consult
(workspace/adapters/__init__.py:get_adapter). Distinguished failure
modes so operator messages are concrete:
- ADAPTER_MODULE unset → "no adapter installed; set the env var"
- ADAPTER_MODULE set but module won't import → import error type +
message
- module imports but no Adapter class → "convention violation, add
`Adapter = YourClass`"
- Adapter.name() raises → caught with operator message
- Adapter.name() returns non-string → contract violation message
- Adapter.name() doesn't match config.runtime → drift WARNING (not
fatal; the adapter wins in production, config.yaml is just
documentation)
The drift case is the one behavioral change worth calling out: the
prior static-list path would have hard-failed config.runtime values
not in the allowlist. With discovery, an unknown runtime in
config.yaml is just a documentation drift — the adapter that's
actually installed runs regardless. Operator gets a warning naming
both the configured and installed names so they can fix whichever
is stale.
Tests:
- Replaces the obsolete "static list pass/fail" tests with 6 new
cases covering each distinguished failure mode, plus a positive
test for the adapter-matches-config happy path
- Adds an autouse `_default_langgraph_adapter` fixture that
pre-installs a fake adapter via sys.modules monkey-patching, so
existing tests building default WorkspaceConfig (runtime="langgraph")
inherit a valid adapter without each test setting ADAPTER_MODULE
- Failure-mode tests opt out of the default fixture via
@pytest.mark.no_default_adapter (registered in pytest.ini)
- Sentinel pattern (`_UNSET = object()`) for `name_returns` so None
is a passable test value (otherwise `is not None` would skip the
None branch — exact bug the sentinel avoids)
Verification:
- 22/22 preflight tests pass (was 16; +6 new failure-path tests)
- 1256/1256 workspace pytest pass (was 1251; +5 net)
- No production code path other than preflight changed
Source: 2026-04-27 #87 cleanup audit after PR #2154 (wedge extraction).
This change is independent of the cli_executor.py template moves
(task #122) — completes one of the two remaining cleanup items.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses github-code-quality unused-import flag on the runtime_wedge
re-export shim. Adds __all__ listing the names that exist purely for
backwards-compat (is_wedged / wedge_reason / _reset_sdk_wedge_for_test)
so static analysis recognizes the imports as deliberate exports.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three changes from /code-review-and-quality on PR #2154:
1. Optional (architecture): wrap state in a private _WedgeState class
instead of bare module-level globals. Public API (mark_wedged /
clear_wedge / is_wedged / wedge_reason / reset_for_test) is
unchanged — adapters never see the class. The class is forward-cover
for any future per-scope variant (multiple executors per process, a
keyed registry, etc.) without churning the call sites. Today there's
exactly one instance (_DEFAULT) so behavior is identical.
2. Optional (readability): clarify the import path in the integration
recipe — in a TEMPLATE repo it's `from molecule_runtime.runtime_wedge`
(PyPI package); in molecule-core itself it's `from runtime_wedge`
(top-level module). Removes the trap where a contributor reading the
docstring while editing in-repo copies the template-style import and
gets ImportError.
3. Nit (readability): dedupe the shim rationale. claude_sdk_executor's
re-export comment now points to runtime_wedge's "Compatibility shim"
section as the source of truth instead of restating the same content.
Avoids docs-in-two-places drift risk.
Verification:
- 1251/1251 workspace pytest pass (no behavior change — class wrap
is pure plumbing; module-level helpers delegate to the singleton)
- All shim re-export identity tests still pass (the shim's
`is_wedged is runtime_wedge.is_wedged` assertion holds because we
re-export the SAME function object that delegates to _DEFAULT)
No new tests needed — the existing test suite covers the public API
contract; the class is an implementation detail behind that contract.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Doc-only follow-up to the wedge-state extraction. Adds proactive
guidance so the next adapter (hermes / codex / langgraph / a future
template) discovers the runtime_wedge primitive and integrates the
~6 LOC pattern uniformly instead of inventing its own wedge state.
Two additions:
- workspace/runtime_wedge.py — new "How to use from a NEW adapter"
section in the module docstring with the minimum viable
integration recipe, what-you-get-for-free list, and explicit
DON'TS (don't store local wedge state, don't mark for transient
errors, don't write your own clear logic). Plus a "when wedge is
the WRONG primitive" note to keep adopters from over-using it.
- workspace/adapter_base.py — adds runtime_wedge to the
"Cross-cutting capabilities your adapter can opt into" list in
BaseAdapter's docstring (alongside capabilities() and
idle_timeout_override()). Discoverability path: adapter author
reads BaseAdapter docstring → sees runtime_wedge mention → reads
runtime_wedge module docstring → has the recipe.
Also tightens the "to add a new agent infra" steps in BaseAdapter to
match the actual current model (standalone template repo + ADAPTER_MODULE
env var) rather than the obsolete workspace/adapters/<infra>/ layout
that hasn't been the path since the universal-runtime extraction
started.
Zero code change. Tests untouched (1251/1251 still pass).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>