molecule-core/docs/spikes
rabbitblood 6485c34c61 chore: move spike/ → docs/spikes/ — keep explorations out of repo root
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 16:09:12 -07:00
..
demo.py chore: move spike/ → docs/spikes/ — keep explorations out of repo root 2026-04-17 16:09:12 -07:00
README.md chore: move spike/ → docs/spikes/ — keep explorations out of repo root 2026-04-17 16:09:12 -07:00

Spike #745 — Anthropic Managed Agents as a Molecule Executor

Parent issue: #742 — "Third executor option: Anthropic Managed Agents"
Spike issue: #745

What We Evaluated

Anthropic's Managed Agents beta (managed-agents-2026-04-01) lets you create persistent agent objects, spin up per-task sessions, and stream execution events via SSE — all hosted on Anthropic's infrastructure. The key question for Molecule is: can this replace (or complement) the self-hosted Docker workspace executor?


Demo

demo.py exercises the full lifecycle:

ANTHROPIC_API_KEY=sk-ant-... python demo.py

What it measures:

Phase What we time
environment create Provisioning a cloud execution environment
agent create Storing the agent config (model, system prompt, tools)
cold start sessions.create() → session ready
turn 1 RTT User message → SSE drain → session.status_idle
turn 2 RTT Same, plus implicit state recall check

State continuity is verified by injecting a unique token in turn 1 and asserting the agent quotes it back in turn 2. Exit code 0 = pass, 1 = fail.


Integration Assessment

1. Provisioner changes

Molecule's provisioner today calls docker.NewClient(), pulls an image, creates a container with resource limits, and waits for /registry/register from inside the container. A Managed Agents executor would replace that entire path:

current:  docker pull → container run → heartbeat register
proposed: agents.create() → sessions.create() → SSE stream

A new runtime: "managed-agent" value in workspaces.runtime would branch the provisioner. The workspace row would store agent_id (persistent) and session_id (ephemeral per-run) instead of a Docker container ID.

Migration effort: medium.
A new ManagedAgentProvisioner can be added alongside the existing Docker provisioner without touching the common path. The primary cost is the integration layer described below.


2. A2A routing — the blocking architectural conflict

This is the hard blocker. Molecule's A2A proxy (POST /workspaces/:id/a2a) resolves ws.agent_url and forwards an HTTP POST to the running container. Every workspace has a persistent, addressable HTTP endpoint.

Managed Agents sessions communicate exclusively through the Anthropic SSE API — there is no per-session URL that the platform can proxy to. The session is a streaming consumer, not a server.

Bridging the gap requires one of:

Option A — Long-poll bridge (complex, fragile)
Keep a goroutine open per session holding the SSE stream. When an A2A message arrives, inject it via sessions.events.send() and wait for the next agent.message event. Map response back to A2A caller.
Risk: the goroutine dies, the session becomes unreachable, and A2A callers time out with no clear error path.

Option B — Managed Agents as leaf-only workers (scope reduction)
Only use Managed Agents for workspaces that receive tasks (no outbound A2A). The platform queues work, opens a session, streams the result, and closes the session. No live bridge needed.
Risk: many real workspaces delegate to peers — leaf-only scope limits applicability to batch/one-shot agents.

Option C — Hybrid: MCP bridge
Anthropic agents can call MCP servers. The platform exposes its A2A proxy as an MCP server; the agent's MCP tool calls translate back to A2A messages.
Risk: this inverts the call direction (agent calls platform instead of platform-to-agent) and breaks the current workspace-to-workspace trust model. Security review required before shipping.


3. Cost model

Managed Agents sessions are charged on top of standard token pricing — the platform receives its own compute costs. For comparison, the Docker path uses a customer-supplied model key with zero platform markup.

The cold-start latency (environment + session creation) measured in the demo adds overhead before the first token. For interactive canvas workflows where workspaces are expected to be long-lived ("always on"), this model is a poor fit. For batch workspaces that run occasionally, it may save infrastructure cost.


4. API gaps (as of 2026-04-17)

Molecule requirement Managed Agents support
Persistent HTTP endpoint for A2A No — SSE only
Heartbeat / liveness signal Partial — session status via poll or SSE, but no proactive push to the platform
Resource limits (memory, CPU) No — environment config offers only networking
Custom Docker image No — Anthropic-managed base image only
workspace_dir bind-mount No — files uploaded via client.beta.files API
Bearer token auth per workspace No — auth is Anthropic API key, not per-workspace token
Plugin system (arbitrary pip installs) No — built-in agent_toolset_20260401 or custom tool callbacks
Runtime detection (config.yaml introspection) Not applicable — config lives in agent object

Ship/No-Ship Recommendation

Decision: No-ship for the primary executor. Spike further as a batch worker.

Rationale:

  1. A2A proxy is the load-bearing constraint. Molecule's value proposition is multi-workspace orchestration. A workspace executor that can't be reached by other workspaces over A2A is not a Molecule workspace — it's a standalone call to the Anthropic API with extra steps.

  2. No persistent endpoint = no topology. The canvas shows workspaces as nodes that communicate. A Managed Agents session has no addressable URL; the canvas can't represent it as a live peer.

  3. Cold start is non-trivial. Preliminary measurements from the demo show environment + session creation adding visible latency before the first token. For the "always-on" UX the canvas targets, this is noticeable.

  4. Scope would be a dead end. Shipping Managed Agents as a leaf-only, no-A2A executor today means two provisioner paths diverge. The Managed Agents path can never grow to full parity without Anthropic exposing a persistent addressable URL. We'd be maintaining a permanently limited path.

What to do instead

  • Phase H (planned): Consider Managed Agents as the execution target for scheduled tasks only (workspace_schedules cron rows). A cron fire could spin up a session, run the prompt, stream the result, and self-report via /activity. No live A2A needed. Effort: ~2 weeks.

  • Watch the API. If Anthropic ships a stable URL per session (like a webhook delivery endpoint), re-evaluate. The MCP bridge angle (Option C above) also becomes more viable once Molecule's MCP server is feature-complete.


Rough Effort Estimate (if we did ship)

Component Effort
ManagedAgentProvisioner (create/start/stop session) 35 days
A2A bridge goroutine (Option A) 58 days
Heartbeat adapter (translate SSE status to /registry/heartbeat) 23 days
Canvas: hide A2A tab for managed-agent workspaces 1 day
Tests, migration, docs 34 days
Total ~3 weeks

Even at 3 weeks, the result is a permanently limited path with no A2A and no resource controls. Not recommended.


Files

File Purpose
demo.py Runnable spike script — auth, provision, session, two turns, timing
README.md This assessment