# Spike #745 — Anthropic Managed Agents as a Molecule Executor **Parent issue:** #742 — "Third executor option: Anthropic Managed Agents" **Spike issue:** #745 ## What We Evaluated Anthropic's Managed Agents beta (`managed-agents-2026-04-01`) lets you create persistent agent objects, spin up per-task sessions, and stream execution events via SSE — all hosted on Anthropic's infrastructure. The key question for Molecule is: *can this replace (or complement) the self-hosted Docker workspace executor?* --- ## Demo `demo.py` exercises the full lifecycle: ``` ANTHROPIC_API_KEY=sk-ant-... python demo.py ``` What it measures: | Phase | What we time | |---|---| | `environment create` | Provisioning a cloud execution environment | | `agent create` | Storing the agent config (model, system prompt, tools) | | `cold start` | `sessions.create()` → session ready | | `turn 1 RTT` | User message → SSE drain → `session.status_idle` | | `turn 2 RTT` | Same, plus implicit state recall check | State continuity is verified by injecting a unique token in turn 1 and asserting the agent quotes it back in turn 2. Exit code 0 = pass, 1 = fail. --- ## Integration Assessment ### 1. Provisioner changes Molecule's provisioner today calls `docker.NewClient()`, pulls an image, creates a container with resource limits, and waits for `/registry/register` from inside the container. A Managed Agents executor would replace that entire path: ``` current: docker pull → container run → heartbeat register proposed: agents.create() → sessions.create() → SSE stream ``` A new `runtime: "managed-agent"` value in `workspaces.runtime` would branch the provisioner. The workspace row would store `agent_id` (persistent) and `session_id` (ephemeral per-run) instead of a Docker container ID. **Migration effort:** medium. A new `ManagedAgentProvisioner` can be added alongside the existing Docker provisioner without touching the common path. The primary cost is the integration layer described below. --- ### 2. A2A routing — the blocking architectural conflict This is the hard blocker. Molecule's A2A proxy (`POST /workspaces/:id/a2a`) resolves `ws.agent_url` and forwards an HTTP POST to the running container. Every workspace has a persistent, addressable HTTP endpoint. Managed Agents sessions communicate exclusively through the Anthropic SSE API — there is no per-session URL that the platform can proxy to. The session is a streaming consumer, not a server. Bridging the gap requires one of: **Option A — Long-poll bridge (complex, fragile)** Keep a goroutine open per session holding the SSE stream. When an A2A message arrives, inject it via `sessions.events.send()` and wait for the next `agent.message` event. Map response back to A2A caller. Risk: the goroutine dies, the session becomes unreachable, and A2A callers time out with no clear error path. **Option B — Managed Agents as leaf-only workers (scope reduction)** Only use Managed Agents for workspaces that *receive* tasks (no outbound A2A). The platform queues work, opens a session, streams the result, and closes the session. No live bridge needed. Risk: many real workspaces delegate to peers — leaf-only scope limits applicability to batch/one-shot agents. **Option C — Hybrid: MCP bridge** Anthropic agents can call MCP servers. The platform exposes its A2A proxy as an MCP server; the agent's MCP tool calls translate back to A2A messages. Risk: this inverts the call direction (agent calls platform instead of platform-to-agent) and breaks the current workspace-to-workspace trust model. Security review required before shipping. --- ### 3. Cost model Managed Agents sessions are charged on top of standard token pricing — the platform receives its own compute costs. For comparison, the Docker path uses a customer-supplied model key with zero platform markup. The cold-start latency (environment + session creation) measured in the demo adds overhead before the first token. For interactive canvas workflows where workspaces are expected to be long-lived ("always on"), this model is a poor fit. For batch workspaces that run occasionally, it may save infrastructure cost. --- ### 4. API gaps (as of 2026-04-17) | Molecule requirement | Managed Agents support | |---|---| | Persistent HTTP endpoint for A2A | **No** — SSE only | | Heartbeat / liveness signal | **Partial** — session status via poll or SSE, but no proactive push to the platform | | Resource limits (memory, CPU) | **No** — environment config offers only `networking` | | Custom Docker image | **No** — Anthropic-managed base image only | | `workspace_dir` bind-mount | **No** — files uploaded via `client.beta.files` API | | Bearer token auth per workspace | **No** — auth is Anthropic API key, not per-workspace token | | Plugin system (arbitrary pip installs) | **No** — built-in `agent_toolset_20260401` or custom tool callbacks | | Runtime detection (`config.yaml` introspection) | **Not applicable** — config lives in agent object | --- ## Ship/No-Ship Recommendation ### Decision: **No-ship for the primary executor. Spike further as a batch worker.** **Rationale:** 1. **A2A proxy is the load-bearing constraint.** Molecule's value proposition is multi-workspace orchestration. A workspace executor that can't be reached by other workspaces over A2A is not a Molecule workspace — it's a standalone call to the Anthropic API with extra steps. 2. **No persistent endpoint = no topology.** The canvas shows workspaces as nodes that communicate. A Managed Agents session has no addressable URL; the canvas can't represent it as a live peer. 3. **Cold start is non-trivial.** Preliminary measurements from the demo show environment + session creation adding visible latency before the first token. For the "always-on" UX the canvas targets, this is noticeable. 4. **Scope would be a dead end.** Shipping Managed Agents as a leaf-only, no-A2A executor today means two provisioner paths diverge. The Managed Agents path can never grow to full parity without Anthropic exposing a persistent addressable URL. We'd be maintaining a permanently limited path. ### What to do instead - **Phase H (planned):** Consider Managed Agents as the execution target for *scheduled* tasks only (`workspace_schedules` cron rows). A cron fire could spin up a session, run the prompt, stream the result, and self-report via `/activity`. No live A2A needed. Effort: ~2 weeks. - **Watch the API.** If Anthropic ships a stable URL per session (like a webhook delivery endpoint), re-evaluate. The MCP bridge angle (Option C above) also becomes more viable once Molecule's MCP server is feature-complete. --- ## Rough Effort Estimate (if we did ship) | Component | Effort | |---|---| | `ManagedAgentProvisioner` (create/start/stop session) | 3–5 days | | A2A bridge goroutine (Option A) | 5–8 days | | Heartbeat adapter (translate SSE status to `/registry/heartbeat`) | 2–3 days | | Canvas: hide A2A tab for managed-agent workspaces | 1 day | | Tests, migration, docs | 3–4 days | | **Total** | **~3 weeks** | Even at 3 weeks, the result is a permanently limited path with no A2A and no resource controls. Not recommended. --- ## Files | File | Purpose | |---|---| | `demo.py` | Runnable spike script — auth, provision, session, two turns, timing | | `README.md` | This assessment |