Adds `spike/issue-742-managed-agents-executor/` with: - `demo.py`: standalone Python script that authenticates to the Managed Agents beta API, provisions an environment + agent, starts a session, runs two conversational turns (with cross-turn state recall verification), and prints cold-start and per-turn latency measurements. - `README.md`: full integration assessment covering provisioner changes needed, A2A routing conflict (primary blocker — sessions have no addressable URL), cost model, API gaps table, and a no-ship recommendation with a 3-week effort estimate if we proceeded anyway. Recommendation: no-ship for primary executor. Revisit as a batch/cron worker in Phase H once Molecule's MCP server is feature-complete. Closes #745. References #742. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| demo.py | ||
| README.md | ||
Spike #745 — Anthropic Managed Agents as a Molecule Executor
Parent issue: #742 — "Third executor option: Anthropic Managed Agents"
Spike issue: #745
What We Evaluated
Anthropic's Managed Agents beta (managed-agents-2026-04-01) lets you create
persistent agent objects, spin up per-task sessions, and stream execution events
via SSE — all hosted on Anthropic's infrastructure. The key question for Molecule
is: can this replace (or complement) the self-hosted Docker workspace executor?
Demo
demo.py exercises the full lifecycle:
ANTHROPIC_API_KEY=sk-ant-... python demo.py
What it measures:
| Phase | What we time |
|---|---|
environment create |
Provisioning a cloud execution environment |
agent create |
Storing the agent config (model, system prompt, tools) |
cold start |
sessions.create() → session ready |
turn 1 RTT |
User message → SSE drain → session.status_idle |
turn 2 RTT |
Same, plus implicit state recall check |
State continuity is verified by injecting a unique token in turn 1 and asserting the agent quotes it back in turn 2. Exit code 0 = pass, 1 = fail.
Integration Assessment
1. Provisioner changes
Molecule's provisioner today calls docker.NewClient(), pulls an image,
creates a container with resource limits, and waits for /registry/register
from inside the container. A Managed Agents executor would replace that
entire path:
current: docker pull → container run → heartbeat register
proposed: agents.create() → sessions.create() → SSE stream
A new runtime: "managed-agent" value in workspaces.runtime would branch
the provisioner. The workspace row would store agent_id (persistent) and
session_id (ephemeral per-run) instead of a Docker container ID.
Migration effort: medium.
A new ManagedAgentProvisioner can be added alongside the existing Docker
provisioner without touching the common path. The primary cost is the
integration layer described below.
2. A2A routing — the blocking architectural conflict
This is the hard blocker. Molecule's A2A proxy (POST /workspaces/:id/a2a)
resolves ws.agent_url and forwards an HTTP POST to the running container.
Every workspace has a persistent, addressable HTTP endpoint.
Managed Agents sessions communicate exclusively through the Anthropic SSE API — there is no per-session URL that the platform can proxy to. The session is a streaming consumer, not a server.
Bridging the gap requires one of:
Option A — Long-poll bridge (complex, fragile)
Keep a goroutine open per session holding the SSE stream. When an A2A message
arrives, inject it via sessions.events.send() and wait for the next
agent.message event. Map response back to A2A caller.
Risk: the goroutine dies, the session becomes unreachable, and A2A callers time out
with no clear error path.
Option B — Managed Agents as leaf-only workers (scope reduction)
Only use Managed Agents for workspaces that receive tasks (no outbound A2A).
The platform queues work, opens a session, streams the result, and closes the
session. No live bridge needed.
Risk: many real workspaces delegate to peers — leaf-only scope limits
applicability to batch/one-shot agents.
Option C — Hybrid: MCP bridge
Anthropic agents can call MCP servers. The platform exposes its A2A proxy as
an MCP server; the agent's MCP tool calls translate back to A2A messages.
Risk: this inverts the call direction (agent calls platform instead of
platform-to-agent) and breaks the current workspace-to-workspace trust model.
Security review required before shipping.
3. Cost model
Managed Agents sessions are charged on top of standard token pricing — the platform receives its own compute costs. For comparison, the Docker path uses a customer-supplied model key with zero platform markup.
The cold-start latency (environment + session creation) measured in the demo adds overhead before the first token. For interactive canvas workflows where workspaces are expected to be long-lived ("always on"), this model is a poor fit. For batch workspaces that run occasionally, it may save infrastructure cost.
4. API gaps (as of 2026-04-17)
| Molecule requirement | Managed Agents support |
|---|---|
| Persistent HTTP endpoint for A2A | No — SSE only |
| Heartbeat / liveness signal | Partial — session status via poll or SSE, but no proactive push to the platform |
| Resource limits (memory, CPU) | No — environment config offers only networking |
| Custom Docker image | No — Anthropic-managed base image only |
workspace_dir bind-mount |
No — files uploaded via client.beta.files API |
| Bearer token auth per workspace | No — auth is Anthropic API key, not per-workspace token |
| Plugin system (arbitrary pip installs) | No — built-in agent_toolset_20260401 or custom tool callbacks |
Runtime detection (config.yaml introspection) |
Not applicable — config lives in agent object |
Ship/No-Ship Recommendation
Decision: No-ship for the primary executor. Spike further as a batch worker.
Rationale:
-
A2A proxy is the load-bearing constraint. Molecule's value proposition is multi-workspace orchestration. A workspace executor that can't be reached by other workspaces over A2A is not a Molecule workspace — it's a standalone call to the Anthropic API with extra steps.
-
No persistent endpoint = no topology. The canvas shows workspaces as nodes that communicate. A Managed Agents session has no addressable URL; the canvas can't represent it as a live peer.
-
Cold start is non-trivial. Preliminary measurements from the demo show environment + session creation adding visible latency before the first token. For the "always-on" UX the canvas targets, this is noticeable.
-
Scope would be a dead end. Shipping Managed Agents as a leaf-only, no-A2A executor today means two provisioner paths diverge. The Managed Agents path can never grow to full parity without Anthropic exposing a persistent addressable URL. We'd be maintaining a permanently limited path.
What to do instead
-
Phase H (planned): Consider Managed Agents as the execution target for scheduled tasks only (
workspace_schedulescron rows). A cron fire could spin up a session, run the prompt, stream the result, and self-report via/activity. No live A2A needed. Effort: ~2 weeks. -
Watch the API. If Anthropic ships a stable URL per session (like a webhook delivery endpoint), re-evaluate. The MCP bridge angle (Option C above) also becomes more viable once Molecule's MCP server is feature-complete.
Rough Effort Estimate (if we did ship)
| Component | Effort |
|---|---|
ManagedAgentProvisioner (create/start/stop session) |
3–5 days |
| A2A bridge goroutine (Option A) | 5–8 days |
Heartbeat adapter (translate SSE status to /registry/heartbeat) |
2–3 days |
| Canvas: hide A2A tab for managed-agent workspaces | 1 day |
| Tests, migration, docs | 3–4 days |
| Total | ~3 weeks |
Even at 3 weeks, the result is a permanently limited path with no A2A and no resource controls. Not recommended.
Files
| File | Purpose |
|---|---|
demo.py |
Runnable spike script — auth, provision, session, two turns, timing |
README.md |
This assessment |