History

molecule-ai[bot] 08f8be820a spike(#745 ): evaluate Anthropic Managed Agents as third executor option Adds `spike/issue-742-managed-agents-executor/` with: - `demo.py`: standalone Python script that authenticates to the Managed Agents beta API, provisions an environment + agent, starts a session, runs two conversational turns (with cross-turn state recall verification), and prints cold-start and per-turn latency measurements. - `README.md`: full integration assessment covering provisioner changes needed, A2A routing conflict (primary blocker — sessions have no addressable URL), cost model, API gaps table, and a no-ship recommendation with a 3-week effort estimate if we proceeded anyway. Recommendation: no-ship for primary executor. Revisit as a batch/cron worker in Phase H once Molecule's MCP server is feature-complete. Closes #745. References #742. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 15:43:21 +00:00
..
demo.py	spike(#745 ): evaluate Anthropic Managed Agents as third executor option	2026-04-17 15:43:21 +00:00
README.md	spike(#745 ): evaluate Anthropic Managed Agents as third executor option	2026-04-17 15:43:21 +00:00

molecule-ai[bot] 08f8be820a

spike(#745 ): evaluate Anthropic Managed Agents as third executor option

Adds `spike/issue-742-managed-agents-executor/` with:
- `demo.py`: standalone Python script that authenticates to the Managed Agents
  beta API, provisions an environment + agent, starts a session, runs two
  conversational turns (with cross-turn state recall verification), and prints
  cold-start and per-turn latency measurements.
- `README.md`: full integration assessment covering provisioner changes needed,
  A2A routing conflict (primary blocker — sessions have no addressable URL),
  cost model, API gaps table, and a no-ship recommendation with a 3-week effort
  estimate if we proceeded anyway.

Recommendation: no-ship for primary executor. Revisit as a batch/cron worker
in Phase H once Molecule's MCP server is feature-complete.

Closes #745. References #742.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-17 15:43:21 +00:00

demo.py

spike(#745 ): evaluate Anthropic Managed Agents as third executor option

2026-04-17 15:43:21 +00:00

README.md

spike(#745 ): evaluate Anthropic Managed Agents as third executor option

2026-04-17 15:43:21 +00:00

README.md

Spike #745 — Anthropic Managed Agents as a Molecule Executor

Parent issue: #742 — "Third executor option: Anthropic Managed Agents"
Spike issue: #745

What We Evaluated

Anthropic's Managed Agents beta (managed-agents-2026-04-01) lets you create persistent agent objects, spin up per-task sessions, and stream execution events via SSE — all hosted on Anthropic's infrastructure. The key question for Molecule is: can this replace (or complement) the self-hosted Docker workspace executor?

Demo

demo.py exercises the full lifecycle:

ANTHROPIC_API_KEY=sk-ant-... python demo.py

What it measures:

Phase	What we time
`environment create`	Provisioning a cloud execution environment
`agent create`	Storing the agent config (model, system prompt, tools)
`cold start`	`sessions.create()` → session ready
`turn 1 RTT`	User message → SSE drain → `session.status_idle`
`turn 2 RTT`	Same, plus implicit state recall check

State continuity is verified by injecting a unique token in turn 1 and asserting the agent quotes it back in turn 2. Exit code 0 = pass, 1 = fail.

Integration Assessment

1. Provisioner changes

Molecule's provisioner today calls docker.NewClient(), pulls an image, creates a container with resource limits, and waits for /registry/register from inside the container. A Managed Agents executor would replace that entire path:

current:  docker pull → container run → heartbeat register
proposed: agents.create() → sessions.create() → SSE stream

A new runtime: "managed-agent" value in workspaces.runtime would branch the provisioner. The workspace row would store agent_id (persistent) and session_id (ephemeral per-run) instead of a Docker container ID.

Migration effort: medium.
A new ManagedAgentProvisioner can be added alongside the existing Docker provisioner without touching the common path. The primary cost is the integration layer described below.

2. A2A routing — the blocking architectural conflict

This is the hard blocker. Molecule's A2A proxy (POST /workspaces/:id/a2a) resolves ws.agent_url and forwards an HTTP POST to the running container. Every workspace has a persistent, addressable HTTP endpoint.

Managed Agents sessions communicate exclusively through the Anthropic SSE API — there is no per-session URL that the platform can proxy to. The session is a streaming consumer, not a server.

Bridging the gap requires one of:

Option A — Long-poll bridge (complex, fragile)
Keep a goroutine open per session holding the SSE stream. When an A2A message arrives, inject it via sessions.events.send() and wait for the next agent.message event. Map response back to A2A caller.
Risk: the goroutine dies, the session becomes unreachable, and A2A callers time out with no clear error path.

Option B — Managed Agents as leaf-only workers (scope reduction)
Only use Managed Agents for workspaces that receive tasks (no outbound A2A). The platform queues work, opens a session, streams the result, and closes the session. No live bridge needed.
Risk: many real workspaces delegate to peers — leaf-only scope limits applicability to batch/one-shot agents.

Option C — Hybrid: MCP bridge
Anthropic agents can call MCP servers. The platform exposes its A2A proxy as an MCP server; the agent's MCP tool calls translate back to A2A messages.
Risk: this inverts the call direction (agent calls platform instead of platform-to-agent) and breaks the current workspace-to-workspace trust model. Security review required before shipping.

3. Cost model

Managed Agents sessions are charged on top of standard token pricing — the platform receives its own compute costs. For comparison, the Docker path uses a customer-supplied model key with zero platform markup.

The cold-start latency (environment + session creation) measured in the demo adds overhead before the first token. For interactive canvas workflows where workspaces are expected to be long-lived ("always on"), this model is a poor fit. For batch workspaces that run occasionally, it may save infrastructure cost.

4. API gaps (as of 2026-04-17)

Molecule requirement	Managed Agents support
Persistent HTTP endpoint for A2A	No — SSE only
Heartbeat / liveness signal	Partial — session status via poll or SSE, but no proactive push to the platform
Resource limits (memory, CPU)	No — environment config offers only `networking`
Custom Docker image	No — Anthropic-managed base image only
`workspace_dir` bind-mount	No — files uploaded via `client.beta.files` API
Bearer token auth per workspace	No — auth is Anthropic API key, not per-workspace token
Plugin system (arbitrary pip installs)	No — built-in `agent_toolset_20260401` or custom tool callbacks
Runtime detection (`config.yaml` introspection)	Not applicable — config lives in agent object

Ship/No-Ship Recommendation

Decision: No-ship for the primary executor. Spike further as a batch worker.

Rationale:

A2A proxy is the load-bearing constraint. Molecule's value proposition is multi-workspace orchestration. A workspace executor that can't be reached by other workspaces over A2A is not a Molecule workspace — it's a standalone call to the Anthropic API with extra steps.
No persistent endpoint = no topology. The canvas shows workspaces as nodes that communicate. A Managed Agents session has no addressable URL; the canvas can't represent it as a live peer.
Cold start is non-trivial. Preliminary measurements from the demo show environment + session creation adding visible latency before the first token. For the "always-on" UX the canvas targets, this is noticeable.
Scope would be a dead end. Shipping Managed Agents as a leaf-only, no-A2A executor today means two provisioner paths diverge. The Managed Agents path can never grow to full parity without Anthropic exposing a persistent addressable URL. We'd be maintaining a permanently limited path.

What to do instead

Phase H (planned): Consider Managed Agents as the execution target for scheduled tasks only (workspace_schedules cron rows). A cron fire could spin up a session, run the prompt, stream the result, and self-report via /activity. No live A2A needed. Effort: ~2 weeks.
Watch the API. If Anthropic ships a stable URL per session (like a webhook delivery endpoint), re-evaluate. The MCP bridge angle (Option C above) also becomes more viable once Molecule's MCP server is feature-complete.

Rough Effort Estimate (if we did ship)

Component	Effort
`ManagedAgentProvisioner` (create/start/stop session)	3–5 days
A2A bridge goroutine (Option A)	5–8 days
Heartbeat adapter (translate SSE status to `/registry/heartbeat`)	2–3 days
Canvas: hide A2A tab for managed-agent workspaces	1 day
Tests, migration, docs	3–4 days
Total	~3 weeks

Even at 3 weeks, the result is a permanently limited path with no A2A and no resource controls. Not recommended.

Files

File	Purpose
`demo.py`	Runnable spike script — auth, provision, session, two turns, timing
`README.md`	This assessment

README.md Unescape Escape

Spike #745 — Anthropic Managed Agents as a Molecule Executor

What We Evaluated

Demo

Integration Assessment

1. Provisioner changes

2. A2A routing — the blocking architectural conflict

3. Cost model

4. API gaps (as of 2026-04-17)

Ship/No-Ship Recommendation

Decision: No-ship for the primary executor. Spike further as a batch worker.

What to do instead

Rough Effort Estimate (if we did ship)

Files

README.md