forked from molecule-ai/molecule-core
186 lines
7.3 KiB
Markdown
186 lines
7.3 KiB
Markdown
# Spike #745 — Anthropic Managed Agents as a Molecule Executor
|
||
|
||
**Parent issue:** #742 — "Third executor option: Anthropic Managed Agents"
|
||
**Spike issue:** #745
|
||
|
||
## What We Evaluated
|
||
|
||
Anthropic's Managed Agents beta (`managed-agents-2026-04-01`) lets you create
|
||
persistent agent objects, spin up per-task sessions, and stream execution events
|
||
via SSE — all hosted on Anthropic's infrastructure. The key question for Molecule
|
||
is: *can this replace (or complement) the self-hosted Docker workspace executor?*
|
||
|
||
---
|
||
|
||
## Demo
|
||
|
||
`demo.py` exercises the full lifecycle:
|
||
|
||
```
|
||
ANTHROPIC_API_KEY=sk-ant-... python demo.py
|
||
```
|
||
|
||
What it measures:
|
||
|
||
| Phase | What we time |
|
||
|---|---|
|
||
| `environment create` | Provisioning a cloud execution environment |
|
||
| `agent create` | Storing the agent config (model, system prompt, tools) |
|
||
| `cold start` | `sessions.create()` → session ready |
|
||
| `turn 1 RTT` | User message → SSE drain → `session.status_idle` |
|
||
| `turn 2 RTT` | Same, plus implicit state recall check |
|
||
|
||
State continuity is verified by injecting a unique token in turn 1 and
|
||
asserting the agent quotes it back in turn 2. Exit code 0 = pass, 1 = fail.
|
||
|
||
---
|
||
|
||
## Integration Assessment
|
||
|
||
### 1. Provisioner changes
|
||
|
||
Molecule's provisioner today calls `docker.NewClient()`, pulls an image,
|
||
creates a container with resource limits, and waits for `/registry/register`
|
||
from inside the container. A Managed Agents executor would replace that
|
||
entire path:
|
||
|
||
```
|
||
current: docker pull → container run → heartbeat register
|
||
proposed: agents.create() → sessions.create() → SSE stream
|
||
```
|
||
|
||
A new `runtime: "managed-agent"` value in `workspaces.runtime` would branch
|
||
the provisioner. The workspace row would store `agent_id` (persistent) and
|
||
`session_id` (ephemeral per-run) instead of a Docker container ID.
|
||
|
||
**Migration effort:** medium.
|
||
A new `ManagedAgentProvisioner` can be added alongside the existing Docker
|
||
provisioner without touching the common path. The primary cost is the
|
||
integration layer described below.
|
||
|
||
---
|
||
|
||
### 2. A2A routing — the blocking architectural conflict
|
||
|
||
This is the hard blocker. Molecule's A2A proxy (`POST /workspaces/:id/a2a`)
|
||
resolves `ws.agent_url` and forwards an HTTP POST to the running container.
|
||
Every workspace has a persistent, addressable HTTP endpoint.
|
||
|
||
Managed Agents sessions communicate exclusively through the Anthropic SSE API —
|
||
there is no per-session URL that the platform can proxy to. The session is a
|
||
streaming consumer, not a server.
|
||
|
||
Bridging the gap requires one of:
|
||
|
||
**Option A — Long-poll bridge (complex, fragile)**
|
||
Keep a goroutine open per session holding the SSE stream. When an A2A message
|
||
arrives, inject it via `sessions.events.send()` and wait for the next
|
||
`agent.message` event. Map response back to A2A caller.
|
||
Risk: the goroutine dies, the session becomes unreachable, and A2A callers time out
|
||
with no clear error path.
|
||
|
||
**Option B — Managed Agents as leaf-only workers (scope reduction)**
|
||
Only use Managed Agents for workspaces that *receive* tasks (no outbound A2A).
|
||
The platform queues work, opens a session, streams the result, and closes the
|
||
session. No live bridge needed.
|
||
Risk: many real workspaces delegate to peers — leaf-only scope limits
|
||
applicability to batch/one-shot agents.
|
||
|
||
**Option C — Hybrid: MCP bridge**
|
||
Anthropic agents can call MCP servers. The platform exposes its A2A proxy as
|
||
an MCP server; the agent's MCP tool calls translate back to A2A messages.
|
||
Risk: this inverts the call direction (agent calls platform instead of
|
||
platform-to-agent) and breaks the current workspace-to-workspace trust model.
|
||
Security review required before shipping.
|
||
|
||
---
|
||
|
||
### 3. Cost model
|
||
|
||
Managed Agents sessions are charged on top of standard token pricing — the
|
||
platform receives its own compute costs. For comparison, the Docker path uses
|
||
a customer-supplied model key with zero platform markup.
|
||
|
||
The cold-start latency (environment + session creation) measured in the demo
|
||
adds overhead before the first token. For interactive canvas workflows where
|
||
workspaces are expected to be long-lived ("always on"), this model is a poor
|
||
fit. For batch workspaces that run occasionally, it may save infrastructure
|
||
cost.
|
||
|
||
---
|
||
|
||
### 4. API gaps (as of 2026-04-17)
|
||
|
||
| Molecule requirement | Managed Agents support |
|
||
|---|---|
|
||
| Persistent HTTP endpoint for A2A | **No** — SSE only |
|
||
| Heartbeat / liveness signal | **Partial** — session status via poll or SSE, but no proactive push to the platform |
|
||
| Resource limits (memory, CPU) | **No** — environment config offers only `networking` |
|
||
| Custom Docker image | **No** — Anthropic-managed base image only |
|
||
| `workspace_dir` bind-mount | **No** — files uploaded via `client.beta.files` API |
|
||
| Bearer token auth per workspace | **No** — auth is Anthropic API key, not per-workspace token |
|
||
| Plugin system (arbitrary pip installs) | **No** — built-in `agent_toolset_20260401` or custom tool callbacks |
|
||
| Runtime detection (`config.yaml` introspection) | **Not applicable** — config lives in agent object |
|
||
|
||
---
|
||
|
||
## Ship/No-Ship Recommendation
|
||
|
||
### Decision: **No-ship for the primary executor. Spike further as a batch worker.**
|
||
|
||
**Rationale:**
|
||
|
||
1. **A2A proxy is the load-bearing constraint.** Molecule's value proposition
|
||
is multi-workspace orchestration. A workspace executor that can't be reached
|
||
by other workspaces over A2A is not a Molecule workspace — it's a standalone
|
||
call to the Anthropic API with extra steps.
|
||
|
||
2. **No persistent endpoint = no topology.** The canvas shows workspaces as
|
||
nodes that communicate. A Managed Agents session has no addressable URL; the
|
||
canvas can't represent it as a live peer.
|
||
|
||
3. **Cold start is non-trivial.** Preliminary measurements from the demo show
|
||
environment + session creation adding visible latency before the first token.
|
||
For the "always-on" UX the canvas targets, this is noticeable.
|
||
|
||
4. **Scope would be a dead end.** Shipping Managed Agents as a leaf-only,
|
||
no-A2A executor today means two provisioner paths diverge. The Managed Agents
|
||
path can never grow to full parity without Anthropic exposing a persistent
|
||
addressable URL. We'd be maintaining a permanently limited path.
|
||
|
||
### What to do instead
|
||
|
||
- **Phase H (planned):** Consider Managed Agents as the execution target for
|
||
*scheduled* tasks only (`workspace_schedules` cron rows). A cron fire could
|
||
spin up a session, run the prompt, stream the result, and self-report via
|
||
`/activity`. No live A2A needed. Effort: ~2 weeks.
|
||
|
||
- **Watch the API.** If Anthropic ships a stable URL per session (like a
|
||
webhook delivery endpoint), re-evaluate. The MCP bridge angle (Option C above)
|
||
also becomes more viable once Molecule's MCP server is feature-complete.
|
||
|
||
---
|
||
|
||
## Rough Effort Estimate (if we did ship)
|
||
|
||
| Component | Effort |
|
||
|---|---|
|
||
| `ManagedAgentProvisioner` (create/start/stop session) | 3–5 days |
|
||
| A2A bridge goroutine (Option A) | 5–8 days |
|
||
| Heartbeat adapter (translate SSE status to `/registry/heartbeat`) | 2–3 days |
|
||
| Canvas: hide A2A tab for managed-agent workspaces | 1 day |
|
||
| Tests, migration, docs | 3–4 days |
|
||
| **Total** | **~3 weeks** |
|
||
|
||
Even at 3 weeks, the result is a permanently limited path with no A2A and no
|
||
resource controls. Not recommended.
|
||
|
||
---
|
||
|
||
## Files
|
||
|
||
| File | Purpose |
|
||
|---|---|
|
||
| `demo.py` | Runnable spike script — auth, provision, session, two turns, timing |
|
||
| `README.md` | This assessment |
|