spike(#745): evaluate Anthropic Managed Agents as third executor option

Adds `spike/issue-742-managed-agents-executor/` with:
- `demo.py`: standalone Python script that authenticates to the Managed Agents
  beta API, provisions an environment + agent, starts a session, runs two
  conversational turns (with cross-turn state recall verification), and prints
  cold-start and per-turn latency measurements.
- `README.md`: full integration assessment covering provisioner changes needed,
  A2A routing conflict (primary blocker — sessions have no addressable URL),
  cost model, API gaps table, and a no-ship recommendation with a 3-week effort
  estimate if we proceeded anyway.

Recommendation: no-ship for primary executor. Revisit as a batch/cron worker
in Phase H once Molecule's MCP server is feature-complete.

Closes #745. References #742.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
molecule-ai[bot] 2026-04-17 15:43:21 +00:00 committed by GitHub
parent 00ef832e33
commit 08f8be820a
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 396 additions and 0 deletions

View File

@ -0,0 +1,185 @@
# Spike #745 — Anthropic Managed Agents as a Molecule Executor
**Parent issue:** #742 — "Third executor option: Anthropic Managed Agents"
**Spike issue:** #745
## What We Evaluated
Anthropic's Managed Agents beta (`managed-agents-2026-04-01`) lets you create
persistent agent objects, spin up per-task sessions, and stream execution events
via SSE — all hosted on Anthropic's infrastructure. The key question for Molecule
is: *can this replace (or complement) the self-hosted Docker workspace executor?*
---
## Demo
`demo.py` exercises the full lifecycle:
```
ANTHROPIC_API_KEY=sk-ant-... python demo.py
```
What it measures:
| Phase | What we time |
|---|---|
| `environment create` | Provisioning a cloud execution environment |
| `agent create` | Storing the agent config (model, system prompt, tools) |
| `cold start` | `sessions.create()` → session ready |
| `turn 1 RTT` | User message → SSE drain → `session.status_idle` |
| `turn 2 RTT` | Same, plus implicit state recall check |
State continuity is verified by injecting a unique token in turn 1 and
asserting the agent quotes it back in turn 2. Exit code 0 = pass, 1 = fail.
---
## Integration Assessment
### 1. Provisioner changes
Molecule's provisioner today calls `docker.NewClient()`, pulls an image,
creates a container with resource limits, and waits for `/registry/register`
from inside the container. A Managed Agents executor would replace that
entire path:
```
current: docker pull → container run → heartbeat register
proposed: agents.create() → sessions.create() → SSE stream
```
A new `runtime: "managed-agent"` value in `workspaces.runtime` would branch
the provisioner. The workspace row would store `agent_id` (persistent) and
`session_id` (ephemeral per-run) instead of a Docker container ID.
**Migration effort:** medium.
A new `ManagedAgentProvisioner` can be added alongside the existing Docker
provisioner without touching the common path. The primary cost is the
integration layer described below.
---
### 2. A2A routing — the blocking architectural conflict
This is the hard blocker. Molecule's A2A proxy (`POST /workspaces/:id/a2a`)
resolves `ws.agent_url` and forwards an HTTP POST to the running container.
Every workspace has a persistent, addressable HTTP endpoint.
Managed Agents sessions communicate exclusively through the Anthropic SSE API —
there is no per-session URL that the platform can proxy to. The session is a
streaming consumer, not a server.
Bridging the gap requires one of:
**Option A — Long-poll bridge (complex, fragile)**
Keep a goroutine open per session holding the SSE stream. When an A2A message
arrives, inject it via `sessions.events.send()` and wait for the next
`agent.message` event. Map response back to A2A caller.
Risk: the goroutine dies, the session becomes unreachable, and A2A callers time out
with no clear error path.
**Option B — Managed Agents as leaf-only workers (scope reduction)**
Only use Managed Agents for workspaces that *receive* tasks (no outbound A2A).
The platform queues work, opens a session, streams the result, and closes the
session. No live bridge needed.
Risk: many real workspaces delegate to peers — leaf-only scope limits
applicability to batch/one-shot agents.
**Option C — Hybrid: MCP bridge**
Anthropic agents can call MCP servers. The platform exposes its A2A proxy as
an MCP server; the agent's MCP tool calls translate back to A2A messages.
Risk: this inverts the call direction (agent calls platform instead of
platform-to-agent) and breaks the current workspace-to-workspace trust model.
Security review required before shipping.
---
### 3. Cost model
Managed Agents sessions are charged on top of standard token pricing — the
platform receives its own compute costs. For comparison, the Docker path uses
a customer-supplied model key with zero platform markup.
The cold-start latency (environment + session creation) measured in the demo
adds overhead before the first token. For interactive canvas workflows where
workspaces are expected to be long-lived ("always on"), this model is a poor
fit. For batch workspaces that run occasionally, it may save infrastructure
cost.
---
### 4. API gaps (as of 2026-04-17)
| Molecule requirement | Managed Agents support |
|---|---|
| Persistent HTTP endpoint for A2A | **No** — SSE only |
| Heartbeat / liveness signal | **Partial** — session status via poll or SSE, but no proactive push to the platform |
| Resource limits (memory, CPU) | **No** — environment config offers only `networking` |
| Custom Docker image | **No** — Anthropic-managed base image only |
| `workspace_dir` bind-mount | **No** — files uploaded via `client.beta.files` API |
| Bearer token auth per workspace | **No** — auth is Anthropic API key, not per-workspace token |
| Plugin system (arbitrary pip installs) | **No** — built-in `agent_toolset_20260401` or custom tool callbacks |
| Runtime detection (`config.yaml` introspection) | **Not applicable** — config lives in agent object |
---
## Ship/No-Ship Recommendation
### Decision: **No-ship for the primary executor. Spike further as a batch worker.**
**Rationale:**
1. **A2A proxy is the load-bearing constraint.** Molecule's value proposition
is multi-workspace orchestration. A workspace executor that can't be reached
by other workspaces over A2A is not a Molecule workspace — it's a standalone
call to the Anthropic API with extra steps.
2. **No persistent endpoint = no topology.** The canvas shows workspaces as
nodes that communicate. A Managed Agents session has no addressable URL; the
canvas can't represent it as a live peer.
3. **Cold start is non-trivial.** Preliminary measurements from the demo show
environment + session creation adding visible latency before the first token.
For the "always-on" UX the canvas targets, this is noticeable.
4. **Scope would be a dead end.** Shipping Managed Agents as a leaf-only,
no-A2A executor today means two provisioner paths diverge. The Managed Agents
path can never grow to full parity without Anthropic exposing a persistent
addressable URL. We'd be maintaining a permanently limited path.
### What to do instead
- **Phase H (planned):** Consider Managed Agents as the execution target for
*scheduled* tasks only (`workspace_schedules` cron rows). A cron fire could
spin up a session, run the prompt, stream the result, and self-report via
`/activity`. No live A2A needed. Effort: ~2 weeks.
- **Watch the API.** If Anthropic ships a stable URL per session (like a
webhook delivery endpoint), re-evaluate. The MCP bridge angle (Option C above)
also becomes more viable once Molecule's MCP server is feature-complete.
---
## Rough Effort Estimate (if we did ship)
| Component | Effort |
|---|---|
| `ManagedAgentProvisioner` (create/start/stop session) | 35 days |
| A2A bridge goroutine (Option A) | 58 days |
| Heartbeat adapter (translate SSE status to `/registry/heartbeat`) | 23 days |
| Canvas: hide A2A tab for managed-agent workspaces | 1 day |
| Tests, migration, docs | 34 days |
| **Total** | **~3 weeks** |
Even at 3 weeks, the result is a permanently limited path with no A2A and no
resource controls. Not recommended.
---
## Files
| File | Purpose |
|---|---|
| `demo.py` | Runnable spike script — auth, provision, session, two turns, timing |
| `README.md` | This assessment |

View File

@ -0,0 +1,211 @@
#!/usr/bin/env python3
"""
Spike #745 — Anthropic Managed Agents as a Molecule workspace executor.
This script validates the managed-agents-2026-04-01 beta API against the
criteria in issue #742:
- Authentication & agent provisioning
- Session start (cold-start latency)
- Round-trip prompt/response (per-turn latency)
- State persistence across turns (session continuity)
- Clean shutdown
Usage:
ANTHROPIC_API_KEY=sk-ant-... python demo.py
Optional env vars:
MA_SKIP_CLEANUP=1 keep the agent/session alive after the run
MA_VERBOSE=1 print every SSE event type (not just agent messages)
"""
import os
import sys
import time
import json
try:
import anthropic
except ImportError:
sys.exit("anthropic SDK not installed — run: pip install anthropic")
# ── helpers ──────────────────────────────────────────────────────────────────
VERBOSE = os.getenv("MA_VERBOSE") == "1"
SKIP_CLEANUP = os.getenv("MA_SKIP_CLEANUP") == "1"
def ts() -> float:
return time.monotonic()
def elapsed(start: float) -> float:
return round(time.monotonic() - start, 3)
def collect_turn(client: anthropic.Anthropic, session_id: str, message: str) -> tuple[str, float]:
"""
Stream-first turn: open the SSE stream, send the user message inside the
context manager, then drain events until session.status_idle or
session.status_terminated.
Returns (agent_reply_text, round_trip_seconds).
Raises RuntimeError if the session terminates unexpectedly mid-turn.
"""
reply_parts: list[str] = []
turn_start = ts()
with client.beta.sessions.stream(session_id=session_id) as stream:
# Send inside the stream so we never miss early events
client.beta.sessions.events.send(
session_id=session_id,
events=[
{
"type": "user.message",
"content": [{"type": "text", "text": message}],
}
],
)
for event in stream:
if VERBOSE:
print(f" [evt] {event.type}", flush=True)
if event.type == "agent.message":
for block in event.content:
if block.type == "text":
reply_parts.append(block.text)
elif event.type == "session.status_idle":
break # normal turn completion
elif event.type == "session.status_terminated":
# session ended — surface whatever text arrived
if reply_parts:
break
raise RuntimeError("Session terminated unexpectedly during turn")
return "".join(reply_parts), elapsed(turn_start)
# ── main ─────────────────────────────────────────────────────────────────────
def main() -> None:
api_key = os.environ.get("ANTHROPIC_API_KEY")
if not api_key:
sys.exit("ANTHROPIC_API_KEY not set")
client = anthropic.Anthropic(api_key=api_key)
# ── 1. Create environment ─────────────────────────────────────────────────
print("=== Managed Agents Spike #745 ===\n")
print("Step 1: Creating cloud environment…")
t0 = ts()
environment = client.beta.environments.create(
name="molecule-spike-742",
config={
"type": "cloud",
"networking": {"type": "unrestricted"},
},
)
env_time = elapsed(t0)
print(f" environment_id : {environment.id}")
print(f" env create time: {env_time}s\n")
# ── 2. Create agent ───────────────────────────────────────────────────────
print("Step 2: Creating agent…")
t0 = ts()
agent = client.beta.agents.create(
name="molecule-spike-agent",
model="claude-opus-4-7",
system=(
"You are a stateful test agent for the Molecule AI spike. "
"When asked to remember something, confirm you will. "
"On subsequent turns, recall it accurately."
),
tools=[
{"type": "agent_toolset_20260401", "default_config": {"enabled": True}}
],
)
agent_time = elapsed(t0)
print(f" agent_id : {agent.id}")
print(f" version : {agent.version}")
print(f" agent create time: {agent_time}s\n")
# ── 3. Create session (cold start) ────────────────────────────────────────
print("Step 3: Creating session (cold start)…")
cold_start = ts()
session = client.beta.sessions.create(
agent={"type": "agent", "id": agent.id, "version": agent.version},
environment_id=environment.id,
title="molecule-spike-742-session",
)
cold_time = elapsed(cold_start)
print(f" session_id : {session.id}")
print(f" status : {session.status}")
print(f" cold-start : {cold_time}s\n")
# ── 4. Turn 1 — establish a fact the agent should remember ────────────────
turn1_prompt = (
"Please remember this token for the rest of our conversation: "
"MOLECULE_SPIKE_7a3f. "
"What is today's task? Reply in one sentence."
)
print(f"Turn 1 prompt:\n {turn1_prompt!r}\n")
turn1_reply, turn1_time = collect_turn(client, session.id, turn1_prompt)
print(f"Turn 1 reply ({turn1_time}s):\n {turn1_reply!r}\n")
# ── 5. Turn 2 — verify state persistence ─────────────────────────────────
turn2_prompt = "What was the token I asked you to remember?"
print(f"Turn 2 prompt:\n {turn2_prompt!r}\n")
turn2_reply, turn2_time = collect_turn(client, session.id, turn2_prompt)
print(f"Turn 2 reply ({turn2_time}s):\n {turn2_reply!r}\n")
# ── 6. State continuity check ─────────────────────────────────────────────
token_recalled = "MOLECULE_SPIKE_7a3f" in turn2_reply
print("=== Results ===")
print(f" environment create : {env_time}s")
print(f" agent create : {agent_time}s")
print(f" cold-start (session create → ready) : {cold_time}s")
print(f" turn 1 round-trip : {turn1_time}s")
print(f" turn 2 round-trip : {turn2_time}s")
print(f" state continuity : {'PASS — token recalled' if token_recalled else 'FAIL — token not found in turn 2'}")
# Emit JSON summary for easy parsing in CI / PR bots
summary = {
"environment_id": environment.id,
"agent_id": agent.id,
"session_id": session.id,
"timings": {
"environment_create_s": env_time,
"agent_create_s": agent_time,
"cold_start_s": cold_time,
"turn1_rtt_s": turn1_time,
"turn2_rtt_s": turn2_time,
},
"state_continuity_pass": token_recalled,
}
print("\nJSON summary:")
print(json.dumps(summary, indent=2))
# ── 7. Cleanup ────────────────────────────────────────────────────────────
if not SKIP_CLEANUP:
print("\nCleaning up…")
try:
client.beta.sessions.delete(session_id=session.id)
print(f" session {session.id} deleted")
except Exception as exc:
print(f" session delete warning: {exc}")
# Agents are persistent/shared — don't delete unless explicitly asked.
# Set MA_SKIP_CLEANUP=1 and clean up manually with:
# client.beta.agents.delete(agent.id)
print(f" agent {agent.id} kept (persistent object; delete manually if needed)")
else:
print(f"\nSKIP_CLEANUP=1 — session and agent left alive.")
print(f" Session: {session.id}")
print(f" Agent: {agent.id}")
sys.exit(0 if token_recalled else 1)
if __name__ == "__main__":
main()