forked from molecule-ai/molecule-core
spike(#745): evaluate Anthropic Managed Agents as third executor option
Adds `spike/issue-742-managed-agents-executor/` with: - `demo.py`: standalone Python script that authenticates to the Managed Agents beta API, provisions an environment + agent, starts a session, runs two conversational turns (with cross-turn state recall verification), and prints cold-start and per-turn latency measurements. - `README.md`: full integration assessment covering provisioner changes needed, A2A routing conflict (primary blocker — sessions have no addressable URL), cost model, API gaps table, and a no-ship recommendation with a 3-week effort estimate if we proceeded anyway. Recommendation: no-ship for primary executor. Revisit as a batch/cron worker in Phase H once Molecule's MCP server is feature-complete. Closes #745. References #742. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
f3f5ce32fe
commit
4b7c49b91c
185
spike/issue-742-managed-agents-executor/README.md
Normal file
185
spike/issue-742-managed-agents-executor/README.md
Normal file
@ -0,0 +1,185 @@
|
||||
# Spike #745 — Anthropic Managed Agents as a Molecule Executor
|
||||
|
||||
**Parent issue:** #742 — "Third executor option: Anthropic Managed Agents"
|
||||
**Spike issue:** #745
|
||||
|
||||
## What We Evaluated
|
||||
|
||||
Anthropic's Managed Agents beta (`managed-agents-2026-04-01`) lets you create
|
||||
persistent agent objects, spin up per-task sessions, and stream execution events
|
||||
via SSE — all hosted on Anthropic's infrastructure. The key question for Molecule
|
||||
is: *can this replace (or complement) the self-hosted Docker workspace executor?*
|
||||
|
||||
---
|
||||
|
||||
## Demo
|
||||
|
||||
`demo.py` exercises the full lifecycle:
|
||||
|
||||
```
|
||||
ANTHROPIC_API_KEY=sk-ant-... python demo.py
|
||||
```
|
||||
|
||||
What it measures:
|
||||
|
||||
| Phase | What we time |
|
||||
|---|---|
|
||||
| `environment create` | Provisioning a cloud execution environment |
|
||||
| `agent create` | Storing the agent config (model, system prompt, tools) |
|
||||
| `cold start` | `sessions.create()` → session ready |
|
||||
| `turn 1 RTT` | User message → SSE drain → `session.status_idle` |
|
||||
| `turn 2 RTT` | Same, plus implicit state recall check |
|
||||
|
||||
State continuity is verified by injecting a unique token in turn 1 and
|
||||
asserting the agent quotes it back in turn 2. Exit code 0 = pass, 1 = fail.
|
||||
|
||||
---
|
||||
|
||||
## Integration Assessment
|
||||
|
||||
### 1. Provisioner changes
|
||||
|
||||
Molecule's provisioner today calls `docker.NewClient()`, pulls an image,
|
||||
creates a container with resource limits, and waits for `/registry/register`
|
||||
from inside the container. A Managed Agents executor would replace that
|
||||
entire path:
|
||||
|
||||
```
|
||||
current: docker pull → container run → heartbeat register
|
||||
proposed: agents.create() → sessions.create() → SSE stream
|
||||
```
|
||||
|
||||
A new `runtime: "managed-agent"` value in `workspaces.runtime` would branch
|
||||
the provisioner. The workspace row would store `agent_id` (persistent) and
|
||||
`session_id` (ephemeral per-run) instead of a Docker container ID.
|
||||
|
||||
**Migration effort:** medium.
|
||||
A new `ManagedAgentProvisioner` can be added alongside the existing Docker
|
||||
provisioner without touching the common path. The primary cost is the
|
||||
integration layer described below.
|
||||
|
||||
---
|
||||
|
||||
### 2. A2A routing — the blocking architectural conflict
|
||||
|
||||
This is the hard blocker. Molecule's A2A proxy (`POST /workspaces/:id/a2a`)
|
||||
resolves `ws.agent_url` and forwards an HTTP POST to the running container.
|
||||
Every workspace has a persistent, addressable HTTP endpoint.
|
||||
|
||||
Managed Agents sessions communicate exclusively through the Anthropic SSE API —
|
||||
there is no per-session URL that the platform can proxy to. The session is a
|
||||
streaming consumer, not a server.
|
||||
|
||||
Bridging the gap requires one of:
|
||||
|
||||
**Option A — Long-poll bridge (complex, fragile)**
|
||||
Keep a goroutine open per session holding the SSE stream. When an A2A message
|
||||
arrives, inject it via `sessions.events.send()` and wait for the next
|
||||
`agent.message` event. Map response back to A2A caller.
|
||||
Risk: the goroutine dies, the session becomes unreachable, and A2A callers time out
|
||||
with no clear error path.
|
||||
|
||||
**Option B — Managed Agents as leaf-only workers (scope reduction)**
|
||||
Only use Managed Agents for workspaces that *receive* tasks (no outbound A2A).
|
||||
The platform queues work, opens a session, streams the result, and closes the
|
||||
session. No live bridge needed.
|
||||
Risk: many real workspaces delegate to peers — leaf-only scope limits
|
||||
applicability to batch/one-shot agents.
|
||||
|
||||
**Option C — Hybrid: MCP bridge**
|
||||
Anthropic agents can call MCP servers. The platform exposes its A2A proxy as
|
||||
an MCP server; the agent's MCP tool calls translate back to A2A messages.
|
||||
Risk: this inverts the call direction (agent calls platform instead of
|
||||
platform-to-agent) and breaks the current workspace-to-workspace trust model.
|
||||
Security review required before shipping.
|
||||
|
||||
---
|
||||
|
||||
### 3. Cost model
|
||||
|
||||
Managed Agents sessions are charged on top of standard token pricing — the
|
||||
platform receives its own compute costs. For comparison, the Docker path uses
|
||||
a customer-supplied model key with zero platform markup.
|
||||
|
||||
The cold-start latency (environment + session creation) measured in the demo
|
||||
adds overhead before the first token. For interactive canvas workflows where
|
||||
workspaces are expected to be long-lived ("always on"), this model is a poor
|
||||
fit. For batch workspaces that run occasionally, it may save infrastructure
|
||||
cost.
|
||||
|
||||
---
|
||||
|
||||
### 4. API gaps (as of 2026-04-17)
|
||||
|
||||
| Molecule requirement | Managed Agents support |
|
||||
|---|---|
|
||||
| Persistent HTTP endpoint for A2A | **No** — SSE only |
|
||||
| Heartbeat / liveness signal | **Partial** — session status via poll or SSE, but no proactive push to the platform |
|
||||
| Resource limits (memory, CPU) | **No** — environment config offers only `networking` |
|
||||
| Custom Docker image | **No** — Anthropic-managed base image only |
|
||||
| `workspace_dir` bind-mount | **No** — files uploaded via `client.beta.files` API |
|
||||
| Bearer token auth per workspace | **No** — auth is Anthropic API key, not per-workspace token |
|
||||
| Plugin system (arbitrary pip installs) | **No** — built-in `agent_toolset_20260401` or custom tool callbacks |
|
||||
| Runtime detection (`config.yaml` introspection) | **Not applicable** — config lives in agent object |
|
||||
|
||||
---
|
||||
|
||||
## Ship/No-Ship Recommendation
|
||||
|
||||
### Decision: **No-ship for the primary executor. Spike further as a batch worker.**
|
||||
|
||||
**Rationale:**
|
||||
|
||||
1. **A2A proxy is the load-bearing constraint.** Molecule's value proposition
|
||||
is multi-workspace orchestration. A workspace executor that can't be reached
|
||||
by other workspaces over A2A is not a Molecule workspace — it's a standalone
|
||||
call to the Anthropic API with extra steps.
|
||||
|
||||
2. **No persistent endpoint = no topology.** The canvas shows workspaces as
|
||||
nodes that communicate. A Managed Agents session has no addressable URL; the
|
||||
canvas can't represent it as a live peer.
|
||||
|
||||
3. **Cold start is non-trivial.** Preliminary measurements from the demo show
|
||||
environment + session creation adding visible latency before the first token.
|
||||
For the "always-on" UX the canvas targets, this is noticeable.
|
||||
|
||||
4. **Scope would be a dead end.** Shipping Managed Agents as a leaf-only,
|
||||
no-A2A executor today means two provisioner paths diverge. The Managed Agents
|
||||
path can never grow to full parity without Anthropic exposing a persistent
|
||||
addressable URL. We'd be maintaining a permanently limited path.
|
||||
|
||||
### What to do instead
|
||||
|
||||
- **Phase H (planned):** Consider Managed Agents as the execution target for
|
||||
*scheduled* tasks only (`workspace_schedules` cron rows). A cron fire could
|
||||
spin up a session, run the prompt, stream the result, and self-report via
|
||||
`/activity`. No live A2A needed. Effort: ~2 weeks.
|
||||
|
||||
- **Watch the API.** If Anthropic ships a stable URL per session (like a
|
||||
webhook delivery endpoint), re-evaluate. The MCP bridge angle (Option C above)
|
||||
also becomes more viable once Molecule's MCP server is feature-complete.
|
||||
|
||||
---
|
||||
|
||||
## Rough Effort Estimate (if we did ship)
|
||||
|
||||
| Component | Effort |
|
||||
|---|---|
|
||||
| `ManagedAgentProvisioner` (create/start/stop session) | 3–5 days |
|
||||
| A2A bridge goroutine (Option A) | 5–8 days |
|
||||
| Heartbeat adapter (translate SSE status to `/registry/heartbeat`) | 2–3 days |
|
||||
| Canvas: hide A2A tab for managed-agent workspaces | 1 day |
|
||||
| Tests, migration, docs | 3–4 days |
|
||||
| **Total** | **~3 weeks** |
|
||||
|
||||
Even at 3 weeks, the result is a permanently limited path with no A2A and no
|
||||
resource controls. Not recommended.
|
||||
|
||||
---
|
||||
|
||||
## Files
|
||||
|
||||
| File | Purpose |
|
||||
|---|---|
|
||||
| `demo.py` | Runnable spike script — auth, provision, session, two turns, timing |
|
||||
| `README.md` | This assessment |
|
||||
211
spike/issue-742-managed-agents-executor/demo.py
Normal file
211
spike/issue-742-managed-agents-executor/demo.py
Normal file
@ -0,0 +1,211 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Spike #745 — Anthropic Managed Agents as a Molecule workspace executor.
|
||||
|
||||
This script validates the managed-agents-2026-04-01 beta API against the
|
||||
criteria in issue #742:
|
||||
- Authentication & agent provisioning
|
||||
- Session start (cold-start latency)
|
||||
- Round-trip prompt/response (per-turn latency)
|
||||
- State persistence across turns (session continuity)
|
||||
- Clean shutdown
|
||||
|
||||
Usage:
|
||||
ANTHROPIC_API_KEY=sk-ant-... python demo.py
|
||||
|
||||
Optional env vars:
|
||||
MA_SKIP_CLEANUP=1 keep the agent/session alive after the run
|
||||
MA_VERBOSE=1 print every SSE event type (not just agent messages)
|
||||
"""
|
||||
|
||||
import os
|
||||
import sys
|
||||
import time
|
||||
import json
|
||||
|
||||
try:
|
||||
import anthropic
|
||||
except ImportError:
|
||||
sys.exit("anthropic SDK not installed — run: pip install anthropic")
|
||||
|
||||
# ── helpers ──────────────────────────────────────────────────────────────────
|
||||
|
||||
VERBOSE = os.getenv("MA_VERBOSE") == "1"
|
||||
SKIP_CLEANUP = os.getenv("MA_SKIP_CLEANUP") == "1"
|
||||
|
||||
|
||||
def ts() -> float:
|
||||
return time.monotonic()
|
||||
|
||||
|
||||
def elapsed(start: float) -> float:
|
||||
return round(time.monotonic() - start, 3)
|
||||
|
||||
|
||||
def collect_turn(client: anthropic.Anthropic, session_id: str, message: str) -> tuple[str, float]:
|
||||
"""
|
||||
Stream-first turn: open the SSE stream, send the user message inside the
|
||||
context manager, then drain events until session.status_idle or
|
||||
session.status_terminated.
|
||||
|
||||
Returns (agent_reply_text, round_trip_seconds).
|
||||
Raises RuntimeError if the session terminates unexpectedly mid-turn.
|
||||
"""
|
||||
reply_parts: list[str] = []
|
||||
turn_start = ts()
|
||||
|
||||
with client.beta.sessions.stream(session_id=session_id) as stream:
|
||||
# Send inside the stream so we never miss early events
|
||||
client.beta.sessions.events.send(
|
||||
session_id=session_id,
|
||||
events=[
|
||||
{
|
||||
"type": "user.message",
|
||||
"content": [{"type": "text", "text": message}],
|
||||
}
|
||||
],
|
||||
)
|
||||
|
||||
for event in stream:
|
||||
if VERBOSE:
|
||||
print(f" [evt] {event.type}", flush=True)
|
||||
|
||||
if event.type == "agent.message":
|
||||
for block in event.content:
|
||||
if block.type == "text":
|
||||
reply_parts.append(block.text)
|
||||
|
||||
elif event.type == "session.status_idle":
|
||||
break # normal turn completion
|
||||
|
||||
elif event.type == "session.status_terminated":
|
||||
# session ended — surface whatever text arrived
|
||||
if reply_parts:
|
||||
break
|
||||
raise RuntimeError("Session terminated unexpectedly during turn")
|
||||
|
||||
return "".join(reply_parts), elapsed(turn_start)
|
||||
|
||||
|
||||
# ── main ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
def main() -> None:
|
||||
api_key = os.environ.get("ANTHROPIC_API_KEY")
|
||||
if not api_key:
|
||||
sys.exit("ANTHROPIC_API_KEY not set")
|
||||
|
||||
client = anthropic.Anthropic(api_key=api_key)
|
||||
|
||||
# ── 1. Create environment ─────────────────────────────────────────────────
|
||||
print("=== Managed Agents Spike #745 ===\n")
|
||||
print("Step 1: Creating cloud environment…")
|
||||
t0 = ts()
|
||||
environment = client.beta.environments.create(
|
||||
name="molecule-spike-742",
|
||||
config={
|
||||
"type": "cloud",
|
||||
"networking": {"type": "unrestricted"},
|
||||
},
|
||||
)
|
||||
env_time = elapsed(t0)
|
||||
print(f" environment_id : {environment.id}")
|
||||
print(f" env create time: {env_time}s\n")
|
||||
|
||||
# ── 2. Create agent ───────────────────────────────────────────────────────
|
||||
print("Step 2: Creating agent…")
|
||||
t0 = ts()
|
||||
agent = client.beta.agents.create(
|
||||
name="molecule-spike-agent",
|
||||
model="claude-opus-4-7",
|
||||
system=(
|
||||
"You are a stateful test agent for the Molecule AI spike. "
|
||||
"When asked to remember something, confirm you will. "
|
||||
"On subsequent turns, recall it accurately."
|
||||
),
|
||||
tools=[
|
||||
{"type": "agent_toolset_20260401", "default_config": {"enabled": True}}
|
||||
],
|
||||
)
|
||||
agent_time = elapsed(t0)
|
||||
print(f" agent_id : {agent.id}")
|
||||
print(f" version : {agent.version}")
|
||||
print(f" agent create time: {agent_time}s\n")
|
||||
|
||||
# ── 3. Create session (cold start) ────────────────────────────────────────
|
||||
print("Step 3: Creating session (cold start)…")
|
||||
cold_start = ts()
|
||||
session = client.beta.sessions.create(
|
||||
agent={"type": "agent", "id": agent.id, "version": agent.version},
|
||||
environment_id=environment.id,
|
||||
title="molecule-spike-742-session",
|
||||
)
|
||||
cold_time = elapsed(cold_start)
|
||||
print(f" session_id : {session.id}")
|
||||
print(f" status : {session.status}")
|
||||
print(f" cold-start : {cold_time}s\n")
|
||||
|
||||
# ── 4. Turn 1 — establish a fact the agent should remember ────────────────
|
||||
turn1_prompt = (
|
||||
"Please remember this token for the rest of our conversation: "
|
||||
"MOLECULE_SPIKE_7a3f. "
|
||||
"What is today's task? Reply in one sentence."
|
||||
)
|
||||
print(f"Turn 1 prompt:\n {turn1_prompt!r}\n")
|
||||
turn1_reply, turn1_time = collect_turn(client, session.id, turn1_prompt)
|
||||
print(f"Turn 1 reply ({turn1_time}s):\n {turn1_reply!r}\n")
|
||||
|
||||
# ── 5. Turn 2 — verify state persistence ─────────────────────────────────
|
||||
turn2_prompt = "What was the token I asked you to remember?"
|
||||
print(f"Turn 2 prompt:\n {turn2_prompt!r}\n")
|
||||
turn2_reply, turn2_time = collect_turn(client, session.id, turn2_prompt)
|
||||
print(f"Turn 2 reply ({turn2_time}s):\n {turn2_reply!r}\n")
|
||||
|
||||
# ── 6. State continuity check ─────────────────────────────────────────────
|
||||
token_recalled = "MOLECULE_SPIKE_7a3f" in turn2_reply
|
||||
print("=== Results ===")
|
||||
print(f" environment create : {env_time}s")
|
||||
print(f" agent create : {agent_time}s")
|
||||
print(f" cold-start (session create → ready) : {cold_time}s")
|
||||
print(f" turn 1 round-trip : {turn1_time}s")
|
||||
print(f" turn 2 round-trip : {turn2_time}s")
|
||||
print(f" state continuity : {'PASS — token recalled' if token_recalled else 'FAIL — token not found in turn 2'}")
|
||||
|
||||
# Emit JSON summary for easy parsing in CI / PR bots
|
||||
summary = {
|
||||
"environment_id": environment.id,
|
||||
"agent_id": agent.id,
|
||||
"session_id": session.id,
|
||||
"timings": {
|
||||
"environment_create_s": env_time,
|
||||
"agent_create_s": agent_time,
|
||||
"cold_start_s": cold_time,
|
||||
"turn1_rtt_s": turn1_time,
|
||||
"turn2_rtt_s": turn2_time,
|
||||
},
|
||||
"state_continuity_pass": token_recalled,
|
||||
}
|
||||
print("\nJSON summary:")
|
||||
print(json.dumps(summary, indent=2))
|
||||
|
||||
# ── 7. Cleanup ────────────────────────────────────────────────────────────
|
||||
if not SKIP_CLEANUP:
|
||||
print("\nCleaning up…")
|
||||
try:
|
||||
client.beta.sessions.delete(session_id=session.id)
|
||||
print(f" session {session.id} deleted")
|
||||
except Exception as exc:
|
||||
print(f" session delete warning: {exc}")
|
||||
# Agents are persistent/shared — don't delete unless explicitly asked.
|
||||
# Set MA_SKIP_CLEANUP=1 and clean up manually with:
|
||||
# client.beta.agents.delete(agent.id)
|
||||
print(f" agent {agent.id} kept (persistent object; delete manually if needed)")
|
||||
else:
|
||||
print(f"\nSKIP_CLEANUP=1 — session and agent left alive.")
|
||||
print(f" Session: {session.id}")
|
||||
print(f" Agent: {agent.id}")
|
||||
|
||||
sys.exit(0 if token_recalled else 1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
Loading…
Reference in New Issue
Block a user