spike(#745): evaluate Anthropic Managed Agents as third executor option

Adds `spike/issue-742-managed-agents-executor/` with: - `demo.py`: standalone Python script that authenticates to the Managed Agents beta API, provisions an environment + agent, starts a session, runs two conversational turns (with cross-turn state recall verification), and prints cold-start and per-turn latency measurements. - `README.md`: full integration assessment covering provisioner changes needed, A2A routing conflict (primary blocker — sessions have no addressable URL), cost model, API gaps table, and a no-ship recommendation with a 3-week effort estimate if we proceeded anyway. Recommendation: no-ship for primary executor. Revisit as a batch/cron worker in Phase H once Molecule's MCP server is feature-complete. Closes #745. References #742. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 15:43:21 +00:00 · 2026-04-17 15:43:21 +00:00 · 4b7c49b91c
commit 4b7c49b91c
parent f3f5ce32fe
2 changed files with 396 additions and 0 deletions
--- a/spike/issue-742-managed-agents-executor/README.md
+++ b/spike/issue-742-managed-agents-executor/README.md
@ -0,0 +1,185 @@
+# Spike #745 — Anthropic Managed Agents as a Molecule Executor
+
+**Parent issue:** #742 — "Third executor option: Anthropic Managed Agents"  
+**Spike issue:** #745
+
+## What We Evaluated
+
+Anthropic's Managed Agents beta (`managed-agents-2026-04-01`) lets you create
+persistent agent objects, spin up per-task sessions, and stream execution events
+via SSE — all hosted on Anthropic's infrastructure. The key question for Molecule
+is: *can this replace (or complement) the self-hosted Docker workspace executor?*
+
+---
+
+## Demo
+
+`demo.py` exercises the full lifecycle:
+
+```
+ANTHROPIC_API_KEY=sk-ant-... python demo.py
+```
+
+What it measures:
+
+| Phase | What we time |
+|---|---|
+| `environment create` | Provisioning a cloud execution environment |
+| `agent create` | Storing the agent config (model, system prompt, tools) |
+| `cold start` | `sessions.create()` → session ready |
+| `turn 1 RTT` | User message → SSE drain → `session.status_idle` |
+| `turn 2 RTT` | Same, plus implicit state recall check |
+
+State continuity is verified by injecting a unique token in turn 1 and
+asserting the agent quotes it back in turn 2. Exit code 0 = pass, 1 = fail.
+
+---
+
+## Integration Assessment
+
+### 1. Provisioner changes
+
+Molecule's provisioner today calls `docker.NewClient()`, pulls an image,
+creates a container with resource limits, and waits for `/registry/register`
+from inside the container. A Managed Agents executor would replace that
+entire path:
+
+```
+current:  docker pull → container run → heartbeat register
+proposed: agents.create() → sessions.create() → SSE stream
+```
+
+A new `runtime: "managed-agent"` value in `workspaces.runtime` would branch
+the provisioner. The workspace row would store `agent_id` (persistent) and
+`session_id` (ephemeral per-run) instead of a Docker container ID.
+
+**Migration effort:** medium.  
+A new `ManagedAgentProvisioner` can be added alongside the existing Docker
+provisioner without touching the common path. The primary cost is the
+integration layer described below.
+
+---
+
+### 2. A2A routing — the blocking architectural conflict
+
+This is the hard blocker. Molecule's A2A proxy (`POST /workspaces/:id/a2a`)
+resolves `ws.agent_url` and forwards an HTTP POST to the running container.
+Every workspace has a persistent, addressable HTTP endpoint.
+
+Managed Agents sessions communicate exclusively through the Anthropic SSE API —
+there is no per-session URL that the platform can proxy to. The session is a
+streaming consumer, not a server.
+
+Bridging the gap requires one of:
+
+**Option A — Long-poll bridge (complex, fragile)**  
+Keep a goroutine open per session holding the SSE stream. When an A2A message
+arrives, inject it via `sessions.events.send()` and wait for the next
+`agent.message` event. Map response back to A2A caller.  
+Risk: the goroutine dies, the session becomes unreachable, and A2A callers time out
+with no clear error path.
+
+**Option B — Managed Agents as leaf-only workers (scope reduction)**  
+Only use Managed Agents for workspaces that *receive* tasks (no outbound A2A).
+The platform queues work, opens a session, streams the result, and closes the
+session. No live bridge needed.  
+Risk: many real workspaces delegate to peers — leaf-only scope limits
+applicability to batch/one-shot agents.
+
+**Option C — Hybrid: MCP bridge**  
+Anthropic agents can call MCP servers. The platform exposes its A2A proxy as
+an MCP server; the agent's MCP tool calls translate back to A2A messages.  
+Risk: this inverts the call direction (agent calls platform instead of
+platform-to-agent) and breaks the current workspace-to-workspace trust model.
+Security review required before shipping.
+
+---
+
+### 3. Cost model
+
+Managed Agents sessions are charged on top of standard token pricing — the
+platform receives its own compute costs. For comparison, the Docker path uses
+a customer-supplied model key with zero platform markup.
+
+The cold-start latency (environment + session creation) measured in the demo
+adds overhead before the first token. For interactive canvas workflows where
+workspaces are expected to be long-lived ("always on"), this model is a poor
+fit. For batch workspaces that run occasionally, it may save infrastructure
+cost.
+
+---
+
+### 4. API gaps (as of 2026-04-17)
+
+| Molecule requirement | Managed Agents support |
+|---|---|
+| Persistent HTTP endpoint for A2A | **No** — SSE only |
+| Heartbeat / liveness signal | **Partial** — session status via poll or SSE, but no proactive push to the platform |
+| Resource limits (memory, CPU) | **No** — environment config offers only `networking` |
+| Custom Docker image | **No** — Anthropic-managed base image only |
+| `workspace_dir` bind-mount | **No** — files uploaded via `client.beta.files` API |
+| Bearer token auth per workspace | **No** — auth is Anthropic API key, not per-workspace token |
+| Plugin system (arbitrary pip installs) | **No** — built-in `agent_toolset_20260401` or custom tool callbacks |
+| Runtime detection (`config.yaml` introspection) | **Not applicable** — config lives in agent object |
+
+---
+
+## Ship/No-Ship Recommendation
+
+### Decision: **No-ship for the primary executor. Spike further as a batch worker.**
+
+**Rationale:**
+
+1. **A2A proxy is the load-bearing constraint.** Molecule's value proposition
+   is multi-workspace orchestration. A workspace executor that can't be reached
+   by other workspaces over A2A is not a Molecule workspace — it's a standalone
+   call to the Anthropic API with extra steps.
+
+2. **No persistent endpoint = no topology.** The canvas shows workspaces as
+   nodes that communicate. A Managed Agents session has no addressable URL; the
+   canvas can't represent it as a live peer.
+
+3. **Cold start is non-trivial.** Preliminary measurements from the demo show
+   environment + session creation adding visible latency before the first token.
+   For the "always-on" UX the canvas targets, this is noticeable.
+
+4. **Scope would be a dead end.** Shipping Managed Agents as a leaf-only,
+   no-A2A executor today means two provisioner paths diverge. The Managed Agents
+   path can never grow to full parity without Anthropic exposing a persistent
+   addressable URL. We'd be maintaining a permanently limited path.
+
+### What to do instead
+
+- **Phase H (planned):** Consider Managed Agents as the execution target for
+  *scheduled* tasks only (`workspace_schedules` cron rows). A cron fire could
+  spin up a session, run the prompt, stream the result, and self-report via
+  `/activity`. No live A2A needed. Effort: ~2 weeks.
+
+- **Watch the API.** If Anthropic ships a stable URL per session (like a
+  webhook delivery endpoint), re-evaluate. The MCP bridge angle (Option C above)
+  also becomes more viable once Molecule's MCP server is feature-complete.
+
+---
+
+## Rough Effort Estimate (if we did ship)
+
+| Component | Effort |
+|---|---|
+| `ManagedAgentProvisioner` (create/start/stop session) | 3–5 days |
+| A2A bridge goroutine (Option A) | 5–8 days |
+| Heartbeat adapter (translate SSE status to `/registry/heartbeat`) | 2–3 days |
+| Canvas: hide A2A tab for managed-agent workspaces | 1 day |
+| Tests, migration, docs | 3–4 days |
+| **Total** | **~3 weeks** |
+
+Even at 3 weeks, the result is a permanently limited path with no A2A and no
+resource controls. Not recommended.
+
+---
+
+## Files
+
+| File | Purpose |
+|---|---|
+| `demo.py` | Runnable spike script — auth, provision, session, two turns, timing |
+| `README.md` | This assessment |
--- a/spike/issue-742-managed-agents-executor/demo.py
+++ b/spike/issue-742-managed-agents-executor/demo.py
@ -0,0 +1,211 @@
+#!/usr/bin/env python3
+"""
+Spike #745 — Anthropic Managed Agents as a Molecule workspace executor.
+
+This script validates the managed-agents-2026-04-01 beta API against the
+criteria in issue #742:
+  - Authentication & agent provisioning
+  - Session start (cold-start latency)
+  - Round-trip prompt/response (per-turn latency)
+  - State persistence across turns (session continuity)
+  - Clean shutdown
+
+Usage:
+    ANTHROPIC_API_KEY=sk-ant-... python demo.py
+
+Optional env vars:
+    MA_SKIP_CLEANUP=1   keep the agent/session alive after the run
+    MA_VERBOSE=1        print every SSE event type (not just agent messages)
+"""
+
+import os
+import sys
+import time
+import json
+
+try:
+    import anthropic
+except ImportError:
+    sys.exit("anthropic SDK not installed — run: pip install anthropic")
+
+# ── helpers ──────────────────────────────────────────────────────────────────
+
+VERBOSE = os.getenv("MA_VERBOSE") == "1"
+SKIP_CLEANUP = os.getenv("MA_SKIP_CLEANUP") == "1"
+
+
+def ts() -> float:
+    return time.monotonic()
+
+
+def elapsed(start: float) -> float:
+    return round(time.monotonic() - start, 3)
+
+
+def collect_turn(client: anthropic.Anthropic, session_id: str, message: str) -> tuple[str, float]:
+    """
+    Stream-first turn: open the SSE stream, send the user message inside the
+    context manager, then drain events until session.status_idle or
+    session.status_terminated.
+
+    Returns (agent_reply_text, round_trip_seconds).
+    Raises RuntimeError if the session terminates unexpectedly mid-turn.
+    """
+    reply_parts: list[str] = []
+    turn_start = ts()
+
+    with client.beta.sessions.stream(session_id=session_id) as stream:
+        # Send inside the stream so we never miss early events
+        client.beta.sessions.events.send(
+            session_id=session_id,
+            events=[
+                {
+                    "type": "user.message",
+                    "content": [{"type": "text", "text": message}],
+                }
+            ],
+        )
+
+        for event in stream:
+            if VERBOSE:
+                print(f"  [evt] {event.type}", flush=True)
+
+            if event.type == "agent.message":
+                for block in event.content:
+                    if block.type == "text":
+                        reply_parts.append(block.text)
+
+            elif event.type == "session.status_idle":
+                break  # normal turn completion
+
+            elif event.type == "session.status_terminated":
+                # session ended — surface whatever text arrived
+                if reply_parts:
+                    break
+                raise RuntimeError("Session terminated unexpectedly during turn")
+
+    return "".join(reply_parts), elapsed(turn_start)
+
+
+# ── main ─────────────────────────────────────────────────────────────────────
+
+def main() -> None:
+    api_key = os.environ.get("ANTHROPIC_API_KEY")
+    if not api_key:
+        sys.exit("ANTHROPIC_API_KEY not set")
+
+    client = anthropic.Anthropic(api_key=api_key)
+
+    # ── 1. Create environment ─────────────────────────────────────────────────
+    print("=== Managed Agents Spike #745 ===\n")
+    print("Step 1: Creating cloud environment…")
+    t0 = ts()
+    environment = client.beta.environments.create(
+        name="molecule-spike-742",
+        config={
+            "type": "cloud",
+            "networking": {"type": "unrestricted"},
+        },
+    )
+    env_time = elapsed(t0)
+    print(f"  environment_id : {environment.id}")
+    print(f"  env create time: {env_time}s\n")
+
+    # ── 2. Create agent ───────────────────────────────────────────────────────
+    print("Step 2: Creating agent…")
+    t0 = ts()
+    agent = client.beta.agents.create(
+        name="molecule-spike-agent",
+        model="claude-opus-4-7",
+        system=(
+            "You are a stateful test agent for the Molecule AI spike. "
+            "When asked to remember something, confirm you will. "
+            "On subsequent turns, recall it accurately."
+        ),
+        tools=[
+            {"type": "agent_toolset_20260401", "default_config": {"enabled": True}}
+        ],
+    )
+    agent_time = elapsed(t0)
+    print(f"  agent_id  : {agent.id}")
+    print(f"  version   : {agent.version}")
+    print(f"  agent create time: {agent_time}s\n")
+
+    # ── 3. Create session (cold start) ────────────────────────────────────────
+    print("Step 3: Creating session (cold start)…")
+    cold_start = ts()
+    session = client.beta.sessions.create(
+        agent={"type": "agent", "id": agent.id, "version": agent.version},
+        environment_id=environment.id,
+        title="molecule-spike-742-session",
+    )
+    cold_time = elapsed(cold_start)
+    print(f"  session_id : {session.id}")
+    print(f"  status     : {session.status}")
+    print(f"  cold-start : {cold_time}s\n")
+
+    # ── 4. Turn 1 — establish a fact the agent should remember ────────────────
+    turn1_prompt = (
+        "Please remember this token for the rest of our conversation: "
+        "MOLECULE_SPIKE_7a3f. "
+        "What is today's task? Reply in one sentence."
+    )
+    print(f"Turn 1 prompt:\n  {turn1_prompt!r}\n")
+    turn1_reply, turn1_time = collect_turn(client, session.id, turn1_prompt)
+    print(f"Turn 1 reply ({turn1_time}s):\n  {turn1_reply!r}\n")
+
+    # ── 5. Turn 2 — verify state persistence ─────────────────────────────────
+    turn2_prompt = "What was the token I asked you to remember?"
+    print(f"Turn 2 prompt:\n  {turn2_prompt!r}\n")
+    turn2_reply, turn2_time = collect_turn(client, session.id, turn2_prompt)
+    print(f"Turn 2 reply ({turn2_time}s):\n  {turn2_reply!r}\n")
+
+    # ── 6. State continuity check ─────────────────────────────────────────────
+    token_recalled = "MOLECULE_SPIKE_7a3f" in turn2_reply
+    print("=== Results ===")
+    print(f"  environment create : {env_time}s")
+    print(f"  agent create       : {agent_time}s")
+    print(f"  cold-start (session create → ready) : {cold_time}s")
+    print(f"  turn 1 round-trip  : {turn1_time}s")
+    print(f"  turn 2 round-trip  : {turn2_time}s")
+    print(f"  state continuity   : {'PASS — token recalled' if token_recalled else 'FAIL — token not found in turn 2'}")
+
+    # Emit JSON summary for easy parsing in CI / PR bots
+    summary = {
+        "environment_id": environment.id,
+        "agent_id": agent.id,
+        "session_id": session.id,
+        "timings": {
+            "environment_create_s": env_time,
+            "agent_create_s": agent_time,
+            "cold_start_s": cold_time,
+            "turn1_rtt_s": turn1_time,
+            "turn2_rtt_s": turn2_time,
+        },
+        "state_continuity_pass": token_recalled,
+    }
+    print("\nJSON summary:")
+    print(json.dumps(summary, indent=2))
+
+    # ── 7. Cleanup ────────────────────────────────────────────────────────────
+    if not SKIP_CLEANUP:
+        print("\nCleaning up…")
+        try:
+            client.beta.sessions.delete(session_id=session.id)
+            print(f"  session {session.id} deleted")
+        except Exception as exc:
+            print(f"  session delete warning: {exc}")
+        # Agents are persistent/shared — don't delete unless explicitly asked.
+        # Set MA_SKIP_CLEANUP=1 and clean up manually with:
+        #   client.beta.agents.delete(agent.id)
+        print(f"  agent {agent.id} kept (persistent object; delete manually if needed)")
+    else:
+        print(f"\nSKIP_CLEANUP=1 — session and agent left alive.")
+        print(f"  Session: {session.id}")
+        print(f"  Agent:   {agent.id}")
+
+    sys.exit(0 if token_recalled else 1)
+
+
+if __name__ == "__main__":
+    main()