From 4b7c49b91c88f9f8dfabd40b79b313dec92adba6 Mon Sep 17 00:00:00 2001 From: "molecule-ai[bot]" <276602405+molecule-ai[bot]@users.noreply.github.com> Date: Fri, 17 Apr 2026 15:43:21 +0000 Subject: [PATCH] spike(#745): evaluate Anthropic Managed Agents as third executor option MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds `spike/issue-742-managed-agents-executor/` with: - `demo.py`: standalone Python script that authenticates to the Managed Agents beta API, provisions an environment + agent, starts a session, runs two conversational turns (with cross-turn state recall verification), and prints cold-start and per-turn latency measurements. - `README.md`: full integration assessment covering provisioner changes needed, A2A routing conflict (primary blocker — sessions have no addressable URL), cost model, API gaps table, and a no-ship recommendation with a 3-week effort estimate if we proceeded anyway. Recommendation: no-ship for primary executor. Revisit as a batch/cron worker in Phase H once Molecule's MCP server is feature-complete. Closes #745. References #742. Co-Authored-By: Claude Sonnet 4.6 --- .../README.md | 185 +++++++++++++++ .../issue-742-managed-agents-executor/demo.py | 211 ++++++++++++++++++ 2 files changed, 396 insertions(+) create mode 100644 spike/issue-742-managed-agents-executor/README.md create mode 100644 spike/issue-742-managed-agents-executor/demo.py diff --git a/spike/issue-742-managed-agents-executor/README.md b/spike/issue-742-managed-agents-executor/README.md new file mode 100644 index 00000000..2168b93c --- /dev/null +++ b/spike/issue-742-managed-agents-executor/README.md @@ -0,0 +1,185 @@ +# Spike #745 — Anthropic Managed Agents as a Molecule Executor + +**Parent issue:** #742 — "Third executor option: Anthropic Managed Agents" +**Spike issue:** #745 + +## What We Evaluated + +Anthropic's Managed Agents beta (`managed-agents-2026-04-01`) lets you create +persistent agent objects, spin up per-task sessions, and stream execution events +via SSE — all hosted on Anthropic's infrastructure. The key question for Molecule +is: *can this replace (or complement) the self-hosted Docker workspace executor?* + +--- + +## Demo + +`demo.py` exercises the full lifecycle: + +``` +ANTHROPIC_API_KEY=sk-ant-... python demo.py +``` + +What it measures: + +| Phase | What we time | +|---|---| +| `environment create` | Provisioning a cloud execution environment | +| `agent create` | Storing the agent config (model, system prompt, tools) | +| `cold start` | `sessions.create()` → session ready | +| `turn 1 RTT` | User message → SSE drain → `session.status_idle` | +| `turn 2 RTT` | Same, plus implicit state recall check | + +State continuity is verified by injecting a unique token in turn 1 and +asserting the agent quotes it back in turn 2. Exit code 0 = pass, 1 = fail. + +--- + +## Integration Assessment + +### 1. Provisioner changes + +Molecule's provisioner today calls `docker.NewClient()`, pulls an image, +creates a container with resource limits, and waits for `/registry/register` +from inside the container. A Managed Agents executor would replace that +entire path: + +``` +current: docker pull → container run → heartbeat register +proposed: agents.create() → sessions.create() → SSE stream +``` + +A new `runtime: "managed-agent"` value in `workspaces.runtime` would branch +the provisioner. The workspace row would store `agent_id` (persistent) and +`session_id` (ephemeral per-run) instead of a Docker container ID. + +**Migration effort:** medium. +A new `ManagedAgentProvisioner` can be added alongside the existing Docker +provisioner without touching the common path. The primary cost is the +integration layer described below. + +--- + +### 2. A2A routing — the blocking architectural conflict + +This is the hard blocker. Molecule's A2A proxy (`POST /workspaces/:id/a2a`) +resolves `ws.agent_url` and forwards an HTTP POST to the running container. +Every workspace has a persistent, addressable HTTP endpoint. + +Managed Agents sessions communicate exclusively through the Anthropic SSE API — +there is no per-session URL that the platform can proxy to. The session is a +streaming consumer, not a server. + +Bridging the gap requires one of: + +**Option A — Long-poll bridge (complex, fragile)** +Keep a goroutine open per session holding the SSE stream. When an A2A message +arrives, inject it via `sessions.events.send()` and wait for the next +`agent.message` event. Map response back to A2A caller. +Risk: the goroutine dies, the session becomes unreachable, and A2A callers time out +with no clear error path. + +**Option B — Managed Agents as leaf-only workers (scope reduction)** +Only use Managed Agents for workspaces that *receive* tasks (no outbound A2A). +The platform queues work, opens a session, streams the result, and closes the +session. No live bridge needed. +Risk: many real workspaces delegate to peers — leaf-only scope limits +applicability to batch/one-shot agents. + +**Option C — Hybrid: MCP bridge** +Anthropic agents can call MCP servers. The platform exposes its A2A proxy as +an MCP server; the agent's MCP tool calls translate back to A2A messages. +Risk: this inverts the call direction (agent calls platform instead of +platform-to-agent) and breaks the current workspace-to-workspace trust model. +Security review required before shipping. + +--- + +### 3. Cost model + +Managed Agents sessions are charged on top of standard token pricing — the +platform receives its own compute costs. For comparison, the Docker path uses +a customer-supplied model key with zero platform markup. + +The cold-start latency (environment + session creation) measured in the demo +adds overhead before the first token. For interactive canvas workflows where +workspaces are expected to be long-lived ("always on"), this model is a poor +fit. For batch workspaces that run occasionally, it may save infrastructure +cost. + +--- + +### 4. API gaps (as of 2026-04-17) + +| Molecule requirement | Managed Agents support | +|---|---| +| Persistent HTTP endpoint for A2A | **No** — SSE only | +| Heartbeat / liveness signal | **Partial** — session status via poll or SSE, but no proactive push to the platform | +| Resource limits (memory, CPU) | **No** — environment config offers only `networking` | +| Custom Docker image | **No** — Anthropic-managed base image only | +| `workspace_dir` bind-mount | **No** — files uploaded via `client.beta.files` API | +| Bearer token auth per workspace | **No** — auth is Anthropic API key, not per-workspace token | +| Plugin system (arbitrary pip installs) | **No** — built-in `agent_toolset_20260401` or custom tool callbacks | +| Runtime detection (`config.yaml` introspection) | **Not applicable** — config lives in agent object | + +--- + +## Ship/No-Ship Recommendation + +### Decision: **No-ship for the primary executor. Spike further as a batch worker.** + +**Rationale:** + +1. **A2A proxy is the load-bearing constraint.** Molecule's value proposition + is multi-workspace orchestration. A workspace executor that can't be reached + by other workspaces over A2A is not a Molecule workspace — it's a standalone + call to the Anthropic API with extra steps. + +2. **No persistent endpoint = no topology.** The canvas shows workspaces as + nodes that communicate. A Managed Agents session has no addressable URL; the + canvas can't represent it as a live peer. + +3. **Cold start is non-trivial.** Preliminary measurements from the demo show + environment + session creation adding visible latency before the first token. + For the "always-on" UX the canvas targets, this is noticeable. + +4. **Scope would be a dead end.** Shipping Managed Agents as a leaf-only, + no-A2A executor today means two provisioner paths diverge. The Managed Agents + path can never grow to full parity without Anthropic exposing a persistent + addressable URL. We'd be maintaining a permanently limited path. + +### What to do instead + +- **Phase H (planned):** Consider Managed Agents as the execution target for + *scheduled* tasks only (`workspace_schedules` cron rows). A cron fire could + spin up a session, run the prompt, stream the result, and self-report via + `/activity`. No live A2A needed. Effort: ~2 weeks. + +- **Watch the API.** If Anthropic ships a stable URL per session (like a + webhook delivery endpoint), re-evaluate. The MCP bridge angle (Option C above) + also becomes more viable once Molecule's MCP server is feature-complete. + +--- + +## Rough Effort Estimate (if we did ship) + +| Component | Effort | +|---|---| +| `ManagedAgentProvisioner` (create/start/stop session) | 3–5 days | +| A2A bridge goroutine (Option A) | 5–8 days | +| Heartbeat adapter (translate SSE status to `/registry/heartbeat`) | 2–3 days | +| Canvas: hide A2A tab for managed-agent workspaces | 1 day | +| Tests, migration, docs | 3–4 days | +| **Total** | **~3 weeks** | + +Even at 3 weeks, the result is a permanently limited path with no A2A and no +resource controls. Not recommended. + +--- + +## Files + +| File | Purpose | +|---|---| +| `demo.py` | Runnable spike script — auth, provision, session, two turns, timing | +| `README.md` | This assessment | diff --git a/spike/issue-742-managed-agents-executor/demo.py b/spike/issue-742-managed-agents-executor/demo.py new file mode 100644 index 00000000..0399cf6c --- /dev/null +++ b/spike/issue-742-managed-agents-executor/demo.py @@ -0,0 +1,211 @@ +#!/usr/bin/env python3 +""" +Spike #745 — Anthropic Managed Agents as a Molecule workspace executor. + +This script validates the managed-agents-2026-04-01 beta API against the +criteria in issue #742: + - Authentication & agent provisioning + - Session start (cold-start latency) + - Round-trip prompt/response (per-turn latency) + - State persistence across turns (session continuity) + - Clean shutdown + +Usage: + ANTHROPIC_API_KEY=sk-ant-... python demo.py + +Optional env vars: + MA_SKIP_CLEANUP=1 keep the agent/session alive after the run + MA_VERBOSE=1 print every SSE event type (not just agent messages) +""" + +import os +import sys +import time +import json + +try: + import anthropic +except ImportError: + sys.exit("anthropic SDK not installed — run: pip install anthropic") + +# ── helpers ────────────────────────────────────────────────────────────────── + +VERBOSE = os.getenv("MA_VERBOSE") == "1" +SKIP_CLEANUP = os.getenv("MA_SKIP_CLEANUP") == "1" + + +def ts() -> float: + return time.monotonic() + + +def elapsed(start: float) -> float: + return round(time.monotonic() - start, 3) + + +def collect_turn(client: anthropic.Anthropic, session_id: str, message: str) -> tuple[str, float]: + """ + Stream-first turn: open the SSE stream, send the user message inside the + context manager, then drain events until session.status_idle or + session.status_terminated. + + Returns (agent_reply_text, round_trip_seconds). + Raises RuntimeError if the session terminates unexpectedly mid-turn. + """ + reply_parts: list[str] = [] + turn_start = ts() + + with client.beta.sessions.stream(session_id=session_id) as stream: + # Send inside the stream so we never miss early events + client.beta.sessions.events.send( + session_id=session_id, + events=[ + { + "type": "user.message", + "content": [{"type": "text", "text": message}], + } + ], + ) + + for event in stream: + if VERBOSE: + print(f" [evt] {event.type}", flush=True) + + if event.type == "agent.message": + for block in event.content: + if block.type == "text": + reply_parts.append(block.text) + + elif event.type == "session.status_idle": + break # normal turn completion + + elif event.type == "session.status_terminated": + # session ended — surface whatever text arrived + if reply_parts: + break + raise RuntimeError("Session terminated unexpectedly during turn") + + return "".join(reply_parts), elapsed(turn_start) + + +# ── main ───────────────────────────────────────────────────────────────────── + +def main() -> None: + api_key = os.environ.get("ANTHROPIC_API_KEY") + if not api_key: + sys.exit("ANTHROPIC_API_KEY not set") + + client = anthropic.Anthropic(api_key=api_key) + + # ── 1. Create environment ───────────────────────────────────────────────── + print("=== Managed Agents Spike #745 ===\n") + print("Step 1: Creating cloud environment…") + t0 = ts() + environment = client.beta.environments.create( + name="molecule-spike-742", + config={ + "type": "cloud", + "networking": {"type": "unrestricted"}, + }, + ) + env_time = elapsed(t0) + print(f" environment_id : {environment.id}") + print(f" env create time: {env_time}s\n") + + # ── 2. Create agent ─────────────────────────────────────────────────────── + print("Step 2: Creating agent…") + t0 = ts() + agent = client.beta.agents.create( + name="molecule-spike-agent", + model="claude-opus-4-7", + system=( + "You are a stateful test agent for the Molecule AI spike. " + "When asked to remember something, confirm you will. " + "On subsequent turns, recall it accurately." + ), + tools=[ + {"type": "agent_toolset_20260401", "default_config": {"enabled": True}} + ], + ) + agent_time = elapsed(t0) + print(f" agent_id : {agent.id}") + print(f" version : {agent.version}") + print(f" agent create time: {agent_time}s\n") + + # ── 3. Create session (cold start) ──────────────────────────────────────── + print("Step 3: Creating session (cold start)…") + cold_start = ts() + session = client.beta.sessions.create( + agent={"type": "agent", "id": agent.id, "version": agent.version}, + environment_id=environment.id, + title="molecule-spike-742-session", + ) + cold_time = elapsed(cold_start) + print(f" session_id : {session.id}") + print(f" status : {session.status}") + print(f" cold-start : {cold_time}s\n") + + # ── 4. Turn 1 — establish a fact the agent should remember ──────────────── + turn1_prompt = ( + "Please remember this token for the rest of our conversation: " + "MOLECULE_SPIKE_7a3f. " + "What is today's task? Reply in one sentence." + ) + print(f"Turn 1 prompt:\n {turn1_prompt!r}\n") + turn1_reply, turn1_time = collect_turn(client, session.id, turn1_prompt) + print(f"Turn 1 reply ({turn1_time}s):\n {turn1_reply!r}\n") + + # ── 5. Turn 2 — verify state persistence ───────────────────────────────── + turn2_prompt = "What was the token I asked you to remember?" + print(f"Turn 2 prompt:\n {turn2_prompt!r}\n") + turn2_reply, turn2_time = collect_turn(client, session.id, turn2_prompt) + print(f"Turn 2 reply ({turn2_time}s):\n {turn2_reply!r}\n") + + # ── 6. State continuity check ───────────────────────────────────────────── + token_recalled = "MOLECULE_SPIKE_7a3f" in turn2_reply + print("=== Results ===") + print(f" environment create : {env_time}s") + print(f" agent create : {agent_time}s") + print(f" cold-start (session create → ready) : {cold_time}s") + print(f" turn 1 round-trip : {turn1_time}s") + print(f" turn 2 round-trip : {turn2_time}s") + print(f" state continuity : {'PASS — token recalled' if token_recalled else 'FAIL — token not found in turn 2'}") + + # Emit JSON summary for easy parsing in CI / PR bots + summary = { + "environment_id": environment.id, + "agent_id": agent.id, + "session_id": session.id, + "timings": { + "environment_create_s": env_time, + "agent_create_s": agent_time, + "cold_start_s": cold_time, + "turn1_rtt_s": turn1_time, + "turn2_rtt_s": turn2_time, + }, + "state_continuity_pass": token_recalled, + } + print("\nJSON summary:") + print(json.dumps(summary, indent=2)) + + # ── 7. Cleanup ──────────────────────────────────────────────────────────── + if not SKIP_CLEANUP: + print("\nCleaning up…") + try: + client.beta.sessions.delete(session_id=session.id) + print(f" session {session.id} deleted") + except Exception as exc: + print(f" session delete warning: {exc}") + # Agents are persistent/shared — don't delete unless explicitly asked. + # Set MA_SKIP_CLEANUP=1 and clean up manually with: + # client.beta.agents.delete(agent.id) + print(f" agent {agent.id} kept (persistent object; delete manually if needed)") + else: + print(f"\nSKIP_CLEANUP=1 — session and agent left alive.") + print(f" Session: {session.id}") + print(f" Agent: {agent.id}") + + sys.exit(0 if token_recalled else 1) + + +if __name__ == "__main__": + main()