Files
molecule-core/docs/agent-runtime/cli-runtime.md
T
claude-ceo-assistant f7e2976324
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 9s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 7s
Check migration collisions / Migration version collision check (pull_request) Successful in 10s
CI / Detect changes (pull_request) Successful in 7s
CI / Python Lint & Test (pull_request) Successful in 5s
E2E API Smoke Test / detect-changes (pull_request) Successful in 7s
E2E Chat / detect-changes (pull_request) Successful in 7s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (pull_request) Successful in 5s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 10s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Has been skipped
Handlers Postgres Integration / detect-changes (pull_request) Successful in 6s
Harness Replays / detect-changes (pull_request) Successful in 4s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 4s
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Successful in 33s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (local) (pull_request) Successful in 50s
Lint no tenant GITEA or GITHUB token write / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 8s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 9s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 58s
gate-check-v3 / gate-check (pull_request) Successful in 4s
qa-review / approved (pull_request) Successful in 3s
security-review / approved (pull_request) Successful in 3s
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request) Successful in 4s
sop-checklist / review-refire (pull_request) Has been skipped
sop-tier-check / tier-check (pull_request) Successful in 4s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 1m6s
E2E Staging External Runtime / E2E Staging External Runtime (pull_request) Successful in 5m25s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 20s
E2E Chat / E2E Chat (pull_request) Successful in 33s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 11s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 1m58s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2m44s
Harness Replays / Harness Replays (pull_request) Successful in 6s
CI / Platform (Go) (pull_request) Successful in 6m9s
CI / Canvas (Next.js) (pull_request) Successful in 7m41s
CI / all-required (pull_request) Successful in 32m0s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
audit-force-merge / audit (pull_request) Successful in 32s
chore: retire unmaintained workspace runtimes
2026-05-23 23:45:09 -07:00

13 KiB

Agent Runtime Adapters

Overview

The workspace runtime uses a pluggable adapter architecture — each maintained agent infrastructure (Claude Code, Codex, Hermes, OpenClaw) has its own adapter that bridges the A2A protocol to the infra's native interface.

Adapters live in workspace/adapters/<runtime>/ and are auto-discovered at startup. Each adapter implements BaseAdapter (from adapters/base.py) with setup() and create_executor() methods.

The runtime is selected via config.yaml:

runtime: claude-code    # or: codex, hermes, openclaw
runtime_config:
  model: sonnet
  auth_token_file: .auth-token
  timeout: 0

How It Works

The unified runtime checks the runtime field in config.yaml, discovers the matching adapter, calls adapter.setup(config) then adapter.create_executor(config) to get an AgentExecutor that handles A2A requests.

A2A request arrives
      |
      v
AgentExecutor.execute(context, event_queue)
      |  - extracts user message from A2A parts
      |  - extracts conversation history from params.metadata.history
      |  - sets current_task on heartbeat (shows on canvas card)
      |  - invokes the runtime adapter
      v
Response → A2A event queue → JSON-RPC response

Conversation History

Chat sessions in the Canvas UI send prior messages (up to 20) via params.metadata.history in each A2A message/send request. Executors extract this history:

  • Claude Code: Uses --resume <session_id> for native session continuity (history not needed)
  • Codex: Uses the Codex runtime's native session state
  • Hermes: Uses Hermes' agent runtime session handling
  • OpenClaw: Uses --session-id for native session continuity

Current Task Reporting

All executors update the workspace's current_task via the heartbeat during execution. This shows an amber banner on the canvas card. The shared set_current_task(heartbeat, task) function in a2a_executor.py handles this for all runtimes.

Built-in Adapters

Claude Code (runtime: claude-code)

runtime: claude-code
runtime_config:
  model: sonnet          # or opus, haiku
  auth_token_file: .auth-token   # OAuth token file in /configs/

Uses the Claude Agent SDK (claude-agent-sdk Python package) to invoke the Claude Code engine programmatically via ClaudeSDKExecutor. This replaced the earlier subprocess-based approach (claude --print ...) to eliminate stdout buffering, zombie processes, session-ID parsing fragility, and ~500ms per-message startup overhead.

The SDK uses the same Claude Code engine under the hood — plugins, CLAUDE.md discovery, hooks, auto-memory, and skills all work identically. The @anthropic-ai/claude-code npm package is still installed in the image because the SDK wraps it internally.

Auth: Uses the CLAUDE_CODE_OAUTH_TOKEN env var — the OAuth token is read from /configs/.auth-token and picked up by the SDK automatically.

Concurrency: Turns are serialized per-executor via an asyncio.Lock so session state stays race-free. Cooperative cancel support via aclose() on the SDK's async generator.

Important: Claude Code refuses to run as root with --dangerously-skip-permissions. The Dockerfile creates a non-root agent user.

Codex (runtime: codex)

runtime: codex
model: openai/gpt-5.3-codex

Hermes (runtime: hermes)

runtime: hermes
model: openai/gpt-4o

OpenClaw (runtime: openclaw)

Proxies A2A messages to OpenClaw via openclaw agent CLI subprocess. Handles its own session continuity via --session-id.

runtime: openclaw

Auth: Uses OpenClaw's own authentication (configured during openclaw setup).

Session Continuity (Claude Code)

Claude Code workspaces maintain conversation state across messages using the SDK's resume option:

  1. First message: the SDK's ResultMessage returns a session_id
  2. Subsequent messages: the SDK is called with resume=<session_id> to continue the same conversation
  3. System prompt: only injected on the first message — resumed sessions already have it
  4. Memories: recalled from the platform API on the first turn only; subsequent turns already have context

Session state is stored inside the container at ~/.claude/ and persists across messages but resets on container restart.

System Prompt

All runtimes load system-prompt.md from the workspace's config directory (/configs/system-prompt.md). For Claude Code (SDK executor) and other CLI runtimes, the prompt is re-read on each message (supports hot-reload without restart). A2A delegation instructions are appended automatically.

For LangGraph runtimes, the system prompt is built from multiple sources (config, skills, plugins, peer capabilities) at startup.

Auth Token Resolution

The CLI executor resolves auth tokens in this order:

  1. Environment variableCLAUDE_AUTH_TOKEN, OPENAI_API_KEY, etc.
  2. Token file/configs/.auth-token (relative to config dir)

For Claude Code specifically:

  • Extract your OAuth access token from the macOS keychain: security find-generic-password -s "Claude Code-credentials" -a "<username>" -w
  • Write it to workspace-configs-templates/claude-code-default/.auth-token
  • The provisioner copies this file to each new workspace's config dir

Auto-Provisioning Without Templates

Workspaces can be created without specifying a template. The platform automatically:

  1. Creates a config directory (ws-<id>) under workspace-configs-templates/
  2. Generates a minimal config.yaml with the workspace's name, role, runtime, and model
  3. Copies .auth-token from the claude-code-default template (if it exists)
  4. Merges any files previously uploaded via the Files API
  5. Starts the container

This means you can create a workspace with just:

curl -X POST http://localhost:8080/workspaces \
  -H "Content-Type: application/json" \
  -d '{"name": "My Agent", "role": "Does things", "runtime": "claude-code"}'

And it provisions, registers, and comes online automatically.

Dockerfile

The unified workspace/Dockerfile includes both Python and Node.js:

FROM python:3.11-slim

# Node.js for CLI runtimes (claude-code, codex)
RUN apt-get update && apt-get install -y nodejs
RUN npm install -g @anthropic-ai/claude-code

# Non-root user (claude --dangerously-skip-permissions refuses root)
RUN useradd -m -s /bin/bash agent

# Python deps for LangGraph runtime
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY *.py ./
USER agent
CMD ["python", "main.py"]

Inter-Agent Communication (A2A Delegation)

CLI-based workspaces can communicate with other workspaces via two mechanisms:

MCP Tools (Claude Code and other MCP-compatible runtimes)

For MCP-compatible runtimes, an A2A MCP server (a2a_mcp_server.py) is automatically injected via --mcp-config. This gives the agent three MCP tools:

Tool Description
list_peers Discover sibling/parent/child workspaces (name, ID, status, role)
delegate_task Send a task to a peer and get their response via A2A
delegate_task_async Send a task and return immediately with a task_id (for long tasks)
check_task_status Poll an async task's status and get results when done
get_workspace_info Get this workspace's own metadata

The agent uses these tools naturally — no special instructions needed. Access control is enforced by the platform registry.

Example flow: Marketing uses delegate_task(seo_id, "What is your status?") → A2A message to SEO → SEO responds → result returned to Marketing.

Delegation Error Handling

When delegate_task receives an error from a child (auth failure, timeout, offline), the MCP server wraps it as a DELEGATION FAILED message with instructions for the calling agent to: (1) try a different peer, (2) handle the task itself, or (3) inform the user which peer is unavailable and provide its own best answer. Errors are tagged with a [A2A_ERROR] sentinel prefix so they can be reliably distinguished from normal response text. Coordinator prompts and A2A instructions reinforce that agents must never forward raw error messages to the user.

CLI Commands (Custom runtimes)

For non-MCP runtimes, A2A instructions are injected into the system prompt. The agent uses bash commands:

a2a peers                          # List available peers
a2a delegate <workspace_id> <task>  # Send task to a peer
a2a info                           # Show workspace info

Both approaches use the same backend: platform registry for discovery, A2A protocol for messaging, and access control enforcement (parent↔child, siblings only).

Memory Tools

CLI runtimes keep the same memory tool surface as the Python runtime: commit_memory / commit_memory_v2 / search_memory / commit_summary / forget_memory are exposed via the workspace's MCP bridge and route through the platform's v2 memory plugin under the workspace's workspace:<id> namespace. See Memory Architecture for the backend.

Task Status Reporting

Any process inside a workspace container (cron jobs, scripts, background tasks) can update the canvas card display:

python3 -m molecule_runtime.molecule_ai_status "Running weekly SEO audit..."  # show on canvas
python3 -m molecule_runtime.molecule_ai_status ""                              # clear when done

From Python:

from molecule_runtime.molecule_ai_status import set_status
set_status("Analyzing competitor data...")

This pushes an immediate heartbeat with current_task to the platform, which broadcasts via WebSocket to the canvas. The task banner appears instantly on the workspace card.

Key Files

File Role
main.py Runtime selector — discovers adapter, calls setup/create_executor
claude_sdk_executor.py ClaudeSDKExecutor for Claude Code runtime (SDK-based, replaces subprocess)
executor_helpers.py Shared helpers: memory recall/commit, delegation results, heartbeat, system prompt, error sanitization
cli_executor.py CLIAgentExecutor for Codex, Ollama, custom runtimes (subprocess-based)
a2a_executor.py LangGraphA2AExecutor, shared set_current_task(), _extract_history()
adapters/base.py BaseAdapter interface + AdapterConfig dataclass
adapters/__init__.py Auto-discovers adapters from subdirectories
molecule_ai_status.py CLI tool + module for updating canvas task display from any process
a2a_mcp_server.py MCP server exposing A2A delegation tools (list_peers, delegate_task)
a2a_cli.py CLI tool for A2A delegation (all runtimes)
config.py RuntimeConfig dataclass, runtime field in WorkspaceConfig

Rate Limit Handling

Both executors include built-in retry logic with exponential backoff:

  • Empty responses (common rate limit signal) → retry up to 3 times (5s, 10s, 20s)
  • Rate limit errors (429, "overloaded") → retry with same backoff
  • Auth errors (OAuth token transient failures) → retry with backoff
  • Timeouts → kill subprocess (CLI) or close stream (SDK) and report (no retry)
  • All error messages are sanitized via sanitize_agent_error() — no raw stderr or exception details leak to the user chat

The A2A CLI (a2a_cli.py) also retries delegation calls on rate limits.

For production with many concurrent agents, consider:

  • Using different auth tokens per workspace (separate subscriptions)
  • Staggering agent invocations
  • Using delegate_task_async for long-running tasks

Known Limitations

  • Tier 1 (sandboxed): Read-only root filesystem is disabled for CLI runtimes because Claude Code needs writable directories (.claude/, .npm/, /tmp). Tier 1 still restricts the /workspace volume.
  • Rate limits: All workspaces share the same Claude subscription. Retry logic handles transient rate limits, but sustained high volume needs separate tokens.
  • Auth token lifecycle: OAuth tokens expire and need refreshing. Use claude setup-token for long-lived tokens in production.

Extending with New Runtimes

To add a new adapter:

  1. Create workspace/adapters/<name>/ with:

    • adapter.py — class extending BaseAdapter with setup() and create_executor() methods
    • requirements.txt — runtime-specific Python dependencies (installed at container startup)
    • __init__.py — exports adapter class as Adapter
  2. The create_executor() method returns an AgentExecutor (from a2a.server.agent_execution) whose execute(context, event_queue) method handles A2A requests.

  3. Use set_current_task() from a2a_executor.py for heartbeat/canvas integration.

  4. Use it in config.yaml: runtime: <name>