molecule-core/docs/api-protocol/a2a-protocol.md
Hongming Wang d8026347e5 chore: open-source restructure — rename dirs, remove internal files, scrub secrets
Renames:
- platform/ → workspace-server/ (Go module path stays as "platform" for
  external dep compat — will update after plugin module republish)
- workspace-template/ → workspace/

Removed (moved to separate repos or deleted):
- PLAN.md — internal roadmap (move to private project board)
- HANDOFF.md, AGENTS.md — one-time internal session docs
- .claude/ — gitignored entirely (local agent config)
- infra/cloudflare-worker/ → Molecule-AI/molecule-tenant-proxy
- org-templates/molecule-dev/ → standalone template repo
- .mcp-eval/ → molecule-mcp-server repo
- test-results/ — ephemeral, gitignored

Security scrubbing:
- Cloudflare account/zone/KV IDs → placeholders
- Real EC2 IPs → <EC2_IP> in all docs
- CF token prefix, Neon project ID, Fly app names → redacted
- Langfuse dev credentials → parameterized
- Personal runner username/machine name → generic

Community files:
- CONTRIBUTING.md — build, test, branch conventions
- CODE_OF_CONDUCT.md — Contributor Covenant 2.1

All Dockerfiles, CI workflows, docker-compose, railway.toml, render.yaml,
README, CLAUDE.md updated for new directory names.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 00:24:44 -07:00

8.9 KiB

A2A Protocol (Inter-Workspace Communication)

Workspaces talk to each other directly via A2A (Agent-to-Agent protocol) — the platform is not in the message path.

How It Works

Every workspace is an A2A server. The platform is an A2A client when it needs to communicate with workspaces. Workspaces communicate with each other directly — the platform only handles discovery.

Business Core (A2A client)  ->  Developer PM (A2A server)
                                  (opaque to Business Core
                                   what's inside)

Discovery Flow

How Business Core finds Developer PM's URL:

  1. Business Core asks platform: GET /registry/discover/developer-pm-id with X-Workspace-ID header
  2. Platform checks CanCommunicate() for the caller/target pair
  3. Platform resolves the URL:
    • Workspace caller (has X-Workspace-ID): returns Docker-internal URL from ws:{id}:internal_url Redis key — containers can reach each other by hostname on the Docker network
    • Canvas/external (no header): returns host-mapped URL from ws:{id}:url Redis key — the ephemeral 127.0.0.1:PORT bound by the provisioner
  4. If cache miss, platform reads from Postgres, refreshes cache
  5. Business Core sends A2A JSON-RPC message directly to Developer PM
  6. Developer PM processes the task and responds

The platform is only involved in URL resolution. The actual task messages go workspace-to-workspace.

Message Format

A2A uses JSON-RPC 2.0 over HTTP:

{
  "jsonrpc": "2.0",
  "id": "task-123",
  "method": "message/send",
  "params": {
    "message": {
      "role": "user",
      "parts": [{ "kind": "text", "text": "Build the login feature" }],
      "messageId": "msg-456"
    }
  }
}

The receiving workspace:

  1. Processes this as a task
  2. Streams progress updates via SSE
  3. Returns artifacts (files, structured data, text) when done

On-Demand Discovery (Not Pushed)

Topology is not pushed to workspaces at startup. A workspace only queries the platform for another workspace's URL at the moment it decides to delegate to it.

Why not push at startup: The topology changes while the workspace is running — sub-workspaces get added, removed, come online and go offline. If you push at startup you'd need to also push every topology change to every affected workspace and keep them in sync. That's complex and fragile.

On-demand fits naturally with how agents work — an agent only needs to know about another workspace at the moment it decides to delegate, not before.

Note: While URL resolution is on-demand, the workspace does fetch peer Agent Cards on startup to build its system prompt (see System Prompt Structure). The system prompt is rebuilt reactively when AGENT_CARD_UPDATED events arrive — but the actual A2A URL for sending messages is resolved on-demand at delegation time.

Authentication Between Workspaces

MVP: discovery-time validation only. The platform validates CanCommunicate() when workspace A calls GET /registry/discover/:id (using X-Workspace-ID header). Once A has B's URL, direct A2A calls are unauthenticated.

This is acceptable for MVP because:

  • All workspaces are provisioned by the same platform on trusted infrastructure
  • Docker network isolation (molecule-monorepo-net) limits who can reach workspace endpoints
  • The tool is self-hosted — the operator controls the network

Known gap: Once workspace A caches workspace B's URL, nothing stops A from calling B directly even after the hierarchy changes and A is no longer supposed to reach B. The cached URL remains valid until the container is restarted or the URL changes.

Post-MVP fix — platform-issued tokens: On discovery, the platform issues a short-lived signed token scoped to the specific caller/target pair. The target workspace validates the token on every A2A request. When the hierarchy changes, old tokens expire and new discovery attempts are blocked by CanCommunicate().

Task Lifecycle

Every A2A message creates a task with a defined lifecycle:

submitted → working → completed
                    → failed
                    → canceled
           → input-required → working (caller provides input)

Full Flow

Caller sends message/send or message/sendSubscribe
      │
      ▼
Task created: status = submitted
      │
      ▼
Workspace starts processing: status = working
      │
      ├── needs clarification?
      │         │
      │         ▼
      │   status = input-required
      │   SSE event fires to caller
      │   caller sends follow-up message
      │         │
      │         ▼
      │   status = working (resumes)
      │
      ├── success
      │         │
      │         ▼
      │   status = completed
      │   SSE terminal event fires
      │   artifacts returned
      │
      └── error
                │
                ▼
          status = failed
          SSE terminal event fires
          error details returned

Calling Patterns

Two patterns — synchronous for short tasks, streaming for long ones:

# pattern 1 — synchronous (short tasks)
# caller blocks until terminal state
result = await a2a.send({
    "method": "message/send",
    "params": { "message": { ... } }
})
# returns when completed/failed — no streaming

# pattern 2 — streaming (long tasks)
# caller subscribes to SSE stream
async for event in a2a.subscribe({
    "method": "message/sendSubscribe",
    "params": { "message": { ... } }
}):
    if event["status"] == "working":
        # intermediate progress update
        print(event["message"])

    if event["status"] in ("completed", "failed", "canceled"):
        # terminal event — stream ends here
        result = event["artifacts"]
        break

No polling needed. The SSE stream includes a terminal event — the caller knows the task is done when it receives completed, failed, or canceled.

Task ID

Every task gets an ID on creation, returned in the first SSE event or synchronous response:

task_id = response["id"]

# caller can check status explicitly if needed
status = await a2a.get(f"/tasks/{task_id}")

Cancellation

# cancel an in-flight task
await a2a.send({
    "method": "tasks/cancel",
    "params": { "id": task_id }
})
# workspace receives cancel signal
# status → canceled
# SSE terminal event fires to all subscribers

The workspace handles cancellation via the LangGraphA2AExecutor.cancel() method, which uses LangGraph's interrupt mechanism:

# workspace/a2a_executor.py
async def cancel(self, context: RequestContext, queue: EventQueue):
    await self.agent.ainterrupt(context.context_id)
    # status → canceled, SSE terminal event fires automatically

See Workspace Runtime — A2A Server Wrapping for the full executor implementation.

Artifacts

On completion, the task returns artifacts:

{
  "status": "completed",
  "artifacts": [
    {
      "type": "text/plain",
      "content": "Page generated successfully"
    },
    {
      "type": "application/json",
      "content": { "page_path": "/kitchen-renovation-vancouver" }
    }
  ]
}

Platform A2A Proxy

The canvas (browser) cannot reach Docker-internal agent URLs directly. The platform provides POST /workspaces/:id/a2a as a proxy:

  1. Canvas sends JSON-RPC to the platform proxy
  2. Proxy resolves the agent's host-accessible URL from Redis cache (falls back to DB)
  3. If the request lacks a jsonrpc field, the proxy wraps it in a JSON-RPC 2.0 envelope with a generated UUID
  4. If params.message.messageId is missing, the proxy injects one (required by a2a-sdk)
  5. Proxy forwards the request to the agent (120s timeout, 10MB response limit)
  6. Agent response is returned to the caller

This proxy is the only way the canvas communicates with agents. Workspace-to-workspace communication is direct (no proxy).

Key Properties

  • Transport: JSON-RPC 2.0 over HTTP — any language can implement it
  • Discovery: Agent Cards at /.well-known/agent-card.json
  • On-demand: Workspaces discover peers when needed, not at startup
  • Opaque execution: The caller doesn't know (or care) what's inside the callee
  • Interoperable: Any A2A-compliant agent from any framework can plug in
  • Direct: Workspace-to-workspace messages go direct; canvas uses platform proxy
  • MVP auth: Discovery-time only; post-MVP adds signed tokens