molecule-core/docs/architecture.md

# Architecture Overview

Molecule AI is a distributed platform for orchestrating AI agent teams. Three components form the core system, connected by HTTP, WebSocket, and JSON-RPC protocols.

## System Components

```
Browser ──HTTP/WS──> Canvas (Next.js :3000)
                        │
                    HTTP + WS
                        │
                    Platform (Go :8080)
                     ┌──┴──┐
                 Postgres  Redis
                     │
                 Docker API
                     │
           ┌─────────┼─────────┐
       Agent-1    Agent-2    Agent-N
      (Python)   (Python)   (Python)
           └──A2A JSON-RPC 2.0──┘
```

### Canvas (Next.js 15 + React Flow)

The browser-based visual UI. Built with `@xyflow/react` v12, Zustand for state, and Tailwind CSS.

- Renders workspaces as draggable nodes on a canvas
- Connects to Platform via REST (`http://localhost:8080`) and WebSocket (`ws://localhost:8080/ws`)
- Sends user messages to agents through the Platform's A2A proxy
- Receives real-time updates via WebSocket events (status changes, agent messages, A2A responses)

Source: `canvas/`

### Platform (Go / Gin)

The control plane. Manages workspace lifecycle, provisions containers, proxies A2A communication, and broadcasts events.

Key responsibilities:
- **Workspace CRUD** -- create, list, update, delete workspaces
- **Container provisioning** -- starts Docker containers for each workspace agent, injects secrets as env vars
- **A2A proxy** -- forwards JSON-RPC requests from canvas to workspace agents, avoiding CORS/Docker network issues
- **Registry** -- agents self-register on startup, send heartbeats, update their AgentCard
- **Discovery** -- workspaces discover peers via hierarchy-based access control rules
- **WebSocket hub** -- broadcasts events to canvas clients (all events) and workspace clients (filtered by access)
- **Secrets management** -- global (`/settings/secrets`) + workspace-level encrypted secrets (AES-256-GCM) with inheritance (workspace overrides global)
- **Liveness monitoring** -- 3-layer health detection: passive (Redis TTL), proactive (Docker health sweep), reactive (A2A proxy check)

Source: `platform/`

### Workspace Runtime (Python)

The execution engine for individual agents. Each workspace runs in its own Docker container.

- Loads config from `/configs/config.yaml`
- Discovers the appropriate adapter (LangGraph, Claude Code, etc.)
- Wraps the agent in an A2A server (using `a2a-sdk`)
- Self-registers with Platform on startup (`POST /registry/register`)
- Sends periodic heartbeats (`POST /registry/heartbeat`)
- Communicates with other workspaces via A2A JSON-RPC 2.0

Source: `workspace-template/`

## Message Flow

### User sends a message to an agent

```
1. User types in ChatTab
2. Canvas sends POST /workspaces/:id/a2a with JSON-RPC body
3. Platform resolves workspace URL (cache or DB)
4. Platform wraps body in JSON-RPC 2.0 envelope if needed
5. Platform forwards to agent container (5-min timeout for canvas, 30-min for agent-to-agent)
6. Agent processes via LangGraph/adapter, returns JSON-RPC response
7. Platform broadcasts A2A_RESPONSE via WebSocket (canvas-initiated requests only)
8. Platform logs activity asynchronously
9. Canvas receives A2A_RESPONSE event, extracts text, displays in ChatTab
```

### Agent-to-agent delegation

```
1. Agent A calls message/send targeting Agent B
2. Request goes through Platform A2A proxy (POST /workspaces/:id/a2a with X-Workspace-ID header)
3. Platform verifies access via CanCommunicate(callerID, targetID)
4. Platform forwards to Agent B's container (30-min timeout)
5. Agent B responds, Platform returns response to Agent A
6. Activity logged for both workspaces
```

## Core Concepts

### Workspace

The fundamental unit. A workspace represents an organizational **role** (not a task). Each workspace:
- Has a unique UUID, name, role description, and tier (1-4)
- Runs in its own Docker container
- Exposes a single A2A endpoint
- Can be expanded into a sub-team (Team Lead + children)
- Has a lifecycle: `provisioning` -> `online` -> `degraded` -> `offline` -> `removed`

### Agent Card

An A2A protocol discovery document. Each workspace agent publishes an AgentCard containing:
- Name, description, version
- URL endpoint
- Capabilities (streaming, push notifications)
- Skills (id, name, description, tags, examples)
- Supported input/output modes

Updated via `POST /registry/update-card` and broadcast as `AGENT_CARD_UPDATED`.

### A2A Protocol (Agent-to-Agent)

Industry-standard JSON-RPC 2.0 protocol for agent communication:
- `message/send` -- synchronous request/response
- `message/stream` -- SSE streaming variant
- `tasks/get` -- poll async task status

All agent-to-agent traffic flows through the Platform A2A proxy for access control and observability.

### Hierarchy & Access Control

The organizational structure IS the network topology. `CanCommunicate(callerID, targetID)` rules:
- Same workspace: allowed
- Parent <-> child: allowed
- Siblings (same parent_id): allowed
- Root-level workspaces (both parent_id IS NULL): allowed
- Everything else: denied

### Team Expansion (Fractal Architecture)

Any workspace can recursively expand into a sub-team. From the outside, it still exposes a single A2A endpoint. Inside, a Team Lead coordinates child agents.

```
Before:                     After expand:
┌──────────┐               ┌──────────────────────┐
│ Marketing│               │ Marketing (Team Lead)│
│          │   ──expand──> │  ├─ SEO Agent        │
│          │               │  ├─ Content Writer   │
│          │               │  └─ Analytics Agent  │
└──────────┘               └──────────────────────┘
```

- `POST /workspaces/:id/expand` provisions child workspaces from config
- `POST /workspaces/:id/collapse` removes children, reverting to single workspace
- Children are auto-wired: Team Lead ↔ children can communicate, children are siblings
- On the canvas, children render as chips inside the parent node

### Tiered Security

| Tier | Name | Isolation |
|------|------|-----------|
| 1 | Sandboxed | Read-only root FS, tmpfs /tmp, no /workspace mount |
| 2 | Standard | 512 MiB memory, 1.0 CPU limit |
| 3 | Privileged | Privileged mode, host PID, Docker network |
| 4 | Full Access | Privileged, host PID, host network, Docker socket |

## Database (PostgreSQL)

Key tables:

| Table | Purpose |
|-------|---------|
| `workspaces` | Core entity: id, name, role, tier, status, url, parent_id, agent_card (JSONB), heartbeat timestamps |
| `workspace_secrets` | Per-workspace encrypted secrets (AES-256-GCM). UNIQUE(workspace_id, key) |
| `global_secrets` | Platform-wide secrets. Workspace secrets with same key override globals |
| `activity_logs` | A2A communication logs: source, target, method, request/response bodies, duration, status |
| `agent_memories` | Hierarchical Memory Architecture: LOCAL, TEAM, GLOBAL scoped memories |
| `structure_events` | Append-only event log (WORKSPACE_ONLINE, AGENT_CARD_UPDATED, etc.) |
| `workspace_config` | Arbitrary JSONB config per workspace |
| `workspace_memory` | Key-value store with optional TTL per workspace |
| `canvas_layouts` | Node x/y positions on the canvas |

Migrations: `platform/migrations/` (12 files, auto-applied on startup).

## Directory Structure

```
molecule/
├── canvas/                        # Frontend (Next.js 15)
│   └── src/
│       ├── app/                   # Next.js app router pages
│       ├── components/            # React components (tabs/, workspace-node)
│       ├── store/                 # Zustand stores (canvas, socket, events)
│       ├── hooks/                 # Custom React hooks
│       └── lib/                   # Utilities
├── platform/                      # Backend (Go / Gin)
│   ├── cmd/server/main.go        # Entry point
│   ├── cmd/cli/                   # molecli TUI dashboard
│   ├── internal/
│   │   ├── handlers/              # 24 HTTP handler files
│   │   ├── ws/                    # WebSocket hub + client management
│   │   ├── events/                # Broadcaster (WS + Redis pub/sub)
│   │   ├── db/                    # PostgreSQL + Redis connections
│   │   ├── provisioner/           # Docker container lifecycle
│   │   ├── registry/              # Liveness, health sweep, access rules
│   │   ├── crypto/                # AES-256-GCM encryption
│   │   └── models/                # Data types
│   └── migrations/                # 12 SQL migration files
├── workspace-template/            # Agent Runtime (Python)
│   ├── main.py                    # Entry point
│   ├── a2a_executor.py            # A2A request handler
│   ├── config.py                  # YAML config loader
│   ├── heartbeat.py               # Platform heartbeat loop
│   ├── adapters/                  # Runtime backends (langgraph, claude-code, ...)
│   └── tools/                     # Agent tools (delegation, sandbox, ...)
├── docker-compose.yml             # Full stack
└── docker-compose.infra.yml       # Infrastructure only (dev)
```

## Supporting Infrastructure

| Service | Image | Purpose |
|---------|-------|---------|
| PostgreSQL 16 | `postgres:16-alpine` | Primary database |
| Redis 7 | `redis:7-alpine` | URL caching, pub/sub, TTL-based liveness |
| Langfuse | `langfuse/langfuse:2` + ClickHouse | LLM call tracing and observability |
| LiteLLM (optional) | `ghcr.io/berriai/litellm` | Unified multi-provider LLM routing |
| Ollama (optional) | `ollama/ollama` | Local model inference |