molecule-core/docs/architecture.md
Hongming Wang d8026347e5 chore: open-source restructure — rename dirs, remove internal files, scrub secrets
Renames:
- platform/ → workspace-server/ (Go module path stays as "platform" for
  external dep compat — will update after plugin module republish)
- workspace-template/ → workspace/

Removed (moved to separate repos or deleted):
- PLAN.md — internal roadmap (move to private project board)
- HANDOFF.md, AGENTS.md — one-time internal session docs
- .claude/ — gitignored entirely (local agent config)
- infra/cloudflare-worker/ → Molecule-AI/molecule-tenant-proxy
- org-templates/molecule-dev/ → standalone template repo
- .mcp-eval/ → molecule-mcp-server repo
- test-results/ — ephemeral, gitignored

Security scrubbing:
- Cloudflare account/zone/KV IDs → placeholders
- Real EC2 IPs → <EC2_IP> in all docs
- CF token prefix, Neon project ID, Fly app names → redacted
- Langfuse dev credentials → parameterized
- Personal runner username/machine name → generic

Community files:
- CONTRIBUTING.md — build, test, branch conventions
- CODE_OF_CONDUCT.md — Contributor Covenant 2.1

All Dockerfiles, CI workflows, docker-compose, railway.toml, render.yaml,
README, CLAUDE.md updated for new directory names.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 00:24:44 -07:00

221 lines
9.8 KiB
Markdown

# Architecture Overview
Molecule AI is a distributed platform for orchestrating AI agent teams. Three components form the core system, connected by HTTP, WebSocket, and JSON-RPC protocols.
## System Components
```
Browser ──HTTP/WS──> Canvas (Next.js :3000)
HTTP + WS
Platform (Go :8080)
┌──┴──┐
Postgres Redis
Docker API
┌─────────┼─────────┐
Agent-1 Agent-2 Agent-N
(Python) (Python) (Python)
└──A2A JSON-RPC 2.0──┘
```
### Canvas (Next.js 15 + React Flow)
The browser-based visual UI. Built with `@xyflow/react` v12, Zustand for state, and Tailwind CSS.
- Renders workspaces as draggable nodes on a canvas
- Connects to Platform via REST (`http://localhost:8080`) and WebSocket (`ws://localhost:8080/ws`)
- Sends user messages to agents through the Platform's A2A proxy
- Receives real-time updates via WebSocket events (status changes, agent messages, A2A responses)
Source: `canvas/`
### Platform (Go / Gin)
The control plane. Manages workspace lifecycle, provisions containers, proxies A2A communication, and broadcasts events.
Key responsibilities:
- **Workspace CRUD** -- create, list, update, delete workspaces
- **Container provisioning** -- starts Docker containers for each workspace agent, injects secrets as env vars
- **A2A proxy** -- forwards JSON-RPC requests from canvas to workspace agents, avoiding CORS/Docker network issues
- **Registry** -- agents self-register on startup, send heartbeats, update their AgentCard
- **Discovery** -- workspaces discover peers via hierarchy-based access control rules
- **WebSocket hub** -- broadcasts events to canvas clients (all events) and workspace clients (filtered by access)
- **Secrets management** -- global (`/settings/secrets`) + workspace-level encrypted secrets (AES-256-GCM) with inheritance (workspace overrides global)
- **Liveness monitoring** -- 3-layer health detection: passive (Redis TTL), proactive (Docker health sweep), reactive (A2A proxy check)
Source: `workspace-server/`
### Workspace Runtime (Python)
The execution engine for individual agents. Each workspace runs in its own Docker container.
- Loads config from `/configs/config.yaml`
- Discovers the appropriate adapter (LangGraph, Claude Code, etc.)
- Wraps the agent in an A2A server (using `a2a-sdk`)
- Self-registers with Platform on startup (`POST /registry/register`)
- Sends periodic heartbeats (`POST /registry/heartbeat`)
- Communicates with other workspaces via A2A JSON-RPC 2.0
Source: `workspace/`
## Message Flow
### User sends a message to an agent
```
1. User types in ChatTab
2. Canvas sends POST /workspaces/:id/a2a with JSON-RPC body
3. Platform resolves workspace URL (cache or DB)
4. Platform wraps body in JSON-RPC 2.0 envelope if needed
5. Platform forwards to agent container (5-min timeout for canvas, 30-min for agent-to-agent)
6. Agent processes via LangGraph/adapter, returns JSON-RPC response
7. Platform broadcasts A2A_RESPONSE via WebSocket (canvas-initiated requests only)
8. Platform logs activity asynchronously
9. Canvas receives A2A_RESPONSE event, extracts text, displays in ChatTab
```
### Agent-to-agent delegation
```
1. Agent A calls message/send targeting Agent B
2. Request goes through Platform A2A proxy (POST /workspaces/:id/a2a with X-Workspace-ID header)
3. Platform verifies access via CanCommunicate(callerID, targetID)
4. Platform forwards to Agent B's container (30-min timeout)
5. Agent B responds, Platform returns response to Agent A
6. Activity logged for both workspaces
```
## Core Concepts
### Workspace
The fundamental unit. A workspace represents an organizational **role** (not a task). Each workspace:
- Has a unique UUID, name, role description, and tier (1-4)
- Runs in its own Docker container
- Exposes a single A2A endpoint
- Can be expanded into a sub-team (Team Lead + children)
- Has a lifecycle: `provisioning` -> `online` -> `degraded` -> `offline` -> `removed`
### Agent Card
An A2A protocol discovery document. Each workspace agent publishes an AgentCard containing:
- Name, description, version
- URL endpoint
- Capabilities (streaming, push notifications)
- Skills (id, name, description, tags, examples)
- Supported input/output modes
Updated via `POST /registry/update-card` and broadcast as `AGENT_CARD_UPDATED`.
### A2A Protocol (Agent-to-Agent)
Industry-standard JSON-RPC 2.0 protocol for agent communication:
- `message/send` -- synchronous request/response
- `message/stream` -- SSE streaming variant
- `tasks/get` -- poll async task status
All agent-to-agent traffic flows through the Platform A2A proxy for access control and observability.
### Hierarchy & Access Control
The organizational structure IS the network topology. `CanCommunicate(callerID, targetID)` rules:
- Same workspace: allowed
- Parent <-> child: allowed
- Siblings (same parent_id): allowed
- Root-level workspaces (both parent_id IS NULL): allowed
- Everything else: denied
### Team Expansion (Fractal Architecture)
Any workspace can recursively expand into a sub-team. From the outside, it still exposes a single A2A endpoint. Inside, a Team Lead coordinates child agents.
```
Before: After expand:
┌──────────┐ ┌──────────────────────┐
│ Marketing│ │ Marketing (Team Lead)│
│ │ ──expand──> │ ├─ SEO Agent │
│ │ │ ├─ Content Writer │
│ │ │ └─ Analytics Agent │
└──────────┘ └──────────────────────┘
```
- `POST /workspaces/:id/expand` provisions child workspaces from config
- `POST /workspaces/:id/collapse` removes children, reverting to single workspace
- Children are auto-wired: Team Lead ↔ children can communicate, children are siblings
- On the canvas, children render as chips inside the parent node
### Tiered Security
| Tier | Name | Isolation |
|------|------|-----------|
| 1 | Sandboxed | Read-only root FS, tmpfs /tmp, no /workspace mount |
| 2 | Standard | 512 MiB memory, 1.0 CPU limit |
| 3 | Privileged | Privileged mode, host PID, Docker network |
| 4 | Full Access | Privileged, host PID, host network, Docker socket |
## Database (PostgreSQL)
Key tables:
| Table | Purpose |
|-------|---------|
| `workspaces` | Core entity: id, name, role, tier, status, url, parent_id, agent_card (JSONB), heartbeat timestamps |
| `workspace_secrets` | Per-workspace encrypted secrets (AES-256-GCM). UNIQUE(workspace_id, key) |
| `global_secrets` | Platform-wide secrets. Workspace secrets with same key override globals |
| `activity_logs` | A2A communication logs: source, target, method, request/response bodies, duration, status |
| `agent_memories` | Hierarchical Memory Architecture: LOCAL, TEAM, GLOBAL scoped memories |
| `structure_events` | Append-only event log (WORKSPACE_ONLINE, AGENT_CARD_UPDATED, etc.) |
| `workspace_config` | Arbitrary JSONB config per workspace |
| `workspace_memory` | Key-value store with optional TTL per workspace |
| `canvas_layouts` | Node x/y positions on the canvas |
Migrations: `workspace-server/migrations/` (12 files, auto-applied on startup).
## Directory Structure
```
molecule/
├── canvas/ # Frontend (Next.js 15)
│ └── src/
│ ├── app/ # Next.js app router pages
│ ├── components/ # React components (tabs/, workspace-node)
│ ├── store/ # Zustand stores (canvas, socket, events)
│ ├── hooks/ # Custom React hooks
│ └── lib/ # Utilities
├── workspace-server/ # Backend (Go / Gin)
│ ├── cmd/server/main.go # Entry point
│ ├── cmd/cli/ # molecli TUI dashboard
│ ├── internal/
│ │ ├── handlers/ # 24 HTTP handler files
│ │ ├── ws/ # WebSocket hub + client management
│ │ ├── events/ # Broadcaster (WS + Redis pub/sub)
│ │ ├── db/ # PostgreSQL + Redis connections
│ │ ├── provisioner/ # Docker container lifecycle
│ │ ├── registry/ # Liveness, health sweep, access rules
│ │ ├── crypto/ # AES-256-GCM encryption
│ │ └── models/ # Data types
│ └── migrations/ # 12 SQL migration files
├── workspace/ # Agent Runtime (Python)
│ ├── main.py # Entry point
│ ├── a2a_executor.py # A2A request handler
│ ├── config.py # YAML config loader
│ ├── heartbeat.py # Platform heartbeat loop
│ ├── adapters/ # Runtime backends (langgraph, claude-code, ...)
│ └── tools/ # Agent tools (delegation, sandbox, ...)
├── docker-compose.yml # Full stack
└── docker-compose.infra.yml # Infrastructure only (dev)
```
## Supporting Infrastructure
| Service | Image | Purpose |
|---------|-------|---------|
| PostgreSQL 16 | `postgres:16-alpine` | Primary database |
| Redis 7 | `redis:7-alpine` | URL caching, pub/sub, TTL-based liveness |
| Langfuse | `langfuse/langfuse:2` + ClickHouse | LLM call tracing and observability |
| LiteLLM (optional) | `ghcr.io/berriai/litellm` | Unified multi-provider LLM routing |
| Ollama (optional) | `ollama/ollama` | Local model inference |