Merge pull request #3 from Molecule-AI/feat/external-agents-tokens-mcp

docs: comprehensive content for all 15 pages
2026-04-16 10:11:54 -07:00 · 2026-04-16 10:11:54 -07:00 · e2a772d561
commit e2a772d561
parent ebb56f9b8c a620e5a7a3
13 changed files with 2414 additions and 135 deletions
--- a/content/docs/api-reference.mdx
+++ b/content/docs/api-reference.mdx
@ -1,11 +1,423 @@
 ---
-title: Api Reference
-description: Stub page — content coming soon.
+title: API Reference
+description: Complete reference for all Molecule AI Platform HTTP and WebSocket endpoints.
 ---

-> 🚧 **Coming soon.** The Documentation Specialist agent will populate this
-> page on its next maintenance cycle.
+# API Reference

-If you need this content urgently, open an issue on the
-[docs repo](https://github.com/Molecule-AI/docs/issues/new) and the agent
-will prioritise it on its next cron tick.
+The Molecule AI Platform exposes a REST API (default port 8080) for workspace management, agent registry, communication, and administration. All endpoints return JSON unless otherwise noted.
+
+**Base URL:** `http://localhost:8080` (self-hosted) or `https://api.moleculesai.app` (SaaS)
+
+---
+
+## Authentication Model
+
+The platform uses three authentication middleware variants depending on the sensitivity of the route.
+
+### AdminAuth
+
+Strict bearer-token authentication. Required for any route where a forged request could leak prompts/memory, create/mutate workspaces, or leak operational data.
+
+```
+Authorization: Bearer <token>
+```
+
+**Fail-open behavior:** When no live tokens exist globally (fresh install), AdminAuth passes all requests through. Once the first token is created, all AdminAuth routes require a valid bearer.
+
+### WorkspaceAuth
+
+Per-workspace bearer token binding. Workspace A's token cannot access workspace B's sub-routes. Used for the entire `/workspaces/:id/*` group (except the A2A proxy, which uses `CanCommunicate`).
+
+```
+Authorization: Bearer <workspace-token>
+```
+
+### CanvasOrBearer
+
+Accepts either a valid bearer token OR a request whose `Origin` header matches `CORS_ORIGINS`. Used only for cosmetic-only routes where a forged request has zero data/security impact.
+
+Currently applies only to `PUT /canvas/viewport`. Do not extend to data-sensitive routes.
+
+---
+
+## Health and Monitoring
+
+| Method | Path | Auth | Description |
+|--------|------|------|-------------|
+| GET | `/health` | None | Returns `200 OK` if the platform is running. Use for load balancer health checks. |
+| GET | `/metrics` | None | Prometheus text format (v0.0.4) metrics. Scrape-safe, no auth required. |
+| GET | `/admin/liveness` | AdminAuth | Per-subsystem `supervised.Snapshot()` ages. Check before debugging stuck scheduler/heartbeat goroutines. |
+
+---
+
+## Workspaces
+
+Core workspace CRUD and lifecycle operations.
+
+### CRUD
+
+| Method | Path | Auth | Description |
+|--------|------|------|-------------|
+| POST | `/workspaces` | AdminAuth | Create a new workspace. Accepts `name`, `runtime`, `template`, `parent_id`, `tier`, `workspace_dir`, and other fields. Runtime is auto-detected from template config if omitted (defaults to `langgraph`). |
+| GET | `/workspaces` | AdminAuth | List all workspaces with status, runtime, agent card, position, and hierarchy info. |
+| GET | `/workspaces/:id` | WorkspaceAuth | Get a single workspace by ID. |
+| PATCH | `/workspaces/:id` | WorkspaceAuth | Update workspace fields. **Field-level authz:** cosmetic fields (name, role, x, y, canvas) pass through; sensitive fields (tier, parent_id, runtime, workspace_dir) require a valid bearer token when any live token exists. |
+| DELETE | `/workspaces/:id` | AdminAuth | Delete a workspace. Stops the container, revokes all auth tokens, and removes all associated data. |
+
+### Lifecycle
+
+| Method | Path | Auth | Description |
+|--------|------|------|-------------|
+| POST | `/workspaces/:id/restart` | WorkspaceAuth | Restart the workspace container. Sends a `restart_context` A2A message after successful re-registration. |
+| POST | `/workspaces/:id/pause` | WorkspaceAuth | Stop the container and set status to `paused`. Paused workspaces skip health sweep, liveness monitor, and auto-restart. |
+| POST | `/workspaces/:id/resume` | WorkspaceAuth | Re-provision a paused workspace. Status transitions to `provisioning`. |
+
+---
+
+## Registry
+
+Workspace registration and heartbeat endpoints. Called by workspace runtimes, not by end users.
+
+| Method | Path | Auth | Description |
+|--------|------|------|-------------|
+| POST | `/registry/register` | None | Register a workspace with the platform. Sets status to `online`. Body includes agent URL, agent card, capabilities. |
+| POST | `/registry/heartbeat` | Bearer (if token exists) | Send a heartbeat. Updates Redis TTL key (60s expiry). Body can include `active_tasks`, `current_task`, `error_rate`. Triggers `degraded` status if `error_rate > 0.5`. |
+| POST | `/registry/update-card` | Bearer (if token exists) | Update the workspace's agent card (name, description, skills, etc.). |
+
+---
+
+## Discovery
+
+Peer discovery and access control verification.
+
+| Method | Path | Auth | Description |
+|--------|------|------|-------------|
+| GET | `/registry/discover/:id` | Bearer + `X-Workspace-ID` | Discover a workspace's agent card and URL. Requires caller identification. Fails open on DB hiccup since hierarchy check is primary. |
+| GET | `/registry/:id/peers` | Bearer + `X-Workspace-ID` | List all peers (siblings, parent, children) that the caller can communicate with. |
+| POST | `/registry/check-access` | None | Check whether two workspaces can communicate. Body: `{ "caller_id": "...", "target_id": "..." }`. Returns `{ "allowed": true/false }`. |
+
+---
+
+## Communication
+
+### A2A Proxy
+
+| Method | Path | Auth | Description |
+|--------|------|------|-------------|
+| POST | `/workspaces/:id/a2a` | CanCommunicate | Proxy an A2A JSON-RPC message to the target workspace. Caller identified via `X-Workspace-ID` header. Canvas requests (no header) bypass access check. On connection error, checks if container is dead and triggers auto-restart. |
+
+### Delegation
+
+| Method | Path | Auth | Description |
+|--------|------|------|-------------|
+| POST | `/workspaces/:id/delegate` | WorkspaceAuth | Async fire-and-forget delegation. Supports idempotency keys. Body includes target workspace, prompt, and metadata. |
+| GET | `/workspaces/:id/delegations` | WorkspaceAuth | List delegation status for a workspace. Returns delegation rows with status, result, timestamps. |
+
+---
+
+## Configuration
+
+| Method | Path | Auth | Description |
+|--------|------|------|-------------|
+| GET | `/workspaces/:id/config` | WorkspaceAuth | Get the workspace's `config.yaml` contents. |
+| PATCH | `/workspaces/:id/config` | WorkspaceAuth | Update the workspace config. "Save & Restart" writes config and auto-restarts; "Save" writes only and shows a restart banner in the Canvas. |
+
+---
+
+## Secrets
+
+### Per-Workspace Secrets
+
+| Method | Path | Auth | Description |
+|--------|------|------|-------------|
+| GET | `/workspaces/:id/secrets` | WorkspaceAuth | List secret keys for a workspace (keys only, values masked). |
+| POST | `/workspaces/:id/secrets` | WorkspaceAuth | Set a secret `{ "key": "...", "value": "..." }`. Auto-restarts the workspace. |
+| PUT | `/workspaces/:id/secrets` | WorkspaceAuth | Alias for POST (upsert semantics). Auto-restarts the workspace. |
+| DELETE | `/workspaces/:id/secrets/:key` | WorkspaceAuth | Delete a secret by key. Auto-restarts the workspace. |
+| GET | `/workspaces/:id/model` | WorkspaceAuth | Return the model configuration derived from available API keys (which provider keys are set). |
+
+### Global Secrets
+
+| Method | Path | Auth | Description |
+|--------|------|------|-------------|
+| GET | `/settings/secrets` | AdminAuth | List global secrets (keys only, values masked). |
+| PUT | `/settings/secrets` | AdminAuth | Set a global secret `{ "key": "...", "value": "..." }`. Auto-restarts every non-paused/non-removed workspace that does not shadow the key with a workspace-level override. |
+| POST | `/settings/secrets` | AdminAuth | Alias for PUT. |
+| DELETE | `/settings/secrets/:key` | AdminAuth | Delete a global secret. Same auto-restart fan-out as PUT. |
+
+Legacy aliases `GET/POST/DELETE /admin/secrets[/:key]` also exist and behave identically.
+
+---
+
+## Memory
+
+### Key-Value Memory
+
+| Method | Path | Auth | Description |
+|--------|------|------|-------------|
+| GET | `/workspaces/:id/memory` | WorkspaceAuth | List all key-value memory entries for a workspace. |
+| POST | `/workspaces/:id/memory` | WorkspaceAuth | Set a memory entry `{ "key": "...", "value": "..." }`. |
+| DELETE | `/workspaces/:id/memory/:key` | WorkspaceAuth | Delete a memory entry by key. |
+
+### Agent Memories (HMA-scoped)
+
+| Method | Path | Auth | Description |
+|--------|------|------|-------------|
+| GET | `/workspaces/:id/memories` | WorkspaceAuth | List agent memories for a workspace. |
+| POST | `/workspaces/:id/memories` | WorkspaceAuth | Create an agent memory entry. |
+| DELETE | `/workspaces/:id/memories/:id` | WorkspaceAuth | Delete an agent memory by ID. |
+
+---
+
+## Files
+
+Workspace file management. Files are stored in the workspace's config directory.
+
+| Method | Path | Auth | Description |
+|--------|------|------|-------------|
+| GET | `/workspaces/:id/files` | WorkspaceAuth | List files in the workspace config directory. |
+| GET | `/workspaces/:id/files/*path` | WorkspaceAuth | Read a specific file. |
+| PUT | `/workspaces/:id/files/*path` | WorkspaceAuth | Write a file. Creates parent directories as needed. |
+| DELETE | `/workspaces/:id/files/*path` | WorkspaceAuth | Delete a file. |
+| GET | `/workspaces/:id/shared-context` | WorkspaceAuth | Get the shared context files for a workspace (aggregated from parent hierarchy). |
+
+---
+
+## Activity
+
+Activity logging and search for A2A communications, task updates, and agent logs.
+
+| Method | Path | Auth | Description |
+|--------|------|------|-------------|
+| GET | `/workspaces/:id/activity` | WorkspaceAuth | List activity logs for a workspace. Supports `?source=canvas` or `?source=agent` filter. |
+| POST | `/workspaces/:id/activity` | WorkspaceAuth | Log an activity entry (used by workspace runtimes to self-report). |
+| POST | `/workspaces/:id/notify` | WorkspaceAuth | Agent-to-user push message via WebSocket. Delivers a notification to connected Canvas clients. |
+
+### Session Search
+
+| Method | Path | Auth | Description |
+|--------|------|------|-------------|
+| GET | `/workspaces/:id/session-search` | WorkspaceAuth | Search activity logs with filters for type, date range, and text content. Returns paginated results. |
+
+---
+
+## Schedules
+
+Cron-based scheduled tasks per workspace.
+
+| Method | Path | Auth | Description |
+|--------|------|------|-------------|
+| GET | `/workspaces/:id/schedules` | WorkspaceAuth | List all schedules for a workspace. |
+| POST | `/workspaces/:id/schedules` | WorkspaceAuth | Create a schedule. Body: `{ "expression": "0 */6 * * *", "timezone": "UTC", "prompt": "...", "enabled": true }`. |
+| PATCH | `/workspaces/:id/schedules/:scheduleId` | WorkspaceAuth | Update a schedule (expression, timezone, prompt, enabled). |
+| DELETE | `/workspaces/:id/schedules/:scheduleId` | WorkspaceAuth | Delete a schedule. |
+| POST | `/workspaces/:id/schedules/:scheduleId/run` | WorkspaceAuth | Manually trigger a schedule immediately. |
+| GET | `/workspaces/:id/schedules/:scheduleId/history` | WorkspaceAuth | List past runs for a schedule. Includes status (`success`, `error`, `skipped`) and `error_detail`. |
+
+Schedule `source` field: `template` for org/import-seeded schedules, `runtime` for Canvas/API-created. The `last_status` includes `skipped` when the scheduler concurrency-aware-skips a busy workspace.
+
+---
+
+## Channels
+
+Social channel integrations (Telegram, Slack, etc.) for workspace agents.
+
+### Per-Workspace Channels
+
+| Method | Path | Auth | Description |
+|--------|------|------|-------------|
+| GET | `/workspaces/:id/channels` | WorkspaceAuth | List channels for a workspace. |
+| POST | `/workspaces/:id/channels` | WorkspaceAuth | Create a channel. Body includes platform type, JSONB config, and allowlist. |
+| PATCH | `/workspaces/:id/channels/:channelId` | WorkspaceAuth | Update a channel's config or allowlist. |
+| DELETE | `/workspaces/:id/channels/:channelId` | WorkspaceAuth | Delete a channel. |
+| POST | `/workspaces/:id/channels/:channelId/send` | WorkspaceAuth | Send an outbound message through the channel. |
+| POST | `/workspaces/:id/channels/:channelId/test` | WorkspaceAuth | Test the channel connection (send a test message). |
+
+### Global Channel Endpoints
+
+| Method | Path | Auth | Description |
+|--------|------|------|-------------|
+| GET | `/channels/adapters` | None | List available social platform adapters (Telegram, Slack, etc.). |
+| POST | `/channels/discover` | AdminAuth | Auto-detect available chats/groups for a bot token. |
+| POST | `/webhooks/:type` | None | Incoming webhook endpoint for social platforms. The `:type` parameter identifies the platform (e.g., `telegram`, `slack`). |
+
+---
+
+## Plugins
+
+Plugin registry and per-workspace plugin management.
+
+### Global Plugin Registry
+
+| Method | Path | Auth | Description |
+|--------|------|------|-------------|
+| GET | `/plugins` | None | List all plugins in the registry. Supports `?runtime=` filter to show only compatible plugins. |
+| GET | `/plugins/sources` | None | List registered install-source schemes (e.g., `github://`, `local://`). |
+
+### Per-Workspace Plugins
+
+| Method | Path | Auth | Description |
+|--------|------|------|-------------|
+| GET | `/workspaces/:id/plugins` | WorkspaceAuth | List installed plugins for a workspace. |
+| POST | `/workspaces/:id/plugins` | WorkspaceAuth | Install a plugin. Body: `{ "source": "github://org/repo" }`. Safeguards: 64 KiB body limit, 5 min fetch timeout, 100 MiB max staged-tree. |
+| DELETE | `/workspaces/:id/plugins/:name` | WorkspaceAuth | Uninstall a plugin by name. |
+| GET | `/workspaces/:id/plugins/available` | WorkspaceAuth | List plugins available for this workspace (filtered by workspace runtime). |
+| GET | `/workspaces/:id/plugins/compatibility` | WorkspaceAuth | Preflight runtime-change check. Query: `?runtime=X`. Returns which currently-installed plugins would be incompatible with the target runtime. |
+
+---
+
+## Auth Tokens
+
+Bearer token management for workspaces.
+
+| Method | Path | Auth | Description |
+|--------|------|------|-------------|
+| GET | `/workspaces/:id/tokens` | WorkspaceAuth | List active tokens for a workspace (token values are masked). |
+| POST | `/workspaces/:id/tokens` | WorkspaceAuth | Create a new bearer token for the workspace. |
+| DELETE | `/workspaces/:id/tokens/:tokenId` | WorkspaceAuth | Revoke a specific token. |
+
+### Test Token (Development Only)
+
+| Method | Path | Auth | Description |
+|--------|------|------|-------------|
+| GET | `/admin/workspaces/:id/test-token` | None | Mint a fresh bearer token for E2E scripts. Returns 404 unless `MOLECULE_ENV != production` or `MOLECULE_ENABLE_TEST_TOKENS=1`. |
+
+---
+
+## Teams
+
+Expand and collapse team views in the Canvas hierarchy.
+
+| Method | Path | Auth | Description |
+|--------|------|------|-------------|
+| POST | `/workspaces/:id/expand` | WorkspaceAuth | Expand a team workspace to show its children on the canvas. |
+| POST | `/workspaces/:id/collapse` | WorkspaceAuth | Collapse a team workspace to hide its children. |
+
+---
+
+## Templates and Bundles
+
+### Templates
+
+| Method | Path | Auth | Description |
+|--------|------|------|-------------|
+| GET | `/templates` | None | List available workspace templates with their runtime, description, and config schema. |
+| POST | `/templates/import` | AdminAuth | Import a workspace template from a `github://` source URL. |
+
+### Org Templates
+
+| Method | Path | Auth | Description |
+|--------|------|------|-------------|
+| GET | `/org/templates` | None | List available organization templates. |
+| POST | `/org/import` | AdminAuth | Import an org template. Applies `resolveInsideRoot` path sanitization. Creates the full workspace hierarchy defined in `org.yaml`. |
+
+### Bundles
+
+| Method | Path | Auth | Description |
+|--------|------|------|-------------|
+| GET | `/bundles/export/:id` | AdminAuth | Export a workspace (or workspace tree) as a portable bundle. Includes config, secrets (keys only), memory, schedules, and hierarchy. |
+| POST | `/bundles/import` | AdminAuth | Import a previously-exported bundle. Recreates the workspace tree with all associated data. |
+
+---
+
+## Approvals
+
+Human-in-the-loop approval system for agent actions.
+
+| Method | Path | Auth | Description |
+|--------|------|------|-------------|
+| POST | `/workspaces/:id/approvals` | WorkspaceAuth | Create an approval request. Body includes the action description, metadata, and options. |
+| GET | `/workspaces/:id/approvals` | WorkspaceAuth | List approval requests for a workspace. |
+| POST | `/workspaces/:id/approvals/:id/decide` | WorkspaceAuth | Approve or reject an approval request. Body: `{ "decision": "approve" }` or `{ "decision": "reject" }`. |
+| GET | `/approvals/pending` | AdminAuth | List all pending approval requests across all workspaces. |
+
+---
+
+## Canvas
+
+Canvas viewport persistence (cosmetic only).
+
+| Method | Path | Auth | Description |
+|--------|------|------|-------------|
+| GET | `/canvas/viewport` | None | Get the saved canvas viewport (zoom, pan position). Open endpoint for bootstrap-friendliness. |
+| PUT | `/canvas/viewport` | CanvasOrBearer | Save the canvas viewport. Accepts bearer OR matching `Origin` header. Worst case on forgery: viewport corruption, recovered by page refresh. |
+
+---
+
+## Traces
+
+LLM trace retrieval from Langfuse.
+
+| Method | Path | Auth | Description |
+|--------|------|------|-------------|
+| GET | `/workspaces/:id/traces` | WorkspaceAuth | List LLM traces for a workspace from Langfuse. |
+
+---
+
+## Events
+
+Append-only event log for structure changes.
+
+| Method | Path | Auth | Description |
+|--------|------|------|-------------|
+| GET | `/events` | AdminAuth | List all structure events across all workspaces. |
+| GET | `/events/:workspaceId` | AdminAuth | List structure events for a specific workspace. |
+
+---
+
+## Terminal
+
+WebSocket-based terminal access to workspace containers.
+
+| Method | Path | Auth | Description |
+|--------|------|------|-------------|
+| WS | `/workspaces/:id/terminal` | WorkspaceAuth | Open a WebSocket terminal session to the workspace container. Provides interactive shell access. |
+
+---
+
+## WebSocket
+
+Real-time event streaming for Canvas clients.
+
+| Method | Path | Auth | Description |
+|--------|------|------|-------------|
+| WS | `/ws` | None | Connect to the WebSocket hub. Receives all structure events (`WORKSPACE_ONLINE`, `WORKSPACE_OFFLINE`, `HEARTBEAT`, `CONFIG_UPDATED`, `A2A_RESPONSE`, `AGENT_MESSAGE`, etc.). Canvas clients connect here for real-time updates. |
+
+---
+
+## Error Responses
+
+All endpoints return standard HTTP status codes:
+
+| Status | Meaning |
+|--------|---------|
+| 200 | Success |
+| 201 | Created |
+| 400 | Bad request (malformed body, missing required fields) |
+| 401 | Unauthorized (missing or invalid bearer token) |
+| 403 | Forbidden (valid token but insufficient access) |
+| 404 | Not found (workspace, schedule, channel, etc. does not exist) |
+| 409 | Conflict (idempotency key collision on delegation) |
+| 429 | Rate limited (exceeds `RATE_LIMIT` requests/min) |
+| 500 | Internal server error |
+
+Error response body format:
+
+```json
+{
+  "error": "human-readable error message"
+}
+```
+
+---
+
+## Rate Limiting
+
+All endpoints are subject to a global rate limit of `RATE_LIMIT` requests per minute (default: 600). When exceeded, the platform returns `429 Too Many Requests` with a `Retry-After` header.
+
+---
+
+## CORS
+
+The platform sets CORS headers based on the `CORS_ORIGINS` environment variable (comma-separated list, default: `http://localhost:3000,http://localhost:3001`). Preflight (`OPTIONS`) requests are handled automatically by the Gin CORS middleware.
--- a/content/docs/architecture.mdx
+++ b/content/docs/architecture.mdx
@ -1,11 +1,341 @@
 ---
 title: Architecture
-description: Stub page — content coming soon.
+description: System architecture, components, infrastructure, and communication model for the Molecule AI platform.
 ---

-> 🚧 **Coming soon.** The Documentation Specialist agent will populate this
-> page on its next maintenance cycle.
+# Architecture

-If you need this content urgently, open an issue on the
-[docs repo](https://github.com/Molecule-AI/docs/issues/new) and the agent
-will prioritise it on its next cron tick.
+Molecule AI is a platform for orchestrating AI agent workspaces that form an organizational hierarchy. Workspaces register with a central platform, communicate via A2A (Agent-to-Agent) protocol, and are visualized on a drag-and-drop canvas.
+
+## System Overview
+
+```
+Canvas (Next.js :3000) <--WebSocket--> Platform (Go :8080) <--HTTP--> Postgres + Redis
+                                                                        |
+                                        Workspace A <----A2A----> Workspace B
+                                        (Python agents)
+                                              | register/heartbeat |
+                                              +------ Platform ----+
+```
+
+The Canvas provides the visual interface, the Platform acts as the control plane, and Workspaces are isolated containers running AI agent runtimes. All inter-agent communication is mediated by the Platform via the A2A proxy, which enforces hierarchical access control.
+
+---
+
+## Four Main Components
+
+### Canvas
+
+**Stack:** Next.js 15 + React Flow (@xyflow/react v12) + Zustand + Tailwind CSS
+
+The Canvas is the browser-based visual workspace graph. It provides:
+
+- **Drag-and-drop layout** with persistent node positions (saved via `PATCH /workspaces/:id`)
+- **Team nesting** using recursive `TeamMemberChip` components (up to 3 levels deep)
+- **Real-time status** via WebSocket connection to the Platform
+- **Chat interface** with two sub-tabs: "My Chat" (user-to-agent) and "Agent Comms" (agent-to-agent A2A traffic)
+- **Config editor** with "Save & Restart" and "Save" (deferred restart) modes
+- **Secrets management** with auto-restart on POST/DELETE
+
+**State management:**
+
+| Concern | Mechanism |
+|---------|-----------|
+| Initial load | HTTP fetch `GET /workspaces` into Zustand |
+| Real-time updates | WebSocket events via `applyEvent()` |
+| Position persistence | `onNodeDragStop` sends `PATCH /workspaces/:id` with `{x, y}` |
+| Node nesting | `nestNode` sets `hidden: !!targetId`; children render inside parent |
+
+**Environment variables:**
+
+| Variable | Default | Purpose |
+|----------|---------|---------|
+| `NEXT_PUBLIC_PLATFORM_URL` | `http://localhost:8080` | Platform API base URL |
+| `NEXT_PUBLIC_WS_URL` | `ws://localhost:8080/ws` | WebSocket endpoint |
+
+### Platform
+
+**Stack:** Go / Gin
+
+The Platform is the central control plane responsible for:
+
+- **Workspace CRUD** -- create, read, update, delete workspaces
+- **Registry** -- workspace registration, heartbeat tracking, agent card management
+- **Discovery** -- peer lookup, access control checks
+- **WebSocket hub** -- real-time event broadcasting to Canvas clients
+- **Liveness monitoring** -- three-layer container health detection
+- **A2A proxy** -- routes inter-agent messages with hierarchical access control
+- **Docker provisioner** -- container lifecycle management with tier-based resource limits
+- **Scheduler** -- cron-based scheduled tasks per workspace
+- **Channel adapters** -- social integrations (Telegram, Slack, etc.)
+
+**Key environment variables:**
+
+| Variable | Default | Purpose |
+|----------|---------|---------|
+| `DATABASE_URL` | (required) | Postgres connection string |
+| `REDIS_URL` | (required) | Redis connection string |
+| `PORT` | `8080` | Server listen port |
+| `PLATFORM_URL` | `http://host.docker.internal:PORT` | URL passed to agent containers |
+| `SECRETS_ENCRYPTION_KEY` | (optional) | AES-256 key, 32 bytes |
+| `CORS_ORIGINS` | `http://localhost:3000,http://localhost:3001` | Allowed CORS origins |
+| `RATE_LIMIT` | `600` | Requests per minute |
+| `MOLECULE_ENV` | (optional) | Set `production` to hide test endpoints |
+| `MOLECULE_ORG_ID` | (optional) | SaaS tenant org gating |
+| `WORKSPACE_DIR` | (optional) | Global fallback host path for `/workspace` bind-mount |
+| `AWARENESS_URL` | (optional) | Injected into workspace containers for cross-session memory |
+| `ACTIVITY_RETENTION_DAYS` | `7` | How long activity logs are kept |
+| `ACTIVITY_CLEANUP_INTERVAL_HOURS` | `6` | Cleanup sweep interval |
+
+**Workspace tier resource limits:**
+
+| Tier | Env (Memory) | Env (CPU) | Defaults |
+|------|-------------|-----------|----------|
+| Standard (Tier 2) | `TIER2_MEMORY_MB` | `TIER2_CPU_SHARES` | 512 MB / 1 CPU |
+| Privileged (Tier 3) | `TIER3_MEMORY_MB` | `TIER3_CPU_SHARES` | 2048 MB / 2 CPU |
+| Full-host (Tier 4) | `TIER4_MEMORY_MB` | `TIER4_CPU_SHARES` | 4096 MB / 4 CPU |
+
+### Workspace Runtime
+
+**Published as:** [`molecule-ai-workspace-runtime`](https://pypi.org/project/molecule-ai-workspace-runtime/) on PyPI
+
+The shared runtime provides the base agent infrastructure: A2A server, heartbeat loop, config loading, platform auth, plugin system, and built-in tools. Each AI framework adapter lives in its own standalone repository.
+
+| Runtime | Standalone Repo | Key Dependencies |
+|---------|-----------------|------------------|
+| LangGraph | `molecule-ai-workspace-template-langgraph` | langchain-anthropic, langgraph |
+| Claude Code | `molecule-ai-workspace-template-claude-code` | claude-agent-sdk, @anthropic-ai/claude-code |
+| OpenClaw | `molecule-ai-workspace-template-openclaw` | openclaw (npm) |
+| CrewAI | `molecule-ai-workspace-template-crewai` | crewai |
+| AutoGen | `molecule-ai-workspace-template-autogen` | autogen |
+| DeepAgents | `molecule-ai-workspace-template-deepagents` | deepagents |
+| Hermes | `molecule-ai-workspace-template-hermes` | openai, anthropic, google-genai |
+| Gemini CLI | `molecule-ai-workspace-template-gemini-cli` | @google/gemini-cli (npm) |
+
+Each adapter repo has its own `Dockerfile` that installs `molecule-ai-workspace-runtime` from PyPI plus adapter-specific dependencies. Templates are cloned at Docker build time into the platform image via `manifest.json`.
+
+### molecli
+
+**Stack:** Go / Bubbletea + Lipgloss
+
+A terminal UI dashboard for real-time workspace monitoring, event log streaming, health overview, and delete/filter operations. Reads `MOLECLI_URL` (default `http://localhost:8080`) to locate the platform. Now published as a standalone repo at `github.com/Molecule-AI/molecule-cli`.
+
+---
+
+## Infrastructure Services
+
+All services run via `docker-compose.infra.yml`, attached to the shared `molecule-monorepo-net` network. Start them with:
+
+```bash
+./infra/scripts/setup.sh    # Start Postgres, Redis, Langfuse, Temporal; run migrations
+```
+
+### Postgres (port 5432)
+
+Primary datastore for workspaces, events, activity logs, secrets, schedules, channels, and more. Also backs Langfuse and Temporal via separate databases.
+
+Key tables:
+
+| Table | Purpose |
+|-------|---------|
+| `workspaces` | Core entity -- status, runtime, agent_card, heartbeat, current_task |
+| `canvas_layouts` | Persisted x/y positions |
+| `structure_events` | Append-only event log |
+| `activity_logs` | A2A communications, task updates, agent logs, errors |
+| `workspace_schedules` | Cron tasks with expression, timezone, prompt, run history |
+| `workspace_channels` | Social channel integrations with JSONB config |
+| `workspace_secrets` / `global_secrets` | Encrypted secrets storage |
+| `workspace_auth_tokens` | Bearer tokens (auto-revoked on workspace delete) |
+| `agent_memories` | HMA-scoped agent memory |
+| `approvals` | Human-in-the-loop approval requests |
+
+**Migration runner:** On startup, the platform globs `*.sql` in the migrations directory, filters out `.down.sql` files, sorts alphabetically, and executes each. All `.up.sql` files must be idempotent (`CREATE TABLE IF NOT EXISTS`, `ALTER TABLE ... IF NOT EXISTS`).
+
+**JSONB gotcha:** When inserting Go `[]byte` (from `json.Marshal`) into Postgres JSONB columns, you must convert to `string()` first and use `::jsonb` cast in SQL. The `lib/pq` driver treats `[]byte` as `bytea`, not JSONB.
+
+### Redis (port 6379)
+
+Used for pub/sub event broadcasting and heartbeat TTL tracking. Workspace heartbeat keys expire after 60 seconds -- expiry triggers the liveness monitor.
+
+### Langfuse (port 3001)
+
+LLM trace viewer backed by ClickHouse. Provides observability into agent LLM calls, token usage, and latency.
+
+### Temporal (port 7233 gRPC, port 8233 Web UI)
+
+Durable workflow engine for `workspace-template/builtin_tools/temporal_workflow.py`. Dev-only posture: the auto-setup image runs with no auth on `0.0.0.0:7233`. Production deployments must gate access via mTLS or an API key / reverse proxy.
+
+---
+
+## Communication Model
+
+### WebSocket Events Flow
+
+```
+1. Action occurs (register, heartbeat, config change, etc.)
+2. broadcaster.RecordAndBroadcast()
+   -> inserts into structure_events table
+   -> publishes to Redis pub/sub
+3. Redis subscriber relays to WebSocket hub
+4. Hub broadcasts to:
+   - Canvas clients (all events)
+   - Workspace clients (filtered by CanCommunicate)
+```
+
+### A2A Proxy
+
+The A2A proxy (`POST /workspaces/:id/a2a`) routes agent-to-agent messages. The caller identifies itself via the `X-Workspace-ID` header and authenticates with `Authorization: Bearer <token>`.
+
+### Access Control Rules
+
+Determined by `CanCommunicate(callerID, targetID)` in `registry/access.go`:
+
+| Relationship | Allowed |
+|-------------|---------|
+| Same workspace (self-call) | Yes |
+| Siblings (same `parent_id`) | Yes |
+| Root-level siblings (both `parent_id` IS NULL) | Yes |
+| Parent to child / child to parent | Yes |
+| System callers (`webhook:*`, `system:*`, `test:*`) | Yes (bypass) |
+| Canvas requests (no `X-Workspace-ID`) | Yes (bypass) |
+| Everything else | **Denied** |
+
+### Import Cycle Prevention
+
+The platform uses function injection to avoid Go import cycles between `ws`, `registry`, and `events` packages:
+
+- `ws.NewHub(canCommunicate AccessChecker)` -- Hub accepts `registry.CanCommunicate` as a function
+- `registry.StartLivenessMonitor(ctx, onOffline OfflineHandler)` -- Liveness accepts broadcaster callback
+- `registry.StartHealthSweep(ctx, checker ContainerChecker, interval, onOffline)` -- Health sweep accepts Docker checker interface
+- Wiring happens in `platform/cmd/server/main.go` -- init order: `wh -> onWorkspaceOffline -> liveness/healthSweep -> router`
+
+---
+
+## Container Health Detection
+
+Three independent layers detect dead containers (e.g., Docker Desktop crash):
+
+### Layer 1: Passive (Redis TTL)
+
+Each workspace sends heartbeats that set a Redis key with a 60-second TTL. When the key expires, the liveness monitor detects the workspace as offline and triggers an auto-restart.
+
+### Layer 2: Proactive (Health Sweep)
+
+`registry.StartHealthSweep` polls the Docker API every 15 seconds. Catches dead containers faster than waiting for Redis TTL expiry.
+
+### Layer 3: Reactive (A2A Proxy)
+
+When the A2A proxy encounters a connection error to a workspace, it immediately checks `provisioner.IsRunning()`. If the container is dead, it marks the workspace offline and triggers a restart.
+
+All three layers call `onWorkspaceOffline`, which broadcasts `WORKSPACE_OFFLINE` and initiates `wh.RestartByID()`. Redis cleanup uses the shared `db.ClearWorkspaceKeys()` function.
+
+---
+
+## Workspace Lifecycle
+
+```
+provisioning --> online (on register)
+     ^              |
+     |         degraded (error_rate > 0.5)
+     |              |
+     |           online (recovered)
+     |              |
+     |          offline (Redis TTL expired / health sweep)
+     |              |
+     +--- auto-restart ---+
+                    |
+                 removed (deleted)
+
+Any state --> paused (user pauses) --> provisioning (user resumes)
+```
+
+Paused workspaces skip health sweep, liveness monitor, and auto-restart.
+
+**Restart context:** After any restart and successful re-registration, the platform sends a synthetic A2A `message/send` with `metadata.kind=restart_context` containing the restart timestamp, previous session info, and available env-var keys (keys only, never values). The sender uses the `system:restart-context` caller prefix to bypass `CanCommunicate`. If the workspace does not re-register within 30 seconds, the message is dropped.
+
+**Initial prompt:** Agents can auto-execute a prompt on startup before any user interaction. Configure via `initial_prompt` (inline string) or `initial_prompt_file` (path relative to config dir) in `config.yaml`. A `.initial_prompt_done` marker file prevents re-execution on restart.
+
+**Idle loop:** When `idle_prompt` is non-empty in `config.yaml`, the workspace self-sends it every `idle_interval_seconds` (default 600) while `heartbeat.active_tasks == 0`. The idle check is local (no LLM call) and the prompt only fires when the agent is genuinely idle.
+
+---
+
+## Deployment Modes
+
+### Self-Hosted
+
+Run the full stack on your own infrastructure using Docker Compose:
+
+```bash
+# Infrastructure only (Postgres, Redis, Langfuse, Temporal)
+docker compose -f docker-compose.infra.yml up -d
+
+# Full stack
+docker compose up
+```
+
+### SaaS
+
+Hosted at `moleculesai.app` with per-tenant isolation. Each tenant gets a dedicated Fly Machine running the tenant image. The `MOLECULE_ORG_ID` env var gates API access -- every non-allowlisted request must carry a matching `X-Molecule-Org-Id` header or gets a 404. When unset, the guard is a passthrough so self-hosted and dev environments are unaffected.
+
+### Tenant Image
+
+`platform/Dockerfile.tenant` bundles the Go platform + Canvas frontend + templates into a single container image, published to `ghcr.io/molecule-ai/platform:latest` and `:sha-<short>`.
+
+---
+
+## Subdomain Architecture
+
+| Subdomain | Service | Purpose |
+|-----------|---------|---------|
+| `moleculesai.app` | Landing page | Marketing site |
+| `app.moleculesai.app` | SaaS dashboard | Tenant management UI |
+| `api.moleculesai.app` | Control plane API | Platform REST + WebSocket |
+| `doc.moleculesai.app` | Documentation | This documentation site |
+| `status.moleculesai.app` | Status page | Uptime and incident tracking |
+| `*.moleculesai.app` | Tenant instances | Per-org isolated platform instances |
+
+---
+
+## Plugin System
+
+Plugins extend workspace capabilities. Two categories exist:
+
+**Shared plugins** (auto-loaded by every workspace):
+
+- **molecule-dev** -- codebase conventions + review-loop skill
+- **superpowers** -- verification, TDD, systematic debugging, writing plans
+- **ecc** -- general Claude Code guardrails
+- **browser-automation** -- Puppeteer/CDP web scraping and live canvas screenshots
+
+**Modular guardrails** (opt-in per workspace):
+
+- **Hook plugins** (ambient enforcement): `molecule-careful-bash`, `molecule-freeze-scope`, `molecule-audit-trail`, `molecule-session-context`, `molecule-prompt-watchdog`
+- **Skill plugins** (on-demand): `molecule-skill-code-review`, `molecule-skill-cross-vendor-review`, `molecule-skill-llm-judge`, `molecule-skill-update-docs`, `molecule-skill-cron-learnings`
+- **Workflow plugins** (slash commands): `molecule-workflow-triage`, `molecule-workflow-retro`
+
+**Org-template plugin resolution:** Per-workspace `plugins:` lists in org template `org.yaml` role overrides UNION with `defaults.plugins` (deduplicated, defaults first). To opt a specific default out for a given role, prefix the plugin name with `!` or `-` (e.g. `!browser-automation`).
+
+Plugin install safeguards:
+
+| Parameter | Default | Purpose |
+|-----------|---------|---------|
+| `PLUGIN_INSTALL_BODY_MAX_BYTES` | 65536 (64 KiB) | Max request body size |
+| `PLUGIN_INSTALL_FETCH_TIMEOUT` | 5m | Whole fetch+copy deadline |
+| `PLUGIN_INSTALL_MAX_DIR_BYTES` | 104857600 (100 MiB) | Max staged-tree size |
+
+---
+
+## CI Pipeline
+
+GitHub Actions runs on push to main and on pull requests:
+
+| Job | What it does |
+|-----|-------------|
+| `platform-build` | Go build, vet, `go test -race` with 25% coverage threshold |
+| `canvas-build` | npm build, vitest run (tests must exist and pass) |
+| `python-lint` | pytest with coverage for workspace-template |
+| `e2e-api` | Spins up Postgres + Redis, runs 62 API tests against locally-built binary |
+| `shellcheck` | Lints all E2E shell scripts |
+| `publish-platform-image` | Builds and pushes to `ghcr.io/molecule-ai/platform` (main only) |
+
+Standalone repos (plugins + templates) use reusable workflows from `Molecule-AI/molecule-ci` for schema validation, secrets scanning, and Docker build smoke tests.
--- a/content/docs/channels.mdx
+++ b/content/docs/channels.mdx
@ -1,11 +1,259 @@
 ---
 title: Channels
-description: Stub page — content coming soon.
+description: Connect workspaces to Telegram, Slack, and Lark/Feishu for social integrations.
 ---

-> 🚧 **Coming soon.** The Documentation Specialist agent will populate this
-> page on its next maintenance cycle.
+## Overview

-If you need this content urgently, open an issue on the
-[docs repo](https://github.com/Molecule-AI/docs/issues/new) and the agent
-will prioritise it on its next cron tick.
+Channels let workspaces send and receive messages on social platforms. Each
+workspace can have multiple channel integrations — a Telegram bot, a Slack
+webhook, a Lark/Feishu Custom Bot — configured independently with per-channel
+allowlists and JSONB config.
+
+Outbound messages flow from the workspace through the platform adapter to the
+social platform. Inbound messages arrive via webhooks (`POST /webhooks/:type`),
+are parsed by the adapter, and forwarded to the workspace as A2A
+`message/send` requests.
+
+```
+User (Telegram/Slack/Lark) ──webhook──> Platform ──A2A──> Workspace Agent
+                                              <──adapter──  (response)
+User <──bot message──────────────────────────────────────/
+```
+
+---
+
+## Adapters
+
+Three adapters are registered out of the box. Use `GET /channels/adapters` to
+list them at runtime.
+
+### Telegram
+
+Uses the Telegram Bot API. Supports both long-polling (for inbound) and direct
+API calls (for outbound). The adapter caches `BotAPI` instances to avoid
+repeated `getMe` calls.
+
+**Required config fields:**
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `bot_token` | string | Telegram bot token (`123456789:ABCdef...`). Validated against a strict regex. |
+| `chat_id` | string | Comma-separated chat IDs to listen on and send to. |
+
+**Features:**
+
+- Long-polling with 30s timeout and 2s retry interval
+- Auto-reply to `/start` with the chat ID (useful for setup)
+- Bot commands: `/start`, `/help`, `/reset` (clear history), `/cancel` (best-effort)
+- Long messages automatically split at paragraph/line/word boundaries (4096 char limit)
+- Typing indicator sent while the agent processes
+- Rate-limit handling with `retry_after` backoff
+- Auto-discovers chats via `getUpdates` (including `my_chat_member` events for group adds)
+- Auto-disables the channel when the bot is kicked from a chat
+
+### Slack
+
+Uses Slack Incoming Webhooks for outbound and the Slack Events API for inbound.
+
+**Required config fields:**
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `webhook_url` | string | Slack Incoming Webhook URL (must start with `https://hooks.slack.com/`). |
+
+**Features:**
+
+- Outbound via Incoming Webhook (no OAuth required)
+- Inbound via Events API JSON payload or slash command (URL-encoded form)
+- `url_verification` challenge handshake supported
+- Slash commands prepend the command name so the agent sees the full invocation
+
+### Lark / Feishu
+
+Outbound via Custom Bot webhooks, inbound via Event Subscriptions.
+
+**Required config fields:**
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `webhook_url` | string | Custom Bot webhook URL. Must start with `https://open.feishu.cn/open-apis/bot/v2/hook/` or `https://open.larksuite.com/open-apis/bot/v2/hook/`. |
+
+**Optional config fields:**
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `verify_token` | string | Verification Token from the app's Event Subscriptions page. When set, inbound events with a mismatching token are rejected. |
+
+**Features:**
+
+- Both China (`open.feishu.cn`) and international (`open.larksuite.com`) endpoints supported
+- `url_verification` handshake with constant-time `verify_token` comparison
+- v2 event payload parsing (`im.message.receive_v1`)
+- Token verification on both `url_verification` and `event_callback` payloads
+- Application-level error codes checked (Lark returns HTTP 200 even for app errors)
+
+---
+
+## Setup Flow
+
+### 1. Create a Channel
+
+```bash
+curl -X POST http://localhost:8080/workspaces/{id}/channels \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer {token}" \
+  -d '{
+    "type": "telegram",
+    "config": {
+      "bot_token": "123456789:ABCdefGHIjklmnopQRSTuvwxyz",
+      "chat_id": "-1001234567890"
+    }
+  }'
+```
+
+### 2. Test the Connection
+
+```bash
+curl -X POST http://localhost:8080/workspaces/{id}/channels/{channelId}/test \
+  -H "Authorization: Bearer {token}"
+```
+
+### 3. Send a Message
+
+```bash
+curl -X POST http://localhost:8080/workspaces/{id}/channels/{channelId}/send \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer {token}" \
+  -d '{"text": "Hello from the agent!"}'
+```
+
+---
+
+## Inbound Webhooks
+
+Register your platform's public URL as the webhook endpoint for each social
+platform. Inbound messages arrive at:
+
+```
+POST /webhooks/:type
+```
+
+where `:type` is `telegram`, `slack`, or `lark`. The platform:
+
+1. Looks up all channels of that type
+2. Calls the adapter's `ParseWebhook` to extract a standardized `InboundMessage`
+3. Checks the allowlist (if configured)
+4. Forwards the message to the workspace via A2A `message/send`
+
+For Telegram, the platform can also use long-polling instead of webhooks,
+started automatically when a Telegram channel is created.
+
+---
+
+## Discover Chats
+
+Auto-detect available chats for a bot token before creating a channel:
+
+```bash
+curl -X POST http://localhost:8080/channels/discover \
+  -H "Content-Type: application/json" \
+  -d '{"type": "telegram", "bot_token": "123456789:ABCdef..."}'
+```
+
+Returns the bot username, discovered chats (with IDs, names, and types), and
+whether the bot can read all group messages (Telegram privacy mode).
+
+---
+
+## Allowlists
+
+Each channel row has an `allowed_users` JSONB array. When non-empty, only
+messages from users whose IDs appear in the list are forwarded to the workspace.
+All others are silently dropped.
+
+---
+
+## Config Encryption
+
+Sensitive config fields (like `bot_token`) are encrypted at rest. The `List`
+endpoint decrypts them server-side and masks tokens in the response
+(showing only the first 4 and last 4 characters).
+
+---
+
+## API Reference
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/channels/adapters` | List available adapter types |
+| POST | `/channels/discover` | Auto-detect chats for a bot token |
+| GET | `/workspaces/:id/channels` | List channels for a workspace |
+| POST | `/workspaces/:id/channels` | Add a channel |
+| PATCH | `/workspaces/:id/channels/:channelId` | Update a channel |
+| DELETE | `/workspaces/:id/channels/:channelId` | Remove a channel |
+| POST | `/workspaces/:id/channels/:channelId/test` | Test connection |
+| POST | `/workspaces/:id/channels/:channelId/send` | Send outbound message |
+| POST | `/webhooks/:type` | Incoming social webhook |
+
+---
+
+## Example Configs
+
+### Telegram
+
+```json
+{
+  "type": "telegram",
+  "config": {
+    "bot_token": "123456789:ABCdefGHIjklmnopQRSTuvwxyz_1234",
+    "chat_id": "-1001234567890"
+  }
+}
+```
+
+Multiple chats (comma-separated):
+
+```json
+{
+  "type": "telegram",
+  "config": {
+    "bot_token": "123456789:ABCdefGHIjklmnopQRSTuvwxyz_1234",
+    "chat_id": "-1001234567890, -1009876543210"
+  }
+}
+```
+
+### Slack
+
+```json
+{
+  "type": "slack",
+  "config": {
+    "webhook_url": "https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
+  }
+}
+```
+
+### Lark / Feishu
+
+```json
+{
+  "type": "lark",
+  "config": {
+    "webhook_url": "https://open.larksuite.com/open-apis/bot/v2/hook/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
+    "verify_token": "your-verification-token"
+  }
+}
+```
+
+China endpoint:
+
+```json
+{
+  "type": "lark",
+  "config": {
+    "webhook_url": "https://open.feishu.cn/open-apis/bot/v2/hook/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
+  }
+}
+```
--- a/content/docs/concepts.mdx
+++ b/content/docs/concepts.mdx
@ -1,10 +1,8 @@
 ---
 title: Concepts
-description: The five primitives that compose every Molecule AI org — workspaces, plugins, channels, schedules, and the canvas.
+description: The core primitives that compose every Molecule AI org — workspaces, plugins, channels, schedules, tokens, external agents, and the canvas.
 ---

-If you understand these five concepts, you understand the whole platform.
-
 ## Workspaces

 A **workspace** is a real Docker container running a real LLM agent. Each
@ -12,16 +10,31 @@ workspace has:

 - A **role** (a one-line job description fed into its system prompt)
 - An **initial prompt** (run once at first boot — typically clone repo,
-  read CLAUDE.md, memorise context)
- A **runtime** (`claude-code`, `langgraph`, `crewai`, `autogen`, etc.)
- A **tier** (resource budget — memory and CPU caps)
+  read docs, memorise context)
+- A **runtime** (`claude-code`, `langgraph`, `crewai`, `autogen`, `deepagents`,
+  `openclaw`, `hermes`, `gemini-cli`)
+- A **tier** (resource budget — T1 sandboxed, T2 standard, T3 privileged, T4 full-host)
 - An optional **parent** (forms the org tree)
 - An optional **workspace_dir** (a host path bind-mounted into the
  container — gives the agent direct access to your codebase)

-Workspaces talk to each other via A2A (agent-to-agent) messages, routed
-by the platform. Every message becomes an edge on the canvas in real
-time.
+Workspaces talk to each other via **A2A** (agent-to-agent) messages, routed
+by the platform. Communication rules: same workspace, siblings, and
+parent/child are allowed; everything else is denied.
+
+## External agents
+
+An **external agent** is a workspace with `runtime: external` — it runs on
+your own infrastructure instead of the platform's Docker network. External
+agents:
+
+- Register via `POST /registry/register` and receive a bearer token
+- Send heartbeats every 30 seconds to stay online
+- Accept A2A messages at their registered URL
+- Appear on the canvas with a purple **REMOTE** badge
+- Skip Docker health sweep (liveness is heartbeat-only)
+
+See [External Agents](/docs/external-agents) for the full registration guide.

 ## Plugins

@ -33,30 +46,56 @@ A **plugin** is a bundle of capabilities a workspace can install:
  review, LLM-as-judge gates
 - **Slash commands** — `/triage`, `/retro`, etc.
 - **MCP servers** — bring in tools the model can call
- **Builtin tools** — Python/JS extensions exposed to the agent

-Plugins compose. Per-workspace plugin lists UNION with the org-wide
-defaults — so adding one extra capability to one role doesn't require
-re-listing every default.
+Plugins have two axes: **source** (where to fetch — `local://`, `github://`)
+and **shape** (what's inside — agentskills.io format, MCP server, etc.).
+
+Plugins compose. Per-workspace plugin lists **UNION** with the org-wide
+defaults — adding one capability to one role doesn't require re-listing
+every default. Use `!plugin-name` to opt a specific default out.
+
+See [Plugins](/docs/plugins) for the full guide.

 ## Channels

-A **channel** wires a workspace to an external messaging surface:
-Telegram, Slack, Discord, email, webhooks. Once connected, the user can
-talk to the agent from outside the canvas — and the agent can broadcast
-back.
+A **channel** wires a workspace to an external messaging platform:
+
+| Adapter | Platform | Config |
+|---------|----------|--------|
+| `telegram` | Telegram | Bot token + chat_id allowlist |
+| `slack` | Slack | Workspace token + channel |
+| `lark` | Lark / Feishu | Custom Bot webhook + Event Subscriptions |
+
+Once connected, users can talk to agents from outside the canvas — and
+agents can broadcast back. Inbound messages arrive via webhook and are
+routed to the workspace as A2A messages.
+
+See [Channels](/docs/channels) for setup instructions.

 ## Schedules

 A **schedule** is a cron-driven recurring prompt. Each tick fires an A2A
-message into the workspace, which the agent treats as a new task. Every
-running schedule is supervised — panics in the dispatch path are
-recovered with exponential backoff, and a liveness watchdog surfaces
-stuck subsystems via `/admin/liveness`.
+message into the workspace, which the agent treats as a new task. Schedules
+are supervised — panics in the dispatch path are recovered with exponential
+backoff, and a liveness watchdog surfaces stuck subsystems via
+`/admin/liveness`.

-Schedules let you build the *evolution* loop alongside the *review*
-loop: hourly security audits, daily ecosystem watches, weekly plugin
-curation, etc.
+Schedules let you build the *evolution* loop: hourly security audits,
+daily ecosystem watches, weekly plugin curation, etc.
+
+See [Schedules](/docs/schedules) for the full guide.
+
+## Tokens
+
+**Bearer tokens** authenticate agents and API clients. Each token is
+scoped to a single workspace — a token from workspace A cannot access
+workspace B.
+
+- Issued on first registration (`POST /registry/register`)
+- Create/list/revoke via `GET/POST/DELETE /workspaces/:id/tokens`
+- 256-bit entropy, sha256-hashed in DB, plaintext shown once
+
+See [Token Management](/docs/tokens) for the full guide.

 ## The canvas

@ -66,18 +105,19 @@ write, every scheduled fire, every status change pushes a WebSocket
 event in real time.

 The canvas isn't just a viewer — it's the operator surface. Drag nodes
-to reorganise, click to chat, watch the team work.
+to reorganise teams, click to chat, right-click for actions, watch the
+team work in real time.

 ## How they fit together

-A typical org definition looks like:
+A typical org definition:

 ```yaml
 org_name: My Team
 defaults:
  runtime: claude-code
  tier: 2
-  plugins: [ecc, molecule-dev, superpowers, molecule-careful-bash, ...]
+  plugins: [ecc, molecule-dev, superpowers, molecule-careful-bash]
  category_routing:
    security: [Backend Engineer]
    ui: [Frontend Engineer]
@ -85,10 +125,10 @@ defaults:
 workspaces:
  - name: PM
    canvas: { x: 400, y: 50 }
-    plugins: [molecule-workflow-triage, molecule-workflow-retro]
+    plugins: [molecule-workflow-triage]
    channels:
      - type: telegram
-        config: { ... }
+        config: { bot_token: "${TELEGRAM_BOT_TOKEN}", chat_id: "12345" }
    children:
      - name: Dev Lead
        children:
@ -100,6 +140,13 @@ workspaces:
                prompt: "Run npm run typecheck and report any new errors..."
 ```

-That's the entire mental model. Templates → plugins → channels →
-schedules → canvas. Everything else in the docs is depth on one of
-these five.
+That's the mental model. Templates → plugins → channels → schedules →
+tokens → canvas. Everything else in the docs is depth on one of these
+primitives.
+
+## MCP integration
+
+Any MCP-compatible AI agent can manage Molecule AI workspaces using the
+[MCP Server](/docs/mcp-server) — 87 tools covering workspace CRUD,
+communication, secrets, memory, files, schedules, channels, plugins,
+and more. Install via `npx @molecule-ai/mcp-server`.
--- a/content/docs/index.mdx
+++ b/content/docs/index.mdx
@ -9,6 +9,16 @@ multi-agent organisations. You define your team in one YAML file
 talk on, schedule their recurring work — and the platform takes care of the
 rest.

+## Try it now
+
+| | |
+|---|---|
+| **Dashboard** | [app.moleculesai.app](https://app.moleculesai.app) — create orgs, deploy agents |
+| **API** | [api.moleculesai.app](https://api.moleculesai.app) — control plane REST API |
+| **Documentation** | [doc.moleculesai.app](https://doc.moleculesai.app) — you are here |
+| **Status** | [status.moleculesai.app](https://status.moleculesai.app) — uptime monitoring |
+| **Self-host** | [Self-Hosting Guide](/docs/self-hosting) — run on your own infrastructure |
+
 ## What you can build

 - **Self-running engineering teams** — PM, Dev Lead, frontend / backend / devops
@ -19,23 +29,47 @@ rest.
  shared memory.
 - **Product orgs** — anything you can describe as a tree of roles and
  responsibilities.
+- **Hybrid teams** — mix cloud-hosted agents with [external agents](/docs/external-agents)
+  running on your own infrastructure, edge devices, or other clouds.

 ## How it works

 1. **Templates.** Describe your org as a YAML tree of workspaces. Each workspace
-  is a real container running an LLM agent. Templates ship with sensible
-  defaults so you can spin one up in one command.
+   is a real container running an LLM agent. Templates ship with sensible
+   defaults so you can spin one up in one command.
 2. **Plugins.** Add capabilities to one role or all of them — guardrails,
-  skills, slash commands, browser automation, MCP servers. Plugins compose;
-  per-role overrides UNION with the defaults.
-3. **Channels.** Connect any role to Telegram, Slack, email, or webhooks so
-  the user can talk to it directly.
-4. **Schedules.** Define recurring work in cron syntax. The runtime fires the
-  prompt at the scheduled time, supervised against panics with a liveness
-  watchdog so a single bad input can't silently kill the loop.
-5. **The canvas.** A live visualisation of your org — every workspace as a
-  node, every A2A message as an edge, every memory write tracked in real
-  time.
+   skills, slash commands, browser automation, MCP servers. Plugins compose;
+   per-role overrides UNION with the defaults.
+3. **Channels.** Connect any role to [Telegram, Slack, or Lark/Feishu](/docs/channels)
+   so users can talk to agents directly from their existing tools.
+4. **Schedules.** Define [recurring work](/docs/schedules) in cron syntax. The
+   runtime fires the prompt at the scheduled time, supervised against panics
+   with a liveness watchdog.
+5. **Tokens.** Generate [API tokens](/docs/tokens) per workspace for secure
+   authentication. Rotate, revoke, and audit from the dashboard or API.
+6. **The canvas.** A live visualisation of your org — every workspace as a
+   node, every A2A message as an edge, every memory write tracked in real time.
+
+## Eight runtime adapters
+
+| Runtime | Description |
+|---------|-------------|
+| Claude Code | Anthropic Claude with code execution |
+| LangGraph | LangChain ReAct agent with tools |
+| OpenClaw | Multi-file prompt system with SOUL |
+| CrewAI | Role-based agent with task delegation |
+| AutoGen | Microsoft conversable agents |
+| DeepAgents | Deep research with planning |
+| Hermes | NousResearch Hermes-3 multi-provider |
+| Gemini CLI | Google Gemini CLI workspace |
+
+## Integrate with everything
+
+- **[MCP Server](/docs/mcp-server)** — 87 tools for managing Molecule AI from any
+  MCP-compatible AI agent (Claude Code, Cursor, etc.)
+- **[Python SDK](https://pypi.org/project/molecule-ai-sdk)** — `pip install molecule-ai-sdk`
+- **[External Agents](/docs/external-agents)** — register any HTTP agent as a
+  first-class workspace

 ## Where to next

@ -43,10 +77,7 @@ rest.
  agent in under five minutes.
 - Want the architecture tour? Start with [Concepts](/docs/concepts) and
  [Architecture](/docs/architecture).
- Ready to build your own org? Jump straight to
-  [Org Templates](/docs/org-template).
-
-> This documentation is maintained automatically by the
-> Documentation Specialist agent in our own dogfood org. Every PR to the
-> platform repo triggers a docs sync. Spot something stale? Open an issue or
-> a PR — those signals reach the agent on its next cron tick.
+- Ready to build your own org? Jump to [Org Templates](/docs/org-template).
+- Want to connect your own agent? See [External Agents](/docs/external-agents).
+- Need API access? Check [Token Management](/docs/tokens) and the
+  [API Reference](/docs/api-reference).
--- a/content/docs/meta.json
+++ b/content/docs/meta.json
@ -4,15 +4,15 @@
    "index",
    "quickstart",
    "concepts",
+    "architecture",
    "org-template",
    "plugins",
    "channels",
    "schedules",
    "external-agents",
-    "architecture",
+    "tokens",
    "api-reference",
    "mcp-server",
-    "tokens",
    "self-hosting",
    "observability",
    "troubleshooting"
--- a/content/docs/observability.mdx
+++ b/content/docs/observability.mdx
@ -1,11 +1,141 @@
 ---
 title: Observability
-description: Stub page — content coming soon.
+description: Monitor agent activity, LLM traces, and platform health.
 ---

-> 🚧 **Coming soon.** The Documentation Specialist agent will populate this
-> page on its next maintenance cycle.
+## Overview

-If you need this content urgently, open an issue on the
-[docs repo](https://github.com/Molecule-AI/docs/issues/new) and the agent
-will prioritise it on its next cron tick.
+Molecule AI provides multiple layers of observability -- from real-time WebSocket events on the canvas to structured activity logs, LLM traces, Prometheus metrics, and admin health endpoints.
+
+## Activity Logs
+
+Every significant action in the platform is recorded in the `activity_logs` table. Query logs for a specific workspace:
+
+```
+GET /workspaces/:id/activity
+```
+
+Activity types include:
+
+- **A2A communications** -- request/response capture with duration and method
+- **Task updates** -- agent-reported task status changes
+- **Agent logs** -- structured log entries from workspace runtimes
+- **Errors** -- failures with `error_detail` for debugging
+
+Filter by source to separate user-agent chat (`source=canvas`) from agent-to-agent traffic (`source=agent`).
+
+Activity logs are automatically cleaned up based on `ACTIVITY_RETENTION_DAYS` (default 7). The cleanup job runs every `ACTIVITY_CLEANUP_INTERVAL_HOURS` (default 6).
+
+## LLM Traces
+
+Molecule AI integrates with [Langfuse](https://langfuse.com) for LLM observability. Langfuse runs as part of the infrastructure stack on port 3001, backed by ClickHouse for efficient trace storage.
+
+View traces for a specific workspace:
+
+```
+GET /workspaces/:id/traces
+```
+
+The Langfuse UI at `http://localhost:3001` provides:
+
+- Token usage and cost tracking per workspace
+- Latency breakdowns for LLM calls
+- Prompt/completion pairs for debugging
+- Trace timelines showing multi-step agent reasoning
+
+## Prometheus Metrics
+
+The platform exposes Prometheus-format metrics at:
+
+```
+GET /metrics
+```
+
+This endpoint requires no authentication and is safe to scrape. Metrics are in Prometheus text format (v0.0.4) and include:
+
+- Request counts by method, path, and status code
+- Request latency histograms
+- Active WebSocket connections
+- Workspace status counts
+
+Configure your Prometheus instance to scrape `http://localhost:8080/metrics` at your preferred interval.
+
+## Admin Liveness
+
+The liveness endpoint reports the health of every supervised subsystem:
+
+```
+GET /admin/liveness
+```
+
+This endpoint requires `AdminAuth` (bearer token). It returns a `supervised.Snapshot()` for each subsystem with ages -- how long since each subsystem last reported healthy. Use this to debug stuck schedulers, stalled heartbeat goroutines, or unresponsive health sweeps before diving into logs.
+
+## WebSocket Events
+
+The canvas receives real-time updates via WebSocket at `/ws`. Every state change in the platform is broadcast to connected clients:
+
+| Event | Trigger |
+|-------|---------|
+| `WORKSPACE_ONLINE` | Workspace registers successfully |
+| `WORKSPACE_OFFLINE` | Heartbeat TTL expires or health sweep detects dead container |
+| `WORKSPACE_DEGRADED` | Error rate exceeds threshold |
+| `WORKSPACE_RECOVERED` | Error rate drops back to normal |
+| `WORKSPACE_REMOVED` | Workspace deleted |
+| `HEARTBEAT` | Periodic heartbeat from workspace |
+| `A2A_RESPONSE` | Agent-to-agent message received |
+| `AGENT_MESSAGE` | Agent pushes a message to the user |
+
+Events flow through Redis pub/sub to ensure all platform instances broadcast consistently.
+
+## Structure Events
+
+The `structure_events` table is an append-only audit log of every structural change in the platform. Each event is:
+
+1. Inserted into the database via `broadcaster.RecordAndBroadcast()`
+2. Published to Redis pub/sub
+3. Relayed to WebSocket clients
+
+Query events for a specific workspace or globally:
+
+```
+GET /events/:workspaceId    # Workspace-specific
+GET /events                 # All events
+```
+
+Both endpoints require `AdminAuth`.
+
+## Session Search
+
+Search through chat history for a workspace:
+
+```
+GET /workspaces/:id/session-search?q=deployment+error
+```
+
+This searches across both user-agent conversations and agent-to-agent A2A traffic stored in the activity logs.
+
+## Current Task Visibility
+
+Each workspace reports its current task via heartbeat. This is visible in two places:
+
+- **Canvas node** -- the workspace card on the canvas shows the current task text
+- **Heartbeat data** -- `GET /registry/discover/:id` includes `current_task` in the workspace info
+
+When `active_tasks` drops to zero, the current task field clears and the idle loop (if configured) begins its countdown.
+
+## Schedule Run History
+
+For workspaces with cron schedules, inspect past runs:
+
+```
+GET /workspaces/:id/schedules/:scheduleId/history
+```
+
+Each history entry includes:
+
+- Execution timestamp
+- Status (`success`, `failed`, `skipped`)
+- Duration
+- `error_detail` when the run failed (populated by `scheduler.fireSchedule`)
+
+A status of `skipped` means the workspace was busy (active tasks > 0) when the schedule fired and the concurrency-aware scheduler chose not to queue the prompt.
--- a/content/docs/org-template.mdx
+++ b/content/docs/org-template.mdx
@ -1,11 +1,166 @@
 ---
-title: Org Template
-description: Stub page — content coming soon.
+title: Org Templates
+description: Deploy entire multi-workspace organizations from a single YAML file.
 ---

-> 🚧 **Coming soon.** The Documentation Specialist agent will populate this
-> page on its next maintenance cycle.
+## Overview

-If you need this content urgently, open an issue on the
-[docs repo](https://github.com/Molecule-AI/docs/issues/new) and the agent
-will prioritise it on its next cron tick.
+Org templates let you define an entire agent organization -- hierarchy of workspaces with roles, configurations, and relationships -- in a single YAML file. Import one template and the platform provisions every workspace, wires parent-child relationships, seeds schedules, and installs plugins automatically.
+
+## YAML Structure
+
+A minimal org template looks like this:
+
+```yaml
+org_name: molecule-dev
+
+defaults:
+  runtime: claude-code
+  tier: 2
+  plugins:
+    - molecule-dev
+    - molecule-careful-bash
+
+workspaces:
+  pm:
+    name: Project Manager
+    role: PM
+    tier: 3
+    children:
+      dev-lead:
+        name: Dev Lead
+        children:
+          backend:
+            name: Backend Engineer
+          frontend:
+            name: Frontend Engineer
+        marketing:
+          name: Marketing Specialist
+          runtime: langgraph
+```
+
+The `workspaces` map defines the hierarchy. Each key becomes the workspace's slug. Nesting under `children` sets the parent-child relationship automatically.
+
+## Workspace Fields
+
+Each workspace entry supports the following fields:
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `name` | string | Display name shown on the canvas |
+| `role` | string | Agent role (e.g. PM, Engineer, Researcher) |
+| `runtime` | string | Runtime adapter (`claude-code`, `langgraph`, `crewai`, etc.) |
+| `tier` | integer | Resource tier (2 = Standard, 3 = Privileged, 4 = Full-host) |
+| `workspace_dir` | string | Host path for `/workspace` bind-mount |
+| `plugins` | list | Plugins to install on this workspace |
+| `initial_prompt` | string | Prompt auto-executed after A2A server is ready |
+| `idle_prompt` | string | Prompt fired periodically while workspace is idle |
+| `idle_interval_seconds` | integer | Interval for idle prompt (default 600, minimum 60) |
+| `channels` | list | Social channel integrations (Telegram, Slack, etc.) |
+| `schedules` | list | Cron schedules seeded on import |
+| `x` | number | Canvas X coordinate |
+| `y` | number | Canvas Y coordinate |
+| `children` | map | Nested child workspaces |
+
+## Defaults Layer
+
+The `defaults` block sets baseline values for every workspace in the template. Per-workspace fields override defaults when specified.
+
+**Plugin merging is additive.** Per-workspace `plugins` lists UNION with `defaults.plugins` (deduplicated, defaults first) -- they do not replace them. To opt a specific default plugin out for a given workspace, prefix the plugin name with `!` or `-`:
+
+```yaml
+defaults:
+  plugins:
+    - molecule-dev
+    - molecule-careful-bash
+    - browser-automation
+
+workspaces:
+  backend:
+    name: Backend Engineer
+    plugins:
+      - molecule-skill-code-review    # added
+      - "!browser-automation"         # opted out of default
+```
+
+In this example, the backend workspace gets `molecule-dev`, `molecule-careful-bash`, and `molecule-skill-code-review` -- but not `browser-automation`.
+
+## Template Registry
+
+Five org templates live in standalone repos under the `Molecule-AI` GitHub organization:
+
+| Template | Repo |
+|----------|------|
+| molecule-dev | `Molecule-AI/molecule-ai-org-template-molecule-dev` |
+| marketing-team | `Molecule-AI/molecule-ai-org-template-marketing-team` |
+| research-lab | `Molecule-AI/molecule-ai-org-template-research-lab` |
+| startup-mvp | `Molecule-AI/molecule-ai-org-template-startup-mvp` |
+| enterprise-ops | `Molecule-AI/molecule-ai-org-template-enterprise-ops` |
+
+These are cloned into the platform image at Docker build time and registered in the `template_registry` database table.
+
+## Importing an Org Template
+
+### Via API
+
+```bash
+curl -X POST http://localhost:8080/org/import \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer $TOKEN" \
+  -d '{"dir": "molecule-dev"}'
+```
+
+The `POST /org/import` endpoint requires `AdminAuth` (bearer token). The `dir` field references a template directory name from the registry.
+
+### Via Canvas
+
+Open the template browser in the canvas sidebar and select an org template. The UI calls the same API endpoint.
+
+## Initial Prompts
+
+Workspaces can auto-execute a prompt on startup before any user interaction. Set `initial_prompt` as an inline string or point `initial_prompt_file` to a path relative to the config directory.
+
+After the A2A server is ready, the runtime sends the prompt as a `message/send` to itself. A `.initial_prompt_done` marker file prevents re-execution on restart.
+
+**Important:** Initial prompts must NOT send A2A messages (`delegate_task`, `send_message_to_user`) because other agents may not be ready yet. Keep them local: clone a repo, read docs, save to memory, wait for tasks.
+
+Org templates support `initial_prompt` on both `defaults` (all agents) and per-workspace (overrides default).
+
+## Idle Loop
+
+The idle loop is an opt-in pattern for workspaces that should do periodic background work when they have no active tasks.
+
+When `idle_prompt` is non-empty in the workspace config, the runtime self-sends the prompt every `idle_interval_seconds` (default 600) while `heartbeat.active_tasks == 0`. The fire timeout clamps to `max(60, min(300, idle_interval_seconds))`.
+
+Set per-workspace or as an org template default:
+
+```yaml
+defaults:
+  idle_prompt: "Check for new issues and update your task list."
+  idle_interval_seconds: 300
+```
+
+The idle check is local (no LLM call) and the prompt only fires when there is genuinely nothing to do, so cost collapses to event-driven.
+
+## Canvas Positioning
+
+Use `x` and `y` fields to control where workspaces appear on the drag-and-drop canvas after import:
+
+```yaml
+workspaces:
+  pm:
+    name: Project Manager
+    x: 400
+    y: 100
+    children:
+      dev:
+        name: Developer
+        x: 200
+        y: 300
+      researcher:
+        name: Researcher
+        x: 600
+        y: 300
+```
+
+If coordinates are omitted, the canvas auto-layouts new workspaces.
--- a/content/docs/plugins.mdx
+++ b/content/docs/plugins.mdx
@ -1,11 +1,267 @@
 ---
 title: Plugins
-description: Stub page — content coming soon.
+description: Extend workspace capabilities with modular plugins — guardrails, skills, workflows.
 ---

-> 🚧 **Coming soon.** The Documentation Specialist agent will populate this
-> page on its next maintenance cycle.
+## Overview

-If you need this content urgently, open an issue on the
-[docs repo](https://github.com/Molecule-AI/docs/issues/new) and the agent
-will prioritise it on its next cron tick.
+Plugins are installable capability bundles that extend what a workspace can do.
+They range from ambient guardrails that enforce rules automatically, to
+on-demand skills invoked via the `Skill` tool, to workflow plugins that
+compose skills into slash commands.
+
+Plugins follow a **two-axis model**: the *source* (where the plugin comes from)
+is orthogonal to the *shape* (what format it takes). This means you can install
+a plugin from a local registry or from GitHub, and the workspace runtime
+figures out how to load it based on its shape.
+
+---
+
+## Two-Axis Model
+
+### Sources (where)
+
+| Scheme | Description | Example |
+|--------|-------------|---------|
+| `local://` | Platform's curated plugin registry (auto-discovered from the `plugins/` directory) | `local://molecule-careful-bash` |
+| `github://` | Public GitHub repo (shallow clone at install time) | `github://owner/repo` |
+| `github://` (pinned) | GitHub repo at a specific ref | `github://owner/repo#v1.2.0` |
+
+Use `GET /plugins/sources` to list all registered install-source schemes at
+runtime.
+
+### Shapes (what)
+
+| Shape | Description |
+|-------|-------------|
+| agentskills.io format | `SKILL.md` + optional scripts, hooks, and `plugin.yaml` manifest |
+| MCP server | Model Context Protocol server (coming soon for more runtimes) |
+
+The shape is orthogonal to the source. A `github://` plugin and a `local://`
+plugin can both be agentskills.io format. The per-runtime adapter inside the
+workspace handles loading at startup.
+
+---
+
+## Installing a Plugin
+
+```bash
+curl -X POST http://localhost:8080/workspaces/{id}/plugins \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer {token}" \
+  -d '{"source": "local://molecule-careful-bash"}'
+```
+
+From GitHub:
+
+```bash
+curl -X POST http://localhost:8080/workspaces/{id}/plugins \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer {token}" \
+  -d '{"source": "github://Molecule-AI/molecule-plugin-careful-bash"}'
+```
+
+The platform resolves the source, stages the plugin files, copies them into the
+workspace container at `/configs/plugins/<name>/`, and triggers an automatic
+workspace restart so the runtime picks up the new plugin.
+
+---
+
+## Uninstalling a Plugin
+
+```bash
+curl -X DELETE http://localhost:8080/workspaces/{id}/plugins/{name} \
+  -H "Authorization: Bearer {token}"
+```
+
+Uninstall removes the plugin directory, cleans up copied skill directories and
+rule markers from `CLAUDE.md`, and triggers an automatic workspace restart.
+
+---
+
+## Listing Plugins
+
+### Platform Registry
+
+List all available plugins in the platform registry:
+
+```bash
+# All plugins
+curl http://localhost:8080/plugins
+
+# Filtered by runtime
+curl http://localhost:8080/plugins?runtime=claude-code
+```
+
+Plugins with no declared `runtimes` field in their manifest are treated as
+"unspecified, try it" and included in filtered results.
+
+### Available for a Workspace
+
+Returns plugins filtered to those supported by the workspace's current runtime:
+
+```bash
+curl http://localhost:8080/workspaces/{id}/plugins/available \
+  -H "Authorization: Bearer {token}"
+```
+
+### Installed on a Workspace
+
+```bash
+curl http://localhost:8080/workspaces/{id}/plugins \
+  -H "Authorization: Bearer {token}"
+```
+
+Each installed plugin is annotated with whether it still supports the
+workspace's current runtime. This lets the canvas grey out plugins that went
+inert after a runtime change.
+
+---
+
+## Runtime Compatibility Check
+
+Before changing a workspace's runtime, check which installed plugins would
+become incompatible:
+
+```bash
+curl "http://localhost:8080/workspaces/{id}/plugins/compatibility?runtime=langgraph" \
+  -H "Authorization: Bearer {token}"
+```
+
+Response:
+
+```json
+{
+  "target_runtime": "langgraph",
+  "compatible": [...],
+  "incompatible": [...],
+  "all_compatible": false
+}
+```
+
+The canvas uses this to show a confirmation dialog before applying a runtime
+change.
+
+---
+
+## Built-in Plugins
+
+### Hook Plugins (ambient enforcement)
+
+These fire automatically via the harness layer. No explicit invocation needed.
+
+| Plugin | Purpose |
+|--------|---------|
+| `molecule-careful-bash` | Refuses `git push --force` to main, `rm -rf` at root, `DROP TABLE` against prod schema. Ships the `careful-mode` skill as documentation. |
+| `molecule-freeze-scope` | Locks edits to a single path glob via `.claude/freeze`. Useful while debugging. |
+| `molecule-audit-trail` | Appends every Edit/Write to `.claude/audit.jsonl` for accountability. |
+| `molecule-session-context` | Auto-loads recent cron-learnings and open PR/issue counts at session start. |
+| `molecule-prompt-watchdog` | Injects warning context when the prompt mentions destructive keywords. |
+
+### Skill Plugins (on-demand)
+
+Invoked explicitly via the `Skill` tool during a conversation.
+
+| Plugin | Purpose |
+|--------|---------|
+| `molecule-skill-code-review` | 16-criteria multi-axis code review rubric. |
+| `molecule-skill-cross-vendor-review` | Adversarial second-model review for noteworthy PRs. |
+| `molecule-skill-llm-judge` | Score whether a deliverable addresses the original request. |
+| `molecule-skill-update-docs` | Sync repo docs after merges. |
+| `molecule-skill-cron-learnings` | Defines the operational-memory JSONL format. |
+
+### Workflow Plugins (slash commands)
+
+Compose skills into repeatable multi-step workflows.
+
+| Plugin | Command | Purpose |
+|--------|---------|---------|
+| `molecule-workflow-triage` | `/triage` | Full PR-triage cycle (gates 1-7 + code-review + merge if green). |
+| `molecule-workflow-retro` | `/retro` | Weekly retrospective issue. |
+
+### Shared Plugins
+
+Loaded by default from the `plugins/` directory at the repo root.
+
+| Plugin | Purpose |
+|--------|---------|
+| `molecule-dev` | Codebase conventions (rules injected into CLAUDE.md) + `review-loop` skill. |
+| `superpowers` | `verification-before-completion`, `test-driven-development`, `systematic-debugging`, `writing-plans`. |
+| `ecc` | General Claude Code guardrails. |
+| `browser-automation` | Puppeteer/CDP-based web scraping and live canvas screenshots. Opt-in per workspace. |
+
+---
+
+## Org Template Plugin Resolution
+
+When deploying an org template, per-workspace `plugins:` lists in `org.yaml`
+role overrides **UNION** with `defaults.plugins` (deduplicated, defaults first).
+They do not replace them.
+
+To opt a specific default out for a given role or workspace, prefix the plugin
+name with `!` or `-`:
+
+```yaml
+defaults:
+  plugins:
+    - molecule-careful-bash
+    - molecule-audit-trail
+    - superpowers
+
+workspaces:
+  researcher:
+    role: "Research Analyst"
+    plugins:
+      - browser-automation       # added on top of defaults
+      - "!superpowers"           # opted out of superpowers
+```
+
+Result for the `researcher` workspace:
+`molecule-careful-bash`, `molecule-audit-trail`, `browser-automation`
+
+---
+
+## Install Safeguards
+
+Environment variables that bound the cost of a single plugin install:
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `PLUGIN_INSTALL_BODY_MAX_BYTES` | `65536` (64 KiB) | Max request body size |
+| `PLUGIN_INSTALL_FETCH_TIMEOUT` | `5m` | Whole fetch + copy deadline |
+| `PLUGIN_INSTALL_MAX_DIR_BYTES` | `104857600` (100 MiB) | Max staged-tree size |
+
+These prevent a slow or malicious source from tying up a handler goroutine or
+exhausting disk space.
+
+---
+
+## Plugin Download (External Workspaces)
+
+External workspaces (those running outside Docker) can pull plugins as gzipped
+tarballs:
+
+```bash
+curl http://localhost:8080/workspaces/{id}/plugins/{name}/download \
+  -H "Authorization: Bearer {token}" \
+  -o plugin.tar.gz
+```
+
+An optional `?source=github://owner/repo` query parameter lets external
+workspaces pull from upstream repos without the platform pre-staging them.
+Defaults to `local://<name>` when omitted.
+
+---
+
+## API Reference
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/plugins` | List plugin registry (supports `?runtime=` filter) |
+| GET | `/plugins/sources` | List registered install-source schemes |
+| GET | `/workspaces/:id/plugins` | List installed plugins |
+| POST | `/workspaces/:id/plugins` | Install a plugin (`{"source": "scheme://spec"}`) |
+| DELETE | `/workspaces/:id/plugins/:name` | Uninstall a plugin |
+| GET | `/workspaces/:id/plugins/available` | Available plugins filtered by workspace runtime |
+| GET | `/workspaces/:id/plugins/compatibility?runtime=X` | Preflight runtime-change compatibility check |
+| GET | `/workspaces/:id/plugins/:name/download` | Download plugin as tarball (external workspaces) |
--- a/content/docs/quickstart.mdx
+++ b/content/docs/quickstart.mdx
@ -9,28 +9,52 @@ using the bundled `molecule-dev` template.
 ## Prerequisites

 - Docker Desktop (or any Docker daemon) running locally
- Go 1.25+ and Node 22+ if you want to build the platform from source
- A Claude API key (`CLAUDE_CODE_OAUTH_TOKEN`) in your environment
+- Go 1.25+ and Node 20+ if building from source
+- An LLM API key (Claude, OpenRouter, or Gemini)

-## 1. Clone the monorepo
+## Option A: One-command start (recommended)

 ```bash
 git clone https://github.com/Molecule-AI/molecule-monorepo.git
 cd molecule-monorepo
+./scripts/dev-start.sh
 ```

-## 2. Boot the platform
+This starts everything: Postgres, Redis, Platform (Go on `:8080`), and
+Canvas (Next.js on `:3000`). Press `Ctrl-C` to stop all services.
+
+## Option B: Docker Compose

 ```bash
-docker compose up -d --build platform canvas
+git clone https://github.com/Molecule-AI/molecule-monorepo.git
+cd molecule-monorepo
+docker compose up -d
 ```

-This starts:
- **platform** (Go API on `localhost:8080`)
- **canvas** (Next.js 15 frontend on `localhost:3000`)
- **postgres** + **redis** for state and pub/sub
+This starts the full stack including Langfuse (`:3001`) and Temporal (`:8233`).

-## 3. Import the dev template
+## Option C: Manual setup
+
+```bash
+# 1. Start infrastructure
+./infra/scripts/setup.sh    # Postgres, Redis, Langfuse, Temporal
+
+# 2. Start platform
+cd platform && go run ./cmd/server    # API on :8080
+
+# 3. Start canvas (new terminal)
+cd canvas && npm install && npm run dev    # UI on :3000
+```
+
+## 2. Open the canvas
+
+Navigate to [http://localhost:3000](http://localhost:3000). You should see
+the empty state with template cards.
+
+## 3. Deploy from a template
+
+Click any template card to deploy a workspace instantly. Or import a full
+org template:

 ```bash
 curl -X POST http://localhost:8080/org/import \
@ -42,15 +66,10 @@ This provisions the 12-workspace dev team — PM, Research Lead and 3
 researchers, Dev Lead and 5 engineers, plus Security/QA/UIUX auditors —
 each as its own Docker container.

-## 4. Open the canvas
+## 4. Talk to PM

-Navigate to [http://localhost:3000](http://localhost:3000). You should
-see your team rendered as a tree of nodes. Click any node to chat with
-that agent directly.
-
-## 5. Talk to PM
-
-PM is the entry point. Send it a task:
+PM is the entry point. Click the PM node on the canvas, open the Chat tab,
+and send a task:

 > *"Add a 'Last seen' column to the user list table on the admin page."*

@ -58,17 +77,40 @@ PM will break the request into specific assignments, fan them out to the
 right leads in parallel, verify the results, and report back when the
 work is shipped.

+## 5. Set up secrets
+
+Most agents need an LLM API key. Set it as a global secret so all
+workspaces inherit it:
+
+```bash
+curl -X PUT http://localhost:8080/settings/secrets \
+  -H 'Content-Type: application/json' \
+  -d '{"key":"ANTHROPIC_API_KEY","value":"sk-ant-..."}'
+```
+
+Or use the Settings panel (gear icon) in the canvas to manage secrets
+per workspace.
+
 ## What just happened

-You spun up a self-organising engineering team in one command. They're
-clones of real Claude Code agents — they can read your codebase, run
-tests, open PRs to GitHub. Their schedules (security audit, UX audit,
-template fitness checks) run hourly on their own.
+You spun up a self-organising engineering team. They're real agents — they
+can read your codebase, run tests, open PRs to GitHub. Their schedules
+(security audit, UX audit, template fitness checks) run hourly on their own.
+
+## Using the SaaS instead
+
+Don't want to self-host? Use the cloud platform directly:
+
+1. Go to [app.moleculesai.app](https://app.moleculesai.app)
+2. Sign up and create an organization
+3. Your tenant is provisioned at `<your-org>.moleculesai.app`
+4. Deploy agents from templates — same experience, zero infrastructure

 ## Next steps

- Customise the [Org Template](/docs/org-template) to match your team's
-  actual structure.
- Add [Plugins](/docs/plugins) to give specific roles new capabilities.
+- Customise the [Org Template](/docs/org-template) to match your team.
+- Add [Plugins](/docs/plugins) to give roles new capabilities.
 - Wire a [Channel](/docs/channels) so you can talk to PM from Telegram.
+- Connect your own agents with [External Agents](/docs/external-agents).
+- Generate [API Tokens](/docs/tokens) for programmatic access.
 - Read about the [Architecture](/docs/architecture) under the hood.
--- a/content/docs/schedules.mdx
+++ b/content/docs/schedules.mdx
@ -1,11 +1,298 @@
 ---
 title: Schedules
-description: Stub page — content coming soon.
+description: Run recurring prompts on cron schedules — automated audits, reports, and maintenance.
 ---

-> 🚧 **Coming soon.** The Documentation Specialist agent will populate this
-> page on its next maintenance cycle.
+## Overview

-If you need this content urgently, open an issue on the
-[docs repo](https://github.com/Molecule-AI/docs/issues/new) and the agent
-will prioritise it on its next cron tick.
+Schedules let you run recurring prompts against a workspace on a cron schedule.
+Each tick fires an A2A `message/send` into the workspace, so the agent
+processes the prompt as if it received a normal message. This enables automated
+audits, daily reports, weekly retrospectives, and any other recurring task.
+
+The scheduler polls the `workspace_schedules` table every 30 seconds. When a
+schedule's `next_run_at` has passed, the scheduler fires the prompt and
+computes the next run time.
+
+```
+Scheduler (30s poll) ──> workspace_schedules table
+                              │
+                  next_run_at <= now?
+                              │
+                    ┌─────────┴──────────┐
+                    │  A2A message/send   │──> Workspace Agent
+                    │  (callerID=system:  │
+                    │   scheduler)        │
+                    └─────────────────────┘
+```
+
+---
+
+## Creating a Schedule
+
+```bash
+curl -X POST http://localhost:8080/workspaces/{id}/schedules \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer {token}" \
+  -d '{
+    "name": "Daily Security Audit",
+    "cron_expr": "0 9 * * *",
+    "timezone": "America/New_York",
+    "prompt": "Run a security audit of all open PRs. Check for leaked secrets, SQL injection, and auth bypass.",
+    "enabled": true
+  }'
+```
+
+**Required fields:**
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `cron_expr` | string | Standard cron expression (5-field: minute, hour, day-of-month, month, day-of-week) |
+| `prompt` | string | The text sent to the workspace as an A2A message each tick |
+
+**Optional fields:**
+
+| Field | Type | Default | Description |
+|-------|------|---------|-------------|
+| `name` | string | `""` | Human-readable label |
+| `timezone` | string | `"UTC"` | IANA timezone for cron evaluation (e.g. `America/New_York`, `Asia/Tokyo`) |
+| `enabled` | bool | `true` | Whether the schedule fires |
+
+The timezone is validated against Go's `time.LoadLocation` on create and update.
+The cron expression is validated and the next run time is computed immediately.
+
+---
+
+## CRUD Operations
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/workspaces/:id/schedules` | List all schedules for a workspace |
+| POST | `/workspaces/:id/schedules` | Create a new schedule |
+| PATCH | `/workspaces/:id/schedules/:scheduleId` | Update a schedule (partial update via COALESCE) |
+| DELETE | `/workspaces/:id/schedules/:scheduleId` | Delete a schedule |
+
+### Update
+
+PATCH accepts any subset of fields. Only provided fields are changed — the
+handler uses `COALESCE` in SQL so omitted fields retain their current values.
+If `cron_expr` or `timezone` changes, the next run time is recomputed.
+
+```bash
+curl -X PATCH http://localhost:8080/workspaces/{id}/schedules/{scheduleId} \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer {token}" \
+  -d '{"enabled": false}'
+```
+
+### Delete
+
+```bash
+curl -X DELETE http://localhost:8080/workspaces/{id}/schedules/{scheduleId} \
+  -H "Authorization: Bearer {token}"
+```
+
+All schedule operations are scoped to the owning workspace ID to prevent IDOR.
+
+---
+
+## Manual Trigger
+
+Fire a schedule immediately, outside its cron cadence:
+
+```bash
+curl -X POST http://localhost:8080/workspaces/{id}/schedules/{scheduleId}/run \
+  -H "Authorization: Bearer {token}"
+```
+
+Returns the schedule's prompt so the frontend can POST it to
+`/workspaces/:id/a2a`. This keeps the handler stateless.
+
+---
+
+## Run History
+
+View the last 20 runs for a schedule, including error details for failed runs:
+
+```bash
+curl http://localhost:8080/workspaces/{id}/schedules/{scheduleId}/history \
+  -H "Authorization: Bearer {token}"
+```
+
+Response:
+
+```json
+[
+  {
+    "timestamp": "2026-04-16T09:00:02Z",
+    "duration_ms": 4523,
+    "status": "success",
+    "error_detail": "",
+    "request": {"schedule_id": "...", "prompt": "..."}
+  },
+  {
+    "timestamp": "2026-04-15T09:00:01Z",
+    "duration_ms": null,
+    "status": "error",
+    "error_detail": "A2A proxy returned 503: workspace container not running",
+    "request": {"schedule_id": "...", "prompt": "..."}
+  }
+]
+```
+
+History is pulled from the `activity_logs` table filtered by
+`activity_type = 'cron_run'` and the schedule ID in the request body.
+
+---
+
+## Source Field
+
+Each schedule has a `source` field that tracks how it was created:
+
+| Value | Meaning |
+|-------|---------|
+| `template` | Seeded by an org template import or bundle import. On re-import, only `template`-source rows are refreshed — `runtime` rows survive. |
+| `runtime` | Created via the Canvas UI or API. These are user-owned and never overwritten by re-imports. |
+
+---
+
+## Status Values
+
+The `last_status` field on a schedule tracks the outcome of the most recent
+run:
+
+| Status | Meaning |
+|--------|---------|
+| `success` | The A2A message was delivered and the workspace acknowledged it. |
+| `error` | The A2A proxy returned a non-2xx status. `last_error` contains details. |
+| `skipped` | The workspace was busy (concurrency-aware skip). The scheduler detected `active_tasks > 0` and deferred the run to avoid overloading the agent. |
+
+---
+
+## Schedule Health Endpoint
+
+Peer workspaces can monitor each other's schedule health without admin auth:
+
+```bash
+curl http://localhost:8080/workspaces/{id}/schedules/health \
+  -H "X-Workspace-ID: {callerWorkspaceId}" \
+  -H "Authorization: Bearer {callerToken}"
+```
+
+This endpoint returns execution-state fields only (`last_run_at`,
+`last_status`, `run_count`, `next_run_at`, `last_error`). It deliberately
+omits `prompt` and `cron_expr` so sensitive task content is never exposed to
+peer workspaces.
+
+**Auth rules** (mirrors the A2A proxy pattern):
+- `X-Workspace-ID` header required to identify the caller
+- Caller's own bearer token validated (legacy workspaces grandfathered)
+- `registry.CanCommunicate(callerID, workspaceID)` must return true
+- System callers (`system:*`, `webhook:*`, `test:*`) bypass checks
+- Self-calls always allowed
+
+---
+
+## Scheduler Internals
+
+### Poll Loop
+
+The scheduler runs a 30-second poll loop. Each tick:
+
+1. Queries up to 50 due schedules (`next_run_at <= now AND enabled = true`)
+2. Fires up to 10 concurrently via a semaphore
+3. Each fire sends an A2A `message/send` with a 5-minute timeout
+4. Updates `last_run_at`, `run_count`, `last_status`, and `next_run_at`
+5. Logs the run to `activity_logs` with `activity_type = 'cron_run'`
+
+### Panic Recovery
+
+The scheduler recovers from panics inside the tick function. A single bad row,
+malformed cron expression, or database blip cannot permanently kill the
+scheduler. Without this recovery, the goroutine dies silently and the only
+signal is "no crons firing."
+
+### Liveness Watchdog
+
+The scheduler reports heartbeats to the `supervised` subsystem. The
+`/admin/liveness` endpoint exposes per-subsystem ages, so operators can detect
+a stuck scheduler before it causes a missed-cron outage.
+
+`Scheduler.Healthy()` returns true if the scheduler has completed a tick within
+the last 60 seconds (2x the poll interval). Returns false before the first tick
+or if the scheduler is stalled.
+
+---
+
+## Examples
+
+### Hourly Security Audit
+
+```json
+{
+  "name": "Hourly Security Scan",
+  "cron_expr": "0 * * * *",
+  "timezone": "UTC",
+  "prompt": "Scan all open PRs for leaked secrets, SQL injection patterns, and auth bypass vulnerabilities. Report findings as a summary."
+}
+```
+
+### Daily Standup Report
+
+```json
+{
+  "name": "Daily Standup",
+  "cron_expr": "0 9 * * 1-5",
+  "timezone": "America/Los_Angeles",
+  "prompt": "Generate a standup report: what was completed yesterday, what is planned today, and any blockers. Post to the team channel."
+}
+```
+
+### Weekly Retrospective
+
+```json
+{
+  "name": "Weekly Retro",
+  "cron_expr": "0 17 * * 5",
+  "timezone": "America/New_York",
+  "prompt": "Write a weekly retrospective covering PRs merged, issues closed, cron failures, and code review findings. Post as a GitHub issue."
+}
+```
+
+### Nightly Cleanup
+
+```json
+{
+  "name": "Nightly Cleanup",
+  "cron_expr": "0 2 * * *",
+  "timezone": "UTC",
+  "prompt": "Archive stale branches older than 30 days. Close issues that have been inactive for 60 days with a comment explaining the auto-close policy.",
+  "enabled": true
+}
+```
+
+---
+
+## Timezone Handling
+
+All cron expressions are evaluated in the specified timezone. If no timezone is
+provided, `UTC` is used. The timezone must be a valid IANA timezone string
+(e.g. `America/New_York`, `Europe/London`, `Asia/Tokyo`).
+
+When a schedule's `cron_expr` or `timezone` is updated, the `next_run_at` is
+immediately recomputed using the new values. This prevents schedules from
+firing at unexpected times after a timezone change.
+
+---
+
+## API Reference
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/workspaces/:id/schedules` | List schedules |
+| POST | `/workspaces/:id/schedules` | Create schedule |
+| PATCH | `/workspaces/:id/schedules/:scheduleId` | Update schedule |
+| DELETE | `/workspaces/:id/schedules/:scheduleId` | Delete schedule |
+| POST | `/workspaces/:id/schedules/:scheduleId/run` | Manual trigger |
+| GET | `/workspaces/:id/schedules/:scheduleId/history` | Run history (last 20) |
+| GET | `/workspaces/:id/schedules/health` | Health view (open to peers) |
--- a/content/docs/self-hosting.mdx
+++ b/content/docs/self-hosting.mdx
@ -1,11 +1,199 @@
 ---
-title: Self Hosting
-description: Stub page — content coming soon.
+title: Self-Hosting
+description: Run the full Molecule AI stack on your own infrastructure.
 ---

-> 🚧 **Coming soon.** The Documentation Specialist agent will populate this
-> page on its next maintenance cycle.
+## Prerequisites

-If you need this content urgently, open an issue on the
-[docs repo](https://github.com/Molecule-AI/docs/issues/new) and the agent
-will prioritise it on its next cron tick.
+| Requirement | Minimum Version |
+|-------------|----------------|
+| Docker Desktop | Latest stable |
+| Go | 1.25+ |
+| Node.js | 20+ |
+| Git | 2.x |
+
+## Quick Start
+
+The fastest way to get Molecule AI running locally:
+
+```bash
+git clone https://github.com/Molecule-AI/molecule-monorepo.git
+cd molecule-monorepo
+./scripts/dev-start.sh
+# Canvas: http://localhost:3000
+# Platform: http://localhost:8080
+```
+
+This script starts all infrastructure services, builds the platform, and launches the canvas dev server.
+
+## Infrastructure Setup
+
+Molecule AI depends on four infrastructure services, all managed via `docker-compose.infra.yml` and attached to the shared `molecule-monorepo-net` Docker network:
+
+| Service | Port | Purpose |
+|---------|------|---------|
+| Postgres | 5432 | Primary datastore (also backs Langfuse and Temporal) |
+| Redis | 6379 | Pub/sub, heartbeat TTLs |
+| Langfuse | 3001 | LLM trace viewer (backed by ClickHouse) |
+| Temporal | 7233 (gRPC), 8233 (Web UI) | Durable workflow engine |
+
+Start infrastructure only:
+
+```bash
+./infra/scripts/setup.sh
+```
+
+Tear everything down (removes volumes):
+
+```bash
+./infra/scripts/nuke.sh
+```
+
+## Manual Setup
+
+If you prefer to start each component individually:
+
+### Platform (Go)
+
+```bash
+cd platform
+go build ./cmd/server
+go run ./cmd/server
+# Requires Postgres + Redis running
+```
+
+The platform must be run from the `platform/` directory, not the repo root.
+
+### Canvas (Next.js)
+
+```bash
+cd canvas
+npm install
+npm run dev
+# Dev server on http://localhost:3000
+```
+
+### Docker Compose
+
+For infrastructure only:
+
+```bash
+docker compose -f docker-compose.infra.yml up -d
+```
+
+For the full stack (infrastructure + platform + canvas):
+
+```bash
+docker compose up
+```
+
+## Environment Variables
+
+### Platform
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `DATABASE_URL` | -- | Postgres connection string (required) |
+| `REDIS_URL` | -- | Redis connection string (required) |
+| `PORT` | `8080` | Platform HTTP port |
+| `PLATFORM_URL` | `http://host.docker.internal:PORT` | URL passed to agent containers to reach the platform |
+| `CORS_ORIGINS` | `http://localhost:3000,http://localhost:3001` | Comma-separated allowed origins |
+| `SECRETS_ENCRYPTION_KEY` | -- | AES-256 key (32 bytes) for encrypting workspace secrets |
+| `WORKSPACE_DIR` | -- | Global fallback host path for `/workspace` bind-mount |
+| `MOLECULE_ENV` | -- | Set to `production` to hide E2E helper endpoints |
+| `ACTIVITY_RETENTION_DAYS` | `7` | How long activity logs are retained |
+| `ACTIVITY_CLEANUP_INTERVAL_HOURS` | `6` | How often the cleanup job runs |
+| `RATE_LIMIT` | `600` | Requests per minute per client |
+
+### Tier Resource Limits
+
+Override per-tier memory and CPU caps for workspace containers. CPU\_SHARES follows Docker's convention where 1024 equals 1 CPU.
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `TIER2_MEMORY_MB` | `512` | Standard tier memory limit |
+| `TIER2_CPU_SHARES` | `1024` | Standard tier CPU shares |
+| `TIER3_MEMORY_MB` | `2048` | Privileged tier memory limit |
+| `TIER3_CPU_SHARES` | `2048` | Privileged tier CPU shares |
+| `TIER4_MEMORY_MB` | `4096` | Full-host tier memory limit |
+| `TIER4_CPU_SHARES` | `4096` | Full-host tier CPU shares |
+
+### Plugin Install Safeguards
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `PLUGIN_INSTALL_BODY_MAX_BYTES` | `65536` | Max request body size (64 KiB) |
+| `PLUGIN_INSTALL_FETCH_TIMEOUT` | `5m` | Whole fetch and copy deadline |
+| `PLUGIN_INSTALL_MAX_DIR_BYTES` | `104857600` | Max staged-tree size (100 MiB) |
+
+### Canvas
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `NEXT_PUBLIC_PLATFORM_URL` | `http://localhost:8080` | Platform API URL |
+| `NEXT_PUBLIC_WS_URL` | `ws://localhost:8080/ws` | WebSocket endpoint |
+
+### Tenant Mode
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `CANVAS_PROXY_URL` | -- | When set, the Go server proxies canvas requests to this URL |
+| `MOLECULE_ORG_ID` | -- | UUID for multi-tenant isolation; leave unset for self-hosted |
+
+## Production Deployment
+
+For production, use `platform/Dockerfile.tenant` which builds a combined Go + Canvas image:
+
+```bash
+docker build -f platform/Dockerfile.tenant -t molecule-platform .
+```
+
+This image serves both the API and the canvas frontend from a single container.
+
+## Security Configuration
+
+### Secrets Encryption
+
+Set `SECRETS_ENCRYPTION_KEY` to a 32-byte AES-256 key to encrypt workspace secrets at rest. Without this variable, secrets are stored in plaintext.
+
+```bash
+# Generate a key
+openssl rand -hex 32
+```
+
+**Warning:** `SECRETS_ENCRYPTION_KEY` cannot be rotated without a data migration. Choose carefully before deploying to production.
+
+### Rate Limiting
+
+The `RATE_LIMIT` variable (default 600 requests/min) applies per client. Adjust based on your expected traffic.
+
+### CORS
+
+Set `CORS_ORIGINS` to a comma-separated list of allowed origins. In production, restrict this to your actual domain.
+
+## Pre-commit Hook
+
+Install the project's pre-commit hooks to enforce code quality:
+
+```bash
+git config core.hooksPath .githooks
+```
+
+The hook enforces:
+
+- `'use client'` directive on hook-using `.tsx` files
+- Dark theme only (no `white` or `light` CSS classes)
+- No SQL injection patterns (`fmt.Sprintf` with SQL)
+- No leaked secrets (`sk-ant-`, `ghp_`, `AKIA`)
+
+Commits are rejected until all violations are fixed.
+
+## Building Workspace Images
+
+Build the base workspace image for local development:
+
+```bash
+bash workspace-template/build-all.sh
+```
+
+Adapter-specific images are built from standalone template repos. Each repo's `Dockerfile` installs `molecule-ai-workspace-runtime` from PyPI plus adapter-specific dependencies.
--- a/content/docs/troubleshooting.mdx
+++ b/content/docs/troubleshooting.mdx
@ -1,11 +1,164 @@
 ---
 title: Troubleshooting
-description: Stub page — content coming soon.
+description: Common issues and how to fix them.
 ---

-> 🚧 **Coming soon.** The Documentation Specialist agent will populate this
-> page on its next maintenance cycle.
+## Workspace Stuck in "Provisioning"

-If you need this content urgently, open an issue on the
-[docs repo](https://github.com/Molecule-AI/docs/issues/new) and the agent
-will prioritise it on its next cron tick.
+A workspace that stays in `provisioning` for more than 30 seconds usually indicates a container startup failure.
+
+**Steps to diagnose:**
+
+1. Check Docker logs for the workspace container:
+   ```bash
+   docker logs <container-id>
+   ```
+2. Verify the workspace image exists locally:
+   ```bash
+   docker images | grep workspace-template
+   ```
+3. Check tier resource limits -- the container may be OOM-killed on start. Review `TIER2_MEMORY_MB` / `TIER3_MEMORY_MB` / `TIER4_MEMORY_MB` values.
+4. Ensure the platform can reach the Docker daemon (Docker Desktop must be running).
+
+## 401 Unauthorized on API Calls
+
+Bearer tokens can expire or be revoked. Workspace tokens are also auto-revoked when a workspace is deleted.
+
+**Resolution:**
+
+- For workspace-scoped endpoints, mint a new token:
+  ```bash
+  # Development/staging only (hidden when MOLECULE_ENV=production)
+  curl http://localhost:8080/admin/workspaces/:id/test-token
+  ```
+- For admin endpoints, verify your token is still valid against a known-good endpoint like `GET /health`.
+- Legacy workspaces (created before Phase 30.1) are grandfathered and do not require tokens on heartbeat/update-card routes.
+
+## WebSocket Shows "Reconnecting"
+
+The canvas WebSocket connection (`/ws`) drops and retries.
+
+**Common causes:**
+
+- `CORS_ORIGINS` does not include your domain -- the WebSocket upgrade is rejected. Add your origin to the comma-separated list.
+- A reverse proxy or firewall is terminating the long-lived connection. Ensure WebSocket upgrade headers are forwarded.
+- The platform process crashed or restarted. Check platform logs.
+
+**Verify connectivity:**
+
+```bash
+# Quick check that the WS endpoint is reachable
+curl -i -N \
+  -H "Connection: Upgrade" \
+  -H "Upgrade: websocket" \
+  -H "Sec-WebSocket-Version: 13" \
+  -H "Sec-WebSocket-Key: dGVzdA==" \
+  http://localhost:8080/ws
+```
+
+## Agent Not Responding to A2A
+
+When one agent cannot reach another via the A2A proxy (`POST /workspaces/:id/a2a`), check communication rules.
+
+**The `CanCommunicate` access check allows:**
+
+- Same workspace (self-call)
+- Siblings (same parent)
+- Root-level siblings (both have no parent)
+- Parent to child or child to parent
+
+**Everything else is denied.** If two agents need to communicate, they must be in the same subtree.
+
+**Also verify:**
+
+- The target workspace is `online` (not `paused`, `offline`, or `provisioning`)
+- The target's heartbeat is fresh (Redis TTL has not expired)
+- The caller includes `X-Workspace-ID` and `Authorization: Bearer <token>` headers
+
+## Schedule Not Firing
+
+Cron schedules are managed by the platform scheduler subsystem.
+
+**Checklist:**
+
+- Verify the cron expression is valid (standard 5-field cron syntax)
+- Confirm the workspace is `online` -- paused workspaces skip all schedules
+- Check if the schedule was `skipped` due to concurrency: the scheduler skips when `active_tasks > 0`. Review schedule history:
+  ```
+  GET /workspaces/:id/schedules/:scheduleId/history
+  ```
+- Inspect `GET /admin/liveness` to ensure the scheduler subsystem is alive (age should be under 60 seconds)
+
+## Channel Test Fails
+
+Social channel integrations (Telegram, Slack, etc.) can fail for several reasons.
+
+**Diagnose:**
+
+- Verify the bot token is correct and has not been revoked by the platform provider
+- Check the allowlist config in the channel's JSONB settings -- messages from non-allowlisted chats are silently dropped
+- Ensure the webhook URL is registered with the external platform:
+  ```
+  POST /webhooks/:type
+  ```
+  This is the endpoint the external platform (Telegram, Slack) should send events to.
+- Test the connection explicitly:
+  ```
+  POST /workspaces/:id/channels/:channelId/test
+  ```
+
+## Migration Crash on Boot
+
+The platform runs all `*.up.sql` migrations on every startup (there is no `schema_migrations` tracking table yet).
+
+**Common issues:**
+
+- Migrations must be idempotent (`CREATE TABLE IF NOT EXISTS`, `ALTER TABLE ... IF NOT EXISTS`). If a migration lacks this guard, the second boot fails.
+- Before PR #212, the migration runner did not filter `.down.sql` files, causing tables to be dropped on every boot. Ensure you are running a platform version that includes this fix.
+- If you see errors about duplicate columns or tables, the migration is not idempotent. Patch the `.up.sql` file to add `IF NOT EXISTS` guards.
+
+## Canvas Blank or 502 on Tenant Deploy
+
+In tenant mode (`platform/Dockerfile.tenant`), the Go server proxies canvas requests.
+
+**Verify:**
+
+- `CANVAS_PROXY_URL` is set and points to the running Next.js process inside the container
+- Both the Go server and the Node.js process are running (check container logs for both)
+- The Next.js build completed successfully during `docker build`
+
+## Plugin Install Timeout
+
+Large plugins or slow network connections can exceed the default fetch deadline.
+
+**Adjust limits:**
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `PLUGIN_INSTALL_FETCH_TIMEOUT` | `5m` | Increase for large or remote plugins |
+| `PLUGIN_INSTALL_MAX_DIR_BYTES` | `104857600` (100 MiB) | Increase if the plugin tree exceeds 100 MiB |
+| `PLUGIN_INSTALL_BODY_MAX_BYTES` | `65536` (64 KiB) | Increase if the install request body is large |
+
+## Memory or Disk Usage Growing
+
+Activity logs and structure events accumulate over time.
+
+**Tune retention:**
+
+- `ACTIVITY_RETENTION_DAYS` (default `7`) -- reduce to 3 or even 1 for high-traffic deployments
+- `ACTIVITY_CLEANUP_INTERVAL_HOURS` (default `6`) -- reduce to run cleanup more frequently
+- Monitor the `activity_logs` and `structure_events` tables directly if disk usage is a concern:
+  ```sql
+  SELECT pg_size_pretty(pg_total_relation_size('activity_logs'));
+  SELECT pg_size_pretty(pg_total_relation_size('structure_events'));
+  ```
+
+## Container Health Detection
+
+If workspaces go offline unexpectedly (e.g., Docker Desktop crash), three layers detect the failure:
+
+1. **Passive (Redis TTL):** 60-second heartbeat key expires, liveness monitor triggers auto-restart
+2. **Proactive (Health Sweep):** Docker API polled every 15 seconds, catches dead containers faster than TTL expiry
+3. **Reactive (A2A Proxy):** On connection error to a workspace, checks `provisioner.IsRunning()` and triggers immediate offline + restart
+
+If none of these are catching a dead container, check `GET /admin/liveness` to verify the health sweep and liveness monitor subsystems are running.