Merge pull request #3 from Molecule-AI/feat/external-agents-tokens-mcp

docs: comprehensive content for all 15 pages
This commit is contained in:
Hongming Wang 2026-04-16 10:11:54 -07:00 committed by GitHub
commit e2a772d561
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
13 changed files with 2414 additions and 135 deletions

View File

@ -1,11 +1,423 @@
---
title: Api Reference
description: Stub page — content coming soon.
title: API Reference
description: Complete reference for all Molecule AI Platform HTTP and WebSocket endpoints.
---
> 🚧 **Coming soon.** The Documentation Specialist agent will populate this
> page on its next maintenance cycle.
# API Reference
If you need this content urgently, open an issue on the
[docs repo](https://github.com/Molecule-AI/docs/issues/new) and the agent
will prioritise it on its next cron tick.
The Molecule AI Platform exposes a REST API (default port 8080) for workspace management, agent registry, communication, and administration. All endpoints return JSON unless otherwise noted.
**Base URL:** `http://localhost:8080` (self-hosted) or `https://api.moleculesai.app` (SaaS)
---
## Authentication Model
The platform uses three authentication middleware variants depending on the sensitivity of the route.
### AdminAuth
Strict bearer-token authentication. Required for any route where a forged request could leak prompts/memory, create/mutate workspaces, or leak operational data.
```
Authorization: Bearer <token>
```
**Fail-open behavior:** When no live tokens exist globally (fresh install), AdminAuth passes all requests through. Once the first token is created, all AdminAuth routes require a valid bearer.
### WorkspaceAuth
Per-workspace bearer token binding. Workspace A's token cannot access workspace B's sub-routes. Used for the entire `/workspaces/:id/*` group (except the A2A proxy, which uses `CanCommunicate`).
```
Authorization: Bearer <workspace-token>
```
### CanvasOrBearer
Accepts either a valid bearer token OR a request whose `Origin` header matches `CORS_ORIGINS`. Used only for cosmetic-only routes where a forged request has zero data/security impact.
Currently applies only to `PUT /canvas/viewport`. Do not extend to data-sensitive routes.
---
## Health and Monitoring
| Method | Path | Auth | Description |
|--------|------|------|-------------|
| GET | `/health` | None | Returns `200 OK` if the platform is running. Use for load balancer health checks. |
| GET | `/metrics` | None | Prometheus text format (v0.0.4) metrics. Scrape-safe, no auth required. |
| GET | `/admin/liveness` | AdminAuth | Per-subsystem `supervised.Snapshot()` ages. Check before debugging stuck scheduler/heartbeat goroutines. |
---
## Workspaces
Core workspace CRUD and lifecycle operations.
### CRUD
| Method | Path | Auth | Description |
|--------|------|------|-------------|
| POST | `/workspaces` | AdminAuth | Create a new workspace. Accepts `name`, `runtime`, `template`, `parent_id`, `tier`, `workspace_dir`, and other fields. Runtime is auto-detected from template config if omitted (defaults to `langgraph`). |
| GET | `/workspaces` | AdminAuth | List all workspaces with status, runtime, agent card, position, and hierarchy info. |
| GET | `/workspaces/:id` | WorkspaceAuth | Get a single workspace by ID. |
| PATCH | `/workspaces/:id` | WorkspaceAuth | Update workspace fields. **Field-level authz:** cosmetic fields (name, role, x, y, canvas) pass through; sensitive fields (tier, parent_id, runtime, workspace_dir) require a valid bearer token when any live token exists. |
| DELETE | `/workspaces/:id` | AdminAuth | Delete a workspace. Stops the container, revokes all auth tokens, and removes all associated data. |
### Lifecycle
| Method | Path | Auth | Description |
|--------|------|------|-------------|
| POST | `/workspaces/:id/restart` | WorkspaceAuth | Restart the workspace container. Sends a `restart_context` A2A message after successful re-registration. |
| POST | `/workspaces/:id/pause` | WorkspaceAuth | Stop the container and set status to `paused`. Paused workspaces skip health sweep, liveness monitor, and auto-restart. |
| POST | `/workspaces/:id/resume` | WorkspaceAuth | Re-provision a paused workspace. Status transitions to `provisioning`. |
---
## Registry
Workspace registration and heartbeat endpoints. Called by workspace runtimes, not by end users.
| Method | Path | Auth | Description |
|--------|------|------|-------------|
| POST | `/registry/register` | None | Register a workspace with the platform. Sets status to `online`. Body includes agent URL, agent card, capabilities. |
| POST | `/registry/heartbeat` | Bearer (if token exists) | Send a heartbeat. Updates Redis TTL key (60s expiry). Body can include `active_tasks`, `current_task`, `error_rate`. Triggers `degraded` status if `error_rate > 0.5`. |
| POST | `/registry/update-card` | Bearer (if token exists) | Update the workspace's agent card (name, description, skills, etc.). |
---
## Discovery
Peer discovery and access control verification.
| Method | Path | Auth | Description |
|--------|------|------|-------------|
| GET | `/registry/discover/:id` | Bearer + `X-Workspace-ID` | Discover a workspace's agent card and URL. Requires caller identification. Fails open on DB hiccup since hierarchy check is primary. |
| GET | `/registry/:id/peers` | Bearer + `X-Workspace-ID` | List all peers (siblings, parent, children) that the caller can communicate with. |
| POST | `/registry/check-access` | None | Check whether two workspaces can communicate. Body: `{ "caller_id": "...", "target_id": "..." }`. Returns `{ "allowed": true/false }`. |
---
## Communication
### A2A Proxy
| Method | Path | Auth | Description |
|--------|------|------|-------------|
| POST | `/workspaces/:id/a2a` | CanCommunicate | Proxy an A2A JSON-RPC message to the target workspace. Caller identified via `X-Workspace-ID` header. Canvas requests (no header) bypass access check. On connection error, checks if container is dead and triggers auto-restart. |
### Delegation
| Method | Path | Auth | Description |
|--------|------|------|-------------|
| POST | `/workspaces/:id/delegate` | WorkspaceAuth | Async fire-and-forget delegation. Supports idempotency keys. Body includes target workspace, prompt, and metadata. |
| GET | `/workspaces/:id/delegations` | WorkspaceAuth | List delegation status for a workspace. Returns delegation rows with status, result, timestamps. |
---
## Configuration
| Method | Path | Auth | Description |
|--------|------|------|-------------|
| GET | `/workspaces/:id/config` | WorkspaceAuth | Get the workspace's `config.yaml` contents. |
| PATCH | `/workspaces/:id/config` | WorkspaceAuth | Update the workspace config. "Save & Restart" writes config and auto-restarts; "Save" writes only and shows a restart banner in the Canvas. |
---
## Secrets
### Per-Workspace Secrets
| Method | Path | Auth | Description |
|--------|------|------|-------------|
| GET | `/workspaces/:id/secrets` | WorkspaceAuth | List secret keys for a workspace (keys only, values masked). |
| POST | `/workspaces/:id/secrets` | WorkspaceAuth | Set a secret `{ "key": "...", "value": "..." }`. Auto-restarts the workspace. |
| PUT | `/workspaces/:id/secrets` | WorkspaceAuth | Alias for POST (upsert semantics). Auto-restarts the workspace. |
| DELETE | `/workspaces/:id/secrets/:key` | WorkspaceAuth | Delete a secret by key. Auto-restarts the workspace. |
| GET | `/workspaces/:id/model` | WorkspaceAuth | Return the model configuration derived from available API keys (which provider keys are set). |
### Global Secrets
| Method | Path | Auth | Description |
|--------|------|------|-------------|
| GET | `/settings/secrets` | AdminAuth | List global secrets (keys only, values masked). |
| PUT | `/settings/secrets` | AdminAuth | Set a global secret `{ "key": "...", "value": "..." }`. Auto-restarts every non-paused/non-removed workspace that does not shadow the key with a workspace-level override. |
| POST | `/settings/secrets` | AdminAuth | Alias for PUT. |
| DELETE | `/settings/secrets/:key` | AdminAuth | Delete a global secret. Same auto-restart fan-out as PUT. |
Legacy aliases `GET/POST/DELETE /admin/secrets[/:key]` also exist and behave identically.
---
## Memory
### Key-Value Memory
| Method | Path | Auth | Description |
|--------|------|------|-------------|
| GET | `/workspaces/:id/memory` | WorkspaceAuth | List all key-value memory entries for a workspace. |
| POST | `/workspaces/:id/memory` | WorkspaceAuth | Set a memory entry `{ "key": "...", "value": "..." }`. |
| DELETE | `/workspaces/:id/memory/:key` | WorkspaceAuth | Delete a memory entry by key. |
### Agent Memories (HMA-scoped)
| Method | Path | Auth | Description |
|--------|------|------|-------------|
| GET | `/workspaces/:id/memories` | WorkspaceAuth | List agent memories for a workspace. |
| POST | `/workspaces/:id/memories` | WorkspaceAuth | Create an agent memory entry. |
| DELETE | `/workspaces/:id/memories/:id` | WorkspaceAuth | Delete an agent memory by ID. |
---
## Files
Workspace file management. Files are stored in the workspace's config directory.
| Method | Path | Auth | Description |
|--------|------|------|-------------|
| GET | `/workspaces/:id/files` | WorkspaceAuth | List files in the workspace config directory. |
| GET | `/workspaces/:id/files/*path` | WorkspaceAuth | Read a specific file. |
| PUT | `/workspaces/:id/files/*path` | WorkspaceAuth | Write a file. Creates parent directories as needed. |
| DELETE | `/workspaces/:id/files/*path` | WorkspaceAuth | Delete a file. |
| GET | `/workspaces/:id/shared-context` | WorkspaceAuth | Get the shared context files for a workspace (aggregated from parent hierarchy). |
---
## Activity
Activity logging and search for A2A communications, task updates, and agent logs.
| Method | Path | Auth | Description |
|--------|------|------|-------------|
| GET | `/workspaces/:id/activity` | WorkspaceAuth | List activity logs for a workspace. Supports `?source=canvas` or `?source=agent` filter. |
| POST | `/workspaces/:id/activity` | WorkspaceAuth | Log an activity entry (used by workspace runtimes to self-report). |
| POST | `/workspaces/:id/notify` | WorkspaceAuth | Agent-to-user push message via WebSocket. Delivers a notification to connected Canvas clients. |
### Session Search
| Method | Path | Auth | Description |
|--------|------|------|-------------|
| GET | `/workspaces/:id/session-search` | WorkspaceAuth | Search activity logs with filters for type, date range, and text content. Returns paginated results. |
---
## Schedules
Cron-based scheduled tasks per workspace.
| Method | Path | Auth | Description |
|--------|------|------|-------------|
| GET | `/workspaces/:id/schedules` | WorkspaceAuth | List all schedules for a workspace. |
| POST | `/workspaces/:id/schedules` | WorkspaceAuth | Create a schedule. Body: `{ "expression": "0 */6 * * *", "timezone": "UTC", "prompt": "...", "enabled": true }`. |
| PATCH | `/workspaces/:id/schedules/:scheduleId` | WorkspaceAuth | Update a schedule (expression, timezone, prompt, enabled). |
| DELETE | `/workspaces/:id/schedules/:scheduleId` | WorkspaceAuth | Delete a schedule. |
| POST | `/workspaces/:id/schedules/:scheduleId/run` | WorkspaceAuth | Manually trigger a schedule immediately. |
| GET | `/workspaces/:id/schedules/:scheduleId/history` | WorkspaceAuth | List past runs for a schedule. Includes status (`success`, `error`, `skipped`) and `error_detail`. |
Schedule `source` field: `template` for org/import-seeded schedules, `runtime` for Canvas/API-created. The `last_status` includes `skipped` when the scheduler concurrency-aware-skips a busy workspace.
---
## Channels
Social channel integrations (Telegram, Slack, etc.) for workspace agents.
### Per-Workspace Channels
| Method | Path | Auth | Description |
|--------|------|------|-------------|
| GET | `/workspaces/:id/channels` | WorkspaceAuth | List channels for a workspace. |
| POST | `/workspaces/:id/channels` | WorkspaceAuth | Create a channel. Body includes platform type, JSONB config, and allowlist. |
| PATCH | `/workspaces/:id/channels/:channelId` | WorkspaceAuth | Update a channel's config or allowlist. |
| DELETE | `/workspaces/:id/channels/:channelId` | WorkspaceAuth | Delete a channel. |
| POST | `/workspaces/:id/channels/:channelId/send` | WorkspaceAuth | Send an outbound message through the channel. |
| POST | `/workspaces/:id/channels/:channelId/test` | WorkspaceAuth | Test the channel connection (send a test message). |
### Global Channel Endpoints
| Method | Path | Auth | Description |
|--------|------|------|-------------|
| GET | `/channels/adapters` | None | List available social platform adapters (Telegram, Slack, etc.). |
| POST | `/channels/discover` | AdminAuth | Auto-detect available chats/groups for a bot token. |
| POST | `/webhooks/:type` | None | Incoming webhook endpoint for social platforms. The `:type` parameter identifies the platform (e.g., `telegram`, `slack`). |
---
## Plugins
Plugin registry and per-workspace plugin management.
### Global Plugin Registry
| Method | Path | Auth | Description |
|--------|------|------|-------------|
| GET | `/plugins` | None | List all plugins in the registry. Supports `?runtime=` filter to show only compatible plugins. |
| GET | `/plugins/sources` | None | List registered install-source schemes (e.g., `github://`, `local://`). |
### Per-Workspace Plugins
| Method | Path | Auth | Description |
|--------|------|------|-------------|
| GET | `/workspaces/:id/plugins` | WorkspaceAuth | List installed plugins for a workspace. |
| POST | `/workspaces/:id/plugins` | WorkspaceAuth | Install a plugin. Body: `{ "source": "github://org/repo" }`. Safeguards: 64 KiB body limit, 5 min fetch timeout, 100 MiB max staged-tree. |
| DELETE | `/workspaces/:id/plugins/:name` | WorkspaceAuth | Uninstall a plugin by name. |
| GET | `/workspaces/:id/plugins/available` | WorkspaceAuth | List plugins available for this workspace (filtered by workspace runtime). |
| GET | `/workspaces/:id/plugins/compatibility` | WorkspaceAuth | Preflight runtime-change check. Query: `?runtime=X`. Returns which currently-installed plugins would be incompatible with the target runtime. |
---
## Auth Tokens
Bearer token management for workspaces.
| Method | Path | Auth | Description |
|--------|------|------|-------------|
| GET | `/workspaces/:id/tokens` | WorkspaceAuth | List active tokens for a workspace (token values are masked). |
| POST | `/workspaces/:id/tokens` | WorkspaceAuth | Create a new bearer token for the workspace. |
| DELETE | `/workspaces/:id/tokens/:tokenId` | WorkspaceAuth | Revoke a specific token. |
### Test Token (Development Only)
| Method | Path | Auth | Description |
|--------|------|------|-------------|
| GET | `/admin/workspaces/:id/test-token` | None | Mint a fresh bearer token for E2E scripts. Returns 404 unless `MOLECULE_ENV != production` or `MOLECULE_ENABLE_TEST_TOKENS=1`. |
---
## Teams
Expand and collapse team views in the Canvas hierarchy.
| Method | Path | Auth | Description |
|--------|------|------|-------------|
| POST | `/workspaces/:id/expand` | WorkspaceAuth | Expand a team workspace to show its children on the canvas. |
| POST | `/workspaces/:id/collapse` | WorkspaceAuth | Collapse a team workspace to hide its children. |
---
## Templates and Bundles
### Templates
| Method | Path | Auth | Description |
|--------|------|------|-------------|
| GET | `/templates` | None | List available workspace templates with their runtime, description, and config schema. |
| POST | `/templates/import` | AdminAuth | Import a workspace template from a `github://` source URL. |
### Org Templates
| Method | Path | Auth | Description |
|--------|------|------|-------------|
| GET | `/org/templates` | None | List available organization templates. |
| POST | `/org/import` | AdminAuth | Import an org template. Applies `resolveInsideRoot` path sanitization. Creates the full workspace hierarchy defined in `org.yaml`. |
### Bundles
| Method | Path | Auth | Description |
|--------|------|------|-------------|
| GET | `/bundles/export/:id` | AdminAuth | Export a workspace (or workspace tree) as a portable bundle. Includes config, secrets (keys only), memory, schedules, and hierarchy. |
| POST | `/bundles/import` | AdminAuth | Import a previously-exported bundle. Recreates the workspace tree with all associated data. |
---
## Approvals
Human-in-the-loop approval system for agent actions.
| Method | Path | Auth | Description |
|--------|------|------|-------------|
| POST | `/workspaces/:id/approvals` | WorkspaceAuth | Create an approval request. Body includes the action description, metadata, and options. |
| GET | `/workspaces/:id/approvals` | WorkspaceAuth | List approval requests for a workspace. |
| POST | `/workspaces/:id/approvals/:id/decide` | WorkspaceAuth | Approve or reject an approval request. Body: `{ "decision": "approve" }` or `{ "decision": "reject" }`. |
| GET | `/approvals/pending` | AdminAuth | List all pending approval requests across all workspaces. |
---
## Canvas
Canvas viewport persistence (cosmetic only).
| Method | Path | Auth | Description |
|--------|------|------|-------------|
| GET | `/canvas/viewport` | None | Get the saved canvas viewport (zoom, pan position). Open endpoint for bootstrap-friendliness. |
| PUT | `/canvas/viewport` | CanvasOrBearer | Save the canvas viewport. Accepts bearer OR matching `Origin` header. Worst case on forgery: viewport corruption, recovered by page refresh. |
---
## Traces
LLM trace retrieval from Langfuse.
| Method | Path | Auth | Description |
|--------|------|------|-------------|
| GET | `/workspaces/:id/traces` | WorkspaceAuth | List LLM traces for a workspace from Langfuse. |
---
## Events
Append-only event log for structure changes.
| Method | Path | Auth | Description |
|--------|------|------|-------------|
| GET | `/events` | AdminAuth | List all structure events across all workspaces. |
| GET | `/events/:workspaceId` | AdminAuth | List structure events for a specific workspace. |
---
## Terminal
WebSocket-based terminal access to workspace containers.
| Method | Path | Auth | Description |
|--------|------|------|-------------|
| WS | `/workspaces/:id/terminal` | WorkspaceAuth | Open a WebSocket terminal session to the workspace container. Provides interactive shell access. |
---
## WebSocket
Real-time event streaming for Canvas clients.
| Method | Path | Auth | Description |
|--------|------|------|-------------|
| WS | `/ws` | None | Connect to the WebSocket hub. Receives all structure events (`WORKSPACE_ONLINE`, `WORKSPACE_OFFLINE`, `HEARTBEAT`, `CONFIG_UPDATED`, `A2A_RESPONSE`, `AGENT_MESSAGE`, etc.). Canvas clients connect here for real-time updates. |
---
## Error Responses
All endpoints return standard HTTP status codes:
| Status | Meaning |
|--------|---------|
| 200 | Success |
| 201 | Created |
| 400 | Bad request (malformed body, missing required fields) |
| 401 | Unauthorized (missing or invalid bearer token) |
| 403 | Forbidden (valid token but insufficient access) |
| 404 | Not found (workspace, schedule, channel, etc. does not exist) |
| 409 | Conflict (idempotency key collision on delegation) |
| 429 | Rate limited (exceeds `RATE_LIMIT` requests/min) |
| 500 | Internal server error |
Error response body format:
```json
{
"error": "human-readable error message"
}
```
---
## Rate Limiting
All endpoints are subject to a global rate limit of `RATE_LIMIT` requests per minute (default: 600). When exceeded, the platform returns `429 Too Many Requests` with a `Retry-After` header.
---
## CORS
The platform sets CORS headers based on the `CORS_ORIGINS` environment variable (comma-separated list, default: `http://localhost:3000,http://localhost:3001`). Preflight (`OPTIONS`) requests are handled automatically by the Gin CORS middleware.

View File

@ -1,11 +1,341 @@
---
title: Architecture
description: Stub page — content coming soon.
description: System architecture, components, infrastructure, and communication model for the Molecule AI platform.
---
> 🚧 **Coming soon.** The Documentation Specialist agent will populate this
> page on its next maintenance cycle.
# Architecture
If you need this content urgently, open an issue on the
[docs repo](https://github.com/Molecule-AI/docs/issues/new) and the agent
will prioritise it on its next cron tick.
Molecule AI is a platform for orchestrating AI agent workspaces that form an organizational hierarchy. Workspaces register with a central platform, communicate via A2A (Agent-to-Agent) protocol, and are visualized on a drag-and-drop canvas.
## System Overview
```
Canvas (Next.js :3000) <--WebSocket--> Platform (Go :8080) <--HTTP--> Postgres + Redis
|
Workspace A <----A2A----> Workspace B
(Python agents)
| register/heartbeat |
+------ Platform ----+
```
The Canvas provides the visual interface, the Platform acts as the control plane, and Workspaces are isolated containers running AI agent runtimes. All inter-agent communication is mediated by the Platform via the A2A proxy, which enforces hierarchical access control.
---
## Four Main Components
### Canvas
**Stack:** Next.js 15 + React Flow (@xyflow/react v12) + Zustand + Tailwind CSS
The Canvas is the browser-based visual workspace graph. It provides:
- **Drag-and-drop layout** with persistent node positions (saved via `PATCH /workspaces/:id`)
- **Team nesting** using recursive `TeamMemberChip` components (up to 3 levels deep)
- **Real-time status** via WebSocket connection to the Platform
- **Chat interface** with two sub-tabs: "My Chat" (user-to-agent) and "Agent Comms" (agent-to-agent A2A traffic)
- **Config editor** with "Save & Restart" and "Save" (deferred restart) modes
- **Secrets management** with auto-restart on POST/DELETE
**State management:**
| Concern | Mechanism |
|---------|-----------|
| Initial load | HTTP fetch `GET /workspaces` into Zustand |
| Real-time updates | WebSocket events via `applyEvent()` |
| Position persistence | `onNodeDragStop` sends `PATCH /workspaces/:id` with `{x, y}` |
| Node nesting | `nestNode` sets `hidden: !!targetId`; children render inside parent |
**Environment variables:**
| Variable | Default | Purpose |
|----------|---------|---------|
| `NEXT_PUBLIC_PLATFORM_URL` | `http://localhost:8080` | Platform API base URL |
| `NEXT_PUBLIC_WS_URL` | `ws://localhost:8080/ws` | WebSocket endpoint |
### Platform
**Stack:** Go / Gin
The Platform is the central control plane responsible for:
- **Workspace CRUD** -- create, read, update, delete workspaces
- **Registry** -- workspace registration, heartbeat tracking, agent card management
- **Discovery** -- peer lookup, access control checks
- **WebSocket hub** -- real-time event broadcasting to Canvas clients
- **Liveness monitoring** -- three-layer container health detection
- **A2A proxy** -- routes inter-agent messages with hierarchical access control
- **Docker provisioner** -- container lifecycle management with tier-based resource limits
- **Scheduler** -- cron-based scheduled tasks per workspace
- **Channel adapters** -- social integrations (Telegram, Slack, etc.)
**Key environment variables:**
| Variable | Default | Purpose |
|----------|---------|---------|
| `DATABASE_URL` | (required) | Postgres connection string |
| `REDIS_URL` | (required) | Redis connection string |
| `PORT` | `8080` | Server listen port |
| `PLATFORM_URL` | `http://host.docker.internal:PORT` | URL passed to agent containers |
| `SECRETS_ENCRYPTION_KEY` | (optional) | AES-256 key, 32 bytes |
| `CORS_ORIGINS` | `http://localhost:3000,http://localhost:3001` | Allowed CORS origins |
| `RATE_LIMIT` | `600` | Requests per minute |
| `MOLECULE_ENV` | (optional) | Set `production` to hide test endpoints |
| `MOLECULE_ORG_ID` | (optional) | SaaS tenant org gating |
| `WORKSPACE_DIR` | (optional) | Global fallback host path for `/workspace` bind-mount |
| `AWARENESS_URL` | (optional) | Injected into workspace containers for cross-session memory |
| `ACTIVITY_RETENTION_DAYS` | `7` | How long activity logs are kept |
| `ACTIVITY_CLEANUP_INTERVAL_HOURS` | `6` | Cleanup sweep interval |
**Workspace tier resource limits:**
| Tier | Env (Memory) | Env (CPU) | Defaults |
|------|-------------|-----------|----------|
| Standard (Tier 2) | `TIER2_MEMORY_MB` | `TIER2_CPU_SHARES` | 512 MB / 1 CPU |
| Privileged (Tier 3) | `TIER3_MEMORY_MB` | `TIER3_CPU_SHARES` | 2048 MB / 2 CPU |
| Full-host (Tier 4) | `TIER4_MEMORY_MB` | `TIER4_CPU_SHARES` | 4096 MB / 4 CPU |
### Workspace Runtime
**Published as:** [`molecule-ai-workspace-runtime`](https://pypi.org/project/molecule-ai-workspace-runtime/) on PyPI
The shared runtime provides the base agent infrastructure: A2A server, heartbeat loop, config loading, platform auth, plugin system, and built-in tools. Each AI framework adapter lives in its own standalone repository.
| Runtime | Standalone Repo | Key Dependencies |
|---------|-----------------|------------------|
| LangGraph | `molecule-ai-workspace-template-langgraph` | langchain-anthropic, langgraph |
| Claude Code | `molecule-ai-workspace-template-claude-code` | claude-agent-sdk, @anthropic-ai/claude-code |
| OpenClaw | `molecule-ai-workspace-template-openclaw` | openclaw (npm) |
| CrewAI | `molecule-ai-workspace-template-crewai` | crewai |
| AutoGen | `molecule-ai-workspace-template-autogen` | autogen |
| DeepAgents | `molecule-ai-workspace-template-deepagents` | deepagents |
| Hermes | `molecule-ai-workspace-template-hermes` | openai, anthropic, google-genai |
| Gemini CLI | `molecule-ai-workspace-template-gemini-cli` | @google/gemini-cli (npm) |
Each adapter repo has its own `Dockerfile` that installs `molecule-ai-workspace-runtime` from PyPI plus adapter-specific dependencies. Templates are cloned at Docker build time into the platform image via `manifest.json`.
### molecli
**Stack:** Go / Bubbletea + Lipgloss
A terminal UI dashboard for real-time workspace monitoring, event log streaming, health overview, and delete/filter operations. Reads `MOLECLI_URL` (default `http://localhost:8080`) to locate the platform. Now published as a standalone repo at `github.com/Molecule-AI/molecule-cli`.
---
## Infrastructure Services
All services run via `docker-compose.infra.yml`, attached to the shared `molecule-monorepo-net` network. Start them with:
```bash
./infra/scripts/setup.sh # Start Postgres, Redis, Langfuse, Temporal; run migrations
```
### Postgres (port 5432)
Primary datastore for workspaces, events, activity logs, secrets, schedules, channels, and more. Also backs Langfuse and Temporal via separate databases.
Key tables:
| Table | Purpose |
|-------|---------|
| `workspaces` | Core entity -- status, runtime, agent_card, heartbeat, current_task |
| `canvas_layouts` | Persisted x/y positions |
| `structure_events` | Append-only event log |
| `activity_logs` | A2A communications, task updates, agent logs, errors |
| `workspace_schedules` | Cron tasks with expression, timezone, prompt, run history |
| `workspace_channels` | Social channel integrations with JSONB config |
| `workspace_secrets` / `global_secrets` | Encrypted secrets storage |
| `workspace_auth_tokens` | Bearer tokens (auto-revoked on workspace delete) |
| `agent_memories` | HMA-scoped agent memory |
| `approvals` | Human-in-the-loop approval requests |
**Migration runner:** On startup, the platform globs `*.sql` in the migrations directory, filters out `.down.sql` files, sorts alphabetically, and executes each. All `.up.sql` files must be idempotent (`CREATE TABLE IF NOT EXISTS`, `ALTER TABLE ... IF NOT EXISTS`).
**JSONB gotcha:** When inserting Go `[]byte` (from `json.Marshal`) into Postgres JSONB columns, you must convert to `string()` first and use `::jsonb` cast in SQL. The `lib/pq` driver treats `[]byte` as `bytea`, not JSONB.
### Redis (port 6379)
Used for pub/sub event broadcasting and heartbeat TTL tracking. Workspace heartbeat keys expire after 60 seconds -- expiry triggers the liveness monitor.
### Langfuse (port 3001)
LLM trace viewer backed by ClickHouse. Provides observability into agent LLM calls, token usage, and latency.
### Temporal (port 7233 gRPC, port 8233 Web UI)
Durable workflow engine for `workspace-template/builtin_tools/temporal_workflow.py`. Dev-only posture: the auto-setup image runs with no auth on `0.0.0.0:7233`. Production deployments must gate access via mTLS or an API key / reverse proxy.
---
## Communication Model
### WebSocket Events Flow
```
1. Action occurs (register, heartbeat, config change, etc.)
2. broadcaster.RecordAndBroadcast()
-> inserts into structure_events table
-> publishes to Redis pub/sub
3. Redis subscriber relays to WebSocket hub
4. Hub broadcasts to:
- Canvas clients (all events)
- Workspace clients (filtered by CanCommunicate)
```
### A2A Proxy
The A2A proxy (`POST /workspaces/:id/a2a`) routes agent-to-agent messages. The caller identifies itself via the `X-Workspace-ID` header and authenticates with `Authorization: Bearer <token>`.
### Access Control Rules
Determined by `CanCommunicate(callerID, targetID)` in `registry/access.go`:
| Relationship | Allowed |
|-------------|---------|
| Same workspace (self-call) | Yes |
| Siblings (same `parent_id`) | Yes |
| Root-level siblings (both `parent_id` IS NULL) | Yes |
| Parent to child / child to parent | Yes |
| System callers (`webhook:*`, `system:*`, `test:*`) | Yes (bypass) |
| Canvas requests (no `X-Workspace-ID`) | Yes (bypass) |
| Everything else | **Denied** |
### Import Cycle Prevention
The platform uses function injection to avoid Go import cycles between `ws`, `registry`, and `events` packages:
- `ws.NewHub(canCommunicate AccessChecker)` -- Hub accepts `registry.CanCommunicate` as a function
- `registry.StartLivenessMonitor(ctx, onOffline OfflineHandler)` -- Liveness accepts broadcaster callback
- `registry.StartHealthSweep(ctx, checker ContainerChecker, interval, onOffline)` -- Health sweep accepts Docker checker interface
- Wiring happens in `platform/cmd/server/main.go` -- init order: `wh -> onWorkspaceOffline -> liveness/healthSweep -> router`
---
## Container Health Detection
Three independent layers detect dead containers (e.g., Docker Desktop crash):
### Layer 1: Passive (Redis TTL)
Each workspace sends heartbeats that set a Redis key with a 60-second TTL. When the key expires, the liveness monitor detects the workspace as offline and triggers an auto-restart.
### Layer 2: Proactive (Health Sweep)
`registry.StartHealthSweep` polls the Docker API every 15 seconds. Catches dead containers faster than waiting for Redis TTL expiry.
### Layer 3: Reactive (A2A Proxy)
When the A2A proxy encounters a connection error to a workspace, it immediately checks `provisioner.IsRunning()`. If the container is dead, it marks the workspace offline and triggers a restart.
All three layers call `onWorkspaceOffline`, which broadcasts `WORKSPACE_OFFLINE` and initiates `wh.RestartByID()`. Redis cleanup uses the shared `db.ClearWorkspaceKeys()` function.
---
## Workspace Lifecycle
```
provisioning --> online (on register)
^ |
| degraded (error_rate > 0.5)
| |
| online (recovered)
| |
| offline (Redis TTL expired / health sweep)
| |
+--- auto-restart ---+
|
removed (deleted)
Any state --> paused (user pauses) --> provisioning (user resumes)
```
Paused workspaces skip health sweep, liveness monitor, and auto-restart.
**Restart context:** After any restart and successful re-registration, the platform sends a synthetic A2A `message/send` with `metadata.kind=restart_context` containing the restart timestamp, previous session info, and available env-var keys (keys only, never values). The sender uses the `system:restart-context` caller prefix to bypass `CanCommunicate`. If the workspace does not re-register within 30 seconds, the message is dropped.
**Initial prompt:** Agents can auto-execute a prompt on startup before any user interaction. Configure via `initial_prompt` (inline string) or `initial_prompt_file` (path relative to config dir) in `config.yaml`. A `.initial_prompt_done` marker file prevents re-execution on restart.
**Idle loop:** When `idle_prompt` is non-empty in `config.yaml`, the workspace self-sends it every `idle_interval_seconds` (default 600) while `heartbeat.active_tasks == 0`. The idle check is local (no LLM call) and the prompt only fires when the agent is genuinely idle.
---
## Deployment Modes
### Self-Hosted
Run the full stack on your own infrastructure using Docker Compose:
```bash
# Infrastructure only (Postgres, Redis, Langfuse, Temporal)
docker compose -f docker-compose.infra.yml up -d
# Full stack
docker compose up
```
### SaaS
Hosted at `moleculesai.app` with per-tenant isolation. Each tenant gets a dedicated Fly Machine running the tenant image. The `MOLECULE_ORG_ID` env var gates API access -- every non-allowlisted request must carry a matching `X-Molecule-Org-Id` header or gets a 404. When unset, the guard is a passthrough so self-hosted and dev environments are unaffected.
### Tenant Image
`platform/Dockerfile.tenant` bundles the Go platform + Canvas frontend + templates into a single container image, published to `ghcr.io/molecule-ai/platform:latest` and `:sha-<short>`.
---
## Subdomain Architecture
| Subdomain | Service | Purpose |
|-----------|---------|---------|
| `moleculesai.app` | Landing page | Marketing site |
| `app.moleculesai.app` | SaaS dashboard | Tenant management UI |
| `api.moleculesai.app` | Control plane API | Platform REST + WebSocket |
| `doc.moleculesai.app` | Documentation | This documentation site |
| `status.moleculesai.app` | Status page | Uptime and incident tracking |
| `*.moleculesai.app` | Tenant instances | Per-org isolated platform instances |
---
## Plugin System
Plugins extend workspace capabilities. Two categories exist:
**Shared plugins** (auto-loaded by every workspace):
- **molecule-dev** -- codebase conventions + review-loop skill
- **superpowers** -- verification, TDD, systematic debugging, writing plans
- **ecc** -- general Claude Code guardrails
- **browser-automation** -- Puppeteer/CDP web scraping and live canvas screenshots
**Modular guardrails** (opt-in per workspace):
- **Hook plugins** (ambient enforcement): `molecule-careful-bash`, `molecule-freeze-scope`, `molecule-audit-trail`, `molecule-session-context`, `molecule-prompt-watchdog`
- **Skill plugins** (on-demand): `molecule-skill-code-review`, `molecule-skill-cross-vendor-review`, `molecule-skill-llm-judge`, `molecule-skill-update-docs`, `molecule-skill-cron-learnings`
- **Workflow plugins** (slash commands): `molecule-workflow-triage`, `molecule-workflow-retro`
**Org-template plugin resolution:** Per-workspace `plugins:` lists in org template `org.yaml` role overrides UNION with `defaults.plugins` (deduplicated, defaults first). To opt a specific default out for a given role, prefix the plugin name with `!` or `-` (e.g. `!browser-automation`).
Plugin install safeguards:
| Parameter | Default | Purpose |
|-----------|---------|---------|
| `PLUGIN_INSTALL_BODY_MAX_BYTES` | 65536 (64 KiB) | Max request body size |
| `PLUGIN_INSTALL_FETCH_TIMEOUT` | 5m | Whole fetch+copy deadline |
| `PLUGIN_INSTALL_MAX_DIR_BYTES` | 104857600 (100 MiB) | Max staged-tree size |
---
## CI Pipeline
GitHub Actions runs on push to main and on pull requests:
| Job | What it does |
|-----|-------------|
| `platform-build` | Go build, vet, `go test -race` with 25% coverage threshold |
| `canvas-build` | npm build, vitest run (tests must exist and pass) |
| `python-lint` | pytest with coverage for workspace-template |
| `e2e-api` | Spins up Postgres + Redis, runs 62 API tests against locally-built binary |
| `shellcheck` | Lints all E2E shell scripts |
| `publish-platform-image` | Builds and pushes to `ghcr.io/molecule-ai/platform` (main only) |
Standalone repos (plugins + templates) use reusable workflows from `Molecule-AI/molecule-ci` for schema validation, secrets scanning, and Docker build smoke tests.

View File

@ -1,11 +1,259 @@
---
title: Channels
description: Stub page — content coming soon.
description: Connect workspaces to Telegram, Slack, and Lark/Feishu for social integrations.
---
> 🚧 **Coming soon.** The Documentation Specialist agent will populate this
> page on its next maintenance cycle.
## Overview
If you need this content urgently, open an issue on the
[docs repo](https://github.com/Molecule-AI/docs/issues/new) and the agent
will prioritise it on its next cron tick.
Channels let workspaces send and receive messages on social platforms. Each
workspace can have multiple channel integrations — a Telegram bot, a Slack
webhook, a Lark/Feishu Custom Bot — configured independently with per-channel
allowlists and JSONB config.
Outbound messages flow from the workspace through the platform adapter to the
social platform. Inbound messages arrive via webhooks (`POST /webhooks/:type`),
are parsed by the adapter, and forwarded to the workspace as A2A
`message/send` requests.
```
User (Telegram/Slack/Lark) ──webhook──> Platform ──A2A──> Workspace Agent
<──adapter── (response)
User <──bot message──────────────────────────────────────/
```
---
## Adapters
Three adapters are registered out of the box. Use `GET /channels/adapters` to
list them at runtime.
### Telegram
Uses the Telegram Bot API. Supports both long-polling (for inbound) and direct
API calls (for outbound). The adapter caches `BotAPI` instances to avoid
repeated `getMe` calls.
**Required config fields:**
| Field | Type | Description |
|-------|------|-------------|
| `bot_token` | string | Telegram bot token (`123456789:ABCdef...`). Validated against a strict regex. |
| `chat_id` | string | Comma-separated chat IDs to listen on and send to. |
**Features:**
- Long-polling with 30s timeout and 2s retry interval
- Auto-reply to `/start` with the chat ID (useful for setup)
- Bot commands: `/start`, `/help`, `/reset` (clear history), `/cancel` (best-effort)
- Long messages automatically split at paragraph/line/word boundaries (4096 char limit)
- Typing indicator sent while the agent processes
- Rate-limit handling with `retry_after` backoff
- Auto-discovers chats via `getUpdates` (including `my_chat_member` events for group adds)
- Auto-disables the channel when the bot is kicked from a chat
### Slack
Uses Slack Incoming Webhooks for outbound and the Slack Events API for inbound.
**Required config fields:**
| Field | Type | Description |
|-------|------|-------------|
| `webhook_url` | string | Slack Incoming Webhook URL (must start with `https://hooks.slack.com/`). |
**Features:**
- Outbound via Incoming Webhook (no OAuth required)
- Inbound via Events API JSON payload or slash command (URL-encoded form)
- `url_verification` challenge handshake supported
- Slash commands prepend the command name so the agent sees the full invocation
### Lark / Feishu
Outbound via Custom Bot webhooks, inbound via Event Subscriptions.
**Required config fields:**
| Field | Type | Description |
|-------|------|-------------|
| `webhook_url` | string | Custom Bot webhook URL. Must start with `https://open.feishu.cn/open-apis/bot/v2/hook/` or `https://open.larksuite.com/open-apis/bot/v2/hook/`. |
**Optional config fields:**
| Field | Type | Description |
|-------|------|-------------|
| `verify_token` | string | Verification Token from the app's Event Subscriptions page. When set, inbound events with a mismatching token are rejected. |
**Features:**
- Both China (`open.feishu.cn`) and international (`open.larksuite.com`) endpoints supported
- `url_verification` handshake with constant-time `verify_token` comparison
- v2 event payload parsing (`im.message.receive_v1`)
- Token verification on both `url_verification` and `event_callback` payloads
- Application-level error codes checked (Lark returns HTTP 200 even for app errors)
---
## Setup Flow
### 1. Create a Channel
```bash
curl -X POST http://localhost:8080/workspaces/{id}/channels \
-H "Content-Type: application/json" \
-H "Authorization: Bearer {token}" \
-d '{
"type": "telegram",
"config": {
"bot_token": "123456789:ABCdefGHIjklmnopQRSTuvwxyz",
"chat_id": "-1001234567890"
}
}'
```
### 2. Test the Connection
```bash
curl -X POST http://localhost:8080/workspaces/{id}/channels/{channelId}/test \
-H "Authorization: Bearer {token}"
```
### 3. Send a Message
```bash
curl -X POST http://localhost:8080/workspaces/{id}/channels/{channelId}/send \
-H "Content-Type: application/json" \
-H "Authorization: Bearer {token}" \
-d '{"text": "Hello from the agent!"}'
```
---
## Inbound Webhooks
Register your platform's public URL as the webhook endpoint for each social
platform. Inbound messages arrive at:
```
POST /webhooks/:type
```
where `:type` is `telegram`, `slack`, or `lark`. The platform:
1. Looks up all channels of that type
2. Calls the adapter's `ParseWebhook` to extract a standardized `InboundMessage`
3. Checks the allowlist (if configured)
4. Forwards the message to the workspace via A2A `message/send`
For Telegram, the platform can also use long-polling instead of webhooks,
started automatically when a Telegram channel is created.
---
## Discover Chats
Auto-detect available chats for a bot token before creating a channel:
```bash
curl -X POST http://localhost:8080/channels/discover \
-H "Content-Type: application/json" \
-d '{"type": "telegram", "bot_token": "123456789:ABCdef..."}'
```
Returns the bot username, discovered chats (with IDs, names, and types), and
whether the bot can read all group messages (Telegram privacy mode).
---
## Allowlists
Each channel row has an `allowed_users` JSONB array. When non-empty, only
messages from users whose IDs appear in the list are forwarded to the workspace.
All others are silently dropped.
---
## Config Encryption
Sensitive config fields (like `bot_token`) are encrypted at rest. The `List`
endpoint decrypts them server-side and masks tokens in the response
(showing only the first 4 and last 4 characters).
---
## API Reference
| Method | Path | Description |
|--------|------|-------------|
| GET | `/channels/adapters` | List available adapter types |
| POST | `/channels/discover` | Auto-detect chats for a bot token |
| GET | `/workspaces/:id/channels` | List channels for a workspace |
| POST | `/workspaces/:id/channels` | Add a channel |
| PATCH | `/workspaces/:id/channels/:channelId` | Update a channel |
| DELETE | `/workspaces/:id/channels/:channelId` | Remove a channel |
| POST | `/workspaces/:id/channels/:channelId/test` | Test connection |
| POST | `/workspaces/:id/channels/:channelId/send` | Send outbound message |
| POST | `/webhooks/:type` | Incoming social webhook |
---
## Example Configs
### Telegram
```json
{
"type": "telegram",
"config": {
"bot_token": "123456789:ABCdefGHIjklmnopQRSTuvwxyz_1234",
"chat_id": "-1001234567890"
}
}
```
Multiple chats (comma-separated):
```json
{
"type": "telegram",
"config": {
"bot_token": "123456789:ABCdefGHIjklmnopQRSTuvwxyz_1234",
"chat_id": "-1001234567890, -1009876543210"
}
}
```
### Slack
```json
{
"type": "slack",
"config": {
"webhook_url": "https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
}
}
```
### Lark / Feishu
```json
{
"type": "lark",
"config": {
"webhook_url": "https://open.larksuite.com/open-apis/bot/v2/hook/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"verify_token": "your-verification-token"
}
}
```
China endpoint:
```json
{
"type": "lark",
"config": {
"webhook_url": "https://open.feishu.cn/open-apis/bot/v2/hook/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}
}
```

View File

@ -1,10 +1,8 @@
---
title: Concepts
description: The five primitives that compose every Molecule AI org — workspaces, plugins, channels, schedules, and the canvas.
description: The core primitives that compose every Molecule AI org — workspaces, plugins, channels, schedules, tokens, external agents, and the canvas.
---
If you understand these five concepts, you understand the whole platform.
## Workspaces
A **workspace** is a real Docker container running a real LLM agent. Each
@ -12,16 +10,31 @@ workspace has:
- A **role** (a one-line job description fed into its system prompt)
- An **initial prompt** (run once at first boot — typically clone repo,
read CLAUDE.md, memorise context)
- A **runtime** (`claude-code`, `langgraph`, `crewai`, `autogen`, etc.)
- A **tier** (resource budget — memory and CPU caps)
read docs, memorise context)
- A **runtime** (`claude-code`, `langgraph`, `crewai`, `autogen`, `deepagents`,
`openclaw`, `hermes`, `gemini-cli`)
- A **tier** (resource budget — T1 sandboxed, T2 standard, T3 privileged, T4 full-host)
- An optional **parent** (forms the org tree)
- An optional **workspace_dir** (a host path bind-mounted into the
container — gives the agent direct access to your codebase)
Workspaces talk to each other via A2A (agent-to-agent) messages, routed
by the platform. Every message becomes an edge on the canvas in real
time.
Workspaces talk to each other via **A2A** (agent-to-agent) messages, routed
by the platform. Communication rules: same workspace, siblings, and
parent/child are allowed; everything else is denied.
## External agents
An **external agent** is a workspace with `runtime: external` — it runs on
your own infrastructure instead of the platform's Docker network. External
agents:
- Register via `POST /registry/register` and receive a bearer token
- Send heartbeats every 30 seconds to stay online
- Accept A2A messages at their registered URL
- Appear on the canvas with a purple **REMOTE** badge
- Skip Docker health sweep (liveness is heartbeat-only)
See [External Agents](/docs/external-agents) for the full registration guide.
## Plugins
@ -33,30 +46,56 @@ A **plugin** is a bundle of capabilities a workspace can install:
review, LLM-as-judge gates
- **Slash commands** — `/triage`, `/retro`, etc.
- **MCP servers** — bring in tools the model can call
- **Builtin tools** — Python/JS extensions exposed to the agent
Plugins compose. Per-workspace plugin lists UNION with the org-wide
defaults — so adding one extra capability to one role doesn't require
re-listing every default.
Plugins have two axes: **source** (where to fetch — `local://`, `github://`)
and **shape** (what's inside — agentskills.io format, MCP server, etc.).
Plugins compose. Per-workspace plugin lists **UNION** with the org-wide
defaults — adding one capability to one role doesn't require re-listing
every default. Use `!plugin-name` to opt a specific default out.
See [Plugins](/docs/plugins) for the full guide.
## Channels
A **channel** wires a workspace to an external messaging surface:
Telegram, Slack, Discord, email, webhooks. Once connected, the user can
talk to the agent from outside the canvas — and the agent can broadcast
back.
A **channel** wires a workspace to an external messaging platform:
| Adapter | Platform | Config |
|---------|----------|--------|
| `telegram` | Telegram | Bot token + chat_id allowlist |
| `slack` | Slack | Workspace token + channel |
| `lark` | Lark / Feishu | Custom Bot webhook + Event Subscriptions |
Once connected, users can talk to agents from outside the canvas — and
agents can broadcast back. Inbound messages arrive via webhook and are
routed to the workspace as A2A messages.
See [Channels](/docs/channels) for setup instructions.
## Schedules
A **schedule** is a cron-driven recurring prompt. Each tick fires an A2A
message into the workspace, which the agent treats as a new task. Every
running schedule is supervised — panics in the dispatch path are
recovered with exponential backoff, and a liveness watchdog surfaces
stuck subsystems via `/admin/liveness`.
message into the workspace, which the agent treats as a new task. Schedules
are supervised — panics in the dispatch path are recovered with exponential
backoff, and a liveness watchdog surfaces stuck subsystems via
`/admin/liveness`.
Schedules let you build the *evolution* loop alongside the *review*
loop: hourly security audits, daily ecosystem watches, weekly plugin
curation, etc.
Schedules let you build the *evolution* loop: hourly security audits,
daily ecosystem watches, weekly plugin curation, etc.
See [Schedules](/docs/schedules) for the full guide.
## Tokens
**Bearer tokens** authenticate agents and API clients. Each token is
scoped to a single workspace — a token from workspace A cannot access
workspace B.
- Issued on first registration (`POST /registry/register`)
- Create/list/revoke via `GET/POST/DELETE /workspaces/:id/tokens`
- 256-bit entropy, sha256-hashed in DB, plaintext shown once
See [Token Management](/docs/tokens) for the full guide.
## The canvas
@ -66,18 +105,19 @@ write, every scheduled fire, every status change pushes a WebSocket
event in real time.
The canvas isn't just a viewer — it's the operator surface. Drag nodes
to reorganise, click to chat, watch the team work.
to reorganise teams, click to chat, right-click for actions, watch the
team work in real time.
## How they fit together
A typical org definition looks like:
A typical org definition:
```yaml
org_name: My Team
defaults:
runtime: claude-code
tier: 2
plugins: [ecc, molecule-dev, superpowers, molecule-careful-bash, ...]
plugins: [ecc, molecule-dev, superpowers, molecule-careful-bash]
category_routing:
security: [Backend Engineer]
ui: [Frontend Engineer]
@ -85,10 +125,10 @@ defaults:
workspaces:
- name: PM
canvas: { x: 400, y: 50 }
plugins: [molecule-workflow-triage, molecule-workflow-retro]
plugins: [molecule-workflow-triage]
channels:
- type: telegram
config: { ... }
config: { bot_token: "${TELEGRAM_BOT_TOKEN}", chat_id: "12345" }
children:
- name: Dev Lead
children:
@ -100,6 +140,13 @@ workspaces:
prompt: "Run npm run typecheck and report any new errors..."
```
That's the entire mental model. Templates → plugins → channels →
schedules → canvas. Everything else in the docs is depth on one of
these five.
That's the mental model. Templates → plugins → channels → schedules →
tokens → canvas. Everything else in the docs is depth on one of these
primitives.
## MCP integration
Any MCP-compatible AI agent can manage Molecule AI workspaces using the
[MCP Server](/docs/mcp-server) — 87 tools covering workspace CRUD,
communication, secrets, memory, files, schedules, channels, plugins,
and more. Install via `npx @molecule-ai/mcp-server`.

View File

@ -9,6 +9,16 @@ multi-agent organisations. You define your team in one YAML file
talk on, schedule their recurring work — and the platform takes care of the
rest.
## Try it now
| | |
|---|---|
| **Dashboard** | [app.moleculesai.app](https://app.moleculesai.app) — create orgs, deploy agents |
| **API** | [api.moleculesai.app](https://api.moleculesai.app) — control plane REST API |
| **Documentation** | [doc.moleculesai.app](https://doc.moleculesai.app) — you are here |
| **Status** | [status.moleculesai.app](https://status.moleculesai.app) — uptime monitoring |
| **Self-host** | [Self-Hosting Guide](/docs/self-hosting) — run on your own infrastructure |
## What you can build
- **Self-running engineering teams** — PM, Dev Lead, frontend / backend / devops
@ -19,23 +29,47 @@ rest.
shared memory.
- **Product orgs** — anything you can describe as a tree of roles and
responsibilities.
- **Hybrid teams** — mix cloud-hosted agents with [external agents](/docs/external-agents)
running on your own infrastructure, edge devices, or other clouds.
## How it works
1. **Templates.** Describe your org as a YAML tree of workspaces. Each workspace
is a real container running an LLM agent. Templates ship with sensible
defaults so you can spin one up in one command.
is a real container running an LLM agent. Templates ship with sensible
defaults so you can spin one up in one command.
2. **Plugins.** Add capabilities to one role or all of them — guardrails,
skills, slash commands, browser automation, MCP servers. Plugins compose;
per-role overrides UNION with the defaults.
3. **Channels.** Connect any role to Telegram, Slack, email, or webhooks so
the user can talk to it directly.
4. **Schedules.** Define recurring work in cron syntax. The runtime fires the
prompt at the scheduled time, supervised against panics with a liveness
watchdog so a single bad input can't silently kill the loop.
5. **The canvas.** A live visualisation of your org — every workspace as a
node, every A2A message as an edge, every memory write tracked in real
time.
skills, slash commands, browser automation, MCP servers. Plugins compose;
per-role overrides UNION with the defaults.
3. **Channels.** Connect any role to [Telegram, Slack, or Lark/Feishu](/docs/channels)
so users can talk to agents directly from their existing tools.
4. **Schedules.** Define [recurring work](/docs/schedules) in cron syntax. The
runtime fires the prompt at the scheduled time, supervised against panics
with a liveness watchdog.
5. **Tokens.** Generate [API tokens](/docs/tokens) per workspace for secure
authentication. Rotate, revoke, and audit from the dashboard or API.
6. **The canvas.** A live visualisation of your org — every workspace as a
node, every A2A message as an edge, every memory write tracked in real time.
## Eight runtime adapters
| Runtime | Description |
|---------|-------------|
| Claude Code | Anthropic Claude with code execution |
| LangGraph | LangChain ReAct agent with tools |
| OpenClaw | Multi-file prompt system with SOUL |
| CrewAI | Role-based agent with task delegation |
| AutoGen | Microsoft conversable agents |
| DeepAgents | Deep research with planning |
| Hermes | NousResearch Hermes-3 multi-provider |
| Gemini CLI | Google Gemini CLI workspace |
## Integrate with everything
- **[MCP Server](/docs/mcp-server)** — 87 tools for managing Molecule AI from any
MCP-compatible AI agent (Claude Code, Cursor, etc.)
- **[Python SDK](https://pypi.org/project/molecule-ai-sdk)** — `pip install molecule-ai-sdk`
- **[External Agents](/docs/external-agents)** — register any HTTP agent as a
first-class workspace
## Where to next
@ -43,10 +77,7 @@ rest.
agent in under five minutes.
- Want the architecture tour? Start with [Concepts](/docs/concepts) and
[Architecture](/docs/architecture).
- Ready to build your own org? Jump straight to
[Org Templates](/docs/org-template).
> This documentation is maintained automatically by the
> Documentation Specialist agent in our own dogfood org. Every PR to the
> platform repo triggers a docs sync. Spot something stale? Open an issue or
> a PR — those signals reach the agent on its next cron tick.
- Ready to build your own org? Jump to [Org Templates](/docs/org-template).
- Want to connect your own agent? See [External Agents](/docs/external-agents).
- Need API access? Check [Token Management](/docs/tokens) and the
[API Reference](/docs/api-reference).

View File

@ -4,15 +4,15 @@
"index",
"quickstart",
"concepts",
"architecture",
"org-template",
"plugins",
"channels",
"schedules",
"external-agents",
"architecture",
"tokens",
"api-reference",
"mcp-server",
"tokens",
"self-hosting",
"observability",
"troubleshooting"

View File

@ -1,11 +1,141 @@
---
title: Observability
description: Stub page — content coming soon.
description: Monitor agent activity, LLM traces, and platform health.
---
> 🚧 **Coming soon.** The Documentation Specialist agent will populate this
> page on its next maintenance cycle.
## Overview
If you need this content urgently, open an issue on the
[docs repo](https://github.com/Molecule-AI/docs/issues/new) and the agent
will prioritise it on its next cron tick.
Molecule AI provides multiple layers of observability -- from real-time WebSocket events on the canvas to structured activity logs, LLM traces, Prometheus metrics, and admin health endpoints.
## Activity Logs
Every significant action in the platform is recorded in the `activity_logs` table. Query logs for a specific workspace:
```
GET /workspaces/:id/activity
```
Activity types include:
- **A2A communications** -- request/response capture with duration and method
- **Task updates** -- agent-reported task status changes
- **Agent logs** -- structured log entries from workspace runtimes
- **Errors** -- failures with `error_detail` for debugging
Filter by source to separate user-agent chat (`source=canvas`) from agent-to-agent traffic (`source=agent`).
Activity logs are automatically cleaned up based on `ACTIVITY_RETENTION_DAYS` (default 7). The cleanup job runs every `ACTIVITY_CLEANUP_INTERVAL_HOURS` (default 6).
## LLM Traces
Molecule AI integrates with [Langfuse](https://langfuse.com) for LLM observability. Langfuse runs as part of the infrastructure stack on port 3001, backed by ClickHouse for efficient trace storage.
View traces for a specific workspace:
```
GET /workspaces/:id/traces
```
The Langfuse UI at `http://localhost:3001` provides:
- Token usage and cost tracking per workspace
- Latency breakdowns for LLM calls
- Prompt/completion pairs for debugging
- Trace timelines showing multi-step agent reasoning
## Prometheus Metrics
The platform exposes Prometheus-format metrics at:
```
GET /metrics
```
This endpoint requires no authentication and is safe to scrape. Metrics are in Prometheus text format (v0.0.4) and include:
- Request counts by method, path, and status code
- Request latency histograms
- Active WebSocket connections
- Workspace status counts
Configure your Prometheus instance to scrape `http://localhost:8080/metrics` at your preferred interval.
## Admin Liveness
The liveness endpoint reports the health of every supervised subsystem:
```
GET /admin/liveness
```
This endpoint requires `AdminAuth` (bearer token). It returns a `supervised.Snapshot()` for each subsystem with ages -- how long since each subsystem last reported healthy. Use this to debug stuck schedulers, stalled heartbeat goroutines, or unresponsive health sweeps before diving into logs.
## WebSocket Events
The canvas receives real-time updates via WebSocket at `/ws`. Every state change in the platform is broadcast to connected clients:
| Event | Trigger |
|-------|---------|
| `WORKSPACE_ONLINE` | Workspace registers successfully |
| `WORKSPACE_OFFLINE` | Heartbeat TTL expires or health sweep detects dead container |
| `WORKSPACE_DEGRADED` | Error rate exceeds threshold |
| `WORKSPACE_RECOVERED` | Error rate drops back to normal |
| `WORKSPACE_REMOVED` | Workspace deleted |
| `HEARTBEAT` | Periodic heartbeat from workspace |
| `A2A_RESPONSE` | Agent-to-agent message received |
| `AGENT_MESSAGE` | Agent pushes a message to the user |
Events flow through Redis pub/sub to ensure all platform instances broadcast consistently.
## Structure Events
The `structure_events` table is an append-only audit log of every structural change in the platform. Each event is:
1. Inserted into the database via `broadcaster.RecordAndBroadcast()`
2. Published to Redis pub/sub
3. Relayed to WebSocket clients
Query events for a specific workspace or globally:
```
GET /events/:workspaceId # Workspace-specific
GET /events # All events
```
Both endpoints require `AdminAuth`.
## Session Search
Search through chat history for a workspace:
```
GET /workspaces/:id/session-search?q=deployment+error
```
This searches across both user-agent conversations and agent-to-agent A2A traffic stored in the activity logs.
## Current Task Visibility
Each workspace reports its current task via heartbeat. This is visible in two places:
- **Canvas node** -- the workspace card on the canvas shows the current task text
- **Heartbeat data** -- `GET /registry/discover/:id` includes `current_task` in the workspace info
When `active_tasks` drops to zero, the current task field clears and the idle loop (if configured) begins its countdown.
## Schedule Run History
For workspaces with cron schedules, inspect past runs:
```
GET /workspaces/:id/schedules/:scheduleId/history
```
Each history entry includes:
- Execution timestamp
- Status (`success`, `failed`, `skipped`)
- Duration
- `error_detail` when the run failed (populated by `scheduler.fireSchedule`)
A status of `skipped` means the workspace was busy (active tasks > 0) when the schedule fired and the concurrency-aware scheduler chose not to queue the prompt.

View File

@ -1,11 +1,166 @@
---
title: Org Template
description: Stub page — content coming soon.
title: Org Templates
description: Deploy entire multi-workspace organizations from a single YAML file.
---
> 🚧 **Coming soon.** The Documentation Specialist agent will populate this
> page on its next maintenance cycle.
## Overview
If you need this content urgently, open an issue on the
[docs repo](https://github.com/Molecule-AI/docs/issues/new) and the agent
will prioritise it on its next cron tick.
Org templates let you define an entire agent organization -- hierarchy of workspaces with roles, configurations, and relationships -- in a single YAML file. Import one template and the platform provisions every workspace, wires parent-child relationships, seeds schedules, and installs plugins automatically.
## YAML Structure
A minimal org template looks like this:
```yaml
org_name: molecule-dev
defaults:
runtime: claude-code
tier: 2
plugins:
- molecule-dev
- molecule-careful-bash
workspaces:
pm:
name: Project Manager
role: PM
tier: 3
children:
dev-lead:
name: Dev Lead
children:
backend:
name: Backend Engineer
frontend:
name: Frontend Engineer
marketing:
name: Marketing Specialist
runtime: langgraph
```
The `workspaces` map defines the hierarchy. Each key becomes the workspace's slug. Nesting under `children` sets the parent-child relationship automatically.
## Workspace Fields
Each workspace entry supports the following fields:
| Field | Type | Description |
|-------|------|-------------|
| `name` | string | Display name shown on the canvas |
| `role` | string | Agent role (e.g. PM, Engineer, Researcher) |
| `runtime` | string | Runtime adapter (`claude-code`, `langgraph`, `crewai`, etc.) |
| `tier` | integer | Resource tier (2 = Standard, 3 = Privileged, 4 = Full-host) |
| `workspace_dir` | string | Host path for `/workspace` bind-mount |
| `plugins` | list | Plugins to install on this workspace |
| `initial_prompt` | string | Prompt auto-executed after A2A server is ready |
| `idle_prompt` | string | Prompt fired periodically while workspace is idle |
| `idle_interval_seconds` | integer | Interval for idle prompt (default 600, minimum 60) |
| `channels` | list | Social channel integrations (Telegram, Slack, etc.) |
| `schedules` | list | Cron schedules seeded on import |
| `x` | number | Canvas X coordinate |
| `y` | number | Canvas Y coordinate |
| `children` | map | Nested child workspaces |
## Defaults Layer
The `defaults` block sets baseline values for every workspace in the template. Per-workspace fields override defaults when specified.
**Plugin merging is additive.** Per-workspace `plugins` lists UNION with `defaults.plugins` (deduplicated, defaults first) -- they do not replace them. To opt a specific default plugin out for a given workspace, prefix the plugin name with `!` or `-`:
```yaml
defaults:
plugins:
- molecule-dev
- molecule-careful-bash
- browser-automation
workspaces:
backend:
name: Backend Engineer
plugins:
- molecule-skill-code-review # added
- "!browser-automation" # opted out of default
```
In this example, the backend workspace gets `molecule-dev`, `molecule-careful-bash`, and `molecule-skill-code-review` -- but not `browser-automation`.
## Template Registry
Five org templates live in standalone repos under the `Molecule-AI` GitHub organization:
| Template | Repo |
|----------|------|
| molecule-dev | `Molecule-AI/molecule-ai-org-template-molecule-dev` |
| marketing-team | `Molecule-AI/molecule-ai-org-template-marketing-team` |
| research-lab | `Molecule-AI/molecule-ai-org-template-research-lab` |
| startup-mvp | `Molecule-AI/molecule-ai-org-template-startup-mvp` |
| enterprise-ops | `Molecule-AI/molecule-ai-org-template-enterprise-ops` |
These are cloned into the platform image at Docker build time and registered in the `template_registry` database table.
## Importing an Org Template
### Via API
```bash
curl -X POST http://localhost:8080/org/import \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d '{"dir": "molecule-dev"}'
```
The `POST /org/import` endpoint requires `AdminAuth` (bearer token). The `dir` field references a template directory name from the registry.
### Via Canvas
Open the template browser in the canvas sidebar and select an org template. The UI calls the same API endpoint.
## Initial Prompts
Workspaces can auto-execute a prompt on startup before any user interaction. Set `initial_prompt` as an inline string or point `initial_prompt_file` to a path relative to the config directory.
After the A2A server is ready, the runtime sends the prompt as a `message/send` to itself. A `.initial_prompt_done` marker file prevents re-execution on restart.
**Important:** Initial prompts must NOT send A2A messages (`delegate_task`, `send_message_to_user`) because other agents may not be ready yet. Keep them local: clone a repo, read docs, save to memory, wait for tasks.
Org templates support `initial_prompt` on both `defaults` (all agents) and per-workspace (overrides default).
## Idle Loop
The idle loop is an opt-in pattern for workspaces that should do periodic background work when they have no active tasks.
When `idle_prompt` is non-empty in the workspace config, the runtime self-sends the prompt every `idle_interval_seconds` (default 600) while `heartbeat.active_tasks == 0`. The fire timeout clamps to `max(60, min(300, idle_interval_seconds))`.
Set per-workspace or as an org template default:
```yaml
defaults:
idle_prompt: "Check for new issues and update your task list."
idle_interval_seconds: 300
```
The idle check is local (no LLM call) and the prompt only fires when there is genuinely nothing to do, so cost collapses to event-driven.
## Canvas Positioning
Use `x` and `y` fields to control where workspaces appear on the drag-and-drop canvas after import:
```yaml
workspaces:
pm:
name: Project Manager
x: 400
y: 100
children:
dev:
name: Developer
x: 200
y: 300
researcher:
name: Researcher
x: 600
y: 300
```
If coordinates are omitted, the canvas auto-layouts new workspaces.

View File

@ -1,11 +1,267 @@
---
title: Plugins
description: Stub page — content coming soon.
description: Extend workspace capabilities with modular plugins — guardrails, skills, workflows.
---
> 🚧 **Coming soon.** The Documentation Specialist agent will populate this
> page on its next maintenance cycle.
## Overview
If you need this content urgently, open an issue on the
[docs repo](https://github.com/Molecule-AI/docs/issues/new) and the agent
will prioritise it on its next cron tick.
Plugins are installable capability bundles that extend what a workspace can do.
They range from ambient guardrails that enforce rules automatically, to
on-demand skills invoked via the `Skill` tool, to workflow plugins that
compose skills into slash commands.
Plugins follow a **two-axis model**: the *source* (where the plugin comes from)
is orthogonal to the *shape* (what format it takes). This means you can install
a plugin from a local registry or from GitHub, and the workspace runtime
figures out how to load it based on its shape.
---
## Two-Axis Model
### Sources (where)
| Scheme | Description | Example |
|--------|-------------|---------|
| `local://` | Platform's curated plugin registry (auto-discovered from the `plugins/` directory) | `local://molecule-careful-bash` |
| `github://` | Public GitHub repo (shallow clone at install time) | `github://owner/repo` |
| `github://` (pinned) | GitHub repo at a specific ref | `github://owner/repo#v1.2.0` |
Use `GET /plugins/sources` to list all registered install-source schemes at
runtime.
### Shapes (what)
| Shape | Description |
|-------|-------------|
| agentskills.io format | `SKILL.md` + optional scripts, hooks, and `plugin.yaml` manifest |
| MCP server | Model Context Protocol server (coming soon for more runtimes) |
The shape is orthogonal to the source. A `github://` plugin and a `local://`
plugin can both be agentskills.io format. The per-runtime adapter inside the
workspace handles loading at startup.
---
## Installing a Plugin
```bash
curl -X POST http://localhost:8080/workspaces/{id}/plugins \
-H "Content-Type: application/json" \
-H "Authorization: Bearer {token}" \
-d '{"source": "local://molecule-careful-bash"}'
```
From GitHub:
```bash
curl -X POST http://localhost:8080/workspaces/{id}/plugins \
-H "Content-Type: application/json" \
-H "Authorization: Bearer {token}" \
-d '{"source": "github://Molecule-AI/molecule-plugin-careful-bash"}'
```
The platform resolves the source, stages the plugin files, copies them into the
workspace container at `/configs/plugins/<name>/`, and triggers an automatic
workspace restart so the runtime picks up the new plugin.
---
## Uninstalling a Plugin
```bash
curl -X DELETE http://localhost:8080/workspaces/{id}/plugins/{name} \
-H "Authorization: Bearer {token}"
```
Uninstall removes the plugin directory, cleans up copied skill directories and
rule markers from `CLAUDE.md`, and triggers an automatic workspace restart.
---
## Listing Plugins
### Platform Registry
List all available plugins in the platform registry:
```bash
# All plugins
curl http://localhost:8080/plugins
# Filtered by runtime
curl http://localhost:8080/plugins?runtime=claude-code
```
Plugins with no declared `runtimes` field in their manifest are treated as
"unspecified, try it" and included in filtered results.
### Available for a Workspace
Returns plugins filtered to those supported by the workspace's current runtime:
```bash
curl http://localhost:8080/workspaces/{id}/plugins/available \
-H "Authorization: Bearer {token}"
```
### Installed on a Workspace
```bash
curl http://localhost:8080/workspaces/{id}/plugins \
-H "Authorization: Bearer {token}"
```
Each installed plugin is annotated with whether it still supports the
workspace's current runtime. This lets the canvas grey out plugins that went
inert after a runtime change.
---
## Runtime Compatibility Check
Before changing a workspace's runtime, check which installed plugins would
become incompatible:
```bash
curl "http://localhost:8080/workspaces/{id}/plugins/compatibility?runtime=langgraph" \
-H "Authorization: Bearer {token}"
```
Response:
```json
{
"target_runtime": "langgraph",
"compatible": [...],
"incompatible": [...],
"all_compatible": false
}
```
The canvas uses this to show a confirmation dialog before applying a runtime
change.
---
## Built-in Plugins
### Hook Plugins (ambient enforcement)
These fire automatically via the harness layer. No explicit invocation needed.
| Plugin | Purpose |
|--------|---------|
| `molecule-careful-bash` | Refuses `git push --force` to main, `rm -rf` at root, `DROP TABLE` against prod schema. Ships the `careful-mode` skill as documentation. |
| `molecule-freeze-scope` | Locks edits to a single path glob via `.claude/freeze`. Useful while debugging. |
| `molecule-audit-trail` | Appends every Edit/Write to `.claude/audit.jsonl` for accountability. |
| `molecule-session-context` | Auto-loads recent cron-learnings and open PR/issue counts at session start. |
| `molecule-prompt-watchdog` | Injects warning context when the prompt mentions destructive keywords. |
### Skill Plugins (on-demand)
Invoked explicitly via the `Skill` tool during a conversation.
| Plugin | Purpose |
|--------|---------|
| `molecule-skill-code-review` | 16-criteria multi-axis code review rubric. |
| `molecule-skill-cross-vendor-review` | Adversarial second-model review for noteworthy PRs. |
| `molecule-skill-llm-judge` | Score whether a deliverable addresses the original request. |
| `molecule-skill-update-docs` | Sync repo docs after merges. |
| `molecule-skill-cron-learnings` | Defines the operational-memory JSONL format. |
### Workflow Plugins (slash commands)
Compose skills into repeatable multi-step workflows.
| Plugin | Command | Purpose |
|--------|---------|---------|
| `molecule-workflow-triage` | `/triage` | Full PR-triage cycle (gates 1-7 + code-review + merge if green). |
| `molecule-workflow-retro` | `/retro` | Weekly retrospective issue. |
### Shared Plugins
Loaded by default from the `plugins/` directory at the repo root.
| Plugin | Purpose |
|--------|---------|
| `molecule-dev` | Codebase conventions (rules injected into CLAUDE.md) + `review-loop` skill. |
| `superpowers` | `verification-before-completion`, `test-driven-development`, `systematic-debugging`, `writing-plans`. |
| `ecc` | General Claude Code guardrails. |
| `browser-automation` | Puppeteer/CDP-based web scraping and live canvas screenshots. Opt-in per workspace. |
---
## Org Template Plugin Resolution
When deploying an org template, per-workspace `plugins:` lists in `org.yaml`
role overrides **UNION** with `defaults.plugins` (deduplicated, defaults first).
They do not replace them.
To opt a specific default out for a given role or workspace, prefix the plugin
name with `!` or `-`:
```yaml
defaults:
plugins:
- molecule-careful-bash
- molecule-audit-trail
- superpowers
workspaces:
researcher:
role: "Research Analyst"
plugins:
- browser-automation # added on top of defaults
- "!superpowers" # opted out of superpowers
```
Result for the `researcher` workspace:
`molecule-careful-bash`, `molecule-audit-trail`, `browser-automation`
---
## Install Safeguards
Environment variables that bound the cost of a single plugin install:
| Variable | Default | Description |
|----------|---------|-------------|
| `PLUGIN_INSTALL_BODY_MAX_BYTES` | `65536` (64 KiB) | Max request body size |
| `PLUGIN_INSTALL_FETCH_TIMEOUT` | `5m` | Whole fetch + copy deadline |
| `PLUGIN_INSTALL_MAX_DIR_BYTES` | `104857600` (100 MiB) | Max staged-tree size |
These prevent a slow or malicious source from tying up a handler goroutine or
exhausting disk space.
---
## Plugin Download (External Workspaces)
External workspaces (those running outside Docker) can pull plugins as gzipped
tarballs:
```bash
curl http://localhost:8080/workspaces/{id}/plugins/{name}/download \
-H "Authorization: Bearer {token}" \
-o plugin.tar.gz
```
An optional `?source=github://owner/repo` query parameter lets external
workspaces pull from upstream repos without the platform pre-staging them.
Defaults to `local://<name>` when omitted.
---
## API Reference
| Method | Path | Description |
|--------|------|-------------|
| GET | `/plugins` | List plugin registry (supports `?runtime=` filter) |
| GET | `/plugins/sources` | List registered install-source schemes |
| GET | `/workspaces/:id/plugins` | List installed plugins |
| POST | `/workspaces/:id/plugins` | Install a plugin (`{"source": "scheme://spec"}`) |
| DELETE | `/workspaces/:id/plugins/:name` | Uninstall a plugin |
| GET | `/workspaces/:id/plugins/available` | Available plugins filtered by workspace runtime |
| GET | `/workspaces/:id/plugins/compatibility?runtime=X` | Preflight runtime-change compatibility check |
| GET | `/workspaces/:id/plugins/:name/download` | Download plugin as tarball (external workspaces) |

View File

@ -9,28 +9,52 @@ using the bundled `molecule-dev` template.
## Prerequisites
- Docker Desktop (or any Docker daemon) running locally
- Go 1.25+ and Node 22+ if you want to build the platform from source
- A Claude API key (`CLAUDE_CODE_OAUTH_TOKEN`) in your environment
- Go 1.25+ and Node 20+ if building from source
- An LLM API key (Claude, OpenRouter, or Gemini)
## 1. Clone the monorepo
## Option A: One-command start (recommended)
```bash
git clone https://github.com/Molecule-AI/molecule-monorepo.git
cd molecule-monorepo
./scripts/dev-start.sh
```
## 2. Boot the platform
This starts everything: Postgres, Redis, Platform (Go on `:8080`), and
Canvas (Next.js on `:3000`). Press `Ctrl-C` to stop all services.
## Option B: Docker Compose
```bash
docker compose up -d --build platform canvas
git clone https://github.com/Molecule-AI/molecule-monorepo.git
cd molecule-monorepo
docker compose up -d
```
This starts:
- **platform** (Go API on `localhost:8080`)
- **canvas** (Next.js 15 frontend on `localhost:3000`)
- **postgres** + **redis** for state and pub/sub
This starts the full stack including Langfuse (`:3001`) and Temporal (`:8233`).
## 3. Import the dev template
## Option C: Manual setup
```bash
# 1. Start infrastructure
./infra/scripts/setup.sh # Postgres, Redis, Langfuse, Temporal
# 2. Start platform
cd platform && go run ./cmd/server # API on :8080
# 3. Start canvas (new terminal)
cd canvas && npm install && npm run dev # UI on :3000
```
## 2. Open the canvas
Navigate to [http://localhost:3000](http://localhost:3000). You should see
the empty state with template cards.
## 3. Deploy from a template
Click any template card to deploy a workspace instantly. Or import a full
org template:
```bash
curl -X POST http://localhost:8080/org/import \
@ -42,15 +66,10 @@ This provisions the 12-workspace dev team — PM, Research Lead and 3
researchers, Dev Lead and 5 engineers, plus Security/QA/UIUX auditors —
each as its own Docker container.
## 4. Open the canvas
## 4. Talk to PM
Navigate to [http://localhost:3000](http://localhost:3000). You should
see your team rendered as a tree of nodes. Click any node to chat with
that agent directly.
## 5. Talk to PM
PM is the entry point. Send it a task:
PM is the entry point. Click the PM node on the canvas, open the Chat tab,
and send a task:
> *"Add a 'Last seen' column to the user list table on the admin page."*
@ -58,17 +77,40 @@ PM will break the request into specific assignments, fan them out to the
right leads in parallel, verify the results, and report back when the
work is shipped.
## 5. Set up secrets
Most agents need an LLM API key. Set it as a global secret so all
workspaces inherit it:
```bash
curl -X PUT http://localhost:8080/settings/secrets \
-H 'Content-Type: application/json' \
-d '{"key":"ANTHROPIC_API_KEY","value":"sk-ant-..."}'
```
Or use the Settings panel (gear icon) in the canvas to manage secrets
per workspace.
## What just happened
You spun up a self-organising engineering team in one command. They're
clones of real Claude Code agents — they can read your codebase, run
tests, open PRs to GitHub. Their schedules (security audit, UX audit,
template fitness checks) run hourly on their own.
You spun up a self-organising engineering team. They're real agents — they
can read your codebase, run tests, open PRs to GitHub. Their schedules
(security audit, UX audit, template fitness checks) run hourly on their own.
## Using the SaaS instead
Don't want to self-host? Use the cloud platform directly:
1. Go to [app.moleculesai.app](https://app.moleculesai.app)
2. Sign up and create an organization
3. Your tenant is provisioned at `<your-org>.moleculesai.app`
4. Deploy agents from templates — same experience, zero infrastructure
## Next steps
- Customise the [Org Template](/docs/org-template) to match your team's
actual structure.
- Add [Plugins](/docs/plugins) to give specific roles new capabilities.
- Customise the [Org Template](/docs/org-template) to match your team.
- Add [Plugins](/docs/plugins) to give roles new capabilities.
- Wire a [Channel](/docs/channels) so you can talk to PM from Telegram.
- Connect your own agents with [External Agents](/docs/external-agents).
- Generate [API Tokens](/docs/tokens) for programmatic access.
- Read about the [Architecture](/docs/architecture) under the hood.

View File

@ -1,11 +1,298 @@
---
title: Schedules
description: Stub page — content coming soon.
description: Run recurring prompts on cron schedules — automated audits, reports, and maintenance.
---
> 🚧 **Coming soon.** The Documentation Specialist agent will populate this
> page on its next maintenance cycle.
## Overview
If you need this content urgently, open an issue on the
[docs repo](https://github.com/Molecule-AI/docs/issues/new) and the agent
will prioritise it on its next cron tick.
Schedules let you run recurring prompts against a workspace on a cron schedule.
Each tick fires an A2A `message/send` into the workspace, so the agent
processes the prompt as if it received a normal message. This enables automated
audits, daily reports, weekly retrospectives, and any other recurring task.
The scheduler polls the `workspace_schedules` table every 30 seconds. When a
schedule's `next_run_at` has passed, the scheduler fires the prompt and
computes the next run time.
```
Scheduler (30s poll) ──> workspace_schedules table
next_run_at <= now?
┌─────────┴──────────┐
│ A2A message/send │──> Workspace Agent
│ (callerID=system: │
│ scheduler) │
└─────────────────────┘
```
---
## Creating a Schedule
```bash
curl -X POST http://localhost:8080/workspaces/{id}/schedules \
-H "Content-Type: application/json" \
-H "Authorization: Bearer {token}" \
-d '{
"name": "Daily Security Audit",
"cron_expr": "0 9 * * *",
"timezone": "America/New_York",
"prompt": "Run a security audit of all open PRs. Check for leaked secrets, SQL injection, and auth bypass.",
"enabled": true
}'
```
**Required fields:**
| Field | Type | Description |
|-------|------|-------------|
| `cron_expr` | string | Standard cron expression (5-field: minute, hour, day-of-month, month, day-of-week) |
| `prompt` | string | The text sent to the workspace as an A2A message each tick |
**Optional fields:**
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `name` | string | `""` | Human-readable label |
| `timezone` | string | `"UTC"` | IANA timezone for cron evaluation (e.g. `America/New_York`, `Asia/Tokyo`) |
| `enabled` | bool | `true` | Whether the schedule fires |
The timezone is validated against Go's `time.LoadLocation` on create and update.
The cron expression is validated and the next run time is computed immediately.
---
## CRUD Operations
| Method | Path | Description |
|--------|------|-------------|
| GET | `/workspaces/:id/schedules` | List all schedules for a workspace |
| POST | `/workspaces/:id/schedules` | Create a new schedule |
| PATCH | `/workspaces/:id/schedules/:scheduleId` | Update a schedule (partial update via COALESCE) |
| DELETE | `/workspaces/:id/schedules/:scheduleId` | Delete a schedule |
### Update
PATCH accepts any subset of fields. Only provided fields are changed — the
handler uses `COALESCE` in SQL so omitted fields retain their current values.
If `cron_expr` or `timezone` changes, the next run time is recomputed.
```bash
curl -X PATCH http://localhost:8080/workspaces/{id}/schedules/{scheduleId} \
-H "Content-Type: application/json" \
-H "Authorization: Bearer {token}" \
-d '{"enabled": false}'
```
### Delete
```bash
curl -X DELETE http://localhost:8080/workspaces/{id}/schedules/{scheduleId} \
-H "Authorization: Bearer {token}"
```
All schedule operations are scoped to the owning workspace ID to prevent IDOR.
---
## Manual Trigger
Fire a schedule immediately, outside its cron cadence:
```bash
curl -X POST http://localhost:8080/workspaces/{id}/schedules/{scheduleId}/run \
-H "Authorization: Bearer {token}"
```
Returns the schedule's prompt so the frontend can POST it to
`/workspaces/:id/a2a`. This keeps the handler stateless.
---
## Run History
View the last 20 runs for a schedule, including error details for failed runs:
```bash
curl http://localhost:8080/workspaces/{id}/schedules/{scheduleId}/history \
-H "Authorization: Bearer {token}"
```
Response:
```json
[
{
"timestamp": "2026-04-16T09:00:02Z",
"duration_ms": 4523,
"status": "success",
"error_detail": "",
"request": {"schedule_id": "...", "prompt": "..."}
},
{
"timestamp": "2026-04-15T09:00:01Z",
"duration_ms": null,
"status": "error",
"error_detail": "A2A proxy returned 503: workspace container not running",
"request": {"schedule_id": "...", "prompt": "..."}
}
]
```
History is pulled from the `activity_logs` table filtered by
`activity_type = 'cron_run'` and the schedule ID in the request body.
---
## Source Field
Each schedule has a `source` field that tracks how it was created:
| Value | Meaning |
|-------|---------|
| `template` | Seeded by an org template import or bundle import. On re-import, only `template`-source rows are refreshed — `runtime` rows survive. |
| `runtime` | Created via the Canvas UI or API. These are user-owned and never overwritten by re-imports. |
---
## Status Values
The `last_status` field on a schedule tracks the outcome of the most recent
run:
| Status | Meaning |
|--------|---------|
| `success` | The A2A message was delivered and the workspace acknowledged it. |
| `error` | The A2A proxy returned a non-2xx status. `last_error` contains details. |
| `skipped` | The workspace was busy (concurrency-aware skip). The scheduler detected `active_tasks > 0` and deferred the run to avoid overloading the agent. |
---
## Schedule Health Endpoint
Peer workspaces can monitor each other's schedule health without admin auth:
```bash
curl http://localhost:8080/workspaces/{id}/schedules/health \
-H "X-Workspace-ID: {callerWorkspaceId}" \
-H "Authorization: Bearer {callerToken}"
```
This endpoint returns execution-state fields only (`last_run_at`,
`last_status`, `run_count`, `next_run_at`, `last_error`). It deliberately
omits `prompt` and `cron_expr` so sensitive task content is never exposed to
peer workspaces.
**Auth rules** (mirrors the A2A proxy pattern):
- `X-Workspace-ID` header required to identify the caller
- Caller's own bearer token validated (legacy workspaces grandfathered)
- `registry.CanCommunicate(callerID, workspaceID)` must return true
- System callers (`system:*`, `webhook:*`, `test:*`) bypass checks
- Self-calls always allowed
---
## Scheduler Internals
### Poll Loop
The scheduler runs a 30-second poll loop. Each tick:
1. Queries up to 50 due schedules (`next_run_at <= now AND enabled = true`)
2. Fires up to 10 concurrently via a semaphore
3. Each fire sends an A2A `message/send` with a 5-minute timeout
4. Updates `last_run_at`, `run_count`, `last_status`, and `next_run_at`
5. Logs the run to `activity_logs` with `activity_type = 'cron_run'`
### Panic Recovery
The scheduler recovers from panics inside the tick function. A single bad row,
malformed cron expression, or database blip cannot permanently kill the
scheduler. Without this recovery, the goroutine dies silently and the only
signal is "no crons firing."
### Liveness Watchdog
The scheduler reports heartbeats to the `supervised` subsystem. The
`/admin/liveness` endpoint exposes per-subsystem ages, so operators can detect
a stuck scheduler before it causes a missed-cron outage.
`Scheduler.Healthy()` returns true if the scheduler has completed a tick within
the last 60 seconds (2x the poll interval). Returns false before the first tick
or if the scheduler is stalled.
---
## Examples
### Hourly Security Audit
```json
{
"name": "Hourly Security Scan",
"cron_expr": "0 * * * *",
"timezone": "UTC",
"prompt": "Scan all open PRs for leaked secrets, SQL injection patterns, and auth bypass vulnerabilities. Report findings as a summary."
}
```
### Daily Standup Report
```json
{
"name": "Daily Standup",
"cron_expr": "0 9 * * 1-5",
"timezone": "America/Los_Angeles",
"prompt": "Generate a standup report: what was completed yesterday, what is planned today, and any blockers. Post to the team channel."
}
```
### Weekly Retrospective
```json
{
"name": "Weekly Retro",
"cron_expr": "0 17 * * 5",
"timezone": "America/New_York",
"prompt": "Write a weekly retrospective covering PRs merged, issues closed, cron failures, and code review findings. Post as a GitHub issue."
}
```
### Nightly Cleanup
```json
{
"name": "Nightly Cleanup",
"cron_expr": "0 2 * * *",
"timezone": "UTC",
"prompt": "Archive stale branches older than 30 days. Close issues that have been inactive for 60 days with a comment explaining the auto-close policy.",
"enabled": true
}
```
---
## Timezone Handling
All cron expressions are evaluated in the specified timezone. If no timezone is
provided, `UTC` is used. The timezone must be a valid IANA timezone string
(e.g. `America/New_York`, `Europe/London`, `Asia/Tokyo`).
When a schedule's `cron_expr` or `timezone` is updated, the `next_run_at` is
immediately recomputed using the new values. This prevents schedules from
firing at unexpected times after a timezone change.
---
## API Reference
| Method | Path | Description |
|--------|------|-------------|
| GET | `/workspaces/:id/schedules` | List schedules |
| POST | `/workspaces/:id/schedules` | Create schedule |
| PATCH | `/workspaces/:id/schedules/:scheduleId` | Update schedule |
| DELETE | `/workspaces/:id/schedules/:scheduleId` | Delete schedule |
| POST | `/workspaces/:id/schedules/:scheduleId/run` | Manual trigger |
| GET | `/workspaces/:id/schedules/:scheduleId/history` | Run history (last 20) |
| GET | `/workspaces/:id/schedules/health` | Health view (open to peers) |

View File

@ -1,11 +1,199 @@
---
title: Self Hosting
description: Stub page — content coming soon.
title: Self-Hosting
description: Run the full Molecule AI stack on your own infrastructure.
---
> 🚧 **Coming soon.** The Documentation Specialist agent will populate this
> page on its next maintenance cycle.
## Prerequisites
If you need this content urgently, open an issue on the
[docs repo](https://github.com/Molecule-AI/docs/issues/new) and the agent
will prioritise it on its next cron tick.
| Requirement | Minimum Version |
|-------------|----------------|
| Docker Desktop | Latest stable |
| Go | 1.25+ |
| Node.js | 20+ |
| Git | 2.x |
## Quick Start
The fastest way to get Molecule AI running locally:
```bash
git clone https://github.com/Molecule-AI/molecule-monorepo.git
cd molecule-monorepo
./scripts/dev-start.sh
# Canvas: http://localhost:3000
# Platform: http://localhost:8080
```
This script starts all infrastructure services, builds the platform, and launches the canvas dev server.
## Infrastructure Setup
Molecule AI depends on four infrastructure services, all managed via `docker-compose.infra.yml` and attached to the shared `molecule-monorepo-net` Docker network:
| Service | Port | Purpose |
|---------|------|---------|
| Postgres | 5432 | Primary datastore (also backs Langfuse and Temporal) |
| Redis | 6379 | Pub/sub, heartbeat TTLs |
| Langfuse | 3001 | LLM trace viewer (backed by ClickHouse) |
| Temporal | 7233 (gRPC), 8233 (Web UI) | Durable workflow engine |
Start infrastructure only:
```bash
./infra/scripts/setup.sh
```
Tear everything down (removes volumes):
```bash
./infra/scripts/nuke.sh
```
## Manual Setup
If you prefer to start each component individually:
### Platform (Go)
```bash
cd platform
go build ./cmd/server
go run ./cmd/server
# Requires Postgres + Redis running
```
The platform must be run from the `platform/` directory, not the repo root.
### Canvas (Next.js)
```bash
cd canvas
npm install
npm run dev
# Dev server on http://localhost:3000
```
### Docker Compose
For infrastructure only:
```bash
docker compose -f docker-compose.infra.yml up -d
```
For the full stack (infrastructure + platform + canvas):
```bash
docker compose up
```
## Environment Variables
### Platform
| Variable | Default | Description |
|----------|---------|-------------|
| `DATABASE_URL` | -- | Postgres connection string (required) |
| `REDIS_URL` | -- | Redis connection string (required) |
| `PORT` | `8080` | Platform HTTP port |
| `PLATFORM_URL` | `http://host.docker.internal:PORT` | URL passed to agent containers to reach the platform |
| `CORS_ORIGINS` | `http://localhost:3000,http://localhost:3001` | Comma-separated allowed origins |
| `SECRETS_ENCRYPTION_KEY` | -- | AES-256 key (32 bytes) for encrypting workspace secrets |
| `WORKSPACE_DIR` | -- | Global fallback host path for `/workspace` bind-mount |
| `MOLECULE_ENV` | -- | Set to `production` to hide E2E helper endpoints |
| `ACTIVITY_RETENTION_DAYS` | `7` | How long activity logs are retained |
| `ACTIVITY_CLEANUP_INTERVAL_HOURS` | `6` | How often the cleanup job runs |
| `RATE_LIMIT` | `600` | Requests per minute per client |
### Tier Resource Limits
Override per-tier memory and CPU caps for workspace containers. CPU\_SHARES follows Docker's convention where 1024 equals 1 CPU.
| Variable | Default | Description |
|----------|---------|-------------|
| `TIER2_MEMORY_MB` | `512` | Standard tier memory limit |
| `TIER2_CPU_SHARES` | `1024` | Standard tier CPU shares |
| `TIER3_MEMORY_MB` | `2048` | Privileged tier memory limit |
| `TIER3_CPU_SHARES` | `2048` | Privileged tier CPU shares |
| `TIER4_MEMORY_MB` | `4096` | Full-host tier memory limit |
| `TIER4_CPU_SHARES` | `4096` | Full-host tier CPU shares |
### Plugin Install Safeguards
| Variable | Default | Description |
|----------|---------|-------------|
| `PLUGIN_INSTALL_BODY_MAX_BYTES` | `65536` | Max request body size (64 KiB) |
| `PLUGIN_INSTALL_FETCH_TIMEOUT` | `5m` | Whole fetch and copy deadline |
| `PLUGIN_INSTALL_MAX_DIR_BYTES` | `104857600` | Max staged-tree size (100 MiB) |
### Canvas
| Variable | Default | Description |
|----------|---------|-------------|
| `NEXT_PUBLIC_PLATFORM_URL` | `http://localhost:8080` | Platform API URL |
| `NEXT_PUBLIC_WS_URL` | `ws://localhost:8080/ws` | WebSocket endpoint |
### Tenant Mode
| Variable | Default | Description |
|----------|---------|-------------|
| `CANVAS_PROXY_URL` | -- | When set, the Go server proxies canvas requests to this URL |
| `MOLECULE_ORG_ID` | -- | UUID for multi-tenant isolation; leave unset for self-hosted |
## Production Deployment
For production, use `platform/Dockerfile.tenant` which builds a combined Go + Canvas image:
```bash
docker build -f platform/Dockerfile.tenant -t molecule-platform .
```
This image serves both the API and the canvas frontend from a single container.
## Security Configuration
### Secrets Encryption
Set `SECRETS_ENCRYPTION_KEY` to a 32-byte AES-256 key to encrypt workspace secrets at rest. Without this variable, secrets are stored in plaintext.
```bash
# Generate a key
openssl rand -hex 32
```
**Warning:** `SECRETS_ENCRYPTION_KEY` cannot be rotated without a data migration. Choose carefully before deploying to production.
### Rate Limiting
The `RATE_LIMIT` variable (default 600 requests/min) applies per client. Adjust based on your expected traffic.
### CORS
Set `CORS_ORIGINS` to a comma-separated list of allowed origins. In production, restrict this to your actual domain.
## Pre-commit Hook
Install the project's pre-commit hooks to enforce code quality:
```bash
git config core.hooksPath .githooks
```
The hook enforces:
- `'use client'` directive on hook-using `.tsx` files
- Dark theme only (no `white` or `light` CSS classes)
- No SQL injection patterns (`fmt.Sprintf` with SQL)
- No leaked secrets (`sk-ant-`, `ghp_`, `AKIA`)
Commits are rejected until all violations are fixed.
## Building Workspace Images
Build the base workspace image for local development:
```bash
bash workspace-template/build-all.sh
```
Adapter-specific images are built from standalone template repos. Each repo's `Dockerfile` installs `molecule-ai-workspace-runtime` from PyPI plus adapter-specific dependencies.

View File

@ -1,11 +1,164 @@
---
title: Troubleshooting
description: Stub page — content coming soon.
description: Common issues and how to fix them.
---
> 🚧 **Coming soon.** The Documentation Specialist agent will populate this
> page on its next maintenance cycle.
## Workspace Stuck in "Provisioning"
If you need this content urgently, open an issue on the
[docs repo](https://github.com/Molecule-AI/docs/issues/new) and the agent
will prioritise it on its next cron tick.
A workspace that stays in `provisioning` for more than 30 seconds usually indicates a container startup failure.
**Steps to diagnose:**
1. Check Docker logs for the workspace container:
```bash
docker logs <container-id>
```
2. Verify the workspace image exists locally:
```bash
docker images | grep workspace-template
```
3. Check tier resource limits -- the container may be OOM-killed on start. Review `TIER2_MEMORY_MB` / `TIER3_MEMORY_MB` / `TIER4_MEMORY_MB` values.
4. Ensure the platform can reach the Docker daemon (Docker Desktop must be running).
## 401 Unauthorized on API Calls
Bearer tokens can expire or be revoked. Workspace tokens are also auto-revoked when a workspace is deleted.
**Resolution:**
- For workspace-scoped endpoints, mint a new token:
```bash
# Development/staging only (hidden when MOLECULE_ENV=production)
curl http://localhost:8080/admin/workspaces/:id/test-token
```
- For admin endpoints, verify your token is still valid against a known-good endpoint like `GET /health`.
- Legacy workspaces (created before Phase 30.1) are grandfathered and do not require tokens on heartbeat/update-card routes.
## WebSocket Shows "Reconnecting"
The canvas WebSocket connection (`/ws`) drops and retries.
**Common causes:**
- `CORS_ORIGINS` does not include your domain -- the WebSocket upgrade is rejected. Add your origin to the comma-separated list.
- A reverse proxy or firewall is terminating the long-lived connection. Ensure WebSocket upgrade headers are forwarded.
- The platform process crashed or restarted. Check platform logs.
**Verify connectivity:**
```bash
# Quick check that the WS endpoint is reachable
curl -i -N \
-H "Connection: Upgrade" \
-H "Upgrade: websocket" \
-H "Sec-WebSocket-Version: 13" \
-H "Sec-WebSocket-Key: dGVzdA==" \
http://localhost:8080/ws
```
## Agent Not Responding to A2A
When one agent cannot reach another via the A2A proxy (`POST /workspaces/:id/a2a`), check communication rules.
**The `CanCommunicate` access check allows:**
- Same workspace (self-call)
- Siblings (same parent)
- Root-level siblings (both have no parent)
- Parent to child or child to parent
**Everything else is denied.** If two agents need to communicate, they must be in the same subtree.
**Also verify:**
- The target workspace is `online` (not `paused`, `offline`, or `provisioning`)
- The target's heartbeat is fresh (Redis TTL has not expired)
- The caller includes `X-Workspace-ID` and `Authorization: Bearer <token>` headers
## Schedule Not Firing
Cron schedules are managed by the platform scheduler subsystem.
**Checklist:**
- Verify the cron expression is valid (standard 5-field cron syntax)
- Confirm the workspace is `online` -- paused workspaces skip all schedules
- Check if the schedule was `skipped` due to concurrency: the scheduler skips when `active_tasks > 0`. Review schedule history:
```
GET /workspaces/:id/schedules/:scheduleId/history
```
- Inspect `GET /admin/liveness` to ensure the scheduler subsystem is alive (age should be under 60 seconds)
## Channel Test Fails
Social channel integrations (Telegram, Slack, etc.) can fail for several reasons.
**Diagnose:**
- Verify the bot token is correct and has not been revoked by the platform provider
- Check the allowlist config in the channel's JSONB settings -- messages from non-allowlisted chats are silently dropped
- Ensure the webhook URL is registered with the external platform:
```
POST /webhooks/:type
```
This is the endpoint the external platform (Telegram, Slack) should send events to.
- Test the connection explicitly:
```
POST /workspaces/:id/channels/:channelId/test
```
## Migration Crash on Boot
The platform runs all `*.up.sql` migrations on every startup (there is no `schema_migrations` tracking table yet).
**Common issues:**
- Migrations must be idempotent (`CREATE TABLE IF NOT EXISTS`, `ALTER TABLE ... IF NOT EXISTS`). If a migration lacks this guard, the second boot fails.
- Before PR #212, the migration runner did not filter `.down.sql` files, causing tables to be dropped on every boot. Ensure you are running a platform version that includes this fix.
- If you see errors about duplicate columns or tables, the migration is not idempotent. Patch the `.up.sql` file to add `IF NOT EXISTS` guards.
## Canvas Blank or 502 on Tenant Deploy
In tenant mode (`platform/Dockerfile.tenant`), the Go server proxies canvas requests.
**Verify:**
- `CANVAS_PROXY_URL` is set and points to the running Next.js process inside the container
- Both the Go server and the Node.js process are running (check container logs for both)
- The Next.js build completed successfully during `docker build`
## Plugin Install Timeout
Large plugins or slow network connections can exceed the default fetch deadline.
**Adjust limits:**
| Variable | Default | Description |
|----------|---------|-------------|
| `PLUGIN_INSTALL_FETCH_TIMEOUT` | `5m` | Increase for large or remote plugins |
| `PLUGIN_INSTALL_MAX_DIR_BYTES` | `104857600` (100 MiB) | Increase if the plugin tree exceeds 100 MiB |
| `PLUGIN_INSTALL_BODY_MAX_BYTES` | `65536` (64 KiB) | Increase if the install request body is large |
## Memory or Disk Usage Growing
Activity logs and structure events accumulate over time.
**Tune retention:**
- `ACTIVITY_RETENTION_DAYS` (default `7`) -- reduce to 3 or even 1 for high-traffic deployments
- `ACTIVITY_CLEANUP_INTERVAL_HOURS` (default `6`) -- reduce to run cleanup more frequently
- Monitor the `activity_logs` and `structure_events` tables directly if disk usage is a concern:
```sql
SELECT pg_size_pretty(pg_total_relation_size('activity_logs'));
SELECT pg_size_pretty(pg_total_relation_size('structure_events'));
```
## Container Health Detection
If workspaces go offline unexpectedly (e.g., Docker Desktop crash), three layers detect the failure:
1. **Passive (Redis TTL):** 60-second heartbeat key expires, liveness monitor triggers auto-restart
2. **Proactive (Health Sweep):** Docker API polled every 15 seconds, catches dead containers faster than TTL expiry
3. **Reactive (A2A Proxy):** On connection error to a workspace, checks `provisioner.IsRunning()` and triggers immediate offline + restart
If none of these are catching a dead container, check `GET /admin/liveness` to verify the health sweep and liveness monitor subsystems are running.