diff --git a/content/docs/api-reference.mdx b/content/docs/api-reference.mdx index 956e8c9..b0f0909 100644 --- a/content/docs/api-reference.mdx +++ b/content/docs/api-reference.mdx @@ -1,11 +1,423 @@ --- -title: Api Reference -description: Stub page β€” content coming soon. +title: API Reference +description: Complete reference for all Molecule AI Platform HTTP and WebSocket endpoints. --- -> 🚧 **Coming soon.** The Documentation Specialist agent will populate this -> page on its next maintenance cycle. +# API Reference -If you need this content urgently, open an issue on the -[docs repo](https://github.com/Molecule-AI/docs/issues/new) and the agent -will prioritise it on its next cron tick. +The Molecule AI Platform exposes a REST API (default port 8080) for workspace management, agent registry, communication, and administration. All endpoints return JSON unless otherwise noted. + +**Base URL:** `http://localhost:8080` (self-hosted) or `https://api.moleculesai.app` (SaaS) + +--- + +## Authentication Model + +The platform uses three authentication middleware variants depending on the sensitivity of the route. + +### AdminAuth + +Strict bearer-token authentication. Required for any route where a forged request could leak prompts/memory, create/mutate workspaces, or leak operational data. + +``` +Authorization: Bearer +``` + +**Fail-open behavior:** When no live tokens exist globally (fresh install), AdminAuth passes all requests through. Once the first token is created, all AdminAuth routes require a valid bearer. + +### WorkspaceAuth + +Per-workspace bearer token binding. Workspace A's token cannot access workspace B's sub-routes. Used for the entire `/workspaces/:id/*` group (except the A2A proxy, which uses `CanCommunicate`). + +``` +Authorization: Bearer +``` + +### CanvasOrBearer + +Accepts either a valid bearer token OR a request whose `Origin` header matches `CORS_ORIGINS`. Used only for cosmetic-only routes where a forged request has zero data/security impact. + +Currently applies only to `PUT /canvas/viewport`. Do not extend to data-sensitive routes. + +--- + +## Health and Monitoring + +| Method | Path | Auth | Description | +|--------|------|------|-------------| +| GET | `/health` | None | Returns `200 OK` if the platform is running. Use for load balancer health checks. | +| GET | `/metrics` | None | Prometheus text format (v0.0.4) metrics. Scrape-safe, no auth required. | +| GET | `/admin/liveness` | AdminAuth | Per-subsystem `supervised.Snapshot()` ages. Check before debugging stuck scheduler/heartbeat goroutines. | + +--- + +## Workspaces + +Core workspace CRUD and lifecycle operations. + +### CRUD + +| Method | Path | Auth | Description | +|--------|------|------|-------------| +| POST | `/workspaces` | AdminAuth | Create a new workspace. Accepts `name`, `runtime`, `template`, `parent_id`, `tier`, `workspace_dir`, and other fields. Runtime is auto-detected from template config if omitted (defaults to `langgraph`). | +| GET | `/workspaces` | AdminAuth | List all workspaces with status, runtime, agent card, position, and hierarchy info. | +| GET | `/workspaces/:id` | WorkspaceAuth | Get a single workspace by ID. | +| PATCH | `/workspaces/:id` | WorkspaceAuth | Update workspace fields. **Field-level authz:** cosmetic fields (name, role, x, y, canvas) pass through; sensitive fields (tier, parent_id, runtime, workspace_dir) require a valid bearer token when any live token exists. | +| DELETE | `/workspaces/:id` | AdminAuth | Delete a workspace. Stops the container, revokes all auth tokens, and removes all associated data. | + +### Lifecycle + +| Method | Path | Auth | Description | +|--------|------|------|-------------| +| POST | `/workspaces/:id/restart` | WorkspaceAuth | Restart the workspace container. Sends a `restart_context` A2A message after successful re-registration. | +| POST | `/workspaces/:id/pause` | WorkspaceAuth | Stop the container and set status to `paused`. Paused workspaces skip health sweep, liveness monitor, and auto-restart. | +| POST | `/workspaces/:id/resume` | WorkspaceAuth | Re-provision a paused workspace. Status transitions to `provisioning`. | + +--- + +## Registry + +Workspace registration and heartbeat endpoints. Called by workspace runtimes, not by end users. + +| Method | Path | Auth | Description | +|--------|------|------|-------------| +| POST | `/registry/register` | None | Register a workspace with the platform. Sets status to `online`. Body includes agent URL, agent card, capabilities. | +| POST | `/registry/heartbeat` | Bearer (if token exists) | Send a heartbeat. Updates Redis TTL key (60s expiry). Body can include `active_tasks`, `current_task`, `error_rate`. Triggers `degraded` status if `error_rate > 0.5`. | +| POST | `/registry/update-card` | Bearer (if token exists) | Update the workspace's agent card (name, description, skills, etc.). | + +--- + +## Discovery + +Peer discovery and access control verification. + +| Method | Path | Auth | Description | +|--------|------|------|-------------| +| GET | `/registry/discover/:id` | Bearer + `X-Workspace-ID` | Discover a workspace's agent card and URL. Requires caller identification. Fails open on DB hiccup since hierarchy check is primary. | +| GET | `/registry/:id/peers` | Bearer + `X-Workspace-ID` | List all peers (siblings, parent, children) that the caller can communicate with. | +| POST | `/registry/check-access` | None | Check whether two workspaces can communicate. Body: `{ "caller_id": "...", "target_id": "..." }`. Returns `{ "allowed": true/false }`. | + +--- + +## Communication + +### A2A Proxy + +| Method | Path | Auth | Description | +|--------|------|------|-------------| +| POST | `/workspaces/:id/a2a` | CanCommunicate | Proxy an A2A JSON-RPC message to the target workspace. Caller identified via `X-Workspace-ID` header. Canvas requests (no header) bypass access check. On connection error, checks if container is dead and triggers auto-restart. | + +### Delegation + +| Method | Path | Auth | Description | +|--------|------|------|-------------| +| POST | `/workspaces/:id/delegate` | WorkspaceAuth | Async fire-and-forget delegation. Supports idempotency keys. Body includes target workspace, prompt, and metadata. | +| GET | `/workspaces/:id/delegations` | WorkspaceAuth | List delegation status for a workspace. Returns delegation rows with status, result, timestamps. | + +--- + +## Configuration + +| Method | Path | Auth | Description | +|--------|------|------|-------------| +| GET | `/workspaces/:id/config` | WorkspaceAuth | Get the workspace's `config.yaml` contents. | +| PATCH | `/workspaces/:id/config` | WorkspaceAuth | Update the workspace config. "Save & Restart" writes config and auto-restarts; "Save" writes only and shows a restart banner in the Canvas. | + +--- + +## Secrets + +### Per-Workspace Secrets + +| Method | Path | Auth | Description | +|--------|------|------|-------------| +| GET | `/workspaces/:id/secrets` | WorkspaceAuth | List secret keys for a workspace (keys only, values masked). | +| POST | `/workspaces/:id/secrets` | WorkspaceAuth | Set a secret `{ "key": "...", "value": "..." }`. Auto-restarts the workspace. | +| PUT | `/workspaces/:id/secrets` | WorkspaceAuth | Alias for POST (upsert semantics). Auto-restarts the workspace. | +| DELETE | `/workspaces/:id/secrets/:key` | WorkspaceAuth | Delete a secret by key. Auto-restarts the workspace. | +| GET | `/workspaces/:id/model` | WorkspaceAuth | Return the model configuration derived from available API keys (which provider keys are set). | + +### Global Secrets + +| Method | Path | Auth | Description | +|--------|------|------|-------------| +| GET | `/settings/secrets` | AdminAuth | List global secrets (keys only, values masked). | +| PUT | `/settings/secrets` | AdminAuth | Set a global secret `{ "key": "...", "value": "..." }`. Auto-restarts every non-paused/non-removed workspace that does not shadow the key with a workspace-level override. | +| POST | `/settings/secrets` | AdminAuth | Alias for PUT. | +| DELETE | `/settings/secrets/:key` | AdminAuth | Delete a global secret. Same auto-restart fan-out as PUT. | + +Legacy aliases `GET/POST/DELETE /admin/secrets[/:key]` also exist and behave identically. + +--- + +## Memory + +### Key-Value Memory + +| Method | Path | Auth | Description | +|--------|------|------|-------------| +| GET | `/workspaces/:id/memory` | WorkspaceAuth | List all key-value memory entries for a workspace. | +| POST | `/workspaces/:id/memory` | WorkspaceAuth | Set a memory entry `{ "key": "...", "value": "..." }`. | +| DELETE | `/workspaces/:id/memory/:key` | WorkspaceAuth | Delete a memory entry by key. | + +### Agent Memories (HMA-scoped) + +| Method | Path | Auth | Description | +|--------|------|------|-------------| +| GET | `/workspaces/:id/memories` | WorkspaceAuth | List agent memories for a workspace. | +| POST | `/workspaces/:id/memories` | WorkspaceAuth | Create an agent memory entry. | +| DELETE | `/workspaces/:id/memories/:id` | WorkspaceAuth | Delete an agent memory by ID. | + +--- + +## Files + +Workspace file management. Files are stored in the workspace's config directory. + +| Method | Path | Auth | Description | +|--------|------|------|-------------| +| GET | `/workspaces/:id/files` | WorkspaceAuth | List files in the workspace config directory. | +| GET | `/workspaces/:id/files/*path` | WorkspaceAuth | Read a specific file. | +| PUT | `/workspaces/:id/files/*path` | WorkspaceAuth | Write a file. Creates parent directories as needed. | +| DELETE | `/workspaces/:id/files/*path` | WorkspaceAuth | Delete a file. | +| GET | `/workspaces/:id/shared-context` | WorkspaceAuth | Get the shared context files for a workspace (aggregated from parent hierarchy). | + +--- + +## Activity + +Activity logging and search for A2A communications, task updates, and agent logs. + +| Method | Path | Auth | Description | +|--------|------|------|-------------| +| GET | `/workspaces/:id/activity` | WorkspaceAuth | List activity logs for a workspace. Supports `?source=canvas` or `?source=agent` filter. | +| POST | `/workspaces/:id/activity` | WorkspaceAuth | Log an activity entry (used by workspace runtimes to self-report). | +| POST | `/workspaces/:id/notify` | WorkspaceAuth | Agent-to-user push message via WebSocket. Delivers a notification to connected Canvas clients. | + +### Session Search + +| Method | Path | Auth | Description | +|--------|------|------|-------------| +| GET | `/workspaces/:id/session-search` | WorkspaceAuth | Search activity logs with filters for type, date range, and text content. Returns paginated results. | + +--- + +## Schedules + +Cron-based scheduled tasks per workspace. + +| Method | Path | Auth | Description | +|--------|------|------|-------------| +| GET | `/workspaces/:id/schedules` | WorkspaceAuth | List all schedules for a workspace. | +| POST | `/workspaces/:id/schedules` | WorkspaceAuth | Create a schedule. Body: `{ "expression": "0 */6 * * *", "timezone": "UTC", "prompt": "...", "enabled": true }`. | +| PATCH | `/workspaces/:id/schedules/:scheduleId` | WorkspaceAuth | Update a schedule (expression, timezone, prompt, enabled). | +| DELETE | `/workspaces/:id/schedules/:scheduleId` | WorkspaceAuth | Delete a schedule. | +| POST | `/workspaces/:id/schedules/:scheduleId/run` | WorkspaceAuth | Manually trigger a schedule immediately. | +| GET | `/workspaces/:id/schedules/:scheduleId/history` | WorkspaceAuth | List past runs for a schedule. Includes status (`success`, `error`, `skipped`) and `error_detail`. | + +Schedule `source` field: `template` for org/import-seeded schedules, `runtime` for Canvas/API-created. The `last_status` includes `skipped` when the scheduler concurrency-aware-skips a busy workspace. + +--- + +## Channels + +Social channel integrations (Telegram, Slack, etc.) for workspace agents. + +### Per-Workspace Channels + +| Method | Path | Auth | Description | +|--------|------|------|-------------| +| GET | `/workspaces/:id/channels` | WorkspaceAuth | List channels for a workspace. | +| POST | `/workspaces/:id/channels` | WorkspaceAuth | Create a channel. Body includes platform type, JSONB config, and allowlist. | +| PATCH | `/workspaces/:id/channels/:channelId` | WorkspaceAuth | Update a channel's config or allowlist. | +| DELETE | `/workspaces/:id/channels/:channelId` | WorkspaceAuth | Delete a channel. | +| POST | `/workspaces/:id/channels/:channelId/send` | WorkspaceAuth | Send an outbound message through the channel. | +| POST | `/workspaces/:id/channels/:channelId/test` | WorkspaceAuth | Test the channel connection (send a test message). | + +### Global Channel Endpoints + +| Method | Path | Auth | Description | +|--------|------|------|-------------| +| GET | `/channels/adapters` | None | List available social platform adapters (Telegram, Slack, etc.). | +| POST | `/channels/discover` | AdminAuth | Auto-detect available chats/groups for a bot token. | +| POST | `/webhooks/:type` | None | Incoming webhook endpoint for social platforms. The `:type` parameter identifies the platform (e.g., `telegram`, `slack`). | + +--- + +## Plugins + +Plugin registry and per-workspace plugin management. + +### Global Plugin Registry + +| Method | Path | Auth | Description | +|--------|------|------|-------------| +| GET | `/plugins` | None | List all plugins in the registry. Supports `?runtime=` filter to show only compatible plugins. | +| GET | `/plugins/sources` | None | List registered install-source schemes (e.g., `github://`, `local://`). | + +### Per-Workspace Plugins + +| Method | Path | Auth | Description | +|--------|------|------|-------------| +| GET | `/workspaces/:id/plugins` | WorkspaceAuth | List installed plugins for a workspace. | +| POST | `/workspaces/:id/plugins` | WorkspaceAuth | Install a plugin. Body: `{ "source": "github://org/repo" }`. Safeguards: 64 KiB body limit, 5 min fetch timeout, 100 MiB max staged-tree. | +| DELETE | `/workspaces/:id/plugins/:name` | WorkspaceAuth | Uninstall a plugin by name. | +| GET | `/workspaces/:id/plugins/available` | WorkspaceAuth | List plugins available for this workspace (filtered by workspace runtime). | +| GET | `/workspaces/:id/plugins/compatibility` | WorkspaceAuth | Preflight runtime-change check. Query: `?runtime=X`. Returns which currently-installed plugins would be incompatible with the target runtime. | + +--- + +## Auth Tokens + +Bearer token management for workspaces. + +| Method | Path | Auth | Description | +|--------|------|------|-------------| +| GET | `/workspaces/:id/tokens` | WorkspaceAuth | List active tokens for a workspace (token values are masked). | +| POST | `/workspaces/:id/tokens` | WorkspaceAuth | Create a new bearer token for the workspace. | +| DELETE | `/workspaces/:id/tokens/:tokenId` | WorkspaceAuth | Revoke a specific token. | + +### Test Token (Development Only) + +| Method | Path | Auth | Description | +|--------|------|------|-------------| +| GET | `/admin/workspaces/:id/test-token` | None | Mint a fresh bearer token for E2E scripts. Returns 404 unless `MOLECULE_ENV != production` or `MOLECULE_ENABLE_TEST_TOKENS=1`. | + +--- + +## Teams + +Expand and collapse team views in the Canvas hierarchy. + +| Method | Path | Auth | Description | +|--------|------|------|-------------| +| POST | `/workspaces/:id/expand` | WorkspaceAuth | Expand a team workspace to show its children on the canvas. | +| POST | `/workspaces/:id/collapse` | WorkspaceAuth | Collapse a team workspace to hide its children. | + +--- + +## Templates and Bundles + +### Templates + +| Method | Path | Auth | Description | +|--------|------|------|-------------| +| GET | `/templates` | None | List available workspace templates with their runtime, description, and config schema. | +| POST | `/templates/import` | AdminAuth | Import a workspace template from a `github://` source URL. | + +### Org Templates + +| Method | Path | Auth | Description | +|--------|------|------|-------------| +| GET | `/org/templates` | None | List available organization templates. | +| POST | `/org/import` | AdminAuth | Import an org template. Applies `resolveInsideRoot` path sanitization. Creates the full workspace hierarchy defined in `org.yaml`. | + +### Bundles + +| Method | Path | Auth | Description | +|--------|------|------|-------------| +| GET | `/bundles/export/:id` | AdminAuth | Export a workspace (or workspace tree) as a portable bundle. Includes config, secrets (keys only), memory, schedules, and hierarchy. | +| POST | `/bundles/import` | AdminAuth | Import a previously-exported bundle. Recreates the workspace tree with all associated data. | + +--- + +## Approvals + +Human-in-the-loop approval system for agent actions. + +| Method | Path | Auth | Description | +|--------|------|------|-------------| +| POST | `/workspaces/:id/approvals` | WorkspaceAuth | Create an approval request. Body includes the action description, metadata, and options. | +| GET | `/workspaces/:id/approvals` | WorkspaceAuth | List approval requests for a workspace. | +| POST | `/workspaces/:id/approvals/:id/decide` | WorkspaceAuth | Approve or reject an approval request. Body: `{ "decision": "approve" }` or `{ "decision": "reject" }`. | +| GET | `/approvals/pending` | AdminAuth | List all pending approval requests across all workspaces. | + +--- + +## Canvas + +Canvas viewport persistence (cosmetic only). + +| Method | Path | Auth | Description | +|--------|------|------|-------------| +| GET | `/canvas/viewport` | None | Get the saved canvas viewport (zoom, pan position). Open endpoint for bootstrap-friendliness. | +| PUT | `/canvas/viewport` | CanvasOrBearer | Save the canvas viewport. Accepts bearer OR matching `Origin` header. Worst case on forgery: viewport corruption, recovered by page refresh. | + +--- + +## Traces + +LLM trace retrieval from Langfuse. + +| Method | Path | Auth | Description | +|--------|------|------|-------------| +| GET | `/workspaces/:id/traces` | WorkspaceAuth | List LLM traces for a workspace from Langfuse. | + +--- + +## Events + +Append-only event log for structure changes. + +| Method | Path | Auth | Description | +|--------|------|------|-------------| +| GET | `/events` | AdminAuth | List all structure events across all workspaces. | +| GET | `/events/:workspaceId` | AdminAuth | List structure events for a specific workspace. | + +--- + +## Terminal + +WebSocket-based terminal access to workspace containers. + +| Method | Path | Auth | Description | +|--------|------|------|-------------| +| WS | `/workspaces/:id/terminal` | WorkspaceAuth | Open a WebSocket terminal session to the workspace container. Provides interactive shell access. | + +--- + +## WebSocket + +Real-time event streaming for Canvas clients. + +| Method | Path | Auth | Description | +|--------|------|------|-------------| +| WS | `/ws` | None | Connect to the WebSocket hub. Receives all structure events (`WORKSPACE_ONLINE`, `WORKSPACE_OFFLINE`, `HEARTBEAT`, `CONFIG_UPDATED`, `A2A_RESPONSE`, `AGENT_MESSAGE`, etc.). Canvas clients connect here for real-time updates. | + +--- + +## Error Responses + +All endpoints return standard HTTP status codes: + +| Status | Meaning | +|--------|---------| +| 200 | Success | +| 201 | Created | +| 400 | Bad request (malformed body, missing required fields) | +| 401 | Unauthorized (missing or invalid bearer token) | +| 403 | Forbidden (valid token but insufficient access) | +| 404 | Not found (workspace, schedule, channel, etc. does not exist) | +| 409 | Conflict (idempotency key collision on delegation) | +| 429 | Rate limited (exceeds `RATE_LIMIT` requests/min) | +| 500 | Internal server error | + +Error response body format: + +```json +{ + "error": "human-readable error message" +} +``` + +--- + +## Rate Limiting + +All endpoints are subject to a global rate limit of `RATE_LIMIT` requests per minute (default: 600). When exceeded, the platform returns `429 Too Many Requests` with a `Retry-After` header. + +--- + +## CORS + +The platform sets CORS headers based on the `CORS_ORIGINS` environment variable (comma-separated list, default: `http://localhost:3000,http://localhost:3001`). Preflight (`OPTIONS`) requests are handled automatically by the Gin CORS middleware. diff --git a/content/docs/architecture.mdx b/content/docs/architecture.mdx index d68c385..c2b52a8 100644 --- a/content/docs/architecture.mdx +++ b/content/docs/architecture.mdx @@ -1,11 +1,341 @@ --- title: Architecture -description: Stub page β€” content coming soon. +description: System architecture, components, infrastructure, and communication model for the Molecule AI platform. --- -> 🚧 **Coming soon.** The Documentation Specialist agent will populate this -> page on its next maintenance cycle. +# Architecture -If you need this content urgently, open an issue on the -[docs repo](https://github.com/Molecule-AI/docs/issues/new) and the agent -will prioritise it on its next cron tick. +Molecule AI is a platform for orchestrating AI agent workspaces that form an organizational hierarchy. Workspaces register with a central platform, communicate via A2A (Agent-to-Agent) protocol, and are visualized on a drag-and-drop canvas. + +## System Overview + +``` +Canvas (Next.js :3000) <--WebSocket--> Platform (Go :8080) <--HTTP--> Postgres + Redis + | + Workspace A <----A2A----> Workspace B + (Python agents) + | register/heartbeat | + +------ Platform ----+ +``` + +The Canvas provides the visual interface, the Platform acts as the control plane, and Workspaces are isolated containers running AI agent runtimes. All inter-agent communication is mediated by the Platform via the A2A proxy, which enforces hierarchical access control. + +--- + +## Four Main Components + +### Canvas + +**Stack:** Next.js 15 + React Flow (@xyflow/react v12) + Zustand + Tailwind CSS + +The Canvas is the browser-based visual workspace graph. It provides: + +- **Drag-and-drop layout** with persistent node positions (saved via `PATCH /workspaces/:id`) +- **Team nesting** using recursive `TeamMemberChip` components (up to 3 levels deep) +- **Real-time status** via WebSocket connection to the Platform +- **Chat interface** with two sub-tabs: "My Chat" (user-to-agent) and "Agent Comms" (agent-to-agent A2A traffic) +- **Config editor** with "Save & Restart" and "Save" (deferred restart) modes +- **Secrets management** with auto-restart on POST/DELETE + +**State management:** + +| Concern | Mechanism | +|---------|-----------| +| Initial load | HTTP fetch `GET /workspaces` into Zustand | +| Real-time updates | WebSocket events via `applyEvent()` | +| Position persistence | `onNodeDragStop` sends `PATCH /workspaces/:id` with `{x, y}` | +| Node nesting | `nestNode` sets `hidden: !!targetId`; children render inside parent | + +**Environment variables:** + +| Variable | Default | Purpose | +|----------|---------|---------| +| `NEXT_PUBLIC_PLATFORM_URL` | `http://localhost:8080` | Platform API base URL | +| `NEXT_PUBLIC_WS_URL` | `ws://localhost:8080/ws` | WebSocket endpoint | + +### Platform + +**Stack:** Go / Gin + +The Platform is the central control plane responsible for: + +- **Workspace CRUD** -- create, read, update, delete workspaces +- **Registry** -- workspace registration, heartbeat tracking, agent card management +- **Discovery** -- peer lookup, access control checks +- **WebSocket hub** -- real-time event broadcasting to Canvas clients +- **Liveness monitoring** -- three-layer container health detection +- **A2A proxy** -- routes inter-agent messages with hierarchical access control +- **Docker provisioner** -- container lifecycle management with tier-based resource limits +- **Scheduler** -- cron-based scheduled tasks per workspace +- **Channel adapters** -- social integrations (Telegram, Slack, etc.) + +**Key environment variables:** + +| Variable | Default | Purpose | +|----------|---------|---------| +| `DATABASE_URL` | (required) | Postgres connection string | +| `REDIS_URL` | (required) | Redis connection string | +| `PORT` | `8080` | Server listen port | +| `PLATFORM_URL` | `http://host.docker.internal:PORT` | URL passed to agent containers | +| `SECRETS_ENCRYPTION_KEY` | (optional) | AES-256 key, 32 bytes | +| `CORS_ORIGINS` | `http://localhost:3000,http://localhost:3001` | Allowed CORS origins | +| `RATE_LIMIT` | `600` | Requests per minute | +| `MOLECULE_ENV` | (optional) | Set `production` to hide test endpoints | +| `MOLECULE_ORG_ID` | (optional) | SaaS tenant org gating | +| `WORKSPACE_DIR` | (optional) | Global fallback host path for `/workspace` bind-mount | +| `AWARENESS_URL` | (optional) | Injected into workspace containers for cross-session memory | +| `ACTIVITY_RETENTION_DAYS` | `7` | How long activity logs are kept | +| `ACTIVITY_CLEANUP_INTERVAL_HOURS` | `6` | Cleanup sweep interval | + +**Workspace tier resource limits:** + +| Tier | Env (Memory) | Env (CPU) | Defaults | +|------|-------------|-----------|----------| +| Standard (Tier 2) | `TIER2_MEMORY_MB` | `TIER2_CPU_SHARES` | 512 MB / 1 CPU | +| Privileged (Tier 3) | `TIER3_MEMORY_MB` | `TIER3_CPU_SHARES` | 2048 MB / 2 CPU | +| Full-host (Tier 4) | `TIER4_MEMORY_MB` | `TIER4_CPU_SHARES` | 4096 MB / 4 CPU | + +### Workspace Runtime + +**Published as:** [`molecule-ai-workspace-runtime`](https://pypi.org/project/molecule-ai-workspace-runtime/) on PyPI + +The shared runtime provides the base agent infrastructure: A2A server, heartbeat loop, config loading, platform auth, plugin system, and built-in tools. Each AI framework adapter lives in its own standalone repository. + +| Runtime | Standalone Repo | Key Dependencies | +|---------|-----------------|------------------| +| LangGraph | `molecule-ai-workspace-template-langgraph` | langchain-anthropic, langgraph | +| Claude Code | `molecule-ai-workspace-template-claude-code` | claude-agent-sdk, @anthropic-ai/claude-code | +| OpenClaw | `molecule-ai-workspace-template-openclaw` | openclaw (npm) | +| CrewAI | `molecule-ai-workspace-template-crewai` | crewai | +| AutoGen | `molecule-ai-workspace-template-autogen` | autogen | +| DeepAgents | `molecule-ai-workspace-template-deepagents` | deepagents | +| Hermes | `molecule-ai-workspace-template-hermes` | openai, anthropic, google-genai | +| Gemini CLI | `molecule-ai-workspace-template-gemini-cli` | @google/gemini-cli (npm) | + +Each adapter repo has its own `Dockerfile` that installs `molecule-ai-workspace-runtime` from PyPI plus adapter-specific dependencies. Templates are cloned at Docker build time into the platform image via `manifest.json`. + +### molecli + +**Stack:** Go / Bubbletea + Lipgloss + +A terminal UI dashboard for real-time workspace monitoring, event log streaming, health overview, and delete/filter operations. Reads `MOLECLI_URL` (default `http://localhost:8080`) to locate the platform. Now published as a standalone repo at `github.com/Molecule-AI/molecule-cli`. + +--- + +## Infrastructure Services + +All services run via `docker-compose.infra.yml`, attached to the shared `molecule-monorepo-net` network. Start them with: + +```bash +./infra/scripts/setup.sh # Start Postgres, Redis, Langfuse, Temporal; run migrations +``` + +### Postgres (port 5432) + +Primary datastore for workspaces, events, activity logs, secrets, schedules, channels, and more. Also backs Langfuse and Temporal via separate databases. + +Key tables: + +| Table | Purpose | +|-------|---------| +| `workspaces` | Core entity -- status, runtime, agent_card, heartbeat, current_task | +| `canvas_layouts` | Persisted x/y positions | +| `structure_events` | Append-only event log | +| `activity_logs` | A2A communications, task updates, agent logs, errors | +| `workspace_schedules` | Cron tasks with expression, timezone, prompt, run history | +| `workspace_channels` | Social channel integrations with JSONB config | +| `workspace_secrets` / `global_secrets` | Encrypted secrets storage | +| `workspace_auth_tokens` | Bearer tokens (auto-revoked on workspace delete) | +| `agent_memories` | HMA-scoped agent memory | +| `approvals` | Human-in-the-loop approval requests | + +**Migration runner:** On startup, the platform globs `*.sql` in the migrations directory, filters out `.down.sql` files, sorts alphabetically, and executes each. All `.up.sql` files must be idempotent (`CREATE TABLE IF NOT EXISTS`, `ALTER TABLE ... IF NOT EXISTS`). + +**JSONB gotcha:** When inserting Go `[]byte` (from `json.Marshal`) into Postgres JSONB columns, you must convert to `string()` first and use `::jsonb` cast in SQL. The `lib/pq` driver treats `[]byte` as `bytea`, not JSONB. + +### Redis (port 6379) + +Used for pub/sub event broadcasting and heartbeat TTL tracking. Workspace heartbeat keys expire after 60 seconds -- expiry triggers the liveness monitor. + +### Langfuse (port 3001) + +LLM trace viewer backed by ClickHouse. Provides observability into agent LLM calls, token usage, and latency. + +### Temporal (port 7233 gRPC, port 8233 Web UI) + +Durable workflow engine for `workspace-template/builtin_tools/temporal_workflow.py`. Dev-only posture: the auto-setup image runs with no auth on `0.0.0.0:7233`. Production deployments must gate access via mTLS or an API key / reverse proxy. + +--- + +## Communication Model + +### WebSocket Events Flow + +``` +1. Action occurs (register, heartbeat, config change, etc.) +2. broadcaster.RecordAndBroadcast() + -> inserts into structure_events table + -> publishes to Redis pub/sub +3. Redis subscriber relays to WebSocket hub +4. Hub broadcasts to: + - Canvas clients (all events) + - Workspace clients (filtered by CanCommunicate) +``` + +### A2A Proxy + +The A2A proxy (`POST /workspaces/:id/a2a`) routes agent-to-agent messages. The caller identifies itself via the `X-Workspace-ID` header and authenticates with `Authorization: Bearer `. + +### Access Control Rules + +Determined by `CanCommunicate(callerID, targetID)` in `registry/access.go`: + +| Relationship | Allowed | +|-------------|---------| +| Same workspace (self-call) | Yes | +| Siblings (same `parent_id`) | Yes | +| Root-level siblings (both `parent_id` IS NULL) | Yes | +| Parent to child / child to parent | Yes | +| System callers (`webhook:*`, `system:*`, `test:*`) | Yes (bypass) | +| Canvas requests (no `X-Workspace-ID`) | Yes (bypass) | +| Everything else | **Denied** | + +### Import Cycle Prevention + +The platform uses function injection to avoid Go import cycles between `ws`, `registry`, and `events` packages: + +- `ws.NewHub(canCommunicate AccessChecker)` -- Hub accepts `registry.CanCommunicate` as a function +- `registry.StartLivenessMonitor(ctx, onOffline OfflineHandler)` -- Liveness accepts broadcaster callback +- `registry.StartHealthSweep(ctx, checker ContainerChecker, interval, onOffline)` -- Health sweep accepts Docker checker interface +- Wiring happens in `platform/cmd/server/main.go` -- init order: `wh -> onWorkspaceOffline -> liveness/healthSweep -> router` + +--- + +## Container Health Detection + +Three independent layers detect dead containers (e.g., Docker Desktop crash): + +### Layer 1: Passive (Redis TTL) + +Each workspace sends heartbeats that set a Redis key with a 60-second TTL. When the key expires, the liveness monitor detects the workspace as offline and triggers an auto-restart. + +### Layer 2: Proactive (Health Sweep) + +`registry.StartHealthSweep` polls the Docker API every 15 seconds. Catches dead containers faster than waiting for Redis TTL expiry. + +### Layer 3: Reactive (A2A Proxy) + +When the A2A proxy encounters a connection error to a workspace, it immediately checks `provisioner.IsRunning()`. If the container is dead, it marks the workspace offline and triggers a restart. + +All three layers call `onWorkspaceOffline`, which broadcasts `WORKSPACE_OFFLINE` and initiates `wh.RestartByID()`. Redis cleanup uses the shared `db.ClearWorkspaceKeys()` function. + +--- + +## Workspace Lifecycle + +``` +provisioning --> online (on register) + ^ | + | degraded (error_rate > 0.5) + | | + | online (recovered) + | | + | offline (Redis TTL expired / health sweep) + | | + +--- auto-restart ---+ + | + removed (deleted) + +Any state --> paused (user pauses) --> provisioning (user resumes) +``` + +Paused workspaces skip health sweep, liveness monitor, and auto-restart. + +**Restart context:** After any restart and successful re-registration, the platform sends a synthetic A2A `message/send` with `metadata.kind=restart_context` containing the restart timestamp, previous session info, and available env-var keys (keys only, never values). The sender uses the `system:restart-context` caller prefix to bypass `CanCommunicate`. If the workspace does not re-register within 30 seconds, the message is dropped. + +**Initial prompt:** Agents can auto-execute a prompt on startup before any user interaction. Configure via `initial_prompt` (inline string) or `initial_prompt_file` (path relative to config dir) in `config.yaml`. A `.initial_prompt_done` marker file prevents re-execution on restart. + +**Idle loop:** When `idle_prompt` is non-empty in `config.yaml`, the workspace self-sends it every `idle_interval_seconds` (default 600) while `heartbeat.active_tasks == 0`. The idle check is local (no LLM call) and the prompt only fires when the agent is genuinely idle. + +--- + +## Deployment Modes + +### Self-Hosted + +Run the full stack on your own infrastructure using Docker Compose: + +```bash +# Infrastructure only (Postgres, Redis, Langfuse, Temporal) +docker compose -f docker-compose.infra.yml up -d + +# Full stack +docker compose up +``` + +### SaaS + +Hosted at `moleculesai.app` with per-tenant isolation. Each tenant gets a dedicated Fly Machine running the tenant image. The `MOLECULE_ORG_ID` env var gates API access -- every non-allowlisted request must carry a matching `X-Molecule-Org-Id` header or gets a 404. When unset, the guard is a passthrough so self-hosted and dev environments are unaffected. + +### Tenant Image + +`platform/Dockerfile.tenant` bundles the Go platform + Canvas frontend + templates into a single container image, published to `ghcr.io/molecule-ai/platform:latest` and `:sha-`. + +--- + +## Subdomain Architecture + +| Subdomain | Service | Purpose | +|-----------|---------|---------| +| `moleculesai.app` | Landing page | Marketing site | +| `app.moleculesai.app` | SaaS dashboard | Tenant management UI | +| `api.moleculesai.app` | Control plane API | Platform REST + WebSocket | +| `doc.moleculesai.app` | Documentation | This documentation site | +| `status.moleculesai.app` | Status page | Uptime and incident tracking | +| `*.moleculesai.app` | Tenant instances | Per-org isolated platform instances | + +--- + +## Plugin System + +Plugins extend workspace capabilities. Two categories exist: + +**Shared plugins** (auto-loaded by every workspace): + +- **molecule-dev** -- codebase conventions + review-loop skill +- **superpowers** -- verification, TDD, systematic debugging, writing plans +- **ecc** -- general Claude Code guardrails +- **browser-automation** -- Puppeteer/CDP web scraping and live canvas screenshots + +**Modular guardrails** (opt-in per workspace): + +- **Hook plugins** (ambient enforcement): `molecule-careful-bash`, `molecule-freeze-scope`, `molecule-audit-trail`, `molecule-session-context`, `molecule-prompt-watchdog` +- **Skill plugins** (on-demand): `molecule-skill-code-review`, `molecule-skill-cross-vendor-review`, `molecule-skill-llm-judge`, `molecule-skill-update-docs`, `molecule-skill-cron-learnings` +- **Workflow plugins** (slash commands): `molecule-workflow-triage`, `molecule-workflow-retro` + +**Org-template plugin resolution:** Per-workspace `plugins:` lists in org template `org.yaml` role overrides UNION with `defaults.plugins` (deduplicated, defaults first). To opt a specific default out for a given role, prefix the plugin name with `!` or `-` (e.g. `!browser-automation`). + +Plugin install safeguards: + +| Parameter | Default | Purpose | +|-----------|---------|---------| +| `PLUGIN_INSTALL_BODY_MAX_BYTES` | 65536 (64 KiB) | Max request body size | +| `PLUGIN_INSTALL_FETCH_TIMEOUT` | 5m | Whole fetch+copy deadline | +| `PLUGIN_INSTALL_MAX_DIR_BYTES` | 104857600 (100 MiB) | Max staged-tree size | + +--- + +## CI Pipeline + +GitHub Actions runs on push to main and on pull requests: + +| Job | What it does | +|-----|-------------| +| `platform-build` | Go build, vet, `go test -race` with 25% coverage threshold | +| `canvas-build` | npm build, vitest run (tests must exist and pass) | +| `python-lint` | pytest with coverage for workspace-template | +| `e2e-api` | Spins up Postgres + Redis, runs 62 API tests against locally-built binary | +| `shellcheck` | Lints all E2E shell scripts | +| `publish-platform-image` | Builds and pushes to `ghcr.io/molecule-ai/platform` (main only) | + +Standalone repos (plugins + templates) use reusable workflows from `Molecule-AI/molecule-ci` for schema validation, secrets scanning, and Docker build smoke tests. diff --git a/content/docs/channels.mdx b/content/docs/channels.mdx index 054c2f1..94fa44a 100644 --- a/content/docs/channels.mdx +++ b/content/docs/channels.mdx @@ -1,11 +1,259 @@ --- title: Channels -description: Stub page β€” content coming soon. +description: Connect workspaces to Telegram, Slack, and Lark/Feishu for social integrations. --- -> 🚧 **Coming soon.** The Documentation Specialist agent will populate this -> page on its next maintenance cycle. +## Overview -If you need this content urgently, open an issue on the -[docs repo](https://github.com/Molecule-AI/docs/issues/new) and the agent -will prioritise it on its next cron tick. +Channels let workspaces send and receive messages on social platforms. Each +workspace can have multiple channel integrations β€” a Telegram bot, a Slack +webhook, a Lark/Feishu Custom Bot β€” configured independently with per-channel +allowlists and JSONB config. + +Outbound messages flow from the workspace through the platform adapter to the +social platform. Inbound messages arrive via webhooks (`POST /webhooks/:type`), +are parsed by the adapter, and forwarded to the workspace as A2A +`message/send` requests. + +``` +User (Telegram/Slack/Lark) ──webhook──> Platform ──A2A──> Workspace Agent + <──adapter── (response) +User <──bot message──────────────────────────────────────/ +``` + +--- + +## Adapters + +Three adapters are registered out of the box. Use `GET /channels/adapters` to +list them at runtime. + +### Telegram + +Uses the Telegram Bot API. Supports both long-polling (for inbound) and direct +API calls (for outbound). The adapter caches `BotAPI` instances to avoid +repeated `getMe` calls. + +**Required config fields:** + +| Field | Type | Description | +|-------|------|-------------| +| `bot_token` | string | Telegram bot token (`123456789:ABCdef...`). Validated against a strict regex. | +| `chat_id` | string | Comma-separated chat IDs to listen on and send to. | + +**Features:** + +- Long-polling with 30s timeout and 2s retry interval +- Auto-reply to `/start` with the chat ID (useful for setup) +- Bot commands: `/start`, `/help`, `/reset` (clear history), `/cancel` (best-effort) +- Long messages automatically split at paragraph/line/word boundaries (4096 char limit) +- Typing indicator sent while the agent processes +- Rate-limit handling with `retry_after` backoff +- Auto-discovers chats via `getUpdates` (including `my_chat_member` events for group adds) +- Auto-disables the channel when the bot is kicked from a chat + +### Slack + +Uses Slack Incoming Webhooks for outbound and the Slack Events API for inbound. + +**Required config fields:** + +| Field | Type | Description | +|-------|------|-------------| +| `webhook_url` | string | Slack Incoming Webhook URL (must start with `https://hooks.slack.com/`). | + +**Features:** + +- Outbound via Incoming Webhook (no OAuth required) +- Inbound via Events API JSON payload or slash command (URL-encoded form) +- `url_verification` challenge handshake supported +- Slash commands prepend the command name so the agent sees the full invocation + +### Lark / Feishu + +Outbound via Custom Bot webhooks, inbound via Event Subscriptions. + +**Required config fields:** + +| Field | Type | Description | +|-------|------|-------------| +| `webhook_url` | string | Custom Bot webhook URL. Must start with `https://open.feishu.cn/open-apis/bot/v2/hook/` or `https://open.larksuite.com/open-apis/bot/v2/hook/`. | + +**Optional config fields:** + +| Field | Type | Description | +|-------|------|-------------| +| `verify_token` | string | Verification Token from the app's Event Subscriptions page. When set, inbound events with a mismatching token are rejected. | + +**Features:** + +- Both China (`open.feishu.cn`) and international (`open.larksuite.com`) endpoints supported +- `url_verification` handshake with constant-time `verify_token` comparison +- v2 event payload parsing (`im.message.receive_v1`) +- Token verification on both `url_verification` and `event_callback` payloads +- Application-level error codes checked (Lark returns HTTP 200 even for app errors) + +--- + +## Setup Flow + +### 1. Create a Channel + +```bash +curl -X POST http://localhost:8080/workspaces/{id}/channels \ + -H "Content-Type: application/json" \ + -H "Authorization: Bearer {token}" \ + -d '{ + "type": "telegram", + "config": { + "bot_token": "123456789:ABCdefGHIjklmnopQRSTuvwxyz", + "chat_id": "-1001234567890" + } + }' +``` + +### 2. Test the Connection + +```bash +curl -X POST http://localhost:8080/workspaces/{id}/channels/{channelId}/test \ + -H "Authorization: Bearer {token}" +``` + +### 3. Send a Message + +```bash +curl -X POST http://localhost:8080/workspaces/{id}/channels/{channelId}/send \ + -H "Content-Type: application/json" \ + -H "Authorization: Bearer {token}" \ + -d '{"text": "Hello from the agent!"}' +``` + +--- + +## Inbound Webhooks + +Register your platform's public URL as the webhook endpoint for each social +platform. Inbound messages arrive at: + +``` +POST /webhooks/:type +``` + +where `:type` is `telegram`, `slack`, or `lark`. The platform: + +1. Looks up all channels of that type +2. Calls the adapter's `ParseWebhook` to extract a standardized `InboundMessage` +3. Checks the allowlist (if configured) +4. Forwards the message to the workspace via A2A `message/send` + +For Telegram, the platform can also use long-polling instead of webhooks, +started automatically when a Telegram channel is created. + +--- + +## Discover Chats + +Auto-detect available chats for a bot token before creating a channel: + +```bash +curl -X POST http://localhost:8080/channels/discover \ + -H "Content-Type: application/json" \ + -d '{"type": "telegram", "bot_token": "123456789:ABCdef..."}' +``` + +Returns the bot username, discovered chats (with IDs, names, and types), and +whether the bot can read all group messages (Telegram privacy mode). + +--- + +## Allowlists + +Each channel row has an `allowed_users` JSONB array. When non-empty, only +messages from users whose IDs appear in the list are forwarded to the workspace. +All others are silently dropped. + +--- + +## Config Encryption + +Sensitive config fields (like `bot_token`) are encrypted at rest. The `List` +endpoint decrypts them server-side and masks tokens in the response +(showing only the first 4 and last 4 characters). + +--- + +## API Reference + +| Method | Path | Description | +|--------|------|-------------| +| GET | `/channels/adapters` | List available adapter types | +| POST | `/channels/discover` | Auto-detect chats for a bot token | +| GET | `/workspaces/:id/channels` | List channels for a workspace | +| POST | `/workspaces/:id/channels` | Add a channel | +| PATCH | `/workspaces/:id/channels/:channelId` | Update a channel | +| DELETE | `/workspaces/:id/channels/:channelId` | Remove a channel | +| POST | `/workspaces/:id/channels/:channelId/test` | Test connection | +| POST | `/workspaces/:id/channels/:channelId/send` | Send outbound message | +| POST | `/webhooks/:type` | Incoming social webhook | + +--- + +## Example Configs + +### Telegram + +```json +{ + "type": "telegram", + "config": { + "bot_token": "123456789:ABCdefGHIjklmnopQRSTuvwxyz_1234", + "chat_id": "-1001234567890" + } +} +``` + +Multiple chats (comma-separated): + +```json +{ + "type": "telegram", + "config": { + "bot_token": "123456789:ABCdefGHIjklmnopQRSTuvwxyz_1234", + "chat_id": "-1001234567890, -1009876543210" + } +} +``` + +### Slack + +```json +{ + "type": "slack", + "config": { + "webhook_url": "https://hooks.slack.com/services/YOUR/WEBHOOK/URL" + } +} +``` + +### Lark / Feishu + +```json +{ + "type": "lark", + "config": { + "webhook_url": "https://open.larksuite.com/open-apis/bot/v2/hook/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx", + "verify_token": "your-verification-token" + } +} +``` + +China endpoint: + +```json +{ + "type": "lark", + "config": { + "webhook_url": "https://open.feishu.cn/open-apis/bot/v2/hook/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + } +} +``` diff --git a/content/docs/concepts.mdx b/content/docs/concepts.mdx index a0d5d6a..c402722 100644 --- a/content/docs/concepts.mdx +++ b/content/docs/concepts.mdx @@ -1,10 +1,8 @@ --- title: Concepts -description: The five primitives that compose every Molecule AI org β€” workspaces, plugins, channels, schedules, and the canvas. +description: The core primitives that compose every Molecule AI org β€” workspaces, plugins, channels, schedules, tokens, external agents, and the canvas. --- -If you understand these five concepts, you understand the whole platform. - ## Workspaces A **workspace** is a real Docker container running a real LLM agent. Each @@ -12,16 +10,31 @@ workspace has: - A **role** (a one-line job description fed into its system prompt) - An **initial prompt** (run once at first boot β€” typically clone repo, - read CLAUDE.md, memorise context) -- A **runtime** (`claude-code`, `langgraph`, `crewai`, `autogen`, etc.) -- A **tier** (resource budget β€” memory and CPU caps) + read docs, memorise context) +- A **runtime** (`claude-code`, `langgraph`, `crewai`, `autogen`, `deepagents`, + `openclaw`, `hermes`, `gemini-cli`) +- A **tier** (resource budget β€” T1 sandboxed, T2 standard, T3 privileged, T4 full-host) - An optional **parent** (forms the org tree) - An optional **workspace_dir** (a host path bind-mounted into the container β€” gives the agent direct access to your codebase) -Workspaces talk to each other via A2A (agent-to-agent) messages, routed -by the platform. Every message becomes an edge on the canvas in real -time. +Workspaces talk to each other via **A2A** (agent-to-agent) messages, routed +by the platform. Communication rules: same workspace, siblings, and +parent/child are allowed; everything else is denied. + +## External agents + +An **external agent** is a workspace with `runtime: external` β€” it runs on +your own infrastructure instead of the platform's Docker network. External +agents: + +- Register via `POST /registry/register` and receive a bearer token +- Send heartbeats every 30 seconds to stay online +- Accept A2A messages at their registered URL +- Appear on the canvas with a purple **REMOTE** badge +- Skip Docker health sweep (liveness is heartbeat-only) + +See [External Agents](/docs/external-agents) for the full registration guide. ## Plugins @@ -33,30 +46,56 @@ A **plugin** is a bundle of capabilities a workspace can install: review, LLM-as-judge gates - **Slash commands** β€” `/triage`, `/retro`, etc. - **MCP servers** β€” bring in tools the model can call -- **Builtin tools** β€” Python/JS extensions exposed to the agent -Plugins compose. Per-workspace plugin lists UNION with the org-wide -defaults β€” so adding one extra capability to one role doesn't require -re-listing every default. +Plugins have two axes: **source** (where to fetch β€” `local://`, `github://`) +and **shape** (what's inside β€” agentskills.io format, MCP server, etc.). + +Plugins compose. Per-workspace plugin lists **UNION** with the org-wide +defaults β€” adding one capability to one role doesn't require re-listing +every default. Use `!plugin-name` to opt a specific default out. + +See [Plugins](/docs/plugins) for the full guide. ## Channels -A **channel** wires a workspace to an external messaging surface: -Telegram, Slack, Discord, email, webhooks. Once connected, the user can -talk to the agent from outside the canvas β€” and the agent can broadcast -back. +A **channel** wires a workspace to an external messaging platform: + +| Adapter | Platform | Config | +|---------|----------|--------| +| `telegram` | Telegram | Bot token + chat_id allowlist | +| `slack` | Slack | Workspace token + channel | +| `lark` | Lark / Feishu | Custom Bot webhook + Event Subscriptions | + +Once connected, users can talk to agents from outside the canvas β€” and +agents can broadcast back. Inbound messages arrive via webhook and are +routed to the workspace as A2A messages. + +See [Channels](/docs/channels) for setup instructions. ## Schedules A **schedule** is a cron-driven recurring prompt. Each tick fires an A2A -message into the workspace, which the agent treats as a new task. Every -running schedule is supervised β€” panics in the dispatch path are -recovered with exponential backoff, and a liveness watchdog surfaces -stuck subsystems via `/admin/liveness`. +message into the workspace, which the agent treats as a new task. Schedules +are supervised β€” panics in the dispatch path are recovered with exponential +backoff, and a liveness watchdog surfaces stuck subsystems via +`/admin/liveness`. -Schedules let you build the *evolution* loop alongside the *review* -loop: hourly security audits, daily ecosystem watches, weekly plugin -curation, etc. +Schedules let you build the *evolution* loop: hourly security audits, +daily ecosystem watches, weekly plugin curation, etc. + +See [Schedules](/docs/schedules) for the full guide. + +## Tokens + +**Bearer tokens** authenticate agents and API clients. Each token is +scoped to a single workspace β€” a token from workspace A cannot access +workspace B. + +- Issued on first registration (`POST /registry/register`) +- Create/list/revoke via `GET/POST/DELETE /workspaces/:id/tokens` +- 256-bit entropy, sha256-hashed in DB, plaintext shown once + +See [Token Management](/docs/tokens) for the full guide. ## The canvas @@ -66,18 +105,19 @@ write, every scheduled fire, every status change pushes a WebSocket event in real time. The canvas isn't just a viewer β€” it's the operator surface. Drag nodes -to reorganise, click to chat, watch the team work. +to reorganise teams, click to chat, right-click for actions, watch the +team work in real time. ## How they fit together -A typical org definition looks like: +A typical org definition: ```yaml org_name: My Team defaults: runtime: claude-code tier: 2 - plugins: [ecc, molecule-dev, superpowers, molecule-careful-bash, ...] + plugins: [ecc, molecule-dev, superpowers, molecule-careful-bash] category_routing: security: [Backend Engineer] ui: [Frontend Engineer] @@ -85,10 +125,10 @@ defaults: workspaces: - name: PM canvas: { x: 400, y: 50 } - plugins: [molecule-workflow-triage, molecule-workflow-retro] + plugins: [molecule-workflow-triage] channels: - type: telegram - config: { ... } + config: { bot_token: "${TELEGRAM_BOT_TOKEN}", chat_id: "12345" } children: - name: Dev Lead children: @@ -100,6 +140,13 @@ workspaces: prompt: "Run npm run typecheck and report any new errors..." ``` -That's the entire mental model. Templates β†’ plugins β†’ channels β†’ -schedules β†’ canvas. Everything else in the docs is depth on one of -these five. +That's the mental model. Templates β†’ plugins β†’ channels β†’ schedules β†’ +tokens β†’ canvas. Everything else in the docs is depth on one of these +primitives. + +## MCP integration + +Any MCP-compatible AI agent can manage Molecule AI workspaces using the +[MCP Server](/docs/mcp-server) β€” 87 tools covering workspace CRUD, +communication, secrets, memory, files, schedules, channels, plugins, +and more. Install via `npx @molecule-ai/mcp-server`. diff --git a/content/docs/index.mdx b/content/docs/index.mdx index e47af39..c205876 100644 --- a/content/docs/index.mdx +++ b/content/docs/index.mdx @@ -9,6 +9,16 @@ multi-agent organisations. You define your team in one YAML file talk on, schedule their recurring work β€” and the platform takes care of the rest. +## Try it now + +| | | +|---|---| +| **Dashboard** | [app.moleculesai.app](https://app.moleculesai.app) β€” create orgs, deploy agents | +| **API** | [api.moleculesai.app](https://api.moleculesai.app) β€” control plane REST API | +| **Documentation** | [doc.moleculesai.app](https://doc.moleculesai.app) β€” you are here | +| **Status** | [status.moleculesai.app](https://status.moleculesai.app) β€” uptime monitoring | +| **Self-host** | [Self-Hosting Guide](/docs/self-hosting) β€” run on your own infrastructure | + ## What you can build - **Self-running engineering teams** β€” PM, Dev Lead, frontend / backend / devops @@ -19,23 +29,47 @@ rest. shared memory. - **Product orgs** β€” anything you can describe as a tree of roles and responsibilities. +- **Hybrid teams** β€” mix cloud-hosted agents with [external agents](/docs/external-agents) + running on your own infrastructure, edge devices, or other clouds. ## How it works 1. **Templates.** Describe your org as a YAML tree of workspaces. Each workspace - is a real container running an LLM agent. Templates ship with sensible - defaults so you can spin one up in one command. + is a real container running an LLM agent. Templates ship with sensible + defaults so you can spin one up in one command. 2. **Plugins.** Add capabilities to one role or all of them β€” guardrails, - skills, slash commands, browser automation, MCP servers. Plugins compose; - per-role overrides UNION with the defaults. -3. **Channels.** Connect any role to Telegram, Slack, email, or webhooks so - the user can talk to it directly. -4. **Schedules.** Define recurring work in cron syntax. The runtime fires the - prompt at the scheduled time, supervised against panics with a liveness - watchdog so a single bad input can't silently kill the loop. -5. **The canvas.** A live visualisation of your org β€” every workspace as a - node, every A2A message as an edge, every memory write tracked in real - time. + skills, slash commands, browser automation, MCP servers. Plugins compose; + per-role overrides UNION with the defaults. +3. **Channels.** Connect any role to [Telegram, Slack, or Lark/Feishu](/docs/channels) + so users can talk to agents directly from their existing tools. +4. **Schedules.** Define [recurring work](/docs/schedules) in cron syntax. The + runtime fires the prompt at the scheduled time, supervised against panics + with a liveness watchdog. +5. **Tokens.** Generate [API tokens](/docs/tokens) per workspace for secure + authentication. Rotate, revoke, and audit from the dashboard or API. +6. **The canvas.** A live visualisation of your org β€” every workspace as a + node, every A2A message as an edge, every memory write tracked in real time. + +## Eight runtime adapters + +| Runtime | Description | +|---------|-------------| +| Claude Code | Anthropic Claude with code execution | +| LangGraph | LangChain ReAct agent with tools | +| OpenClaw | Multi-file prompt system with SOUL | +| CrewAI | Role-based agent with task delegation | +| AutoGen | Microsoft conversable agents | +| DeepAgents | Deep research with planning | +| Hermes | NousResearch Hermes-3 multi-provider | +| Gemini CLI | Google Gemini CLI workspace | + +## Integrate with everything + +- **[MCP Server](/docs/mcp-server)** β€” 87 tools for managing Molecule AI from any + MCP-compatible AI agent (Claude Code, Cursor, etc.) +- **[Python SDK](https://pypi.org/project/molecule-ai-sdk)** β€” `pip install molecule-ai-sdk` +- **[External Agents](/docs/external-agents)** β€” register any HTTP agent as a + first-class workspace ## Where to next @@ -43,10 +77,7 @@ rest. agent in under five minutes. - Want the architecture tour? Start with [Concepts](/docs/concepts) and [Architecture](/docs/architecture). -- Ready to build your own org? Jump straight to - [Org Templates](/docs/org-template). - -> This documentation is maintained automatically by the -> Documentation Specialist agent in our own dogfood org. Every PR to the -> platform repo triggers a docs sync. Spot something stale? Open an issue or -> a PR β€” those signals reach the agent on its next cron tick. +- Ready to build your own org? Jump to [Org Templates](/docs/org-template). +- Want to connect your own agent? See [External Agents](/docs/external-agents). +- Need API access? Check [Token Management](/docs/tokens) and the + [API Reference](/docs/api-reference). diff --git a/content/docs/meta.json b/content/docs/meta.json index 86bb609..ba4220c 100644 --- a/content/docs/meta.json +++ b/content/docs/meta.json @@ -4,15 +4,15 @@ "index", "quickstart", "concepts", + "architecture", "org-template", "plugins", "channels", "schedules", "external-agents", - "architecture", + "tokens", "api-reference", "mcp-server", - "tokens", "self-hosting", "observability", "troubleshooting" diff --git a/content/docs/observability.mdx b/content/docs/observability.mdx index 3baefc5..d0fdaa6 100644 --- a/content/docs/observability.mdx +++ b/content/docs/observability.mdx @@ -1,11 +1,141 @@ --- title: Observability -description: Stub page β€” content coming soon. +description: Monitor agent activity, LLM traces, and platform health. --- -> 🚧 **Coming soon.** The Documentation Specialist agent will populate this -> page on its next maintenance cycle. +## Overview -If you need this content urgently, open an issue on the -[docs repo](https://github.com/Molecule-AI/docs/issues/new) and the agent -will prioritise it on its next cron tick. +Molecule AI provides multiple layers of observability -- from real-time WebSocket events on the canvas to structured activity logs, LLM traces, Prometheus metrics, and admin health endpoints. + +## Activity Logs + +Every significant action in the platform is recorded in the `activity_logs` table. Query logs for a specific workspace: + +``` +GET /workspaces/:id/activity +``` + +Activity types include: + +- **A2A communications** -- request/response capture with duration and method +- **Task updates** -- agent-reported task status changes +- **Agent logs** -- structured log entries from workspace runtimes +- **Errors** -- failures with `error_detail` for debugging + +Filter by source to separate user-agent chat (`source=canvas`) from agent-to-agent traffic (`source=agent`). + +Activity logs are automatically cleaned up based on `ACTIVITY_RETENTION_DAYS` (default 7). The cleanup job runs every `ACTIVITY_CLEANUP_INTERVAL_HOURS` (default 6). + +## LLM Traces + +Molecule AI integrates with [Langfuse](https://langfuse.com) for LLM observability. Langfuse runs as part of the infrastructure stack on port 3001, backed by ClickHouse for efficient trace storage. + +View traces for a specific workspace: + +``` +GET /workspaces/:id/traces +``` + +The Langfuse UI at `http://localhost:3001` provides: + +- Token usage and cost tracking per workspace +- Latency breakdowns for LLM calls +- Prompt/completion pairs for debugging +- Trace timelines showing multi-step agent reasoning + +## Prometheus Metrics + +The platform exposes Prometheus-format metrics at: + +``` +GET /metrics +``` + +This endpoint requires no authentication and is safe to scrape. Metrics are in Prometheus text format (v0.0.4) and include: + +- Request counts by method, path, and status code +- Request latency histograms +- Active WebSocket connections +- Workspace status counts + +Configure your Prometheus instance to scrape `http://localhost:8080/metrics` at your preferred interval. + +## Admin Liveness + +The liveness endpoint reports the health of every supervised subsystem: + +``` +GET /admin/liveness +``` + +This endpoint requires `AdminAuth` (bearer token). It returns a `supervised.Snapshot()` for each subsystem with ages -- how long since each subsystem last reported healthy. Use this to debug stuck schedulers, stalled heartbeat goroutines, or unresponsive health sweeps before diving into logs. + +## WebSocket Events + +The canvas receives real-time updates via WebSocket at `/ws`. Every state change in the platform is broadcast to connected clients: + +| Event | Trigger | +|-------|---------| +| `WORKSPACE_ONLINE` | Workspace registers successfully | +| `WORKSPACE_OFFLINE` | Heartbeat TTL expires or health sweep detects dead container | +| `WORKSPACE_DEGRADED` | Error rate exceeds threshold | +| `WORKSPACE_RECOVERED` | Error rate drops back to normal | +| `WORKSPACE_REMOVED` | Workspace deleted | +| `HEARTBEAT` | Periodic heartbeat from workspace | +| `A2A_RESPONSE` | Agent-to-agent message received | +| `AGENT_MESSAGE` | Agent pushes a message to the user | + +Events flow through Redis pub/sub to ensure all platform instances broadcast consistently. + +## Structure Events + +The `structure_events` table is an append-only audit log of every structural change in the platform. Each event is: + +1. Inserted into the database via `broadcaster.RecordAndBroadcast()` +2. Published to Redis pub/sub +3. Relayed to WebSocket clients + +Query events for a specific workspace or globally: + +``` +GET /events/:workspaceId # Workspace-specific +GET /events # All events +``` + +Both endpoints require `AdminAuth`. + +## Session Search + +Search through chat history for a workspace: + +``` +GET /workspaces/:id/session-search?q=deployment+error +``` + +This searches across both user-agent conversations and agent-to-agent A2A traffic stored in the activity logs. + +## Current Task Visibility + +Each workspace reports its current task via heartbeat. This is visible in two places: + +- **Canvas node** -- the workspace card on the canvas shows the current task text +- **Heartbeat data** -- `GET /registry/discover/:id` includes `current_task` in the workspace info + +When `active_tasks` drops to zero, the current task field clears and the idle loop (if configured) begins its countdown. + +## Schedule Run History + +For workspaces with cron schedules, inspect past runs: + +``` +GET /workspaces/:id/schedules/:scheduleId/history +``` + +Each history entry includes: + +- Execution timestamp +- Status (`success`, `failed`, `skipped`) +- Duration +- `error_detail` when the run failed (populated by `scheduler.fireSchedule`) + +A status of `skipped` means the workspace was busy (active tasks > 0) when the schedule fired and the concurrency-aware scheduler chose not to queue the prompt. diff --git a/content/docs/org-template.mdx b/content/docs/org-template.mdx index ba693e6..fec8929 100644 --- a/content/docs/org-template.mdx +++ b/content/docs/org-template.mdx @@ -1,11 +1,166 @@ --- -title: Org Template -description: Stub page β€” content coming soon. +title: Org Templates +description: Deploy entire multi-workspace organizations from a single YAML file. --- -> 🚧 **Coming soon.** The Documentation Specialist agent will populate this -> page on its next maintenance cycle. +## Overview -If you need this content urgently, open an issue on the -[docs repo](https://github.com/Molecule-AI/docs/issues/new) and the agent -will prioritise it on its next cron tick. +Org templates let you define an entire agent organization -- hierarchy of workspaces with roles, configurations, and relationships -- in a single YAML file. Import one template and the platform provisions every workspace, wires parent-child relationships, seeds schedules, and installs plugins automatically. + +## YAML Structure + +A minimal org template looks like this: + +```yaml +org_name: molecule-dev + +defaults: + runtime: claude-code + tier: 2 + plugins: + - molecule-dev + - molecule-careful-bash + +workspaces: + pm: + name: Project Manager + role: PM + tier: 3 + children: + dev-lead: + name: Dev Lead + children: + backend: + name: Backend Engineer + frontend: + name: Frontend Engineer + marketing: + name: Marketing Specialist + runtime: langgraph +``` + +The `workspaces` map defines the hierarchy. Each key becomes the workspace's slug. Nesting under `children` sets the parent-child relationship automatically. + +## Workspace Fields + +Each workspace entry supports the following fields: + +| Field | Type | Description | +|-------|------|-------------| +| `name` | string | Display name shown on the canvas | +| `role` | string | Agent role (e.g. PM, Engineer, Researcher) | +| `runtime` | string | Runtime adapter (`claude-code`, `langgraph`, `crewai`, etc.) | +| `tier` | integer | Resource tier (2 = Standard, 3 = Privileged, 4 = Full-host) | +| `workspace_dir` | string | Host path for `/workspace` bind-mount | +| `plugins` | list | Plugins to install on this workspace | +| `initial_prompt` | string | Prompt auto-executed after A2A server is ready | +| `idle_prompt` | string | Prompt fired periodically while workspace is idle | +| `idle_interval_seconds` | integer | Interval for idle prompt (default 600, minimum 60) | +| `channels` | list | Social channel integrations (Telegram, Slack, etc.) | +| `schedules` | list | Cron schedules seeded on import | +| `x` | number | Canvas X coordinate | +| `y` | number | Canvas Y coordinate | +| `children` | map | Nested child workspaces | + +## Defaults Layer + +The `defaults` block sets baseline values for every workspace in the template. Per-workspace fields override defaults when specified. + +**Plugin merging is additive.** Per-workspace `plugins` lists UNION with `defaults.plugins` (deduplicated, defaults first) -- they do not replace them. To opt a specific default plugin out for a given workspace, prefix the plugin name with `!` or `-`: + +```yaml +defaults: + plugins: + - molecule-dev + - molecule-careful-bash + - browser-automation + +workspaces: + backend: + name: Backend Engineer + plugins: + - molecule-skill-code-review # added + - "!browser-automation" # opted out of default +``` + +In this example, the backend workspace gets `molecule-dev`, `molecule-careful-bash`, and `molecule-skill-code-review` -- but not `browser-automation`. + +## Template Registry + +Five org templates live in standalone repos under the `Molecule-AI` GitHub organization: + +| Template | Repo | +|----------|------| +| molecule-dev | `Molecule-AI/molecule-ai-org-template-molecule-dev` | +| marketing-team | `Molecule-AI/molecule-ai-org-template-marketing-team` | +| research-lab | `Molecule-AI/molecule-ai-org-template-research-lab` | +| startup-mvp | `Molecule-AI/molecule-ai-org-template-startup-mvp` | +| enterprise-ops | `Molecule-AI/molecule-ai-org-template-enterprise-ops` | + +These are cloned into the platform image at Docker build time and registered in the `template_registry` database table. + +## Importing an Org Template + +### Via API + +```bash +curl -X POST http://localhost:8080/org/import \ + -H "Content-Type: application/json" \ + -H "Authorization: Bearer $TOKEN" \ + -d '{"dir": "molecule-dev"}' +``` + +The `POST /org/import` endpoint requires `AdminAuth` (bearer token). The `dir` field references a template directory name from the registry. + +### Via Canvas + +Open the template browser in the canvas sidebar and select an org template. The UI calls the same API endpoint. + +## Initial Prompts + +Workspaces can auto-execute a prompt on startup before any user interaction. Set `initial_prompt` as an inline string or point `initial_prompt_file` to a path relative to the config directory. + +After the A2A server is ready, the runtime sends the prompt as a `message/send` to itself. A `.initial_prompt_done` marker file prevents re-execution on restart. + +**Important:** Initial prompts must NOT send A2A messages (`delegate_task`, `send_message_to_user`) because other agents may not be ready yet. Keep them local: clone a repo, read docs, save to memory, wait for tasks. + +Org templates support `initial_prompt` on both `defaults` (all agents) and per-workspace (overrides default). + +## Idle Loop + +The idle loop is an opt-in pattern for workspaces that should do periodic background work when they have no active tasks. + +When `idle_prompt` is non-empty in the workspace config, the runtime self-sends the prompt every `idle_interval_seconds` (default 600) while `heartbeat.active_tasks == 0`. The fire timeout clamps to `max(60, min(300, idle_interval_seconds))`. + +Set per-workspace or as an org template default: + +```yaml +defaults: + idle_prompt: "Check for new issues and update your task list." + idle_interval_seconds: 300 +``` + +The idle check is local (no LLM call) and the prompt only fires when there is genuinely nothing to do, so cost collapses to event-driven. + +## Canvas Positioning + +Use `x` and `y` fields to control where workspaces appear on the drag-and-drop canvas after import: + +```yaml +workspaces: + pm: + name: Project Manager + x: 400 + y: 100 + children: + dev: + name: Developer + x: 200 + y: 300 + researcher: + name: Researcher + x: 600 + y: 300 +``` + +If coordinates are omitted, the canvas auto-layouts new workspaces. diff --git a/content/docs/plugins.mdx b/content/docs/plugins.mdx index 2ba9a04..dee2a35 100644 --- a/content/docs/plugins.mdx +++ b/content/docs/plugins.mdx @@ -1,11 +1,267 @@ --- title: Plugins -description: Stub page β€” content coming soon. +description: Extend workspace capabilities with modular plugins β€” guardrails, skills, workflows. --- -> 🚧 **Coming soon.** The Documentation Specialist agent will populate this -> page on its next maintenance cycle. +## Overview -If you need this content urgently, open an issue on the -[docs repo](https://github.com/Molecule-AI/docs/issues/new) and the agent -will prioritise it on its next cron tick. +Plugins are installable capability bundles that extend what a workspace can do. +They range from ambient guardrails that enforce rules automatically, to +on-demand skills invoked via the `Skill` tool, to workflow plugins that +compose skills into slash commands. + +Plugins follow a **two-axis model**: the *source* (where the plugin comes from) +is orthogonal to the *shape* (what format it takes). This means you can install +a plugin from a local registry or from GitHub, and the workspace runtime +figures out how to load it based on its shape. + +--- + +## Two-Axis Model + +### Sources (where) + +| Scheme | Description | Example | +|--------|-------------|---------| +| `local://` | Platform's curated plugin registry (auto-discovered from the `plugins/` directory) | `local://molecule-careful-bash` | +| `github://` | Public GitHub repo (shallow clone at install time) | `github://owner/repo` | +| `github://` (pinned) | GitHub repo at a specific ref | `github://owner/repo#v1.2.0` | + +Use `GET /plugins/sources` to list all registered install-source schemes at +runtime. + +### Shapes (what) + +| Shape | Description | +|-------|-------------| +| agentskills.io format | `SKILL.md` + optional scripts, hooks, and `plugin.yaml` manifest | +| MCP server | Model Context Protocol server (coming soon for more runtimes) | + +The shape is orthogonal to the source. A `github://` plugin and a `local://` +plugin can both be agentskills.io format. The per-runtime adapter inside the +workspace handles loading at startup. + +--- + +## Installing a Plugin + +```bash +curl -X POST http://localhost:8080/workspaces/{id}/plugins \ + -H "Content-Type: application/json" \ + -H "Authorization: Bearer {token}" \ + -d '{"source": "local://molecule-careful-bash"}' +``` + +From GitHub: + +```bash +curl -X POST http://localhost:8080/workspaces/{id}/plugins \ + -H "Content-Type: application/json" \ + -H "Authorization: Bearer {token}" \ + -d '{"source": "github://Molecule-AI/molecule-plugin-careful-bash"}' +``` + +The platform resolves the source, stages the plugin files, copies them into the +workspace container at `/configs/plugins//`, and triggers an automatic +workspace restart so the runtime picks up the new plugin. + +--- + +## Uninstalling a Plugin + +```bash +curl -X DELETE http://localhost:8080/workspaces/{id}/plugins/{name} \ + -H "Authorization: Bearer {token}" +``` + +Uninstall removes the plugin directory, cleans up copied skill directories and +rule markers from `CLAUDE.md`, and triggers an automatic workspace restart. + +--- + +## Listing Plugins + +### Platform Registry + +List all available plugins in the platform registry: + +```bash +# All plugins +curl http://localhost:8080/plugins + +# Filtered by runtime +curl http://localhost:8080/plugins?runtime=claude-code +``` + +Plugins with no declared `runtimes` field in their manifest are treated as +"unspecified, try it" and included in filtered results. + +### Available for a Workspace + +Returns plugins filtered to those supported by the workspace's current runtime: + +```bash +curl http://localhost:8080/workspaces/{id}/plugins/available \ + -H "Authorization: Bearer {token}" +``` + +### Installed on a Workspace + +```bash +curl http://localhost:8080/workspaces/{id}/plugins \ + -H "Authorization: Bearer {token}" +``` + +Each installed plugin is annotated with whether it still supports the +workspace's current runtime. This lets the canvas grey out plugins that went +inert after a runtime change. + +--- + +## Runtime Compatibility Check + +Before changing a workspace's runtime, check which installed plugins would +become incompatible: + +```bash +curl "http://localhost:8080/workspaces/{id}/plugins/compatibility?runtime=langgraph" \ + -H "Authorization: Bearer {token}" +``` + +Response: + +```json +{ + "target_runtime": "langgraph", + "compatible": [...], + "incompatible": [...], + "all_compatible": false +} +``` + +The canvas uses this to show a confirmation dialog before applying a runtime +change. + +--- + +## Built-in Plugins + +### Hook Plugins (ambient enforcement) + +These fire automatically via the harness layer. No explicit invocation needed. + +| Plugin | Purpose | +|--------|---------| +| `molecule-careful-bash` | Refuses `git push --force` to main, `rm -rf` at root, `DROP TABLE` against prod schema. Ships the `careful-mode` skill as documentation. | +| `molecule-freeze-scope` | Locks edits to a single path glob via `.claude/freeze`. Useful while debugging. | +| `molecule-audit-trail` | Appends every Edit/Write to `.claude/audit.jsonl` for accountability. | +| `molecule-session-context` | Auto-loads recent cron-learnings and open PR/issue counts at session start. | +| `molecule-prompt-watchdog` | Injects warning context when the prompt mentions destructive keywords. | + +### Skill Plugins (on-demand) + +Invoked explicitly via the `Skill` tool during a conversation. + +| Plugin | Purpose | +|--------|---------| +| `molecule-skill-code-review` | 16-criteria multi-axis code review rubric. | +| `molecule-skill-cross-vendor-review` | Adversarial second-model review for noteworthy PRs. | +| `molecule-skill-llm-judge` | Score whether a deliverable addresses the original request. | +| `molecule-skill-update-docs` | Sync repo docs after merges. | +| `molecule-skill-cron-learnings` | Defines the operational-memory JSONL format. | + +### Workflow Plugins (slash commands) + +Compose skills into repeatable multi-step workflows. + +| Plugin | Command | Purpose | +|--------|---------|---------| +| `molecule-workflow-triage` | `/triage` | Full PR-triage cycle (gates 1-7 + code-review + merge if green). | +| `molecule-workflow-retro` | `/retro` | Weekly retrospective issue. | + +### Shared Plugins + +Loaded by default from the `plugins/` directory at the repo root. + +| Plugin | Purpose | +|--------|---------| +| `molecule-dev` | Codebase conventions (rules injected into CLAUDE.md) + `review-loop` skill. | +| `superpowers` | `verification-before-completion`, `test-driven-development`, `systematic-debugging`, `writing-plans`. | +| `ecc` | General Claude Code guardrails. | +| `browser-automation` | Puppeteer/CDP-based web scraping and live canvas screenshots. Opt-in per workspace. | + +--- + +## Org Template Plugin Resolution + +When deploying an org template, per-workspace `plugins:` lists in `org.yaml` +role overrides **UNION** with `defaults.plugins` (deduplicated, defaults first). +They do not replace them. + +To opt a specific default out for a given role or workspace, prefix the plugin +name with `!` or `-`: + +```yaml +defaults: + plugins: + - molecule-careful-bash + - molecule-audit-trail + - superpowers + +workspaces: + researcher: + role: "Research Analyst" + plugins: + - browser-automation # added on top of defaults + - "!superpowers" # opted out of superpowers +``` + +Result for the `researcher` workspace: +`molecule-careful-bash`, `molecule-audit-trail`, `browser-automation` + +--- + +## Install Safeguards + +Environment variables that bound the cost of a single plugin install: + +| Variable | Default | Description | +|----------|---------|-------------| +| `PLUGIN_INSTALL_BODY_MAX_BYTES` | `65536` (64 KiB) | Max request body size | +| `PLUGIN_INSTALL_FETCH_TIMEOUT` | `5m` | Whole fetch + copy deadline | +| `PLUGIN_INSTALL_MAX_DIR_BYTES` | `104857600` (100 MiB) | Max staged-tree size | + +These prevent a slow or malicious source from tying up a handler goroutine or +exhausting disk space. + +--- + +## Plugin Download (External Workspaces) + +External workspaces (those running outside Docker) can pull plugins as gzipped +tarballs: + +```bash +curl http://localhost:8080/workspaces/{id}/plugins/{name}/download \ + -H "Authorization: Bearer {token}" \ + -o plugin.tar.gz +``` + +An optional `?source=github://owner/repo` query parameter lets external +workspaces pull from upstream repos without the platform pre-staging them. +Defaults to `local://` when omitted. + +--- + +## API Reference + +| Method | Path | Description | +|--------|------|-------------| +| GET | `/plugins` | List plugin registry (supports `?runtime=` filter) | +| GET | `/plugins/sources` | List registered install-source schemes | +| GET | `/workspaces/:id/plugins` | List installed plugins | +| POST | `/workspaces/:id/plugins` | Install a plugin (`{"source": "scheme://spec"}`) | +| DELETE | `/workspaces/:id/plugins/:name` | Uninstall a plugin | +| GET | `/workspaces/:id/plugins/available` | Available plugins filtered by workspace runtime | +| GET | `/workspaces/:id/plugins/compatibility?runtime=X` | Preflight runtime-change compatibility check | +| GET | `/workspaces/:id/plugins/:name/download` | Download plugin as tarball (external workspaces) | diff --git a/content/docs/quickstart.mdx b/content/docs/quickstart.mdx index feddada..c8b4712 100644 --- a/content/docs/quickstart.mdx +++ b/content/docs/quickstart.mdx @@ -9,28 +9,52 @@ using the bundled `molecule-dev` template. ## Prerequisites - Docker Desktop (or any Docker daemon) running locally -- Go 1.25+ and Node 22+ if you want to build the platform from source -- A Claude API key (`CLAUDE_CODE_OAUTH_TOKEN`) in your environment +- Go 1.25+ and Node 20+ if building from source +- An LLM API key (Claude, OpenRouter, or Gemini) -## 1. Clone the monorepo +## Option A: One-command start (recommended) ```bash git clone https://github.com/Molecule-AI/molecule-monorepo.git cd molecule-monorepo +./scripts/dev-start.sh ``` -## 2. Boot the platform +This starts everything: Postgres, Redis, Platform (Go on `:8080`), and +Canvas (Next.js on `:3000`). Press `Ctrl-C` to stop all services. + +## Option B: Docker Compose ```bash -docker compose up -d --build platform canvas +git clone https://github.com/Molecule-AI/molecule-monorepo.git +cd molecule-monorepo +docker compose up -d ``` -This starts: -- **platform** (Go API on `localhost:8080`) -- **canvas** (Next.js 15 frontend on `localhost:3000`) -- **postgres** + **redis** for state and pub/sub +This starts the full stack including Langfuse (`:3001`) and Temporal (`:8233`). -## 3. Import the dev template +## Option C: Manual setup + +```bash +# 1. Start infrastructure +./infra/scripts/setup.sh # Postgres, Redis, Langfuse, Temporal + +# 2. Start platform +cd platform && go run ./cmd/server # API on :8080 + +# 3. Start canvas (new terminal) +cd canvas && npm install && npm run dev # UI on :3000 +``` + +## 2. Open the canvas + +Navigate to [http://localhost:3000](http://localhost:3000). You should see +the empty state with template cards. + +## 3. Deploy from a template + +Click any template card to deploy a workspace instantly. Or import a full +org template: ```bash curl -X POST http://localhost:8080/org/import \ @@ -42,15 +66,10 @@ This provisions the 12-workspace dev team β€” PM, Research Lead and 3 researchers, Dev Lead and 5 engineers, plus Security/QA/UIUX auditors β€” each as its own Docker container. -## 4. Open the canvas +## 4. Talk to PM -Navigate to [http://localhost:3000](http://localhost:3000). You should -see your team rendered as a tree of nodes. Click any node to chat with -that agent directly. - -## 5. Talk to PM - -PM is the entry point. Send it a task: +PM is the entry point. Click the PM node on the canvas, open the Chat tab, +and send a task: > *"Add a 'Last seen' column to the user list table on the admin page."* @@ -58,17 +77,40 @@ PM will break the request into specific assignments, fan them out to the right leads in parallel, verify the results, and report back when the work is shipped. +## 5. Set up secrets + +Most agents need an LLM API key. Set it as a global secret so all +workspaces inherit it: + +```bash +curl -X PUT http://localhost:8080/settings/secrets \ + -H 'Content-Type: application/json' \ + -d '{"key":"ANTHROPIC_API_KEY","value":"sk-ant-..."}' +``` + +Or use the Settings panel (gear icon) in the canvas to manage secrets +per workspace. + ## What just happened -You spun up a self-organising engineering team in one command. They're -clones of real Claude Code agents β€” they can read your codebase, run -tests, open PRs to GitHub. Their schedules (security audit, UX audit, -template fitness checks) run hourly on their own. +You spun up a self-organising engineering team. They're real agents β€” they +can read your codebase, run tests, open PRs to GitHub. Their schedules +(security audit, UX audit, template fitness checks) run hourly on their own. + +## Using the SaaS instead + +Don't want to self-host? Use the cloud platform directly: + +1. Go to [app.moleculesai.app](https://app.moleculesai.app) +2. Sign up and create an organization +3. Your tenant is provisioned at `.moleculesai.app` +4. Deploy agents from templates β€” same experience, zero infrastructure ## Next steps -- Customise the [Org Template](/docs/org-template) to match your team's - actual structure. -- Add [Plugins](/docs/plugins) to give specific roles new capabilities. +- Customise the [Org Template](/docs/org-template) to match your team. +- Add [Plugins](/docs/plugins) to give roles new capabilities. - Wire a [Channel](/docs/channels) so you can talk to PM from Telegram. +- Connect your own agents with [External Agents](/docs/external-agents). +- Generate [API Tokens](/docs/tokens) for programmatic access. - Read about the [Architecture](/docs/architecture) under the hood. diff --git a/content/docs/schedules.mdx b/content/docs/schedules.mdx index 3e0e8d1..cd82f07 100644 --- a/content/docs/schedules.mdx +++ b/content/docs/schedules.mdx @@ -1,11 +1,298 @@ --- title: Schedules -description: Stub page β€” content coming soon. +description: Run recurring prompts on cron schedules β€” automated audits, reports, and maintenance. --- -> 🚧 **Coming soon.** The Documentation Specialist agent will populate this -> page on its next maintenance cycle. +## Overview -If you need this content urgently, open an issue on the -[docs repo](https://github.com/Molecule-AI/docs/issues/new) and the agent -will prioritise it on its next cron tick. +Schedules let you run recurring prompts against a workspace on a cron schedule. +Each tick fires an A2A `message/send` into the workspace, so the agent +processes the prompt as if it received a normal message. This enables automated +audits, daily reports, weekly retrospectives, and any other recurring task. + +The scheduler polls the `workspace_schedules` table every 30 seconds. When a +schedule's `next_run_at` has passed, the scheduler fires the prompt and +computes the next run time. + +``` +Scheduler (30s poll) ──> workspace_schedules table + β”‚ + next_run_at <= now? + β”‚ + β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” + β”‚ A2A message/send │──> Workspace Agent + β”‚ (callerID=system: β”‚ + β”‚ scheduler) β”‚ + β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ +``` + +--- + +## Creating a Schedule + +```bash +curl -X POST http://localhost:8080/workspaces/{id}/schedules \ + -H "Content-Type: application/json" \ + -H "Authorization: Bearer {token}" \ + -d '{ + "name": "Daily Security Audit", + "cron_expr": "0 9 * * *", + "timezone": "America/New_York", + "prompt": "Run a security audit of all open PRs. Check for leaked secrets, SQL injection, and auth bypass.", + "enabled": true + }' +``` + +**Required fields:** + +| Field | Type | Description | +|-------|------|-------------| +| `cron_expr` | string | Standard cron expression (5-field: minute, hour, day-of-month, month, day-of-week) | +| `prompt` | string | The text sent to the workspace as an A2A message each tick | + +**Optional fields:** + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `name` | string | `""` | Human-readable label | +| `timezone` | string | `"UTC"` | IANA timezone for cron evaluation (e.g. `America/New_York`, `Asia/Tokyo`) | +| `enabled` | bool | `true` | Whether the schedule fires | + +The timezone is validated against Go's `time.LoadLocation` on create and update. +The cron expression is validated and the next run time is computed immediately. + +--- + +## CRUD Operations + +| Method | Path | Description | +|--------|------|-------------| +| GET | `/workspaces/:id/schedules` | List all schedules for a workspace | +| POST | `/workspaces/:id/schedules` | Create a new schedule | +| PATCH | `/workspaces/:id/schedules/:scheduleId` | Update a schedule (partial update via COALESCE) | +| DELETE | `/workspaces/:id/schedules/:scheduleId` | Delete a schedule | + +### Update + +PATCH accepts any subset of fields. Only provided fields are changed β€” the +handler uses `COALESCE` in SQL so omitted fields retain their current values. +If `cron_expr` or `timezone` changes, the next run time is recomputed. + +```bash +curl -X PATCH http://localhost:8080/workspaces/{id}/schedules/{scheduleId} \ + -H "Content-Type: application/json" \ + -H "Authorization: Bearer {token}" \ + -d '{"enabled": false}' +``` + +### Delete + +```bash +curl -X DELETE http://localhost:8080/workspaces/{id}/schedules/{scheduleId} \ + -H "Authorization: Bearer {token}" +``` + +All schedule operations are scoped to the owning workspace ID to prevent IDOR. + +--- + +## Manual Trigger + +Fire a schedule immediately, outside its cron cadence: + +```bash +curl -X POST http://localhost:8080/workspaces/{id}/schedules/{scheduleId}/run \ + -H "Authorization: Bearer {token}" +``` + +Returns the schedule's prompt so the frontend can POST it to +`/workspaces/:id/a2a`. This keeps the handler stateless. + +--- + +## Run History + +View the last 20 runs for a schedule, including error details for failed runs: + +```bash +curl http://localhost:8080/workspaces/{id}/schedules/{scheduleId}/history \ + -H "Authorization: Bearer {token}" +``` + +Response: + +```json +[ + { + "timestamp": "2026-04-16T09:00:02Z", + "duration_ms": 4523, + "status": "success", + "error_detail": "", + "request": {"schedule_id": "...", "prompt": "..."} + }, + { + "timestamp": "2026-04-15T09:00:01Z", + "duration_ms": null, + "status": "error", + "error_detail": "A2A proxy returned 503: workspace container not running", + "request": {"schedule_id": "...", "prompt": "..."} + } +] +``` + +History is pulled from the `activity_logs` table filtered by +`activity_type = 'cron_run'` and the schedule ID in the request body. + +--- + +## Source Field + +Each schedule has a `source` field that tracks how it was created: + +| Value | Meaning | +|-------|---------| +| `template` | Seeded by an org template import or bundle import. On re-import, only `template`-source rows are refreshed β€” `runtime` rows survive. | +| `runtime` | Created via the Canvas UI or API. These are user-owned and never overwritten by re-imports. | + +--- + +## Status Values + +The `last_status` field on a schedule tracks the outcome of the most recent +run: + +| Status | Meaning | +|--------|---------| +| `success` | The A2A message was delivered and the workspace acknowledged it. | +| `error` | The A2A proxy returned a non-2xx status. `last_error` contains details. | +| `skipped` | The workspace was busy (concurrency-aware skip). The scheduler detected `active_tasks > 0` and deferred the run to avoid overloading the agent. | + +--- + +## Schedule Health Endpoint + +Peer workspaces can monitor each other's schedule health without admin auth: + +```bash +curl http://localhost:8080/workspaces/{id}/schedules/health \ + -H "X-Workspace-ID: {callerWorkspaceId}" \ + -H "Authorization: Bearer {callerToken}" +``` + +This endpoint returns execution-state fields only (`last_run_at`, +`last_status`, `run_count`, `next_run_at`, `last_error`). It deliberately +omits `prompt` and `cron_expr` so sensitive task content is never exposed to +peer workspaces. + +**Auth rules** (mirrors the A2A proxy pattern): +- `X-Workspace-ID` header required to identify the caller +- Caller's own bearer token validated (legacy workspaces grandfathered) +- `registry.CanCommunicate(callerID, workspaceID)` must return true +- System callers (`system:*`, `webhook:*`, `test:*`) bypass checks +- Self-calls always allowed + +--- + +## Scheduler Internals + +### Poll Loop + +The scheduler runs a 30-second poll loop. Each tick: + +1. Queries up to 50 due schedules (`next_run_at <= now AND enabled = true`) +2. Fires up to 10 concurrently via a semaphore +3. Each fire sends an A2A `message/send` with a 5-minute timeout +4. Updates `last_run_at`, `run_count`, `last_status`, and `next_run_at` +5. Logs the run to `activity_logs` with `activity_type = 'cron_run'` + +### Panic Recovery + +The scheduler recovers from panics inside the tick function. A single bad row, +malformed cron expression, or database blip cannot permanently kill the +scheduler. Without this recovery, the goroutine dies silently and the only +signal is "no crons firing." + +### Liveness Watchdog + +The scheduler reports heartbeats to the `supervised` subsystem. The +`/admin/liveness` endpoint exposes per-subsystem ages, so operators can detect +a stuck scheduler before it causes a missed-cron outage. + +`Scheduler.Healthy()` returns true if the scheduler has completed a tick within +the last 60 seconds (2x the poll interval). Returns false before the first tick +or if the scheduler is stalled. + +--- + +## Examples + +### Hourly Security Audit + +```json +{ + "name": "Hourly Security Scan", + "cron_expr": "0 * * * *", + "timezone": "UTC", + "prompt": "Scan all open PRs for leaked secrets, SQL injection patterns, and auth bypass vulnerabilities. Report findings as a summary." +} +``` + +### Daily Standup Report + +```json +{ + "name": "Daily Standup", + "cron_expr": "0 9 * * 1-5", + "timezone": "America/Los_Angeles", + "prompt": "Generate a standup report: what was completed yesterday, what is planned today, and any blockers. Post to the team channel." +} +``` + +### Weekly Retrospective + +```json +{ + "name": "Weekly Retro", + "cron_expr": "0 17 * * 5", + "timezone": "America/New_York", + "prompt": "Write a weekly retrospective covering PRs merged, issues closed, cron failures, and code review findings. Post as a GitHub issue." +} +``` + +### Nightly Cleanup + +```json +{ + "name": "Nightly Cleanup", + "cron_expr": "0 2 * * *", + "timezone": "UTC", + "prompt": "Archive stale branches older than 30 days. Close issues that have been inactive for 60 days with a comment explaining the auto-close policy.", + "enabled": true +} +``` + +--- + +## Timezone Handling + +All cron expressions are evaluated in the specified timezone. If no timezone is +provided, `UTC` is used. The timezone must be a valid IANA timezone string +(e.g. `America/New_York`, `Europe/London`, `Asia/Tokyo`). + +When a schedule's `cron_expr` or `timezone` is updated, the `next_run_at` is +immediately recomputed using the new values. This prevents schedules from +firing at unexpected times after a timezone change. + +--- + +## API Reference + +| Method | Path | Description | +|--------|------|-------------| +| GET | `/workspaces/:id/schedules` | List schedules | +| POST | `/workspaces/:id/schedules` | Create schedule | +| PATCH | `/workspaces/:id/schedules/:scheduleId` | Update schedule | +| DELETE | `/workspaces/:id/schedules/:scheduleId` | Delete schedule | +| POST | `/workspaces/:id/schedules/:scheduleId/run` | Manual trigger | +| GET | `/workspaces/:id/schedules/:scheduleId/history` | Run history (last 20) | +| GET | `/workspaces/:id/schedules/health` | Health view (open to peers) | diff --git a/content/docs/self-hosting.mdx b/content/docs/self-hosting.mdx index be8b1f6..b3a5eec 100644 --- a/content/docs/self-hosting.mdx +++ b/content/docs/self-hosting.mdx @@ -1,11 +1,199 @@ --- -title: Self Hosting -description: Stub page β€” content coming soon. +title: Self-Hosting +description: Run the full Molecule AI stack on your own infrastructure. --- -> 🚧 **Coming soon.** The Documentation Specialist agent will populate this -> page on its next maintenance cycle. +## Prerequisites -If you need this content urgently, open an issue on the -[docs repo](https://github.com/Molecule-AI/docs/issues/new) and the agent -will prioritise it on its next cron tick. +| Requirement | Minimum Version | +|-------------|----------------| +| Docker Desktop | Latest stable | +| Go | 1.25+ | +| Node.js | 20+ | +| Git | 2.x | + +## Quick Start + +The fastest way to get Molecule AI running locally: + +```bash +git clone https://github.com/Molecule-AI/molecule-monorepo.git +cd molecule-monorepo +./scripts/dev-start.sh +# Canvas: http://localhost:3000 +# Platform: http://localhost:8080 +``` + +This script starts all infrastructure services, builds the platform, and launches the canvas dev server. + +## Infrastructure Setup + +Molecule AI depends on four infrastructure services, all managed via `docker-compose.infra.yml` and attached to the shared `molecule-monorepo-net` Docker network: + +| Service | Port | Purpose | +|---------|------|---------| +| Postgres | 5432 | Primary datastore (also backs Langfuse and Temporal) | +| Redis | 6379 | Pub/sub, heartbeat TTLs | +| Langfuse | 3001 | LLM trace viewer (backed by ClickHouse) | +| Temporal | 7233 (gRPC), 8233 (Web UI) | Durable workflow engine | + +Start infrastructure only: + +```bash +./infra/scripts/setup.sh +``` + +Tear everything down (removes volumes): + +```bash +./infra/scripts/nuke.sh +``` + +## Manual Setup + +If you prefer to start each component individually: + +### Platform (Go) + +```bash +cd platform +go build ./cmd/server +go run ./cmd/server +# Requires Postgres + Redis running +``` + +The platform must be run from the `platform/` directory, not the repo root. + +### Canvas (Next.js) + +```bash +cd canvas +npm install +npm run dev +# Dev server on http://localhost:3000 +``` + +### Docker Compose + +For infrastructure only: + +```bash +docker compose -f docker-compose.infra.yml up -d +``` + +For the full stack (infrastructure + platform + canvas): + +```bash +docker compose up +``` + +## Environment Variables + +### Platform + +| Variable | Default | Description | +|----------|---------|-------------| +| `DATABASE_URL` | -- | Postgres connection string (required) | +| `REDIS_URL` | -- | Redis connection string (required) | +| `PORT` | `8080` | Platform HTTP port | +| `PLATFORM_URL` | `http://host.docker.internal:PORT` | URL passed to agent containers to reach the platform | +| `CORS_ORIGINS` | `http://localhost:3000,http://localhost:3001` | Comma-separated allowed origins | +| `SECRETS_ENCRYPTION_KEY` | -- | AES-256 key (32 bytes) for encrypting workspace secrets | +| `WORKSPACE_DIR` | -- | Global fallback host path for `/workspace` bind-mount | +| `MOLECULE_ENV` | -- | Set to `production` to hide E2E helper endpoints | +| `ACTIVITY_RETENTION_DAYS` | `7` | How long activity logs are retained | +| `ACTIVITY_CLEANUP_INTERVAL_HOURS` | `6` | How often the cleanup job runs | +| `RATE_LIMIT` | `600` | Requests per minute per client | + +### Tier Resource Limits + +Override per-tier memory and CPU caps for workspace containers. CPU\_SHARES follows Docker's convention where 1024 equals 1 CPU. + +| Variable | Default | Description | +|----------|---------|-------------| +| `TIER2_MEMORY_MB` | `512` | Standard tier memory limit | +| `TIER2_CPU_SHARES` | `1024` | Standard tier CPU shares | +| `TIER3_MEMORY_MB` | `2048` | Privileged tier memory limit | +| `TIER3_CPU_SHARES` | `2048` | Privileged tier CPU shares | +| `TIER4_MEMORY_MB` | `4096` | Full-host tier memory limit | +| `TIER4_CPU_SHARES` | `4096` | Full-host tier CPU shares | + +### Plugin Install Safeguards + +| Variable | Default | Description | +|----------|---------|-------------| +| `PLUGIN_INSTALL_BODY_MAX_BYTES` | `65536` | Max request body size (64 KiB) | +| `PLUGIN_INSTALL_FETCH_TIMEOUT` | `5m` | Whole fetch and copy deadline | +| `PLUGIN_INSTALL_MAX_DIR_BYTES` | `104857600` | Max staged-tree size (100 MiB) | + +### Canvas + +| Variable | Default | Description | +|----------|---------|-------------| +| `NEXT_PUBLIC_PLATFORM_URL` | `http://localhost:8080` | Platform API URL | +| `NEXT_PUBLIC_WS_URL` | `ws://localhost:8080/ws` | WebSocket endpoint | + +### Tenant Mode + +| Variable | Default | Description | +|----------|---------|-------------| +| `CANVAS_PROXY_URL` | -- | When set, the Go server proxies canvas requests to this URL | +| `MOLECULE_ORG_ID` | -- | UUID for multi-tenant isolation; leave unset for self-hosted | + +## Production Deployment + +For production, use `platform/Dockerfile.tenant` which builds a combined Go + Canvas image: + +```bash +docker build -f platform/Dockerfile.tenant -t molecule-platform . +``` + +This image serves both the API and the canvas frontend from a single container. + +## Security Configuration + +### Secrets Encryption + +Set `SECRETS_ENCRYPTION_KEY` to a 32-byte AES-256 key to encrypt workspace secrets at rest. Without this variable, secrets are stored in plaintext. + +```bash +# Generate a key +openssl rand -hex 32 +``` + +**Warning:** `SECRETS_ENCRYPTION_KEY` cannot be rotated without a data migration. Choose carefully before deploying to production. + +### Rate Limiting + +The `RATE_LIMIT` variable (default 600 requests/min) applies per client. Adjust based on your expected traffic. + +### CORS + +Set `CORS_ORIGINS` to a comma-separated list of allowed origins. In production, restrict this to your actual domain. + +## Pre-commit Hook + +Install the project's pre-commit hooks to enforce code quality: + +```bash +git config core.hooksPath .githooks +``` + +The hook enforces: + +- `'use client'` directive on hook-using `.tsx` files +- Dark theme only (no `white` or `light` CSS classes) +- No SQL injection patterns (`fmt.Sprintf` with SQL) +- No leaked secrets (`sk-ant-`, `ghp_`, `AKIA`) + +Commits are rejected until all violations are fixed. + +## Building Workspace Images + +Build the base workspace image for local development: + +```bash +bash workspace-template/build-all.sh +``` + +Adapter-specific images are built from standalone template repos. Each repo's `Dockerfile` installs `molecule-ai-workspace-runtime` from PyPI plus adapter-specific dependencies. diff --git a/content/docs/troubleshooting.mdx b/content/docs/troubleshooting.mdx index a6dfcc7..b0750ba 100644 --- a/content/docs/troubleshooting.mdx +++ b/content/docs/troubleshooting.mdx @@ -1,11 +1,164 @@ --- title: Troubleshooting -description: Stub page β€” content coming soon. +description: Common issues and how to fix them. --- -> 🚧 **Coming soon.** The Documentation Specialist agent will populate this -> page on its next maintenance cycle. +## Workspace Stuck in "Provisioning" -If you need this content urgently, open an issue on the -[docs repo](https://github.com/Molecule-AI/docs/issues/new) and the agent -will prioritise it on its next cron tick. +A workspace that stays in `provisioning` for more than 30 seconds usually indicates a container startup failure. + +**Steps to diagnose:** + +1. Check Docker logs for the workspace container: + ```bash + docker logs + ``` +2. Verify the workspace image exists locally: + ```bash + docker images | grep workspace-template + ``` +3. Check tier resource limits -- the container may be OOM-killed on start. Review `TIER2_MEMORY_MB` / `TIER3_MEMORY_MB` / `TIER4_MEMORY_MB` values. +4. Ensure the platform can reach the Docker daemon (Docker Desktop must be running). + +## 401 Unauthorized on API Calls + +Bearer tokens can expire or be revoked. Workspace tokens are also auto-revoked when a workspace is deleted. + +**Resolution:** + +- For workspace-scoped endpoints, mint a new token: + ```bash + # Development/staging only (hidden when MOLECULE_ENV=production) + curl http://localhost:8080/admin/workspaces/:id/test-token + ``` +- For admin endpoints, verify your token is still valid against a known-good endpoint like `GET /health`. +- Legacy workspaces (created before Phase 30.1) are grandfathered and do not require tokens on heartbeat/update-card routes. + +## WebSocket Shows "Reconnecting" + +The canvas WebSocket connection (`/ws`) drops and retries. + +**Common causes:** + +- `CORS_ORIGINS` does not include your domain -- the WebSocket upgrade is rejected. Add your origin to the comma-separated list. +- A reverse proxy or firewall is terminating the long-lived connection. Ensure WebSocket upgrade headers are forwarded. +- The platform process crashed or restarted. Check platform logs. + +**Verify connectivity:** + +```bash +# Quick check that the WS endpoint is reachable +curl -i -N \ + -H "Connection: Upgrade" \ + -H "Upgrade: websocket" \ + -H "Sec-WebSocket-Version: 13" \ + -H "Sec-WebSocket-Key: dGVzdA==" \ + http://localhost:8080/ws +``` + +## Agent Not Responding to A2A + +When one agent cannot reach another via the A2A proxy (`POST /workspaces/:id/a2a`), check communication rules. + +**The `CanCommunicate` access check allows:** + +- Same workspace (self-call) +- Siblings (same parent) +- Root-level siblings (both have no parent) +- Parent to child or child to parent + +**Everything else is denied.** If two agents need to communicate, they must be in the same subtree. + +**Also verify:** + +- The target workspace is `online` (not `paused`, `offline`, or `provisioning`) +- The target's heartbeat is fresh (Redis TTL has not expired) +- The caller includes `X-Workspace-ID` and `Authorization: Bearer ` headers + +## Schedule Not Firing + +Cron schedules are managed by the platform scheduler subsystem. + +**Checklist:** + +- Verify the cron expression is valid (standard 5-field cron syntax) +- Confirm the workspace is `online` -- paused workspaces skip all schedules +- Check if the schedule was `skipped` due to concurrency: the scheduler skips when `active_tasks > 0`. Review schedule history: + ``` + GET /workspaces/:id/schedules/:scheduleId/history + ``` +- Inspect `GET /admin/liveness` to ensure the scheduler subsystem is alive (age should be under 60 seconds) + +## Channel Test Fails + +Social channel integrations (Telegram, Slack, etc.) can fail for several reasons. + +**Diagnose:** + +- Verify the bot token is correct and has not been revoked by the platform provider +- Check the allowlist config in the channel's JSONB settings -- messages from non-allowlisted chats are silently dropped +- Ensure the webhook URL is registered with the external platform: + ``` + POST /webhooks/:type + ``` + This is the endpoint the external platform (Telegram, Slack) should send events to. +- Test the connection explicitly: + ``` + POST /workspaces/:id/channels/:channelId/test + ``` + +## Migration Crash on Boot + +The platform runs all `*.up.sql` migrations on every startup (there is no `schema_migrations` tracking table yet). + +**Common issues:** + +- Migrations must be idempotent (`CREATE TABLE IF NOT EXISTS`, `ALTER TABLE ... IF NOT EXISTS`). If a migration lacks this guard, the second boot fails. +- Before PR #212, the migration runner did not filter `.down.sql` files, causing tables to be dropped on every boot. Ensure you are running a platform version that includes this fix. +- If you see errors about duplicate columns or tables, the migration is not idempotent. Patch the `.up.sql` file to add `IF NOT EXISTS` guards. + +## Canvas Blank or 502 on Tenant Deploy + +In tenant mode (`platform/Dockerfile.tenant`), the Go server proxies canvas requests. + +**Verify:** + +- `CANVAS_PROXY_URL` is set and points to the running Next.js process inside the container +- Both the Go server and the Node.js process are running (check container logs for both) +- The Next.js build completed successfully during `docker build` + +## Plugin Install Timeout + +Large plugins or slow network connections can exceed the default fetch deadline. + +**Adjust limits:** + +| Variable | Default | Description | +|----------|---------|-------------| +| `PLUGIN_INSTALL_FETCH_TIMEOUT` | `5m` | Increase for large or remote plugins | +| `PLUGIN_INSTALL_MAX_DIR_BYTES` | `104857600` (100 MiB) | Increase if the plugin tree exceeds 100 MiB | +| `PLUGIN_INSTALL_BODY_MAX_BYTES` | `65536` (64 KiB) | Increase if the install request body is large | + +## Memory or Disk Usage Growing + +Activity logs and structure events accumulate over time. + +**Tune retention:** + +- `ACTIVITY_RETENTION_DAYS` (default `7`) -- reduce to 3 or even 1 for high-traffic deployments +- `ACTIVITY_CLEANUP_INTERVAL_HOURS` (default `6`) -- reduce to run cleanup more frequently +- Monitor the `activity_logs` and `structure_events` tables directly if disk usage is a concern: + ```sql + SELECT pg_size_pretty(pg_total_relation_size('activity_logs')); + SELECT pg_size_pretty(pg_total_relation_size('structure_events')); + ``` + +## Container Health Detection + +If workspaces go offline unexpectedly (e.g., Docker Desktop crash), three layers detect the failure: + +1. **Passive (Redis TTL):** 60-second heartbeat key expires, liveness monitor triggers auto-restart +2. **Proactive (Health Sweep):** Docker API polled every 15 seconds, catches dead containers faster than TTL expiry +3. **Reactive (A2A Proxy):** On connection error to a workspace, checks `provisioner.IsRunning()` and triggers immediate offline + restart + +If none of these are catching a dead container, check `GET /admin/liveness` to verify the health sweep and liveness monitor subsystems are running.