Forked clean from public hackathon repo (Starfire-AgentTeam, BSL 1.1) with full rebrand to Molecule AI under github.com/Molecule-AI/molecule-monorepo. Brand: Starfire → Molecule AI. Slug: starfire / agent-molecule → molecule. Env vars: STARFIRE_* → MOLECULE_*. Go module: github.com/agent-molecule/platform → github.com/Molecule-AI/molecule-monorepo/platform. Python packages: starfire_plugin → molecule_plugin, starfire_agent → molecule_agent. DB: agentmolecule → molecule. History truncated; see public repo for prior commits and contributor attribution. Verified green: go test -race ./... (platform), pytest (workspace-template 1129 + sdk 132), vitest (canvas 352), build (mcp). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
10 KiB
Provisioner
The provisioner is the platform component that deploys workspace containers and VMs. It is triggered when a workspace is created, imported from a bundle, or expanded into a team.
How It Works
- Platform receives a workspace creation request (API call or bundle import)
- Platform writes a
WORKSPACE_PROVISIONINGevent and broadcasts it (canvas shows spinner) - Provisioner reads the workspace config (tier, model, env requirements)
- Provisioner reads secrets from
workspace_secretstable, decrypts them, prepares as env vars - Provisioner deploys based on tier (via
ApplyTierConfig()):- T1 (Sandboxed): Docker container, readonly rootfs, tmpfs /tmp, no
/workspacemount - T2 (Standard): Docker container +
/workspacemount + resource limits (512 MiB, 1 CPU) - T3 (Privileged): Docker container,
--privileged+ host PID (Docker network, not host) - T4 (Full Access): Docker container, privileged + host PID + host network + Docker socket
- T1 (Sandboxed): Docker container, readonly rootfs, tmpfs /tmp, no
- Provisioner waits for first heartbeat (workspace is live)
- On first heartbeat: status transitions to
online - On timeout (3 minutes) or immediate error: status transitions to
failed
Docker Networking (Tier 1-3, Tier 4 uses host)
All workspace containers join the molecule-monorepo-net Docker network. Containers are named ws-{id[:12]} (first 12 chars of workspace UUID). Two exported helpers in provisioner package provide the canonical naming:
provisioner.ContainerName(workspaceID)→ws-{id[:12]}provisioner.InternalURL(workspaceID)→http://ws-{id[:12]}:8000
These are used by discovery, workspace provisioning, and terminal handlers — always use them instead of constructing names inline.
Containers are also given an ephemeral host port binding (127.0.0.1:0→8000/tcp) so the platform can reach them from the host.
After ContainerStart, the provisioner inspects the container to resolve the actual mapped port and stores the host-accessible URL:
http://127.0.0.1:{ephemeral_port}
This URL is pre-stored in both Postgres and Redis before the agent registers. When the agent calls POST /registry/register, the register endpoint preserves the provisioner URL (any URL starting with http://127.0.0.1) instead of overwriting it with the agent's Docker-internal hostname.
Why not use Docker-internal URLs? In local dev, the platform runs on the host (not in Docker), so it cannot resolve Docker container hostnames. The ephemeral port mapping lets the A2A proxy reach agents via localhost. In production (platform in Docker), the Docker-internal URL (http://ws-{id}:8000) would work directly.
Workspace-to-workspace discovery: When a workspace discovers another workspace (via X-Workspace-ID header on GET /registry/discover/:id), the platform returns the Docker-internal URL (http://ws-{first12chars}:8000) so containers can reach each other directly on molecule-monorepo-net. The internal URL is cached in Redis at provision time and also synthesized as a fallback if the cache misses (only for online/degraded workspaces).
For external HTTPS access (multi-host mode), Nginx on the host handles TLS termination and proxies to the container.
Tier-Based Container Flags
| Tier | Flags |
|---|---|
| T1 (Sandboxed) | Config volume only, readonly rootfs, tmpfs /tmp, no /workspace mount |
| T2 (Standard) | Config + workspace volume, 512 MiB memory, 1 CPU |
| T3 (Privileged) | Config + workspace + --privileged + --pid=host (Docker network) |
| T4 (Full Access) | Config + workspace + --privileged + --pid=host + --network=host + Docker socket |
Tier configuration is applied via the exported ApplyTierConfig() function in provisioner.go. Unknown or zero tier values default to T2 (safe resource-limited container).
Workspace Lifecycle States
provisioning -> online <-----> degraded
| | |
v v v
failed offline offline
| | |
v v v
removed removed removed
^ ^
| |
(retry) (re-register)
provisioning -> online: first heartbeat receivedonline -> degraded: error_rate >= 50% (via heartbeat self-report)degraded -> online: error_rate < 10% (recovered)online/degraded -> offline: heartbeat TTL expired OR proactive health sweep detects dead containeroffline -> provisioning: auto-restart triggered by liveness monitor or health sweepprovisioning -> failed: 3min timeout or immediate Docker errorfailed -> provisioning: user clicks Retry on canvasoffline -> online: workspace re-registers (after auto-restart or manual restart)any -> paused: user pauses workspace (container stopped, config preserved)paused -> provisioning: user resumes workspaceany -> removed: user deletes workspace
| Status | Meaning | Canvas Display |
|---|---|---|
provisioning |
Container/VM is being spun up, waiting for first heartbeat | Spinner on node |
online |
Heartbeat received, reachable, accepting A2A messages | Green node |
degraded |
Online but error rate above 50%, self-reported via heartbeat | Yellow node with warning |
offline |
Heartbeat TTL expired, unreachable but not deleted | Gray node |
paused |
User paused — container stopped, config preserved, no auto-restart | Indigo node |
failed |
Provisioning timed out or immediate launch error | Red node + retry button |
removed |
User deleted it, kept in DB for event log + 410 responses | Node removed from canvas |
Restart & Runtime Detection
When a workspace is restarted (POST /workspaces/:id/restart):
- Read runtime from the
workspaces.runtimecolumn in Postgres - Stop the existing container
- Resolve template — checks request body, name-based match, then runtime-default template (e.g.
claude-code-default/) - Re-provision with the same config volume (configs persist across restarts)
Runtime stored in DB: The runtime column is set at creation time and persists across restarts. No need to read from the container.
Template resolution at creation: When a workspace specifies a template that doesn't exist (e.g. org-marketing-lead), the Create handler falls back in order: (1) {runtime}-default template (e.g. claude-code-default/), (2) ensureDefaultConfig (generates minimal config + copies .auth-token from claude-code-default/).
Container Health Detection
Three layers detect dead containers:
-
Passive (Redis TTL): Each heartbeat refreshes a 60s Redis key (
ws:{id}). When the key expires, the liveness monitor marks the workspace offline and triggers auto-restart. Gap: up to 60s of false "online" state. -
Proactive (Health Sweep): A goroutine checks all online/degraded workspaces against Docker API (
ContainerInspect) every 15 seconds. If a container is gone, it immediately marks the workspace offline, clears Redis caches, and triggers auto-restart. Catches bulk container death (e.g. Docker Desktop crash) within 15s. -
Reactive (A2A Proxy): When the A2A proxy (
POST /workspaces/:id/a2a) gets a connection error, it checksprovisioner.IsRunning(). If the container is dead, it marks offline, clears caches, triggers restart, and returns 503 with"restarting": true. If the container is running but unresponsive, returns 502.
All three layers use the same onWorkspaceOffline callback: broadcast WORKSPACE_OFFLINE + go wh.RestartByID(workspaceID). RestartByID has a per-workspace mutex (TryLock) that deduplicates concurrent restart attempts.
When a workspace goes offline and is auto-restarted, Redis keys are cleaned up via db.ClearWorkspaceKeys() which removes ws:{id}, ws:{id}:url, and ws:{id}:internal_url.
Failure Handling
When provisioning fails:
- Status set to
failed WORKSPACE_PROVISION_FAILEDevent written with reason- Canvas shows a red node with the error message
- User can click Retry — resets status to
provisioningand re-runs the provisioner
Docker Volume Mounts
By default, each workspace gets an isolated named Docker volume:
docker volume: ws-{id}-workspace
-> mounted at /workspace inside the container
-> persists across: container restart, re-provision, image update
-> destroyed only when: user deletes workspace or runs nuke.sh
The volume is named after the workspace ID, not the container name. So even when a container is destroyed and re-provisioned, the new container mounts the same volume. Tier 1 workspaces skip the workspace volume for read-only isolation.
Per-Workspace Directory (workspace_dir)
Each workspace can optionally specify a host directory to bind-mount as /workspace. The priority chain is:
- Per-workspace
workspace_dir(DB column, set via API or org template) — highest priority - Global
WORKSPACE_DIRenv var — fallback for all workspaces without a per-workspace value - Isolated Docker named volume — default when neither is set
# org-templates/molecule-dev/org.yaml
workspaces:
- name: PM
workspace_dir: /Users/you/project # bind-mounts repo
- name: Backend Engineer
# no workspace_dir → isolated Docker volume
API support:
POST /workspaces {"workspace_dir": "/path"}— set on createPATCH /workspaces/:id {"workspace_dir": "/path"}— update (returnsneeds_restart: true)PATCH /workspaces/:id {"workspace_dir": null}— clear (reverts to isolated volume)
Path validation: must be absolute, no .. traversal, rejects system paths (/etc, /var, /proc, /sys, /dev, /boot, /sbin, /bin, /lib, /usr).
See Memory for full memory backend details.
Container Cleanup
When a workspace is deleted:
- Docker container is stopped and removed
- Memory cleaned up (DB rows deleted, Redis keys cleared)
- Workspace status set to
removedin Postgres WORKSPACE_REMOVEDevent written
Structure events and agent card history are never deleted — only the conversational memory is cleaned.
Related Docs
- Memory — Memory backends and persistence
- Workspace Tiers — What each tier provides
- Workspace Runtime — What runs inside the container
- Registry & Heartbeat — How provisioning transitions to online
- Team Expansion — Provisioning triggered by team expansion