Molecule AI Community Manager 8adf972dd9 docs(tutorials): add Self-Hosted AI Agents deployment guide

Covers Docker, Fly Machines, and bare metal deployment models with
use cases, configuration examples, and a comparison table. Captures
keywords from SEO brief #1126: self-hosted AI agents platform, remote
AI agent deployment.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-21 10:26:41 +00:00

9.9 KiB

Raw Permalink Blame History

title

date

slug

description

Self-Hosted AI Agents: Molecule AI on Docker, Fly Machines, or Bare Metal

Molecule AI is designed to run wherever your agents need to run. Whether you're deploying on a single VPS, distributing agents across cloud VMs, or running on hardware that can't be containerized, Molecule AI has a path that fits.

This guide covers the three deployment models — Docker containers, Fly Machines, and bare metal — with concrete use cases and configuration for each.

Choosing a Deployment Model

Model	Best for	Provisioning	Cold start	Isolation
Docker	Single-host, dev/test, one-box production	Manual (`docker run`) or Docker Compose	~15–30s	Shared kernel
Fly Machines	Multi-region, auto-scaling, per-tenant isolation	Platform API (`POST /workspaces`)	<1s	Firecracker microVM
Bare metal / remote	On-prem, laptops, CI/CD, air-gapped	Manual registration	N/A	None (your infra)

All three models use the same agent runtime and A2A protocol. The differences are in how agents are provisioned, how secrets are delivered, and how liveness is tracked.

Model 1: Docker Containers

The default deployment. The platform manages container lifecycle — you get workspace provisioning, secret injection, and platform heartbeat handling out of the box.

How it works:

POST /workspaces → platform runs `docker run ghcr.io/molecule-ai/workspace-<runtime>`

The platform injects WORKSPACE_ID, PLATFORM_URL, and workspace secrets as environment variables before the container starts. The agent inside registers itself via POST /registry/register on boot, and the platform sends health checks through Docker's health subsystem.

Configuration:

# Your platform's .env
CONTAINER_BACKEND=docker        # default
PLATFORM_URL=https://your-host   # reachable from containers
WORKSPACE_IMAGE_PREFIX=ghcr.io/molecule-ai/workspace-

# Optional: restrict which runtimes are allowed
ALLOWED_RUNTIMES=hermes,claude-code,langgraph

# For CI on the same host:
WORKSPACE_NETWORK=host          # use host network for zero-config networking

When to choose Docker:

Single-host deployments (VPS, single EC2)
Dev/test environments where isolation is less critical
Teams that already have Docker infra
You want the platform to handle provisioning automatically

Model 2: Fly Machines

Fly Machines are Firecracker microVMs managed by the Fly.io API. They offer sub-second cold starts, multi-region placement, and hardware-level isolation between workspaces — without the shared kernel risk of Docker.

How it works:

POST /workspaces → platform calls Fly API → Fly Machine boots workspace image

The platform talks to Fly Machines API directly, passing workspace config and secrets as environment variables. The same agent runtime runs inside the Machine.

Configuration:

# Your platform's .env
CONTAINER_BACKEND=flyio
FLY_API_TOKEN=<fly-deploy-token>          # flyctl tokens create deploy
FLY_WORKSPACE_APP=my-molecule-workspaces   # Fly app for workspace Machines
FLY_REGION=ord                             # default region (or leave for auto)

Resource tiers (configured per workspace via "tier": 2|3|4):

Tier	RAM	CPUs	Use case
T2	512 MB	1	Light workers, eval agents
T3	2 GB	2	General-purpose orchestrators
T4	4 GB	4	Heavy inference, long-context tasks

Setting tier on creation:

curl -X POST https://platform.moleculesai.app/workspaces \
  -H "Authorization: Bearer ${ADMIN_TOKEN}" \
  -d '{
    "name": "eu-worker",
    "runtime": "hermes",
    "tier": 3,
    "metadata": { "region": "ams" }
  }'

Fly picks the closest region to the region metadata field, or defaults to FLY_REGION.

When to choose Fly Machines:

Multi-tenant SaaS where workspace isolation matters (Firecracker = no shared kernel)
Sub-second cold starts matter (queue workers, on-demand workers)
You want multi-region agent distribution without managing your own fleet
You want pay-per-second billing instead of always-on VMs

See: Provision Workspaces on Fly Machines — full walkthrough with flyctl commands.

Model 3: Bare Metal / Remote Agents

For agents that can't be containerized — on-prem hardware, laptops, CI/CD runners — Molecule AI ships a registration API. Your agent registers with the platform, receives a bearer token, and maintains canvas visibility via a heartbeat loop.

This is the most flexible model. The platform doesn't manage the agent's lifecycle — it just provides a coordination layer (fleet visibility, secret management, A2A routing).

How it works:

Create an external workspace via the API
Register the agent and receive a one-time bearer token
The agent starts a 30-second heartbeat loop
The canvas shows the agent with a REMOTE badge

Step-by-step registration:

ADMIN_TOKEN="your-admin-token"
PLATFORM_URL="https://platform.moleculesai.app"
AGENT_URL="https://your-agent.example.com"  # must be HTTPS and reachable

# 1. Create external workspace
WORKSPACE=$(curl -s -X POST "${PLATFORM_URL}/workspaces" \
  -H "Authorization: Bearer ${ADMIN_TOKEN}" \
  -H "Content-Type: application/json" \
  -d "{
    \"name\": \"CI Agent\",
    \"runtime\": \"external\",
    \"external\": true,
    \"url\": \"${AGENT_URL}\"
  }")
WORKSPACE_ID=$(echo $WORKSPACE | jq -r '.id')

# 2. Register and receive bearer token
REG=$(curl -s -X POST "${PLATFORM_URL}/registry/register" \
  -H "Authorization: Bearer ${ADMIN_TOKEN}" \
  -H "Content-Type: application/json" \
  -d "{
    \"id\": \"${WORKSPACE_ID}\",
    \"url\": \"${AGENT_URL}\",
    \"agent_card\": {\"name\": \"CI Agent\", \"runtime\": \"external\"}
  }")
AUTH_TOKEN=$(echo $REG | jq -r '.auth_token')

# 3. Heartbeat every 30s
curl -s -X POST "${PLATFORM_URL}/registry/heartbeat" \
  -H "Authorization: Bearer ${AUTH_TOKEN}" \
  -d "{\"workspace_id\": \"${WORKSPACE_ID}\"}"

Agent-side heartbeat (Python):

import requests, time, threading

AUTH_TOKEN = "<from registration>"
WORKSPACE_ID = "<from registration>"
PLATFORM_URL = "https://platform.moleculesai.app"

def heartbeat_loop():
    while True:
        requests.post(
            f"{PLATFORM_URL}/registry/heartbeat",
            headers={"Authorization": f"Bearer {AUTH_TOKEN}"},
            json={"workspace_id": WORKSPACE_ID},
        )
        time.sleep(30)

threading.Thread(target=heartbeat_loop, daemon=True).start()

For agents behind NAT or firewall:

The platform needs to reach AGENT_URL for inbound A2A messages. Expose your agent with a tunnel:

# Cloudflare Tunnel (recommended for production)
cloudflared tunnel --url http://localhost:8080

# Or ngrok (quick dev/test)
ngrok http 8080

Copy the public URL and use it as AGENT_URL in the registration call.

When to choose bare metal / remote:

On-prem hardware that can't be containerized
Laptops or workstations where Docker isn't practical
CI/CD runners (GitHub Actions, Jenkins) that spin up per job
Air-gapped networks
Any scenario where the platform shouldn't own the agent's lifecycle

See: Register a Remote Agent on Molecule AI — full tutorial with CI/CD examples and minimal Python agent.

Comparing the Three Models

	Docker	Fly Machines	Bare Metal / Remote
Provisioning	Platform (`docker run`)	Platform (Fly API)	Manual via API
Secrets	Injected as env vars at boot	Injected as env vars at boot	Pulled on demand via API
Heartbeat	Platform (Docker health)	Platform (health check)	Agent sends every 30s
Canvas badge	None (standard)	None (standard)	Purple REMOTE
Cold start	~15–30s	<1s	N/A
Isolation	Shared kernel	Hardware (Firecracker)	None (your infra)
Lifecycle managed	✅ Yes	✅ Yes	❌ No (your code)
Works with existing infra	❌ No	❌ No	✅ Yes
Best for	Single-host, dev/test	Multi-region, SaaS	On-prem, CI/CD, laptops

Mixing Deployment Models

You can combine models in the same organization. A typical production setup might look like:

CI/CD agents → bare metal / remote (register per pipeline run)
Queue workers → Fly Machines (auto-scale, sub-second spin-up)
Staging / dev → Docker on a single VPS
Long-running services → Fly Machines in the region closest to your users

All of these show up on the same canvas, visible to the same orchestrator, reachable via A2A. The deployment model is an implementation detail — the coordination layer is uniform.

Which Model Should You Use?

Start with Docker if you're evaluating Molecule AI or running on a single host. It's the lowest friction path.

Move to Fly Machines when you need multi-region, per-tenant isolation, or sub-second scaling. The platform handles Fly provisioning automatically — just set env vars and POST /workspaces.

Add remote / bare metal when you have agents that can't live in either container model — on-prem hardware, CI/CD runners, or air-gapped networks. Register them via API and they join the fleet alongside container-provisioned agents.

→ Register a Remote Agent — bare metal tutorial → Provision Workspaces on Fly Machines — Fly Machines walkthrough → Platform API Reference — full endpoint documentation

Molecule AI is open source. All three deployment models are documented in docs/tutorials/ on main.

9.9 KiB Raw Permalink Blame History Unescape Escape

Self-Hosted AI Agents: Molecule AI on Docker, Fly Machines, or Bare Metal

Choosing a Deployment Model

Model 1: Docker Containers

Model 2: Fly Machines

Model 3: Bare Metal / Remote Agents

Comparing the Three Models

Mixing Deployment Models

Which Model Should You Use?

9.9 KiB

Raw Permalink Blame History