diff --git a/.gitea/workflows/ci.yml b/.gitea/workflows/ci.yml index 4643d3c..fc984fe 100644 --- a/.gitea/workflows/ci.yml +++ b/.gitea/workflows/ci.yml @@ -7,6 +7,7 @@ on: jobs: build: runs-on: ubuntu-latest + timeout-minutes: 30 steps: - uses: actions/checkout@v4 - uses: actions/setup-node@v4 diff --git a/content/docs/tutorials/self-hosted-workspace-docker.md b/content/docs/tutorials/self-hosted-workspace-docker.md new file mode 100644 index 0000000..f8cfed6 --- /dev/null +++ b/content/docs/tutorials/self-hosted-workspace-docker.md @@ -0,0 +1,198 @@ +--- +title: Self-Hosted Workspace Deployment with Docker +--- + +# Self-Hosted Workspace Deployment with Docker + +This guide covers running a Molecule AI workspace agent as a Docker container on a self-hosted server or VM. It covers the Docker image, required environment variables, the built-in healthcheck, graceful shutdown, and Kubernetes deployment considerations. + +> **Prerequisites:** A running Molecule AI control plane (self-hosted or SaaS), an `ADMIN_TOKEN` or org-scoped API key with admin scope, and Docker 20.10+ on the host. + +## How the workspace container works + +The Molecule AI workspace Dockerfile includes: + +- A uvicorn server on port 8000 (configurable via `PORT`) +- A healthcheck endpoint at `/.well-known/agent-card.json` (used by Docker and Kubernetes probes) +- Graceful SIGTERM handling via uvicorn — the heartbeat loop and adapter tasks shut down cleanly + +``` +┌─────────────────────────────────────────────┐ +│ Docker host (your VM / bare metal) │ +│ │ +│ ┌─────────────────────────────────────┐ │ +│ │ workspace container │ │ +│ │ │ │ +│ │ uvicorn (port 8000) │ │ +│ │ └─ /.well-known/agent-card.json ← HEALTHCHECK │ │ +│ │ │ │ +│ │ heartbeat loop + A2A agent │ │ +│ └──────────────┬──────────────────────┘ │ +│ │ │ +│ host.docker.internal:8080 │ +│ │ │ +│ ▼ │ +│ ┌─────────────────────────────────────┐ │ +│ │ Molecule AI control plane │ │ +│ │ (platform on port 8080) │ │ +│ └─────────────────────────────────────┘ │ +└─────────────────────────────────────────────┘ +``` + +## Step 1: Create an external workspace + +First register the workspace as an external (self-managed) agent on the platform. + +```bash +ADMIN_TOKEN="your-admin-token" +PLATFORM_URL="https://platform.moleculesai.app" # or http://localhost:8080 for local dev +WORKSPACE=$(curl -s -X POST "${PLATFORM_URL}/workspaces" \ + -H "Authorization: Bearer ${ADMIN_TOKEN}" \ + -H "Content-Type: application/json" \ + -d '{"name": "self-hosted-agent", "runtime": "external"}') + +WORKSPACE_ID=$(echo "$WORKSPACE" | python3 -c "import json,sys; print(json.load(sys.stdin)['id'])") +echo "Workspace ID: $WORKSPACE_ID" +``` + +Save the returned `WORKSPACE_ID`. The workspace agent obtains its bearer token automatically during its first registration with the platform. + +## Step 2: Pull the workspace image + +The workspace image is published to the Molecule AI ECR registry. Contact your platform administrator for the registry prefix and credentials, then log in: + +```bash +aws ecr get-login-password --region us-east-1 | \ + docker login --username AWS --password-stdin "${REGISTRY_PREFIX}.dkr.ecr.us-east-1.amazonaws.com" + +docker pull "${REGISTRY_PREFIX}.dkr.ecr.us-east-1.amazonaws.com/molecule-workspace:latest" +``` + +## Step 3: Configure environment variables + +| Variable | Default | Description | +|---|---|---| +| `PLATFORM_URL` | `http://localhost:8080` | Platform API URL. Inside a Docker container, use `http://host.docker.internal:8080` to reach the platform on the host machine. | +| `WORKSPACE_ID` | — | Workspace ID from Step 1 (required; no default) | +| `PORT` | `8000` | Agent server port. Must match `containerPort` in Kubernetes and the port mapped with `-p` in Docker. | + +## Step 4: Run the container + +### Docker (standalone) + +```bash +docker run -d \ + --name molecule-workspace \ + -p 8000:8000 \ + -e PLATFORM_URL="http://host.docker.internal:8080" \ + -e WORKSPACE_ID="your-workspace-id" \ + -e PORT=8000 \ + "${REGISTRY_PREFIX}.dkr.ecr.us-east-1.amazonaws.com/molecule-workspace:latest" +``` + +> **Note for Linux hosts:** Docker does not include `host.docker.internal` by default. On Linux, either add `--add-host=host.docker.internal:host-gateway` to the `docker run` command, or use the host machine's IP address directly (e.g. `http://192.168.1.100:8080`). + +### Verify the healthcheck + +```bash +# Wait for the container to become healthy (up to ~2 minutes) +docker inspect --format='{{.State.Health.Status}}' molecule-workspace + +# Expected output: healthy +# Once healthy, the agent card is reachable: +curl -s http://localhost:8000/.well-known/agent-card.json | python3 -m json.tool +``` + +### Docker Compose + +```yaml +services: + molecule-workspace: + image: "${REGISTRY_PREFIX}.dkr.ecr.us-east-1.amazonaws.com/molecule-workspace:latest" + ports: + - "8000:8000" + environment: + PLATFORM_URL: "http://host.docker.internal:8080" + WORKSPACE_ID: "your-workspace-id" + PORT: "8000" + # Linux hosts: add host.docker.internal resolution + # extra_hosts: + # - "host.docker.internal:host-gateway" + restart: unless-stopped + healthcheck: + test: ["CMD", "curl", "-f", "http://localhost:8000/.well-known/agent-card.json"] + interval: 30s + timeout: 5s + retries: 3 + start_period: 30s +``` + +## Step 5: Graceful shutdown + +When the container receives SIGTERM (e.g. from `docker stop` or Kubernetes pod deletion), the workspace's uvicorn server initiates graceful shutdown: the heartbeat loop stops, active A2A tasks are given a grace period to complete, and any snapshotable state is persisted before the process exits. + +To integrate the heartbeat loop into custom agent code: + +```python +import asyncio +import os, signal +from heartbeat import HeartbeatLoop + +# SIGTERM is handled by the Docker runtime, which sends the signal to the +# workspace process. The workspace (via uvicorn) initiates graceful shutdown: +# the heartbeat loop is stopped, any active adapter tasks are cancelled, and +# in-flight A2A requests are given a grace period to complete. +# +# For custom integration with the heartbeat loop directly: +async def main(): + heartbeat = HeartbeatLoop( + platform_url=os.environ["PLATFORM_URL"], + workspace_id=os.environ["WORKSPACE_ID"], + ) + heartbeat.start() + try: + await asyncio.Event().wait() # keep running + finally: + await heartbeat.stop() + print("Heartbeat loop stopped.") +``` + +The Docker `stop` command sends SIGTERM and waits up to 10 seconds by default before sending SIGKILL. The healthcheck ensures orchestrators detect an unhealthy container before the SIGTERM timeout. + +## Kubernetes deployment + +For Kubernetes deployments, use the native liveness/readiness probe configuration instead of the Docker HEALTHCHECK: + +```yaml +ports: + - name: http + containerPort: 8000 +livenessProbe: + httpGet: + path: /.well-known/agent-card.json + port: http + initialDelaySeconds: 30 + periodSeconds: 30 + timeoutSeconds: 5 + failureThreshold: 3 +readinessProbe: + httpGet: + path: /.well-known/agent-card.json + port: http + initialDelaySeconds: 10 + periodSeconds: 10 + timeoutSeconds: 5 + failureThreshold: 3 +terminationGracePeriodSeconds: 120 +``` + +> **Note:** The Kubernetes `terminationGracePeriodSeconds` should exceed the liveness probe failure threshold so that the probe can register a failure before the pod is killed. With `periodSeconds: 30` and `failureThreshold: 3`, the probe does not register a failure until approximately 120–150s after the container becomes unhealthy. Set `terminationGracePeriodSeconds: 120` or higher. + +## Troubleshooting + +| Symptom | Cause | Fix | +|---|---|---| +| Container shows `unhealthy` after startup | Platform unreachable from container | Verify `PLATFORM_URL` uses `host.docker.internal` (Docker) or the correct host IP | +| `curl: (7) Failed to connect` on healthcheck | Container not fully started | Wait up to 30s; increase `start_period` | +| Agent not appearing on canvas | Wrong `WORKSPACE_ID` or expired token | Re-run registration; check platform logs | +| `host.docker.internal` not resolved | Linux host without the Docker flag | Use `--add-host=host.docker.internal:host-gateway` or the host's LAN IP |