molecule-core/docker-compose.yml
documentation-specialist 5d4184f4a3
Some checks failed
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 6s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 7s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 5s
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Failing after 54s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 5s
CI / Detect changes (pull_request) Successful in 5s
E2E API Smoke Test / detect-changes (pull_request) Successful in 6s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 6s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 6s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 6s
CI / Platform (Go) (pull_request) Successful in 3s
CI / Python Lint & Test (pull_request) Successful in 3s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 3s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 5s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 4s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Failing after 13s
CI / Canvas (Next.js) (pull_request) Successful in 42s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Failing after 1m18s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Failing after 1m20s
fix(scripts): migrate ghcr.io→ECR + raw.githubusercontent.com→Gitea (#46)
Per documentation-specialist's grep agent (2026-05-07T07:30, see
internal#46): runtime-breaking ghcr.io references in shell scripts +
docker-compose + the slip-past-workflow lint_secret_pattern_drift.py
all need migration. These were missed by security-auditor's
workflow-only audit.

Files (6):

- .github/scripts/lint_secret_pattern_drift.py:40 — workspace-runtime
  pre-commit-checks.sh consumer URL: raw.githubusercontent.com →
  Gitea raw URL (https://git.moleculesai.app/molecule-ai/.../raw/
  branch/main/...). The lint job runs in CI and would 404 today.

- scripts/refresh-workspace-images.sh:54 — workspace-template image
  pull URL: ghcr.io → ECR (153263036946.dkr.ecr.us-east-2.amazonaws.com).

- scripts/rollback-latest.sh — full rewrite of header + auth flow:
  * ghcr.io/molecule-ai/{platform,platform-tenant} → ECR
  * GITHUB_TOKEN with write:packages → AWS ECR auth
    (aws ecr get-login-password). Per saved memory
    reference_post_suspension_pipeline, prod cutover is to ECR.
  * Updated header docs to match new auth flow + prereqs.

- scripts/demo-freeze.sh:13,17 — comment-only ghcr → ECR
  (the script doesn't currently exec these URLs, but the comments
  describe the cascade and need to match reality).

- docker-compose.yml:215-216 — canvas image: ghcr.io → ECR + updated
  the auth comment to describe `aws ecr get-login-password` flow.

- tools/check-template-parity.sh:21 — inline curl install instructions:
  raw.githubusercontent.com → Gitea raw URL.

Hostile self-review:

1. rollback-latest.sh's GITHUB_TOKEN→aws-cli auth swap is a behavior
   change. Operators using this script now need aws CLI
   authenticated for region us-east-2 with ECR pull/push perms.
   Documented in updated header. Operators who don't have aws CLI
   will get 'aws: command not installed' which is a clear failure
   mode (not silent).
2. The Gitea raw URL shape (/raw/branch/main/) differs from GitHub's
   raw.githubusercontent.com structure. Verified pattern by
   inspecting other Gitea raw URLs in the codebase. If Gitea's URL
   changes (1.23+), update via the same one-line edit.
3. Doesn't touch packer/scripts/install-base.sh which has a similar
   ghcr.io ref per the grep agent's findings — that's bigger-scope
   (packer build pipeline) and lives in molecule-controlplane-ish
   territory; filing as parked follow-up under #46 if not already.

Refs: molecule-ai/internal#46, molecule-ai/internal#37,
molecule-ai/internal#38, saved memory reference_post_suspension_pipeline
2026-05-07 00:56:23 -07:00

315 lines
13 KiB
YAML
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

services:
# --- Infrastructure ---
postgres:
image: postgres:16-alpine
environment:
POSTGRES_USER: ${POSTGRES_USER:-dev}
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-dev}
POSTGRES_DB: ${POSTGRES_DB:-molecule}
command: ["postgres", "-c", "wal_level=logical"]
ports:
- "5432:5432"
volumes:
- pgdata:/var/lib/postgresql/data
networks:
- molecule-monorepo-net
healthcheck:
test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER:-dev}"]
interval: 2s
timeout: 5s
retries: 10
langfuse-db-init:
image: postgres:16-alpine
depends_on:
postgres:
condition: service_healthy
environment:
POSTGRES_USER: ${POSTGRES_USER:-dev}
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-dev}
command:
- /bin/sh
- -c
- |
export PGPASSWORD="$${POSTGRES_PASSWORD}"
until pg_isready -h postgres -U "$${POSTGRES_USER}" -d postgres >/dev/null 2>&1; do
sleep 1
done
if ! psql -h postgres -U "$${POSTGRES_USER}" -d postgres -tAc "SELECT 1 FROM pg_database WHERE datname = 'langfuse'" | grep -q 1; then
psql -h postgres -U "$${POSTGRES_USER}" -d postgres -c "CREATE DATABASE langfuse"
fi
networks:
- molecule-monorepo-net
redis:
image: redis:7-alpine
command: ["redis-server", "--notify-keyspace-events", "KEA"]
ports:
- "6379:6379"
volumes:
- redisdata:/data
networks:
- molecule-monorepo-net
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 2s
timeout: 5s
retries: 10
# --- Observability ---
langfuse-clickhouse:
image: clickhouse/clickhouse-server:24-alpine
environment:
CLICKHOUSE_DB: langfuse
CLICKHOUSE_USER: langfuse
CLICKHOUSE_PASSWORD: langfuse
volumes:
- clickhousedata:/var/lib/clickhouse
networks:
- molecule-monorepo-net
healthcheck:
test: ["CMD-SHELL", "wget --no-verbose --tries=1 --spider http://127.0.0.1:8123/ping || exit 1"]
interval: 5s
timeout: 5s
retries: 10
langfuse:
image: langfuse/langfuse:2
depends_on:
langfuse-clickhouse:
condition: service_healthy
langfuse-db-init:
condition: service_completed_successfully
environment:
DATABASE_URL: postgres://${POSTGRES_USER:-dev}:${POSTGRES_PASSWORD:-dev}@postgres:5432/langfuse
# Langfuse v2 expects the HTTP interface (port 8123). The previous
# clickhouse://...:9000 native-protocol URL is rejected with
# "ClickHouse URL protocol must be either http or https".
CLICKHOUSE_URL: http://langfuse-clickhouse:8123
CLICKHOUSE_MIGRATION_URL: clickhouse://langfuse-clickhouse:9000
CLICKHOUSE_USER: langfuse
CLICKHOUSE_PASSWORD: langfuse
NEXTAUTH_SECRET: ${LANGFUSE_SECRET:-changeme-langfuse-secret}
NEXTAUTH_URL: http://localhost:3001
SALT: ${LANGFUSE_SALT:-changeme-langfuse-salt}
ports:
- "3001:3000"
networks:
- molecule-monorepo-net
healthcheck:
test: ["CMD-SHELL", "wget --no-verbose --tries=1 --spider http://localhost:3000/api/public/health || exit 1"]
interval: 10s
timeout: 5s
retries: 10
# --- Platform ---
platform:
build:
# Build context MUST be repo root, not ./platform — the Dockerfile
# COPYs `workspace-server/migrations`, `workspace-server/go.mod`,
# `workspace-configs-templates/` etc. via repo-relative paths so it
# can bake in templates + migrations alongside the platform binary.
# When context was ./platform earlier, docker silently cached an
# earlier image (the COPY workspace-server/migrations resolved to nothing
# under ./workspace-server/, so layers stopped invalidating) — manifested
# as migration 023 not landing after PR #417 merged. CI workflow
# already uses context=. , this aligns local with CI.
context: .
dockerfile: workspace-server/Dockerfile
depends_on:
postgres:
condition: service_healthy
redis:
condition: service_healthy
environment:
DATABASE_URL: postgres://${POSTGRES_USER:-dev}:${POSTGRES_PASSWORD:-dev}@postgres:5432/${POSTGRES_DB:-molecule}?sslmode=disable
REDIS_URL: redis://redis:6379
PORT: "${PLATFORM_PORT:-8080}"
PLATFORM_URL: "http://platform:${PLATFORM_PORT:-8080}"
# Default MOLECULE_ENV=development so the WorkspaceAuth / AdminAuth
# middleware fail-open path activates when ADMIN_TOKEN is unset —
# otherwise the canvas (which runs without a bearer in pure local
# dev) gets 401 "missing workspace auth token" on every request.
# Override to "production" for SaaS/staged deploys; in those modes
# ADMIN_TOKEN must also be set or every request rejects.
MOLECULE_ENV: "${MOLECULE_ENV:-development}"
CORS_ORIGINS: ${CORS_ORIGINS:-http://localhost:${CANVAS_PUBLISH_PORT:-3000},http://127.0.0.1:${CANVAS_PUBLISH_PORT:-3000},http://localhost:3001}
RATE_LIMIT: "${RATE_LIMIT:-1000}"
CONFIGS_DIR: /configs
CONFIGS_HOST_DIR: "${CONFIGS_HOST_DIR:-${PWD}/workspace-configs-templates}"
PLUGINS_HOST_DIR: "${PLUGINS_HOST_DIR:-${PWD}/plugins}"
# github-app-auth plugin — injects GITHUB_TOKEN / GH_TOKEN into every
# workspace env from the App installation token. Remap the host-side
# path in GITHUB_APP_PRIVATE_KEY_FILE to /secrets/github-app.pem inside
# the container (the private key is bind-mounted below read-only).
# Soft-dep: skipped entirely when GITHUB_APP_ID is unset.
GITHUB_APP_ID: "${GITHUB_APP_ID:-}"
GITHUB_APP_INSTALLATION_ID: "${GITHUB_APP_INSTALLATION_ID:-}"
GITHUB_APP_PRIVATE_KEY_FILE: "/secrets/github-app.pem"
# ADMIN_TOKEN — required to fully close issue #684 (AdminAuth bearer bypass, PR #729).
# When set, only this exact value is accepted on all /admin/* and /approvals/* routes;
# workspace bearer tokens are no longer accepted as admin credentials.
# Unset (default) → backward-compat fallback: any valid workspace token passes AdminAuth
# (same behaviour as before PR #729, still vulnerable to #684).
# Generate: openssl rand -base64 32
# Store in fly secrets / deployment env — NEVER commit the actual value.
ADMIN_TOKEN: "${ADMIN_TOKEN:-}"
# Workspace hibernation default (issue #724 / PR #724). Sets platform-wide idle
# threshold (minutes); per-workspace column takes precedence. Leave empty to
# rely on per-workspace config only (current behaviour — global-default code pending).
HIBERNATION_IDLE_MINUTES: "${HIBERNATION_IDLE_MINUTES:-}"
# Plugin supply chain hardening (issue #768 / PR #775). Never set in production.
PLUGIN_ALLOW_UNPINNED: "${PLUGIN_ALLOW_UNPINNED:-}"
# Force ImagePull/ContainerCreate to request linux/amd64 manifests
# for the workspace-template-* images. The templates ship single-arch
# amd64 today; without this override, an arm64 host (Apple Silicon)
# asks the daemon for linux/arm64/v8, which doesn't match the manifest
# and the pull fails with "no matching manifest". Apple Silicon will
# run the amd64 image under Rosetta — slower (~2-3×) but functional.
# Override to "" or another platform when the templates start shipping
# multi-arch (then this hardcoded amd64 becomes unnecessary).
MOLECULE_IMAGE_PLATFORM: "${MOLECULE_IMAGE_PLATFORM:-linux/amd64}"
# GHCR auth for the workspace-images refresh endpoint
# (POST /admin/workspace-images/refresh). When set, the platform's
# Docker SDK ImagePull on private workspace-template-* images
# succeeds without per-host `docker login`. GHCR_USER is the GitHub
# username; GHCR_TOKEN is a fine-grained PAT with `read:packages`
# on the Molecule-AI org. Both unset → endpoint can only pull
# public images (current state for all 8 templates).
GHCR_USER: "${GHCR_USER:-}"
GHCR_TOKEN: "${GHCR_TOKEN:-}"
# Auto-refresh workspace-template-* images. The watcher polls GHCR
# every 5 min; when a digest moves, it pulls and force-recreates any
# matching ws-* containers (existing /admin/workspace-images/refresh
# logic). Closes the runtime CD chain: merge → containers running
# new code, no operator step. Default ON for local dev because that's
# where the runtime → ws iteration loop is tightest. Set to "false"
# if you don't want the platform to mutate ws-* containers behind
# your back during a long-running test.
IMAGE_AUTO_REFRESH: "${IMAGE_AUTO_REFRESH:-true}"
volumes:
- ./workspace-configs-templates:/configs
- ./org-templates:/org-templates:ro
- ./plugins:/plugins:ro
- /var/run/docker.sock:/var/run/docker.sock
# App private key — read-only bind-mount. The host-side path is
# gitignored per .gitignore rules (/.secrets/ + *.pem).
- ./.secrets/github-app.pem:/secrets/github-app.pem:ro
ports:
- "${PLATFORM_PUBLISH_PORT:-8080}:${PLATFORM_PORT:-8080}"
networks:
- molecule-monorepo-net
healthcheck:
test: ["CMD-SHELL", "wget --no-verbose --tries=1 --spider http://localhost:${PLATFORM_PORT:-8080}/health || exit 1"]
interval: 5s
timeout: 5s
retries: 10
# --- Canvas ---
canvas:
# The publish-canvas-image CI workflow pushes a fresh image to GHCR on
# every canvas/** merge to main. To update the running container:
# docker compose pull canvas && docker compose up -d canvas
# First-time local setup or testing unreleased changes — build from source:
# docker compose build canvas && docker compose up -d canvas
# Note: ECR images require AWS auth — `aws ecr get-login-password --region us-east-2 | docker login --username AWS --password-stdin 153263036946.dkr.ecr.us-east-2.amazonaws.com` before pull.
image: 153263036946.dkr.ecr.us-east-2.amazonaws.com/molecule-ai/canvas:latest
build:
context: ./canvas
dockerfile: Dockerfile
args:
NEXT_PUBLIC_PLATFORM_URL: ${NEXT_PUBLIC_PLATFORM_URL:-http://localhost:${PLATFORM_PUBLISH_PORT:-8080}}
NEXT_PUBLIC_WS_URL: ${NEXT_PUBLIC_WS_URL:-ws://localhost:${PLATFORM_PUBLISH_PORT:-8080}/ws}
NEXT_PUBLIC_ADMIN_TOKEN: ${ADMIN_TOKEN:-}
depends_on:
platform:
condition: service_healthy
environment:
PORT: "${CANVAS_PORT:-3000}"
# Local dev — relaxes CSP to allow cross-port fetches (canvas:3000 → platform:8080).
CSP_DEV_MODE: "${CSP_DEV_MODE:-1}"
# NOTE: NEXT_PUBLIC_* are baked into the JS bundle at `next build` time —
# these runtime values are ignored by the standalone output. They're kept
# here for documentation / override during `docker compose build`.
NEXT_PUBLIC_PLATFORM_URL: ${NEXT_PUBLIC_PLATFORM_URL:-http://localhost:${PLATFORM_PUBLISH_PORT:-8080}}
NEXT_PUBLIC_WS_URL: ${NEXT_PUBLIC_WS_URL:-ws://localhost:${PLATFORM_PUBLISH_PORT:-8080}/ws}
ports:
- "${CANVAS_PUBLISH_PORT:-3000}:${CANVAS_PORT:-3000}"
networks:
- molecule-monorepo-net
healthcheck:
test: ["CMD-SHELL", "wget --no-verbose --tries=1 --spider http://127.0.0.1:${CANVAS_PORT:-3000} || exit 1"]
interval: 10s
timeout: 5s
retries: 10
# --- Optional: LiteLLM Proxy (unified OpenAI-compatible API for all providers) ---
# Start with: docker compose --profile multi-provider up
#
# Workspace agents then set:
# OPENAI_BASE_URL=http://litellm:4000
# OPENAI_API_KEY=${LITELLM_MASTER_KEY:-sk-molecule}
#
# And use model names from infra/litellm_config.yml (e.g. "claude-opus-4-5",
# "gpt-4o", "openrouter/deepseek-r1", "ollama/llama3.2").
# Edit infra/litellm_config.yml to add/remove providers and models.
litellm:
image: ghcr.io/berriai/litellm:main-latest
profiles:
- multi-provider
ports:
- "4000:4000"
volumes:
- ./infra/litellm_config.yml:/app/config.yaml:ro
command: ["--config", "/app/config.yaml", "--port", "4000", "--num_workers", "4"]
environment:
# Pass provider API keys through — only the ones you have are needed
ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY:-}
OPENAI_API_KEY: ${OPENAI_API_KEY:-}
OPENROUTER_API_KEY: ${OPENROUTER_API_KEY:-}
LITELLM_MASTER_KEY: ${LITELLM_MASTER_KEY:-sk-molecule}
networks:
- molecule-monorepo-net
restart: unless-stopped
healthcheck:
test: ["CMD-SHELL", "wget --no-verbose --tries=1 --spider http://localhost:4000/health || exit 1"]
interval: 10s
timeout: 5s
retries: 5
start_period: 15s
# --- Optional: Local LLM Models via Ollama ---
# Start with: docker compose --profile local-models up
# After first start, pull a model:
# docker compose exec ollama ollama pull llama3.2
# docker compose exec ollama ollama pull qwen2.5-coder:7b
# Then set MODEL_PROVIDER=ollama:llama3.2 in your workspace config.yaml
# Workspace agents reach Ollama at http://ollama:11434 (internal Docker network).
ollama:
image: ollama/ollama:latest
profiles:
- local-models
ports:
- "11434:11434"
volumes:
- ollamadata:/root/.ollama
networks:
- molecule-monorepo-net
restart: unless-stopped
healthcheck:
test: ["CMD-SHELL", "ollama list || exit 1"]
interval: 10s
timeout: 5s
retries: 5
start_period: 20s
networks:
molecule-monorepo-net:
name: molecule-monorepo-net
volumes:
pgdata:
redisdata:
clickhousedata:
ollamadata: