Reproducing the README's quickstart on a clean clone surfaced seven independent bugs between `git clone` and seeing the Canvas in a browser. Each fix is minimal and local-dev-only — the SaaS/EC2 provisioner path (issue #1822) is untouched. Bugs fixed: 1. `infra/scripts/setup.sh` applied migrations via raw psql, bypassing the platform's `schema_migrations` tracker. The platform then re-ran every migration on first boot and crashed on non-idempotent ALTER TABLE statements (e.g. `036_org_api_tokens_org_id.up.sql`). Dropped the migration block — `workspace-server/internal/db/postgres.go:53` already tracks and skips applied files. 2. `.env.example` shipped `DATABASE_URL=postgres://USER:PASS@postgres:...` with literal `USER:PASS` placeholders and the Docker-internal hostname `postgres`. A `cp .env.example .env` followed by `go run ./cmd/server` on the host failed with `dial tcp: lookup postgres: no such host`. Replaced with working `dev:dev@localhost:5432` defaults that match `docker-compose.infra.yml`. 3. `docker-compose.infra.yml` and `docker-compose.yml` set `CLICKHOUSE_URL: clickhouse://...:9000/...`. Langfuse v2 rejects anything other than `http://` or `https://`, so the container crash-looped and returned HTTP 500. Switched to `http://...:8123` (HTTP interface) and added `CLICKHOUSE_MIGRATION_URL` for the migration-time native-protocol connection. Also removed `LANGFUSE_AUTO_CLICKHOUSE_MIGRATION_DISABLED` so migrations actually run. 4. `canvas/package.json` dev script crashed with `EADDRINUSE :::8080` when `.env` was sourced before `npm run dev` — Next.js reads `PORT` from env and the platform owns 8080. Pinned `dev` to `-p 3000` so sourced env can't hijack it. `start` left as-is because production `node server.js` (Dockerfile CMD) must respect `PORT` from the orchestrator. 5. README/CONTRIBUTING told users to clone `Molecule-AI/molecule-monorepo` — that repo 404s; the actual name is `molecule-core`. The Railway and Render deploy buttons had the same broken URL. Replaced in both English and Chinese READMEs and in CONTRIBUTING. Internal identifiers (Go module path, Docker network `molecule-monorepo-net`, Python helper `molecule-monorepo-status`) deliberately left alone — renaming those is an invasive refactor orthogonal to this fix. 6. README quickstart was missing `cp .env.example .env`. Users who went straight from `git clone` to `./infra/scripts/setup.sh` got a script that warned about an unset `ADMIN_TOKEN` (harmless) but then couldn't run the platform without figuring out the env setup on their own. Added the step in both READMEs and CONTRIBUTING. Deliberately NOT generating `ADMIN_TOKEN`/`SECRETS_ENCRYPTION_KEY` here — the e2e-api suite (`tests/e2e/test_api.sh`) assumes AdminAuth fallback mode (no server-side `ADMIN_TOKEN`), which is how CI runs it. 7. CI shellcheck only covered `tests/e2e/*.sh` — `infra/scripts/setup.sh` is in the critical path of every new-user onboarding but was never linted. Extended the `shellcheck` job and the `changes` filter to cover `infra/scripts/`. `scripts/` deliberately excluded until its pre-existing SC3040/SC3043 warnings are cleaned up separately. Verification (fresh nuke-and-rebuild following the updated README): - `docker compose -f docker-compose.infra.yml down -v` + `rm .env` - `cp .env.example .env` → defaults work as-is - `bash infra/scripts/setup.sh` — clean, no migration errors, all 6 infra containers healthy - `cd workspace-server && go run ./cmd/server` — "Applied 41 migrations (0 already applied)", platform on :8080/health 200 - `cd canvas && npm install && npm run dev` — Canvas on :3000/ 200 even with `.env` sourced (PORT=8080 in env) - `bash tests/e2e/test_api.sh` — **61 passed, 0 failed** - `cd canvas && npx vitest run` — **900 tests passed** - `cd canvas && npm run build` — production build clean - `shellcheck --severity=warning infra/scripts/*.sh` — clean - Langfuse `/api/public/health` 200 (was 500) Scope notes: - SaaS/EC2 parity (issue #1822): all files touched here are local-dev surface. Canvas container uses `node server.js` with `ENV PORT=3000` in `canvas/Dockerfile` — the `-p 3000` pin in `package.json` dev script only affects `npm run dev`, not the production CMD. - Test coverage (issue #1821): project policy is tiered coverage floors, not a blanket 100% target. Files touched here are shell scripts, YAML, Markdown, and one package.json script — not classes covered by the coverage matrix. - No overlap with open PRs — searched `setup.sh`, `quickstart`, `langfuse`, `clickhouse`, `migration`, `README`; nothing conflicts. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>
281 lines
11 KiB
YAML
281 lines
11 KiB
YAML
services:
|
|
# --- Infrastructure ---
|
|
postgres:
|
|
image: postgres:16-alpine
|
|
environment:
|
|
POSTGRES_USER: ${POSTGRES_USER:-dev}
|
|
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-dev}
|
|
POSTGRES_DB: ${POSTGRES_DB:-molecule}
|
|
command: ["postgres", "-c", "wal_level=logical"]
|
|
ports:
|
|
- "5432:5432"
|
|
volumes:
|
|
- pgdata:/var/lib/postgresql/data
|
|
networks:
|
|
- molecule-monorepo-net
|
|
healthcheck:
|
|
test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER:-dev}"]
|
|
interval: 2s
|
|
timeout: 5s
|
|
retries: 10
|
|
|
|
langfuse-db-init:
|
|
image: postgres:16-alpine
|
|
depends_on:
|
|
postgres:
|
|
condition: service_healthy
|
|
environment:
|
|
POSTGRES_USER: ${POSTGRES_USER:-dev}
|
|
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-dev}
|
|
command:
|
|
- /bin/sh
|
|
- -c
|
|
- |
|
|
export PGPASSWORD="$${POSTGRES_PASSWORD}"
|
|
until pg_isready -h postgres -U "$${POSTGRES_USER}" -d postgres >/dev/null 2>&1; do
|
|
sleep 1
|
|
done
|
|
if ! psql -h postgres -U "$${POSTGRES_USER}" -d postgres -tAc "SELECT 1 FROM pg_database WHERE datname = 'langfuse'" | grep -q 1; then
|
|
psql -h postgres -U "$${POSTGRES_USER}" -d postgres -c "CREATE DATABASE langfuse"
|
|
fi
|
|
networks:
|
|
- molecule-monorepo-net
|
|
|
|
redis:
|
|
image: redis:7-alpine
|
|
command: ["redis-server", "--notify-keyspace-events", "KEA"]
|
|
ports:
|
|
- "6379:6379"
|
|
volumes:
|
|
- redisdata:/data
|
|
networks:
|
|
- molecule-monorepo-net
|
|
healthcheck:
|
|
test: ["CMD", "redis-cli", "ping"]
|
|
interval: 2s
|
|
timeout: 5s
|
|
retries: 10
|
|
|
|
# --- Observability ---
|
|
langfuse-clickhouse:
|
|
image: clickhouse/clickhouse-server:24-alpine
|
|
environment:
|
|
CLICKHOUSE_DB: langfuse
|
|
CLICKHOUSE_USER: langfuse
|
|
CLICKHOUSE_PASSWORD: langfuse
|
|
volumes:
|
|
- clickhousedata:/var/lib/clickhouse
|
|
networks:
|
|
- molecule-monorepo-net
|
|
healthcheck:
|
|
test: ["CMD-SHELL", "wget --no-verbose --tries=1 --spider http://127.0.0.1:8123/ping || exit 1"]
|
|
interval: 5s
|
|
timeout: 5s
|
|
retries: 10
|
|
|
|
langfuse:
|
|
image: langfuse/langfuse:2
|
|
depends_on:
|
|
langfuse-clickhouse:
|
|
condition: service_healthy
|
|
langfuse-db-init:
|
|
condition: service_completed_successfully
|
|
environment:
|
|
DATABASE_URL: postgres://${POSTGRES_USER:-dev}:${POSTGRES_PASSWORD:-dev}@postgres:5432/langfuse
|
|
# Langfuse v2 expects the HTTP interface (port 8123). The previous
|
|
# clickhouse://...:9000 native-protocol URL is rejected with
|
|
# "ClickHouse URL protocol must be either http or https".
|
|
CLICKHOUSE_URL: http://langfuse-clickhouse:8123
|
|
CLICKHOUSE_MIGRATION_URL: clickhouse://langfuse-clickhouse:9000
|
|
CLICKHOUSE_USER: langfuse
|
|
CLICKHOUSE_PASSWORD: langfuse
|
|
NEXTAUTH_SECRET: ${LANGFUSE_SECRET:-changeme-langfuse-secret}
|
|
NEXTAUTH_URL: http://localhost:3001
|
|
SALT: ${LANGFUSE_SALT:-changeme-langfuse-salt}
|
|
ports:
|
|
- "3001:3000"
|
|
networks:
|
|
- molecule-monorepo-net
|
|
healthcheck:
|
|
test: ["CMD-SHELL", "wget --no-verbose --tries=1 --spider http://localhost:3000/api/public/health || exit 1"]
|
|
interval: 10s
|
|
timeout: 5s
|
|
retries: 10
|
|
|
|
# --- Platform ---
|
|
platform:
|
|
build:
|
|
# Build context MUST be repo root, not ./platform — the Dockerfile
|
|
# COPYs `workspace-server/migrations`, `workspace-server/go.mod`,
|
|
# `workspace-configs-templates/` etc. via repo-relative paths so it
|
|
# can bake in templates + migrations alongside the platform binary.
|
|
# When context was ./platform earlier, docker silently cached an
|
|
# earlier image (the COPY workspace-server/migrations resolved to nothing
|
|
# under ./workspace-server/, so layers stopped invalidating) — manifested
|
|
# as migration 023 not landing after PR #417 merged. CI workflow
|
|
# already uses context=. , this aligns local with CI.
|
|
context: .
|
|
dockerfile: workspace-server/Dockerfile
|
|
depends_on:
|
|
postgres:
|
|
condition: service_healthy
|
|
redis:
|
|
condition: service_healthy
|
|
environment:
|
|
DATABASE_URL: postgres://${POSTGRES_USER:-dev}:${POSTGRES_PASSWORD:-dev}@postgres:5432/${POSTGRES_DB:-molecule}?sslmode=disable
|
|
REDIS_URL: redis://redis:6379
|
|
PORT: "${PLATFORM_PORT:-8080}"
|
|
PLATFORM_URL: "http://platform:${PLATFORM_PORT:-8080}"
|
|
CORS_ORIGINS: ${CORS_ORIGINS:-http://localhost:${CANVAS_PUBLISH_PORT:-3000},http://127.0.0.1:${CANVAS_PUBLISH_PORT:-3000},http://localhost:3001}
|
|
RATE_LIMIT: "${RATE_LIMIT:-1000}"
|
|
CONFIGS_DIR: /configs
|
|
CONFIGS_HOST_DIR: "${CONFIGS_HOST_DIR:-${PWD}/workspace-configs-templates}"
|
|
PLUGINS_HOST_DIR: "${PLUGINS_HOST_DIR:-${PWD}/plugins}"
|
|
# github-app-auth plugin — injects GITHUB_TOKEN / GH_TOKEN into every
|
|
# workspace env from the App installation token. Remap the host-side
|
|
# path in GITHUB_APP_PRIVATE_KEY_FILE to /secrets/github-app.pem inside
|
|
# the container (the private key is bind-mounted below read-only).
|
|
# Soft-dep: skipped entirely when GITHUB_APP_ID is unset.
|
|
GITHUB_APP_ID: "${GITHUB_APP_ID:-}"
|
|
GITHUB_APP_INSTALLATION_ID: "${GITHUB_APP_INSTALLATION_ID:-}"
|
|
GITHUB_APP_PRIVATE_KEY_FILE: "/secrets/github-app.pem"
|
|
# ADMIN_TOKEN — required to fully close issue #684 (AdminAuth bearer bypass, PR #729).
|
|
# When set, only this exact value is accepted on all /admin/* and /approvals/* routes;
|
|
# workspace bearer tokens are no longer accepted as admin credentials.
|
|
# Unset (default) → backward-compat fallback: any valid workspace token passes AdminAuth
|
|
# (same behaviour as before PR #729, still vulnerable to #684).
|
|
# Generate: openssl rand -base64 32
|
|
# Store in fly secrets / deployment env — NEVER commit the actual value.
|
|
ADMIN_TOKEN: "${ADMIN_TOKEN:-}"
|
|
# Workspace hibernation default (issue #724 / PR #724). Sets platform-wide idle
|
|
# threshold (minutes); per-workspace column takes precedence. Leave empty to
|
|
# rely on per-workspace config only (current behaviour — global-default code pending).
|
|
HIBERNATION_IDLE_MINUTES: "${HIBERNATION_IDLE_MINUTES:-}"
|
|
# Plugin supply chain hardening (issue #768 / PR #775). Never set in production.
|
|
PLUGIN_ALLOW_UNPINNED: "${PLUGIN_ALLOW_UNPINNED:-}"
|
|
volumes:
|
|
- ./workspace-configs-templates:/configs
|
|
- ./org-templates:/org-templates:ro
|
|
- ./plugins:/plugins:ro
|
|
- /var/run/docker.sock:/var/run/docker.sock
|
|
# App private key — read-only bind-mount. The host-side path is
|
|
# gitignored per .gitignore rules (/.secrets/ + *.pem).
|
|
- ./.secrets/github-app.pem:/secrets/github-app.pem:ro
|
|
ports:
|
|
- "${PLATFORM_PUBLISH_PORT:-8080}:${PLATFORM_PORT:-8080}"
|
|
networks:
|
|
- molecule-monorepo-net
|
|
healthcheck:
|
|
test: ["CMD-SHELL", "wget --no-verbose --tries=1 --spider http://localhost:${PLATFORM_PORT:-8080}/health || exit 1"]
|
|
interval: 5s
|
|
timeout: 5s
|
|
retries: 10
|
|
|
|
# --- Canvas ---
|
|
canvas:
|
|
# The publish-canvas-image CI workflow pushes a fresh image to GHCR on
|
|
# every canvas/** merge to main. To update the running container:
|
|
# docker compose pull canvas && docker compose up -d canvas
|
|
# First-time local setup or testing unreleased changes — build from source:
|
|
# docker compose build canvas && docker compose up -d canvas
|
|
# Note: GHCR images are private — `docker login ghcr.io` required before pull.
|
|
image: ghcr.io/molecule-ai/canvas:latest
|
|
build:
|
|
context: ./canvas
|
|
dockerfile: Dockerfile
|
|
args:
|
|
NEXT_PUBLIC_PLATFORM_URL: ${NEXT_PUBLIC_PLATFORM_URL:-http://localhost:${PLATFORM_PUBLISH_PORT:-8080}}
|
|
NEXT_PUBLIC_WS_URL: ${NEXT_PUBLIC_WS_URL:-ws://localhost:${PLATFORM_PUBLISH_PORT:-8080}/ws}
|
|
NEXT_PUBLIC_ADMIN_TOKEN: ${ADMIN_TOKEN:-}
|
|
depends_on:
|
|
platform:
|
|
condition: service_healthy
|
|
environment:
|
|
PORT: "${CANVAS_PORT:-3000}"
|
|
# Local dev — relaxes CSP to allow cross-port fetches (canvas:3000 → platform:8080).
|
|
CSP_DEV_MODE: "${CSP_DEV_MODE:-1}"
|
|
# NOTE: NEXT_PUBLIC_* are baked into the JS bundle at `next build` time —
|
|
# these runtime values are ignored by the standalone output. They're kept
|
|
# here for documentation / override during `docker compose build`.
|
|
NEXT_PUBLIC_PLATFORM_URL: ${NEXT_PUBLIC_PLATFORM_URL:-http://localhost:${PLATFORM_PUBLISH_PORT:-8080}}
|
|
NEXT_PUBLIC_WS_URL: ${NEXT_PUBLIC_WS_URL:-ws://localhost:${PLATFORM_PUBLISH_PORT:-8080}/ws}
|
|
ports:
|
|
- "${CANVAS_PUBLISH_PORT:-3000}:${CANVAS_PORT:-3000}"
|
|
networks:
|
|
- molecule-monorepo-net
|
|
healthcheck:
|
|
test: ["CMD-SHELL", "wget --no-verbose --tries=1 --spider http://127.0.0.1:${CANVAS_PORT:-3000} || exit 1"]
|
|
interval: 10s
|
|
timeout: 5s
|
|
retries: 10
|
|
|
|
# --- Optional: LiteLLM Proxy (unified OpenAI-compatible API for all providers) ---
|
|
# Start with: docker compose --profile multi-provider up
|
|
#
|
|
# Workspace agents then set:
|
|
# OPENAI_BASE_URL=http://litellm:4000
|
|
# OPENAI_API_KEY=${LITELLM_MASTER_KEY:-sk-molecule}
|
|
#
|
|
# And use model names from infra/litellm_config.yml (e.g. "claude-opus-4-5",
|
|
# "gpt-4o", "openrouter/deepseek-r1", "ollama/llama3.2").
|
|
# Edit infra/litellm_config.yml to add/remove providers and models.
|
|
litellm:
|
|
image: ghcr.io/berriai/litellm:main-latest
|
|
profiles:
|
|
- multi-provider
|
|
ports:
|
|
- "4000:4000"
|
|
volumes:
|
|
- ./infra/litellm_config.yml:/app/config.yaml:ro
|
|
command: ["--config", "/app/config.yaml", "--port", "4000", "--num_workers", "4"]
|
|
environment:
|
|
# Pass provider API keys through — only the ones you have are needed
|
|
ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY:-}
|
|
OPENAI_API_KEY: ${OPENAI_API_KEY:-}
|
|
OPENROUTER_API_KEY: ${OPENROUTER_API_KEY:-}
|
|
LITELLM_MASTER_KEY: ${LITELLM_MASTER_KEY:-sk-molecule}
|
|
networks:
|
|
- molecule-monorepo-net
|
|
restart: unless-stopped
|
|
healthcheck:
|
|
test: ["CMD-SHELL", "wget --no-verbose --tries=1 --spider http://localhost:4000/health || exit 1"]
|
|
interval: 10s
|
|
timeout: 5s
|
|
retries: 5
|
|
start_period: 15s
|
|
|
|
# --- Optional: Local LLM Models via Ollama ---
|
|
# Start with: docker compose --profile local-models up
|
|
# After first start, pull a model:
|
|
# docker compose exec ollama ollama pull llama3.2
|
|
# docker compose exec ollama ollama pull qwen2.5-coder:7b
|
|
# Then set MODEL_PROVIDER=ollama:llama3.2 in your workspace config.yaml
|
|
# Workspace agents reach Ollama at http://ollama:11434 (internal Docker network).
|
|
ollama:
|
|
image: ollama/ollama:latest
|
|
profiles:
|
|
- local-models
|
|
ports:
|
|
- "11434:11434"
|
|
volumes:
|
|
- ollamadata:/root/.ollama
|
|
networks:
|
|
- molecule-monorepo-net
|
|
restart: unless-stopped
|
|
healthcheck:
|
|
test: ["CMD-SHELL", "ollama list || exit 1"]
|
|
interval: 10s
|
|
timeout: 5s
|
|
retries: 5
|
|
start_period: 20s
|
|
|
|
networks:
|
|
molecule-monorepo-net:
|
|
name: molecule-monorepo-net
|
|
|
|
volumes:
|
|
pgdata:
|
|
redisdata:
|
|
clickhousedata:
|
|
ollamadata:
|