forked from molecule-ai/molecule-core
Reproducing the README's quickstart on a clean clone surfaced seven independent bugs between `git clone` and seeing the Canvas in a browser. Each fix is minimal and local-dev-only — the SaaS/EC2 provisioner path (issue #1822) is untouched. Bugs fixed: 1. `infra/scripts/setup.sh` applied migrations via raw psql, bypassing the platform's `schema_migrations` tracker. The platform then re-ran every migration on first boot and crashed on non-idempotent ALTER TABLE statements (e.g. `036_org_api_tokens_org_id.up.sql`). Dropped the migration block — `workspace-server/internal/db/postgres.go:53` already tracks and skips applied files. 2. `.env.example` shipped `DATABASE_URL=postgres://USER:PASS@postgres:...` with literal `USER:PASS` placeholders and the Docker-internal hostname `postgres`. A `cp .env.example .env` followed by `go run ./cmd/server` on the host failed with `dial tcp: lookup postgres: no such host`. Replaced with working `dev:dev@localhost:5432` defaults that match `docker-compose.infra.yml`. 3. `docker-compose.infra.yml` and `docker-compose.yml` set `CLICKHOUSE_URL: clickhouse://...:9000/...`. Langfuse v2 rejects anything other than `http://` or `https://`, so the container crash-looped and returned HTTP 500. Switched to `http://...:8123` (HTTP interface) and added `CLICKHOUSE_MIGRATION_URL` for the migration-time native-protocol connection. Also removed `LANGFUSE_AUTO_CLICKHOUSE_MIGRATION_DISABLED` so migrations actually run. 4. `canvas/package.json` dev script crashed with `EADDRINUSE :::8080` when `.env` was sourced before `npm run dev` — Next.js reads `PORT` from env and the platform owns 8080. Pinned `dev` to `-p 3000` so sourced env can't hijack it. `start` left as-is because production `node server.js` (Dockerfile CMD) must respect `PORT` from the orchestrator. 5. README/CONTRIBUTING told users to clone `Molecule-AI/molecule-monorepo` — that repo 404s; the actual name is `molecule-core`. The Railway and Render deploy buttons had the same broken URL. Replaced in both English and Chinese READMEs and in CONTRIBUTING. Internal identifiers (Go module path, Docker network `molecule-monorepo-net`, Python helper `molecule-monorepo-status`) deliberately left alone — renaming those is an invasive refactor orthogonal to this fix. 6. README quickstart was missing `cp .env.example .env`. Users who went straight from `git clone` to `./infra/scripts/setup.sh` got a script that warned about an unset `ADMIN_TOKEN` (harmless) but then couldn't run the platform without figuring out the env setup on their own. Added the step in both READMEs and CONTRIBUTING. Deliberately NOT generating `ADMIN_TOKEN`/`SECRETS_ENCRYPTION_KEY` here — the e2e-api suite (`tests/e2e/test_api.sh`) assumes AdminAuth fallback mode (no server-side `ADMIN_TOKEN`), which is how CI runs it. 7. CI shellcheck only covered `tests/e2e/*.sh` — `infra/scripts/setup.sh` is in the critical path of every new-user onboarding but was never linted. Extended the `shellcheck` job and the `changes` filter to cover `infra/scripts/`. `scripts/` deliberately excluded until its pre-existing SC3040/SC3043 warnings are cleaned up separately. Verification (fresh nuke-and-rebuild following the updated README): - `docker compose -f docker-compose.infra.yml down -v` + `rm .env` - `cp .env.example .env` → defaults work as-is - `bash infra/scripts/setup.sh` — clean, no migration errors, all 6 infra containers healthy - `cd workspace-server && go run ./cmd/server` — "Applied 41 migrations (0 already applied)", platform on :8080/health 200 - `cd canvas && npm install && npm run dev` — Canvas on :3000/ 200 even with `.env` sourced (PORT=8080 in env) - `bash tests/e2e/test_api.sh` — **61 passed, 0 failed** - `cd canvas && npx vitest run` — **900 tests passed** - `cd canvas && npm run build` — production build clean - `shellcheck --severity=warning infra/scripts/*.sh` — clean - Langfuse `/api/public/health` 200 (was 500) Scope notes: - SaaS/EC2 parity (issue #1822): all files touched here are local-dev surface. Canvas container uses `node server.js` with `ENV PORT=3000` in `canvas/Dockerfile` — the `-p 3000` pin in `package.json` dev script only affects `npm run dev`, not the production CMD. - Test coverage (issue #1821): project policy is tiered coverage floors, not a blanket 100% target. Files touched here are shell scripts, YAML, Markdown, and one package.json script — not classes covered by the coverage matrix. - No overlap with open PRs — searched `setup.sh`, `quickstart`, `langfuse`, `clickhouse`, `migration`, `README`; nothing conflicts. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>
64 lines
2.5 KiB
Bash
Executable File
64 lines
2.5 KiB
Bash
Executable File
#!/usr/bin/env bash
|
|
set -euo pipefail
|
|
|
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
|
ROOT_DIR="$(cd "$SCRIPT_DIR/../.." && pwd)"
|
|
|
|
echo "==> Ensuring shared docker network exists..."
|
|
docker network create molecule-monorepo-net 2>/dev/null || true
|
|
|
|
echo "==> Starting infrastructure..."
|
|
docker compose -f "$ROOT_DIR/docker-compose.infra.yml" up -d
|
|
|
|
echo "==> Waiting for Postgres..."
|
|
until docker compose -f "$ROOT_DIR/docker-compose.infra.yml" exec -T postgres pg_isready -U "${POSTGRES_USER:-dev}" 2>/dev/null; do
|
|
sleep 1
|
|
done
|
|
echo " Postgres is ready."
|
|
|
|
echo "==> Waiting for Redis..."
|
|
until docker compose -f "$ROOT_DIR/docker-compose.infra.yml" exec -T redis redis-cli ping 2>/dev/null | grep -q PONG; do
|
|
sleep 1
|
|
done
|
|
echo " Redis is ready."
|
|
|
|
echo "==> Verifying Redis KEA config..."
|
|
KEA=$(docker compose -f "$ROOT_DIR/docker-compose.infra.yml" exec -T redis redis-cli config get notify-keyspace-events | tail -1)
|
|
echo " notify-keyspace-events = $KEA"
|
|
|
|
# Migrations are intentionally not applied here. The platform's own runner
|
|
# (workspace-server/internal/db/postgres.go::RunMigrations) tracks applied
|
|
# files in `schema_migrations` on every boot. Applying them out-of-band via
|
|
# psql leaves that table empty, so the platform re-applies everything and
|
|
# fails on non-idempotent ALTER TABLE statements. Let `go run ./cmd/server`
|
|
# handle it.
|
|
|
|
echo "==> Infrastructure ready!"
|
|
echo " Postgres: localhost:5432"
|
|
echo " Redis: localhost:6379"
|
|
echo " Langfuse: localhost:3001"
|
|
echo " Temporal: localhost:7233 (gRPC) / localhost:8233 (UI)"
|
|
echo ""
|
|
echo " Next: cd workspace-server && go run ./cmd/server"
|
|
echo " (the platform applies pending migrations on first boot)"
|
|
|
|
# Source .env if it exists so the ADMIN_TOKEN check below reflects what the
|
|
# platform will actually see at startup, not just the current shell env.
|
|
if [ -f "$ROOT_DIR/.env" ]; then
|
|
set -a
|
|
# shellcheck disable=SC1091
|
|
. "$ROOT_DIR/.env"
|
|
set +a
|
|
fi
|
|
|
|
# Security check — issue #684 (AdminAuth bearer bypass, PR #729).
|
|
# Without ADMIN_TOKEN, any valid workspace bearer token can call /admin/* routes.
|
|
if [ -z "${ADMIN_TOKEN:-}" ]; then
|
|
echo ""
|
|
echo " ⚠ WARNING: ADMIN_TOKEN is not set."
|
|
echo " Until it is, AdminAuth falls back to accepting any workspace bearer token"
|
|
echo " — the #684 vulnerability is NOT closed in this deployment."
|
|
echo " Generate one: openssl rand -base64 32"
|
|
echo " Then export ADMIN_TOKEN=<value> or add it to your .env before starting the platform."
|
|
fi
|