Reproducing the README's quickstart on a clean clone surfaced seven independent bugs between `git clone` and seeing the Canvas in a browser. Each fix is minimal and local-dev-only — the SaaS/EC2 provisioner path (issue #1822) is untouched. Bugs fixed: 1. `infra/scripts/setup.sh` applied migrations via raw psql, bypassing the platform's `schema_migrations` tracker. The platform then re-ran every migration on first boot and crashed on non-idempotent ALTER TABLE statements (e.g. `036_org_api_tokens_org_id.up.sql`). Dropped the migration block — `workspace-server/internal/db/postgres.go:53` already tracks and skips applied files. 2. `.env.example` shipped `DATABASE_URL=postgres://USER:PASS@postgres:...` with literal `USER:PASS` placeholders and the Docker-internal hostname `postgres`. A `cp .env.example .env` followed by `go run ./cmd/server` on the host failed with `dial tcp: lookup postgres: no such host`. Replaced with working `dev:dev@localhost:5432` defaults that match `docker-compose.infra.yml`. 3. `docker-compose.infra.yml` and `docker-compose.yml` set `CLICKHOUSE_URL: clickhouse://...:9000/...`. Langfuse v2 rejects anything other than `http://` or `https://`, so the container crash-looped and returned HTTP 500. Switched to `http://...:8123` (HTTP interface) and added `CLICKHOUSE_MIGRATION_URL` for the migration-time native-protocol connection. Also removed `LANGFUSE_AUTO_CLICKHOUSE_MIGRATION_DISABLED` so migrations actually run. 4. `canvas/package.json` dev script crashed with `EADDRINUSE :::8080` when `.env` was sourced before `npm run dev` — Next.js reads `PORT` from env and the platform owns 8080. Pinned `dev` to `-p 3000` so sourced env can't hijack it. `start` left as-is because production `node server.js` (Dockerfile CMD) must respect `PORT` from the orchestrator. 5. README/CONTRIBUTING told users to clone `Molecule-AI/molecule-monorepo` — that repo 404s; the actual name is `molecule-core`. The Railway and Render deploy buttons had the same broken URL. Replaced in both English and Chinese READMEs and in CONTRIBUTING. Internal identifiers (Go module path, Docker network `molecule-monorepo-net`, Python helper `molecule-monorepo-status`) deliberately left alone — renaming those is an invasive refactor orthogonal to this fix. 6. README quickstart was missing `cp .env.example .env`. Users who went straight from `git clone` to `./infra/scripts/setup.sh` got a script that warned about an unset `ADMIN_TOKEN` (harmless) but then couldn't run the platform without figuring out the env setup on their own. Added the step in both READMEs and CONTRIBUTING. Deliberately NOT generating `ADMIN_TOKEN`/`SECRETS_ENCRYPTION_KEY` here — the e2e-api suite (`tests/e2e/test_api.sh`) assumes AdminAuth fallback mode (no server-side `ADMIN_TOKEN`), which is how CI runs it. 7. CI shellcheck only covered `tests/e2e/*.sh` — `infra/scripts/setup.sh` is in the critical path of every new-user onboarding but was never linted. Extended the `shellcheck` job and the `changes` filter to cover `infra/scripts/`. `scripts/` deliberately excluded until its pre-existing SC3040/SC3043 warnings are cleaned up separately. Verification (fresh nuke-and-rebuild following the updated README): - `docker compose -f docker-compose.infra.yml down -v` + `rm .env` - `cp .env.example .env` → defaults work as-is - `bash infra/scripts/setup.sh` — clean, no migration errors, all 6 infra containers healthy - `cd workspace-server && go run ./cmd/server` — "Applied 41 migrations (0 already applied)", platform on :8080/health 200 - `cd canvas && npm install && npm run dev` — Canvas on :3000/ 200 even with `.env` sourced (PORT=8080 in env) - `bash tests/e2e/test_api.sh` — **61 passed, 0 failed** - `cd canvas && npx vitest run` — **900 tests passed** - `cd canvas && npm run build` — production build clean - `shellcheck --severity=warning infra/scripts/*.sh` — clean - Langfuse `/api/public/health` 200 (was 500) Scope notes: - SaaS/EC2 parity (issue #1822): all files touched here are local-dev surface. Canvas container uses `node server.js` with `ENV PORT=3000` in `canvas/Dockerfile` — the `-p 3000` pin in `package.json` dev script only affects `npm run dev`, not the production CMD. - Test coverage (issue #1821): project policy is tiered coverage floors, not a blanket 100% target. Files touched here are shell scripts, YAML, Markdown, and one package.json script — not classes covered by the coverage matrix. - No overlap with open PRs — searched `setup.sh`, `quickstart`, `langfuse`, `clickhouse`, `migration`, `README`; nothing conflicts. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>
286 lines
12 KiB
YAML
286 lines
12 KiB
YAML
name: CI
|
||
|
||
on:
|
||
push:
|
||
branches: [main, staging]
|
||
pull_request:
|
||
branches: [main, staging]
|
||
|
||
# Cancel in-progress CI runs when a new commit arrives on the same ref.
|
||
# This prevents stale runs from queuing behind each other.
|
||
concurrency:
|
||
group: ci-${{ github.ref }}
|
||
cancel-in-progress: true
|
||
|
||
jobs:
|
||
# Detect which paths changed so downstream jobs can skip when only
|
||
# docs/markdown files were modified.
|
||
changes:
|
||
name: Detect changes
|
||
runs-on: ubuntu-latest
|
||
outputs:
|
||
platform: ${{ steps.check.outputs.platform }}
|
||
canvas: ${{ steps.check.outputs.canvas }}
|
||
python: ${{ steps.check.outputs.python }}
|
||
scripts: ${{ steps.check.outputs.scripts }}
|
||
steps:
|
||
- uses: actions/checkout@v4
|
||
with:
|
||
fetch-depth: 0
|
||
- id: check
|
||
run: |
|
||
# For PR events: diff against the base branch (not HEAD~1 of the branch,
|
||
# which may be unrelated after force-pushes). When a push updates a PR,
|
||
# both pull_request and push events fire — prefer the PR base so that
|
||
# the diff is always computed against the actual merge base, not the
|
||
# previous SHA on the branch which may be on a different history line.
|
||
BASE="${GITHUB_BASE_REF:-${{ github.event.before }}}"
|
||
# GITHUB_BASE_REF is set by GitHub for PR events (the base branch name).
|
||
# For pull_request events we use the stored base.sha; for push events
|
||
# (or when base.sha is unavailable) fall back to github.event.before.
|
||
if [ "${{ github.event_name }}" = "pull_request" ] && [ -n "${{ github.event.pull_request.base.sha }}" ]; then
|
||
BASE="${{ github.event.pull_request.base.sha }}"
|
||
fi
|
||
# Fallback: if BASE is empty or all zeros (new branch), run everything
|
||
if [ -z "$BASE" ] || echo "$BASE" | grep -qE '^0+$'; then
|
||
echo "platform=true" >> "$GITHUB_OUTPUT"
|
||
echo "canvas=true" >> "$GITHUB_OUTPUT"
|
||
echo "python=true" >> "$GITHUB_OUTPUT"
|
||
echo "scripts=true" >> "$GITHUB_OUTPUT"
|
||
exit 0
|
||
fi
|
||
DIFF=$(git diff --name-only "$BASE" HEAD 2>/dev/null || echo ".github/workflows/ci.yml")
|
||
echo "platform=$(echo "$DIFF" | grep -qE '^workspace-server/|^\.github/workflows/ci\.yml$' && echo true || echo false)" >> "$GITHUB_OUTPUT"
|
||
echo "canvas=$(echo "$DIFF" | grep -qE '^canvas/|^\.github/workflows/ci\.yml$' && echo true || echo false)" >> "$GITHUB_OUTPUT"
|
||
echo "python=$(echo "$DIFF" | grep -qE '^workspace/|^\.github/workflows/ci\.yml$' && echo true || echo false)" >> "$GITHUB_OUTPUT"
|
||
echo "scripts=$(echo "$DIFF" | grep -qE '^tests/e2e/|^scripts/|^infra/scripts/|^\.github/workflows/ci\.yml$' && echo true || echo false)" >> "$GITHUB_OUTPUT"
|
||
|
||
platform-build:
|
||
name: Platform (Go)
|
||
needs: changes
|
||
if: needs.changes.outputs.platform == 'true'
|
||
runs-on: ubuntu-latest
|
||
defaults:
|
||
run:
|
||
working-directory: workspace-server
|
||
steps:
|
||
- uses: actions/checkout@v4
|
||
- uses: actions/setup-go@v5
|
||
with:
|
||
go-version: 'stable'
|
||
- run: go mod download
|
||
- run: go build ./cmd/server
|
||
# CLI (molecli) moved to standalone repo: github.com/Molecule-AI/molecule-cli
|
||
- run: go vet ./...
|
||
- name: Run golangci-lint
|
||
uses: golangci/golangci-lint-action@v9
|
||
with:
|
||
version: latest
|
||
working-directory: workspace-server
|
||
args: --timeout 3m
|
||
continue-on-error: true # Warn but don't block until codebase is clean
|
||
- name: Run tests with race detection and coverage
|
||
run: go test -race -coverprofile=coverage.out ./...
|
||
|
||
- name: Per-file coverage report
|
||
# Advisory — lists every source file with its coverage so reviewers
|
||
# can see at-a-glance where gaps are. Sorted ascending so the worst
|
||
# offenders float to the top. Does NOT fail the build; the hard
|
||
# gate is the threshold check below. (#1823)
|
||
run: |
|
||
echo "=== Per-file coverage (worst first) ==="
|
||
go tool cover -func=coverage.out \
|
||
| grep -v '^total:' \
|
||
| awk '{file=$1; sub(/:[0-9][0-9.]*:.*/, "", file); pct=$NF; gsub(/%/,"",pct); s[file]+=pct; c[file]++}
|
||
END {for (f in s) printf "%6.1f%% %s\n", s[f]/c[f], f}' \
|
||
| sort -n
|
||
|
||
- name: Check coverage thresholds
|
||
# Enforces two gates from #1823 Layer 1:
|
||
# 1. Total floor (25% — ratchet plan in COVERAGE_FLOOR.md).
|
||
# 2. Per-file floor — non-test .go files in security-critical
|
||
# paths with coverage <10% fail the build, UNLESS the file
|
||
# path is listed in .coverage-allowlist.txt (acknowledged
|
||
# historical debt with a tracking issue + expiry).
|
||
run: |
|
||
set -e
|
||
TOTAL_FLOOR=25
|
||
# Security-critical paths where a 0%-coverage file is a real risk.
|
||
CRITICAL_PATHS=(
|
||
"internal/handlers/tokens"
|
||
"internal/handlers/workspace_provision"
|
||
"internal/handlers/a2a_proxy"
|
||
"internal/handlers/registry"
|
||
"internal/handlers/secrets"
|
||
"internal/middleware/wsauth"
|
||
"internal/crypto"
|
||
)
|
||
|
||
TOTAL=$(go tool cover -func=coverage.out | grep '^total:' | awk '{print $3}' | sed 's/%//')
|
||
echo "Total coverage: ${TOTAL}%"
|
||
if awk "BEGIN{exit !($TOTAL < $TOTAL_FLOOR)}"; then
|
||
echo "::error::Total coverage ${TOTAL}% is below the ${TOTAL_FLOOR}% floor. See COVERAGE_FLOOR.md for ratchet plan."
|
||
exit 1
|
||
fi
|
||
|
||
# Aggregate per-file coverage → /tmp/perfile.txt: "<fullpath> <pct>"
|
||
go tool cover -func=coverage.out \
|
||
| grep -v '^total:' \
|
||
| awk '{file=$1; sub(/:[0-9][0-9.]*:.*/, "", file); pct=$NF; gsub(/%/,"",pct); s[file]+=pct; c[file]++}
|
||
END {for (f in s) printf "%s %.1f\n", f, s[f]/c[f]}' \
|
||
> /tmp/perfile.txt
|
||
|
||
# Build allowlist — paths relative to workspace-server, one per line.
|
||
# Lines starting with # are comments.
|
||
ALLOWLIST=""
|
||
if [ -f ../.coverage-allowlist.txt ]; then
|
||
ALLOWLIST=$(grep -vE '^(#|[[:space:]]*$)' ../.coverage-allowlist.txt || true)
|
||
fi
|
||
|
||
FAILED=0
|
||
WARNED=0
|
||
for path in "${CRITICAL_PATHS[@]}"; do
|
||
while read -r file pct; do
|
||
[[ "$file" == *_test.go ]] && continue
|
||
[[ "$file" == *"$path"* ]] || continue
|
||
awk "BEGIN{exit !($pct < 10)}" || continue
|
||
|
||
# Strip the package-import prefix so we can match .coverage-allowlist.txt
|
||
# entries written as paths relative to workspace-server/.
|
||
rel=$(echo "$file" | sed 's|^github.com/Molecule-AI/molecule-monorepo/platform/||')
|
||
|
||
if echo "$ALLOWLIST" | grep -qxF "$rel"; then
|
||
echo "::warning file=workspace-server/$rel::Critical file at ${pct}% coverage (allowlisted, #1823) — fix before expiry."
|
||
WARNED=$((WARNED+1))
|
||
else
|
||
echo "::error file=workspace-server/$rel::Critical file at ${pct}% coverage — must be >=10% (target 80%). See #1823. To acknowledge as known debt, add this path to .coverage-allowlist.txt."
|
||
FAILED=$((FAILED+1))
|
||
fi
|
||
done < /tmp/perfile.txt
|
||
done
|
||
|
||
echo ""
|
||
echo "Critical-path check: $FAILED new failures, $WARNED allowlisted warnings."
|
||
|
||
if [ "$FAILED" -gt 0 ]; then
|
||
echo ""
|
||
echo "$FAILED security-critical file(s) have <10% test coverage and are"
|
||
echo "NOT in the allowlist. These paths handle auth, tokens, secrets, or"
|
||
echo "workspace provisioning — a 0% file here is the exact gap that let"
|
||
echo "CWE-22, CWE-78, KI-005 slip through in past incidents. Either:"
|
||
echo " (a) add tests to raise coverage above 10%, or"
|
||
echo " (b) add the path to .coverage-allowlist.txt with an expiry date"
|
||
echo " and a tracking issue reference."
|
||
exit 1
|
||
fi
|
||
|
||
canvas-build:
|
||
name: Canvas (Next.js)
|
||
needs: changes
|
||
if: needs.changes.outputs.canvas == 'true'
|
||
runs-on: ubuntu-latest
|
||
defaults:
|
||
run:
|
||
working-directory: canvas
|
||
steps:
|
||
- uses: actions/checkout@v4
|
||
- uses: actions/setup-node@v4
|
||
with:
|
||
node-version: '22'
|
||
- run: rm -f package-lock.json && npm install
|
||
- run: npm run build
|
||
- name: Run tests
|
||
run: npx vitest run
|
||
|
||
# MCP Server + SDK removed from CI — now in standalone repos:
|
||
# - github.com/Molecule-AI/molecule-mcp-server (npm CI)
|
||
# - github.com/Molecule-AI/molecule-sdk-python (PyPI CI)
|
||
|
||
# e2e-api job moved to .github/workflows/e2e-api.yml (issue #458).
|
||
# It now has workflow-level concurrency (cancel-in-progress: false) so
|
||
# new pushes queue the E2E run rather than cancelling it at the run level.
|
||
|
||
shellcheck:
|
||
name: Shellcheck (E2E scripts)
|
||
needs: changes
|
||
if: needs.changes.outputs.scripts == 'true'
|
||
runs-on: ubuntu-latest
|
||
steps:
|
||
- uses: actions/checkout@v4
|
||
- name: Run shellcheck on tests/e2e/*.sh and infra/scripts/*.sh
|
||
# shellcheck is pre-installed on ubuntu-latest runners (via apt).
|
||
# infra/scripts/ is included because setup.sh + nuke.sh gate the
|
||
# README quickstart — a shellcheck regression there silently breaks
|
||
# new-user onboarding. scripts/ is intentionally excluded until its
|
||
# pre-existing SC3040/SC3043 warnings are cleaned up.
|
||
run: |
|
||
find tests/e2e infra/scripts -type f -name '*.sh' -print0 \
|
||
| xargs -0 shellcheck --severity=warning
|
||
|
||
canvas-deploy-reminder:
|
||
name: Canvas Deploy Reminder
|
||
runs-on: ubuntu-latest
|
||
needs: [changes, canvas-build]
|
||
# Only fires on direct pushes to main (i.e. after staging→main promotion).
|
||
if: needs.changes.outputs.canvas == 'true' && github.event_name == 'push' && github.ref == 'refs/heads/main'
|
||
permissions:
|
||
# Required to post commit comments via the GitHub API.
|
||
contents: write
|
||
steps:
|
||
- name: Post deploy reminder as commit comment
|
||
env:
|
||
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
||
COMMIT_SHA: ${{ github.sha }}
|
||
RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
|
||
run: |
|
||
# Write body to a temp file — avoids backtick escaping in shell.
|
||
cat > /tmp/deploy-reminder.md << 'BODY'
|
||
## Canvas build passed ✅ — deploy required
|
||
|
||
The `publish-canvas-image` workflow is now building a fresh Docker image
|
||
(`ghcr.io/molecule-ai/canvas:latest`) in the background.
|
||
|
||
Once it completes (~3–5 min), apply on the host machine with:
|
||
```bash
|
||
cd <runner-workspace>
|
||
git pull origin main
|
||
docker compose pull canvas && docker compose up -d canvas
|
||
```
|
||
|
||
If you need to rebuild from local source instead (e.g. testing unreleased
|
||
changes or a new `NEXT_PUBLIC_*` URL), use:
|
||
```bash
|
||
docker compose build canvas && docker compose up -d canvas
|
||
```
|
||
BODY
|
||
printf '\n> Posted automatically by CI · commit `%s` · [build log](%s)\n' \
|
||
"$COMMIT_SHA" "$RUN_URL" >> /tmp/deploy-reminder.md
|
||
|
||
gh api \
|
||
--method POST \
|
||
"repos/${{ github.repository }}/commits/${{ github.sha }}/comments" \
|
||
--field "body=@/tmp/deploy-reminder.md"
|
||
|
||
python-lint:
|
||
name: Python Lint & Test
|
||
needs: changes
|
||
if: needs.changes.outputs.python == 'true'
|
||
runs-on: ubuntu-latest
|
||
env:
|
||
WORKSPACE_ID: test
|
||
defaults:
|
||
run:
|
||
working-directory: workspace
|
||
steps:
|
||
- uses: actions/checkout@v4
|
||
- uses: actions/setup-python@v5
|
||
with:
|
||
python-version: '3.11'
|
||
cache: pip
|
||
cache-dependency-path: workspace/requirements.txt
|
||
- run: pip install -r requirements.txt pytest pytest-asyncio pytest-cov
|
||
- run: python -m pytest --tb=short -q --cov=. --cov-report=term-missing
|
||
|
||
# SDK + plugin validation moved to standalone repo:
|
||
# github.com/Molecule-AI/molecule-sdk-python
|