Reproducing the README's quickstart on a clean clone surfaced seven
independent bugs between `git clone` and seeing the Canvas in a browser.
Each fix is minimal and local-dev-only — the SaaS/EC2 provisioner path
(issue #1822) is untouched.
Bugs fixed:
1. `infra/scripts/setup.sh` applied migrations via raw psql, bypassing
the platform's `schema_migrations` tracker. The platform then re-ran
every migration on first boot and crashed on non-idempotent ALTER
TABLE statements (e.g. `036_org_api_tokens_org_id.up.sql`). Dropped
the migration block — `workspace-server/internal/db/postgres.go:53`
already tracks and skips applied files.
2. `.env.example` shipped `DATABASE_URL=postgres://USER:PASS@postgres:...`
with literal `USER:PASS` placeholders and the Docker-internal hostname
`postgres`. A `cp .env.example .env` followed by `go run ./cmd/server`
on the host failed with `dial tcp: lookup postgres: no such host`.
Replaced with working `dev:dev@localhost:5432` defaults that match
`docker-compose.infra.yml`.
3. `docker-compose.infra.yml` and `docker-compose.yml` set
`CLICKHOUSE_URL: clickhouse://...:9000/...`. Langfuse v2 rejects
anything other than `http://` or `https://`, so the container
crash-looped and returned HTTP 500. Switched to
`http://...:8123` (HTTP interface) and added `CLICKHOUSE_MIGRATION_URL`
for the migration-time native-protocol connection. Also removed
`LANGFUSE_AUTO_CLICKHOUSE_MIGRATION_DISABLED` so migrations actually
run.
4. `canvas/package.json` dev script crashed with `EADDRINUSE :::8080`
when `.env` was sourced before `npm run dev` — Next.js reads `PORT`
from env and the platform owns 8080. Pinned `dev` to
`-p 3000` so sourced env can't hijack it. `start` left as-is because
production `node server.js` (Dockerfile CMD) must respect `PORT`
from the orchestrator.
5. README/CONTRIBUTING told users to clone `Molecule-AI/molecule-monorepo`
— that repo 404s; the actual name is `molecule-core`. The Railway
and Render deploy buttons had the same broken URL. Replaced in both
English and Chinese READMEs and in CONTRIBUTING. Internal identifiers
(Go module path, Docker network `molecule-monorepo-net`, Python helper
`molecule-monorepo-status`) deliberately left alone — renaming those
is an invasive refactor orthogonal to this fix.
6. README quickstart was missing `cp .env.example .env`. Users who went
straight from `git clone` to `./infra/scripts/setup.sh` got a script
that warned about an unset `ADMIN_TOKEN` (harmless) but then couldn't
run the platform without figuring out the env setup on their own.
Added the step in both READMEs and CONTRIBUTING. Deliberately NOT
generating `ADMIN_TOKEN`/`SECRETS_ENCRYPTION_KEY` here — the e2e-api
suite (`tests/e2e/test_api.sh`) assumes AdminAuth fallback mode
(no server-side `ADMIN_TOKEN`), which is how CI runs it.
7. CI shellcheck only covered `tests/e2e/*.sh` — `infra/scripts/setup.sh`
is in the critical path of every new-user onboarding but was never
linted. Extended the `shellcheck` job and the `changes` filter to
cover `infra/scripts/`. `scripts/` deliberately excluded until its
pre-existing SC3040/SC3043 warnings are cleaned up separately.
Verification (fresh nuke-and-rebuild following the updated README):
- `docker compose -f docker-compose.infra.yml down -v` + `rm .env`
- `cp .env.example .env` → defaults work as-is
- `bash infra/scripts/setup.sh` — clean, no migration errors, all 6
infra containers healthy
- `cd workspace-server && go run ./cmd/server` — "Applied 41 migrations
(0 already applied)", platform on :8080/health 200
- `cd canvas && npm install && npm run dev` — Canvas on :3000/ 200
even with `.env` sourced (PORT=8080 in env)
- `bash tests/e2e/test_api.sh` — **61 passed, 0 failed**
- `cd canvas && npx vitest run` — **900 tests passed**
- `cd canvas && npm run build` — production build clean
- `shellcheck --severity=warning infra/scripts/*.sh` — clean
- Langfuse `/api/public/health` 200 (was 500)
Scope notes:
- SaaS/EC2 parity (issue #1822): all files touched here are local-dev
surface. Canvas container uses `node server.js` with `ENV PORT=3000`
in `canvas/Dockerfile` — the `-p 3000` pin in `package.json` dev
script only affects `npm run dev`, not the production CMD.
- Test coverage (issue #1821): project policy is tiered coverage floors,
not a blanket 100% target. Files touched here are shell scripts,
YAML, Markdown, and one package.json script — not classes covered
by the coverage matrix.
- No overlap with open PRs — searched `setup.sh`, `quickstart`,
`langfuse`, `clickhouse`, `migration`, `README`; nothing conflicts.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>
- Add comment to .env.example explaining ANTHROPIC_API_KEY must be set
as a *global* secret (not just workspace-level) so SDK-direct workspaces
(e.g. molecule-hitl, hermes) receive it without 401 errors
- Add ANTHROPIC_API_KEY to saas-secrets.md secret map with context on
why global propagation matters
- Add full rotation procedure section (generate → PUT /settings/secrets
→ verify restart → revoke old key) with blast-radius note
Closes#894
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds two missing env vars to .env.example + docker-compose.yml platform block:
1. HIBERNATION_IDLE_MINUTES (default 60)
Source: issue #724 / workspace hibernation feature.
Note: currently configured per-workspace via the hibernation_idle_minutes
DB column. This placeholder documents the planned global-default env var;
the platform does not yet read it. Per-workspace DB column is active now.
2. PLUGIN_ALLOW_UNPINNED (empty = false)
Source: issue #768 / PR #775 (supply chain hardening, not yet merged).
Pre-emptive documentation — takes effect when PR #775 lands.
ADMIN_TOKEN (item 3): already present with clear generation instructions
(openssl rand -base64 32) and NEVER-commit reminder. No changes needed.
docker-compose.yml cross-check — vars present in .env.example but absent from
the platform service env block (flagged, not fixed in this PR — all have safe
compiled-in defaults and are optional):
SECRETS_ENCRYPTION_KEY, AWARENESS_URL, MOLECULE_ENV, MOLECULE_IN_DOCKER,
MOLECULE_ENABLE_TEST_TOKENS, MOLECULE_ORG_ID, CP_PROVISION_URL,
ACTIVITY_RETENTION_DAYS, ACTIVITY_CLEANUP_INTERVAL_HOURS,
REMOTE_LIVENESS_STALE_AFTER, PLUGIN_INSTALL_{BODY_MAX_BYTES,FETCH_TIMEOUT,
MAX_DIR_BYTES}, TIER{2,3,4}_{MEMORY_MB,CPU_SHARES}, WORKSPACE_DIR.
These are not forwarded by docker-compose because they either auto-detect or
have safe defaults — operators override them via .env on the host. Adding
all of them to docker-compose would be noisy; a separate cleanup issue tracks
this.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the anthropic:claude-sonnet-4-6 default across config, handlers,
env example, and litellm proxy config. All tests updated to match the new
default; sonnet-4-6 alias kept in litellm_config.yml for pinned workspaces.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Backend Engineer's PR #729 introduces ADMIN_TOKEN — when set, only that value
is accepted on /admin/* and /approvals/* routes, replacing the vulnerable
workspace-bearer fallback. Without the env var wired into deployments the fix
is code-only and the vulnerability stays open in every running instance.
Changes:
- `docker-compose.yml`: adds ADMIN_TOKEN env var to the platform service
(blank default = backward-compat fallback, i.e. still vulnerable until set).
NOTE: docker-compose.infra.yml has no platform service — the platform lives
only in the full-stack docker-compose.yml, so that is the correct file.
- `.env.example`: documents ADMIN_TOKEN with generation instructions and a
clear warning that it must be set to close#684.
- `infra/scripts/setup.sh`: prints a visible warning when ADMIN_TOKEN is unset
so operators know the vulnerability is still open in that deployment.
- `CLAUDE.md`: adds ADMIN_TOKEN to the env vars reference section.
No Go code changed — go build ./... passes clean.
Part of fix for #684 / PR #729
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Resolves#14. ApplyTierConfig now reads TIER{2,3,4}_MEMORY_MB and
TIER{2,3,4}_CPU_SHARES env vars, falling back to the compiled defaults
agreed in the issue:
- T2: 512 MiB / 1024 shares (1 CPU) — unchanged baseline
- T3: 2048 MiB / 2048 shares (2 CPU) — new cap (previously uncapped)
- T4: 4096 MiB / 4096 shares (4 CPU) — new cap (previously uncapped)
CPU_SHARES follows Docker's 1024 = 1 CPU convention; internally the
value is translated to NanoCPUs for a hard allocation so behaviour
remains deterministic across hosts. Malformed or non-positive env
values silently fall back to the default.
Behaviour change note: T3 and T4 previously had no explicit cap.
Operators who relied on unlimited can set very large TIERn_MEMORY_MB /
TIERn_CPU_SHARES values; a follow-up can add unset-means-unlimited
semantics if required.
Tests:
- TestGetTierMemoryMB_DefaultsMatchLegacy
- TestGetTierMemoryMB_EnvOverride (covers malformed + zero fallback)
- TestGetTierCPUShares_EnvOverride
- TestApplyTierConfig_T3_UsesEnvOverride (wiring)
- TestApplyTierConfig_T3_DefaultCap (documents the new cap)
Docs: .env.example section + CLAUDE.md platform env-vars list updated.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- docs/edit-history/2026-04-14.md: append tick-3 section covering the
admin test-token route (#53), the prior-tick doc-sync PR (#54), and
the hermes required_env alignment (#55). Record measured test counts
(Go +4 for the TestAdminTestToken_* quartet).
- CLAUDE.md: bump Go test count 695 → 699 with a note pointing at the
new quartet. Route-table row and env-var mentions for the admin
route already landed with #53; verified on main.
- .env.example: add MOLECULE_ENABLE_TEST_TOKENS with a comment about
the prod-hidden default. Closes the code-review doc-sync flag from
#53 (var was in CLAUDE.md but missing from .env.example).
No PLAN.md / README.md / README.zh-CN.md update needed — none of the
three merges expose a user-visible surface.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds HERMES_API_KEY to .env.example with a cross-reference to the
OPENROUTER_API_KEY fallback, and adds the hermes runtime row to the
CLAUDE.md runtime table so the new adapter is discoverable alongside
its siblings (langgraph, claude-code, openclaw, crewai, autogen,
deepagents).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Addresses FLAG 1 and FLAG 2 from the 7-Gate review on PR #20.
FLAG 1 (token persisted on disk):
Previous: `git clone https://x-access-token:${GITHUB_TOKEN}@github.com/...` wrote
the full tokenized URL into /workspace/repo/.git/config as `[remote "origin"] url = …`.
Token survived container restarts on any bind-mounted workspace_dir.
Fix: after clone, `git remote set-url origin https://github.com/${GITHUB_REPO}.git`
scrubs the token from the remote URL. Token is only in the clone command's argv
(transient) and not persisted on disk. Falls back to anonymous for public repos.
FLAG 2 (docs not updated):
Added GITHUB_REPO and GITHUB_TOKEN entries under a new 'GitHub' section in
.env.example with notes about (a) what they're read for, (b) that GITHUB_TOKEN
should be registered as a global secret via POST /admin/secrets, (c) how it's
handled to avoid on-disk persistence.
FLAG 3 (per-workspace gating) is deferred to a separate issue — it's a platform
design question about secret scope/ACLs, not a template fix.