molecule-core

Author	SHA1	Message	Date
Hongming Wang	fb6df5bb36	Merge pull request #1097 from Molecule-AI/fix/tenant-guard-allow-cp-proxy fix: TenantGuard passes through /cp/* to CP proxy	2026-04-20 13:15:02 -07:00
Hongming Wang	488fde03a7	fix(middleware): TenantGuard passes through /cp/* to CP proxy Today's rollout of cp_proxy (PR #1095/1096) mounted /cp/* as a reverse-proxy to the control plane, but the TenantGuard middleware runs first in the global chain and 404s anything that isn't in its exact-path allowlist (/health + /metrics). Every /cp/auth/me fetch from canvas landed on a 40µs 404 before ever reaching the proxy. /cp/* is handled upstream (WorkOS session + admin bearer), so the tenant doesn't need to attach org identity for those paths. Passing them through is correct — matches the design where the tenant platform is a pure transit layer for /cp/*. Verified: /cp/auth/me via tunnel now returns 401 (correct unauth from CP) instead of 404 from TenantGuard. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 13:14:56 -07:00
rabbitblood	d513a0ced5	security: remove hardcoded API keys from post-rebuild-setup.sh GitGuardian detected exposed MiniMax API key and GitHub PAT in the script's default values. Replaced with env var reads from .env file (which is gitignored). Script now validates required secrets exist before proceeding. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-20 13:02:52 -07:00
Hongming Wang	e2ec12292b	Merge pull request #1096 from Molecule-AI/staging promote: tenant cp-proxy same-origin	2026-04-20 13:01:51 -07:00
Hongming Wang	4ba498ca94	Merge pull request #1095 from Molecule-AI/feat/tenant-cp-proxy-same-origin feat(router): /cp/* reverse-proxy + same-origin canvas fetches	2026-04-20 13:01:46 -07:00
Hongming Wang	eb4f262d2a	feat(router): /cp/* reverse-proxy to CP + same-origin canvas fetches Canvas's browser bundle issues fetches to both CP endpoints (/cp/auth/me, /cp/orgs, ...) AND tenant-platform endpoints (/canvas/viewport, /approvals/pending, /org/templates). They share ONE build-time base URL. Baking api.moleculesai.app broke tenant calls with 404; baking the tenant subdomain broke auth. Tried both today and saw exactly one failure mode per attempt. Real fix: same-origin fetches + tenant-side split. Adds: internal/router/cp_proxy.go # /cp/* → CP_UPSTREAM_URL mounted before NoRoute(canvasProxy). Now a tenant serves: /cp/* → reverse-proxy to api.moleculesai.app /canvas/viewport, /approvals/pending, /workspaces/:id/*, /ws, /registry, → tenant platform (existing handlers) /metrics everything else → canvas UI (existing reverse-proxy) Canvas middleware reverts to `connect-src 'self' wss:` for the same-origin path (keeping explicit PLATFORM_URL whitelist as a self-hosted escape hatch when the build-arg is non-empty). CI build-arg flips to NEXT_PUBLIC_PLATFORM_URL="" so the bundle issues relative fetches. Security of cp_proxy: - Cookie + Authorization PRESERVED across the hop (opposite of canvas proxy) — they carry the WorkOS session, which is the whole point. - Host rewritten to upstream so CORS + cookie-domain on the CP side see their own hostname. - Upstream URL validated at construction: must parse, must be http(s), must have a host — misconfig fails closed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 13:01:40 -07:00
Hongming Wang	5edc95e279	Merge pull request #1094 from Molecule-AI/staging promote: CSP platform_url whitelist	2026-04-20 12:55:15 -07:00
Hongming Wang	c0ef6d92bf	Merge pull request #1093 from Molecule-AI/fix/csp-allow-platform-url fix(canvas): include PLATFORM_URL origin in CSP connect-src	2026-04-20 12:55:09 -07:00
Hongming Wang	1bca58a01b	fix(canvas): include NEXT_PUBLIC_PLATFORM_URL in CSP connect-src Tenant page loads were blocked by: Refused to connect to 'https://api.moleculesai.app/cp/auth/me' because it violates the document's Content Security Policy. CSP had `connect-src 'self' wss:` — fine for same-origin + any wss, but browser refuses cross-origin HTTPS fetches that aren't listed. PLATFORM_URL (baked from NEXT_PUBLIC_PLATFORM_URL, which is the CP origin on SaaS tenants) needs to be explicit. Fix: middleware reads NEXT_PUBLIC_PLATFORM_URL at build/runtime and adds both the https and wss siblings to connect-src. Self- hosted deploys that override the build-arg automatically get a matching CSP — no hardcoded hostname. Test added: buildCsp includes NEXT_PUBLIC_PLATFORM_URL origin in connect-src when set. Also loosens the dev `ws:` assertion since dev uses `connect-src *` which subsumes ws (pre-existing behavior, test was stale). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 12:55:03 -07:00
rabbitblood	f787873698	feat: nuke-and-rebuild.sh — one-command fleet reset Two scripts: - nuke-and-rebuild.sh: docker down -v, clean orphans, rebuild, setup - post-rebuild-setup.sh: insert global secrets (MiniMax + GH PAT), import org template, wait for platform health Global secrets ensure every provisioned container gets MiniMax API config and GitHub PAT injected as env vars automatically — no manual settings.json deployment needed. Usage: bash scripts/nuke-and-rebuild.sh Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-20 12:53:30 -07:00
Hongming Wang	1c945d02f5	Merge pull request #1092 from Molecule-AI/staging promote: bake CP origin into tenant canvas	2026-04-20 12:51:33 -07:00
Hongming Wang	3783e6f5a1	Merge pull request #1091 from Molecule-AI/fix/tenant-canvas-cp-origin fix(ci): bake api.moleculesai.app into tenant canvas bundle	2026-04-20 12:51:28 -07:00
Hongming Wang	ee40880f39	fix(ci): bake api.moleculesai.app into tenant canvas bundle Canvas's browser-side code (auth.ts, api.ts, billing.ts) all call fetch(PLATFORM_URL + /cp/). PLATFORM_URL comes from NEXT_PUBLIC_PLATFORM_URL at build time; with the build arg unset, it falls back to http://localhost:8080 in the compiled bundle. That means on a tenant like hongmingwang.moleculesai.app, the user's browser actually tried to fetch http://localhost:8080/cp/ auth/me — which resolves to the USER'S OWN machine, not the tenant. Login redirect loops 404. Every tenant canvas has been unable to complete a fresh login on this path; existing sessions only worked because the cookie was already set domain-wide. Fix: pass NEXT_PUBLIC_PLATFORM_URL=https://api.moleculesai.app as a build arg in the tenant-image workflow. CP already allows CORS from .moleculesai.app + credentials, and the session cookie is scoped to .moleculesai.app so tenant subdomains inherit it. Verified in prod by rebuilding canvas locally with the flag and hot-patching the hongmingwang instance via SSM. Baked chunks now contain api.moleculesai.app; browser auth redirects resolve cleanly to the CP. Self-hosted users override by rebuilding with their own URL — same pattern molecule-app uses with NEXT_PUBLIC_CP_ORIGIN. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 12:51:22 -07:00
rabbitblood	6091fca961	fix(auth): accept admin token in CanvasOrBearer for viewport PUT	2026-04-20 12:45:09 -07:00
rabbitblood	d47ca547ac	fix(auth): accept admin token in WorkspaceAuth for canvas dashboard The canvas sends NEXT_PUBLIC_ADMIN_TOKEN on all API calls but per-workspace routes (/activity, /delegations, /traces) use WorkspaceAuth which only accepts per-workspace bearer tokens. This made the canvas dashboard 401 on every workspace detail view. Fix: WorkspaceAuth now accepts the admin token as a fallback after workspace token validation fails. This lets the canvas read all workspace data with a single admin credential. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-20 12:42:43 -07:00
Hongming Wang	05aa0cc787	Merge pull request #1090 from Molecule-AI/staging promote: canvas CSP nonce fix	2026-04-20 12:34:14 -07:00
Hongming Wang	5babbb47bd	Merge pull request #1089 from Molecule-AI/fix/canvas-csp-nonce-propagation fix(canvas): root layout dynamic so CSP nonce reaches Next scripts	2026-04-20 12:34:08 -07:00
Hongming Wang	d70aef58f5	fix(canvas): make root layout dynamic so CSP nonce reaches Next scripts Tenant page loads were failing with repeated CSP violations: Executing inline script violates ... script-src 'self' 'nonce-M2M4YTVh...' 'strict-dynamic'. ... because Next.js's bootstrap inline scripts were emitted without a nonce attribute. The middleware was generating per-request nonces correctly and sending them via `x-nonce` — but the layout was fully static, so Next.js cached the HTML once and served that cached bundle (no nonces baked in) for every request. Fix: call `await headers()` in the root layout. That opts the tree into dynamic rendering AND signals Next.js to propagate the x-nonce value to its own generated <script> tags. The `nonce` return value is intentionally unused — the framework handles its bootstrap scripts automatically once the read happens. Future code that adds third-party <Script> components (analytics, etc.) should pass the returned nonce explicitly. Verified against live tenant: before this change every /_next/ chunk script tag in the HTML had no nonce attribute; expected after deploy is `<script nonce="..." src="/_next/...">` on each. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 12:34:03 -07:00
rabbitblood	5f5f70151b	fix(canvas): CSP_DEV_MODE + admin token for local Docker (#1052 follow-up) Three changes that keep getting lost on nuke+rebuild: 1. middleware.ts: read CSP_DEV_MODE env to relax CSP in local Docker 2. api.ts: send NEXT_PUBLIC_ADMIN_TOKEN header (AdminAuth on /workspaces) 3. Dockerfile: accept NEXT_PUBLIC_ADMIN_TOKEN as build arg All three are required for the canvas to work in local Docker where canvas (port 3000) fetches from platform (port 8080) cross-origin. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-20 12:23:43 -07:00
rabbitblood	b0ea25cc36	fix(canvas): add NEXT_PUBLIC_ADMIN_TOKEN + CSP_DEV_MODE to docker-compose Canvas needs AdminAuth token to fetch /workspaces (gated since PR #729) and CSP_DEV_MODE to allow cross-port fetches in local Docker. These were added earlier but lost on nuke+rebuild because they weren't committed to staging. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-20 12:19:12 -07:00
rabbitblood	6e6de392d9	chore: remove org-templates/molecule-dev from git tracking This directory belongs in the dedicated repo Molecule-AI/molecule-ai-org-template-molecule-dev. It should be cloned locally for platform mounting, never committed to molecule-core. The .gitignore already blocks it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-20 11:47:13 -07:00
molecule-ai[bot]	5c3ea0b61d	Merge pull request #1088 from Molecule-AI/fix/workspace-purge-delete-1087 fix: add ?purge=true hard-delete to DELETE /workspaces/:id (#1087)	2026-04-20 11:43:40 -07:00
rabbitblood	5a9658f83c	fix: add ?purge=true hard-delete to DELETE /workspaces/:id (#1087 ) Soft-delete (status='removed') leaves orphan DB rows and FK data forever. When ?purge=true is passed, after container cleanup the handler cascade- deletes all leaf FK tables and hard-removes the workspace row. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-20 11:08:44 -07:00
molecule-ai[bot]	7d931afce9	Merge pull request #1085 from Molecule-AI/fix/org-import-concurrency-1084 fix(org-import): limit concurrent Docker provisioning to 3 (#1084)	2026-04-20 10:38:26 -07:00
rabbitblood	5afc759859	fix(org-import): limit concurrent Docker provisioning to 3 (#1084 ) The org import fired all workspace provisioning goroutines concurrently, overwhelming Docker when creating 39+ containers. Containers timed out, leaving workspaces stuck in 'provisioning' with no schedules or hooks. Fix: - Add provisionConcurrency=3 semaphore limiting concurrent Docker ops - Increase workspaceCreatePacingMs from 50ms to 2000ms between siblings - Pass semaphore through createWorkspaceTree recursion With 39 workspaces at 3 concurrent + 2s pacing, import takes ~30s instead of timing out. Each workspace gets its full template: schedules, hooks, settings, hierarchy. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-20 10:08:17 -07:00
Hongming Wang	7c3cff22c6	Merge pull request #1083 from Molecule-AI/staging promote: staging → main (remove dead canvas waitlist)	2026-04-20 09:56:11 -07:00
Hongming Wang	cd4d2c5140	Merge pull request #1082 from Molecule-AI/chore/canvas-remove-waitlist-dead-page chore(canvas): remove dead /waitlist page (lives in molecule-app)	2026-04-20 09:56:01 -07:00
Hongming Wang	f59473f1fd	chore(canvas): remove dead /waitlist page (lives in molecule-app) #1080 added /waitlist to canvas, but canvas isn't served at app.moleculesai.app — it backs the tenant subdomains (acme.moleculesai.app etc.). The real /waitlist lives in the separate molecule-app repo, which is what the CP auth callback redirects to. molecule-app#12 has the real page + contact form wiring to /cp/waitlist/request. This canvas copy was never reachable and would only diverge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 09:55:35 -07:00
Hongming Wang	59dd873f26	Merge pull request #1081 from Molecule-AI/staging promote: staging → main (waitlist page)	2026-04-20 09:47:52 -07:00
Hongming Wang	61ed4ca293	Merge pull request #1080 from Molecule-AI/feat/waitlist-page feat(canvas): /waitlist page with contact form	2026-04-20 09:47:35 -07:00
Hongming Wang	6bdad3d1b8	feat(canvas): /waitlist page with contact form Adds the user-facing half of the beta-gate: a page at /waitlist that the CP auth callback redirects users to when their email isn't on the allowlist. Collects email + optional name + use-case and POSTs to /cp/waitlist/request (backend landed in controlplane #150). ## Behavior - No auto-pre-fill of email from URL query (CP's #145 dropped the ?email= param for the privacy reason; this test guards against a future regression on the client side). - Client-side validates email shape for instant feedback; backend re-validates. - Three UI states after submit: success → "your request is in" banner, form hidden dedup → softer "already on file" banner when backend returns dedup=true (same 200, no 409 to avoid enumeration) error → inline banner with backend message or network fallback ## Tests 9 tests in __tests__/waitlist-page.test.tsx covering: - default render + a11y (role=button, role=status, role=alert) - URL-pre-fill privacy regression guard - HTML5 + JS validation (empty, malformed) - successful POST with trimmed body - dedup branch - non-2xx with + without error field - network rejection Follow-up to the beta-gate rollout on controlplane #145 / #150. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 09:47:06 -07:00
Hongming Wang	4a072ae130	Merge pull request #1077 from Molecule-AI/staging promote: staging → main (bounded IsRunning body read)	2026-04-20 09:06:54 -07:00
Hongming Wang	dc9f934446	Merge pull request #1076 from Molecule-AI/fix/cp-provisioner-bounded-body-read fix(cp_provisioner): cap IsRunning body read at 64 KiB	2026-04-20 09:06:36 -07:00
Hongming Wang	2d80f61419	fix(cp_provisioner): cap IsRunning body read at 64 KiB IsRunning used an unbounded json.NewDecoder(resp.Body).Decode on CP status responses. Start already caps its body read at 64 KiB (cp_provisioner.go:137) to defend against a misconfigured or compromised CP streaming a huge body and exhausting memory. IsRunning is called reactively per-request from a2a_proxy and periodically from healthsweep, so it's a hotter path than Start and arguably deserves the same defense more. Adds TestIsRunning_BoundedBodyRead that serves a body padded past the cap and asserts the decode still succeeds on the JSON prefix. Follow-up to code-review Nit-2 on #1073. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 09:06:20 -07:00
Hongming Wang	ec99d7b5f1	Merge pull request #1074 from Molecule-AI/staging promote: staging → main (IsRunning contract fix)	2026-04-20 08:59:07 -07:00
Hongming Wang	35f7193ca9	Merge pull request #1073 from Molecule-AI/fix/isrunning-alive-on-transient fix(cp_provisioner): IsRunning returns (true, err) on transient failures	2026-04-20 08:58:44 -07:00
Hongming Wang	25b560960a	fix(cp_provisioner): IsRunning returns (true, err) on transient failures My #1071 made IsRunning return (false, err) on all error paths, but that breaks a2a_proxy which depends on Docker provisioner's (true, err) contract. Without this fix, any brief CP outage causes a2a_proxy to mark workspaces offline and trigger restart cascades across every tenant. Contract now matches Docker.IsRunning: transport error → (true, err) — alive, degraded signal non-2xx response → (true, err) — alive, degraded signal JSON decode error → (true, err) — alive, degraded signal 2xx state!=running → (false, nil) 2xx state==running → (true, nil) healthsweep.go is also happy with this — it skips on err regardless. Adds TestIsRunning_ContractCompat_A2AProxy as regression guard that asserts each error path explicitly against the a2a_proxy expectations. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 08:58:18 -07:00
Hongming Wang	d29ca3ce22	Merge pull request #1072 from Molecule-AI/staging chore: promote IsRunning error surfacing to main	2026-04-20 08:50:28 -07:00
Hongming Wang	1fd9aa238c	Merge pull request #1071 from Molecule-AI/fix/isrunning-surface-http-errors fix(workspace-server): IsRunning surfaces non-2xx + JSON errors	2026-04-20 08:50:03 -07:00
molecule-ai[bot]	3fbf40bf1b	Merge pull request #949 from Molecule-AI/feat/canvas-batch-operations feat(canvas): batch operations — multi-select + restart/pause/delete	2026-04-20 08:48:26 -07:00
molecule-ai[bot]	78a434dfc1	Merge pull request #1011 from Molecule-AI/test/qa-coverage-orgs-page-and-api-timeout test(canvas): QA coverage — orgs page polling + API timeout	2026-04-20 08:48:00 -07:00
molecule-ai[bot]	fe3e4366a3	Merge pull request #1015 from Molecule-AI/fix/canary-verify-health-poll-1013 fix(ci): replace sleep 360 with health-check poll in canary-verify (#1013)	2026-04-20 08:47:56 -07:00
Hongming Wang	47a15c340e	fix(workspace-server): IsRunning surfaces non-2xx + JSON errors Pre-existing silent-failure path: IsRunning decoded CP responses regardless of HTTP status, so a CP 500 → empty body → State="" → returned (false, nil). The sweeper couldn't distinguish "workspace stopped" from "CP broken" and would leave a dead row in place. ## Fix - Non-2xx → wrapped error, does NOT echo body (CP 5xx bodies may contain echoed headers; leaking into logs would expose bearer) - JSON decode error → wrapped error - Transport error → now wrapped with "cp provisioner: status:" prefix for easier log grepping ## Tests +7 cases (5-status table + malformed JSON + existing transport). IsRunning coverage 100%; overall cp_provisioner at 98%. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 08:47:55 -07:00
molecule-ai[bot]	692625b774	Merge pull request #1016 from Molecule-AI/fix/a11y-workspace-node fix(a11y): WorkspaceNode font floor, contrast, focus rings	2026-04-20 08:47:53 -07:00
molecule-ai[bot]	67eb87f43b	Merge pull request #1017 from Molecule-AI/fix/rows-err-missing fix(bundle/exporter): add rows.Err() check + MCP secret scrub	2026-04-20 08:47:49 -07:00
molecule-ai[bot]	e7b2c10c60	Merge pull request #1022 from Molecule-AI/fix/unchecked-exec-workspace-provision fix(mcp): scrub secrets in commit_memory + MCP handler tests	2026-04-20 08:47:25 -07:00
molecule-ai[bot]	70637ff4f7	Merge pull request #1049 from Molecule-AI/feat/platform-native-hma-instructions feat(runtime): inject HMA memory instructions at platform level (#1047)	2026-04-20 08:47:20 -07:00
Hongming Wang	b955b97416	Merge pull request #1070 from Molecule-AI/staging chore: promote workspace-server tenant-auth fix to main	2026-04-20 08:42:08 -07:00
Hongming Wang	df44524f6c	merge main into staging for #1070 promotion # Conflicts: # .gitignore	2026-04-20 08:41:58 -07:00
Hongming Wang	4e5071ffe2	Merge pull request #1067 from Molecule-AI/fix/tenant-workspace-auth fix(workspace-server): send X-Molecule-Admin-Token on CP calls	2026-04-20 08:39:49 -07:00

1 2 3 4 5 ...

1129 Commits