molecule-core

Author	SHA1	Message	Date
Hongming Wang	488fde03a7	fix(middleware): TenantGuard passes through /cp/* to CP proxy Today's rollout of cp_proxy (PR #1095/1096) mounted /cp/* as a reverse-proxy to the control plane, but the TenantGuard middleware runs first in the global chain and 404s anything that isn't in its exact-path allowlist (/health + /metrics). Every /cp/auth/me fetch from canvas landed on a 40µs 404 before ever reaching the proxy. /cp/* is handled upstream (WorkOS session + admin bearer), so the tenant doesn't need to attach org identity for those paths. Passing them through is correct — matches the design where the tenant platform is a pure transit layer for /cp/*. Verified: /cp/auth/me via tunnel now returns 401 (correct unauth from CP) instead of 404 from TenantGuard. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 13:14:56 -07:00
Hongming Wang	e2ec12292b	Merge pull request #1096 from Molecule-AI/staging promote: tenant cp-proxy same-origin	2026-04-20 13:01:51 -07:00
Hongming Wang	4ba498ca94	Merge pull request #1095 from Molecule-AI/feat/tenant-cp-proxy-same-origin feat(router): /cp/* reverse-proxy + same-origin canvas fetches	2026-04-20 13:01:46 -07:00
Hongming Wang	eb4f262d2a	feat(router): /cp/* reverse-proxy to CP + same-origin canvas fetches Canvas's browser bundle issues fetches to both CP endpoints (/cp/auth/me, /cp/orgs, ...) AND tenant-platform endpoints (/canvas/viewport, /approvals/pending, /org/templates). They share ONE build-time base URL. Baking api.moleculesai.app broke tenant calls with 404; baking the tenant subdomain broke auth. Tried both today and saw exactly one failure mode per attempt. Real fix: same-origin fetches + tenant-side split. Adds: internal/router/cp_proxy.go # /cp/* → CP_UPSTREAM_URL mounted before NoRoute(canvasProxy). Now a tenant serves: /cp/* → reverse-proxy to api.moleculesai.app /canvas/viewport, /approvals/pending, /workspaces/:id/*, /ws, /registry, → tenant platform (existing handlers) /metrics everything else → canvas UI (existing reverse-proxy) Canvas middleware reverts to `connect-src 'self' wss:` for the same-origin path (keeping explicit PLATFORM_URL whitelist as a self-hosted escape hatch when the build-arg is non-empty). CI build-arg flips to NEXT_PUBLIC_PLATFORM_URL="" so the bundle issues relative fetches. Security of cp_proxy: - Cookie + Authorization PRESERVED across the hop (opposite of canvas proxy) — they carry the WorkOS session, which is the whole point. - Host rewritten to upstream so CORS + cookie-domain on the CP side see their own hostname. - Upstream URL validated at construction: must parse, must be http(s), must have a host — misconfig fails closed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 13:01:40 -07:00
Hongming Wang	5edc95e279	Merge pull request #1094 from Molecule-AI/staging promote: CSP platform_url whitelist	2026-04-20 12:55:15 -07:00
Hongming Wang	c0ef6d92bf	Merge pull request #1093 from Molecule-AI/fix/csp-allow-platform-url fix(canvas): include PLATFORM_URL origin in CSP connect-src	2026-04-20 12:55:09 -07:00
Hongming Wang	1bca58a01b	fix(canvas): include NEXT_PUBLIC_PLATFORM_URL in CSP connect-src Tenant page loads were blocked by: Refused to connect to 'https://api.moleculesai.app/cp/auth/me' because it violates the document's Content Security Policy. CSP had `connect-src 'self' wss:` — fine for same-origin + any wss, but browser refuses cross-origin HTTPS fetches that aren't listed. PLATFORM_URL (baked from NEXT_PUBLIC_PLATFORM_URL, which is the CP origin on SaaS tenants) needs to be explicit. Fix: middleware reads NEXT_PUBLIC_PLATFORM_URL at build/runtime and adds both the https and wss siblings to connect-src. Self- hosted deploys that override the build-arg automatically get a matching CSP — no hardcoded hostname. Test added: buildCsp includes NEXT_PUBLIC_PLATFORM_URL origin in connect-src when set. Also loosens the dev `ws:` assertion since dev uses `connect-src *` which subsumes ws (pre-existing behavior, test was stale). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 12:55:03 -07:00
rabbitblood	f787873698	feat: nuke-and-rebuild.sh — one-command fleet reset Two scripts: - nuke-and-rebuild.sh: docker down -v, clean orphans, rebuild, setup - post-rebuild-setup.sh: insert global secrets (MiniMax + GH PAT), import org template, wait for platform health Global secrets ensure every provisioned container gets MiniMax API config and GitHub PAT injected as env vars automatically — no manual settings.json deployment needed. Usage: bash scripts/nuke-and-rebuild.sh Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-20 12:53:30 -07:00
Hongming Wang	1c945d02f5	Merge pull request #1092 from Molecule-AI/staging promote: bake CP origin into tenant canvas	2026-04-20 12:51:33 -07:00
Hongming Wang	3783e6f5a1	Merge pull request #1091 from Molecule-AI/fix/tenant-canvas-cp-origin fix(ci): bake api.moleculesai.app into tenant canvas bundle	2026-04-20 12:51:28 -07:00
Hongming Wang	ee40880f39	fix(ci): bake api.moleculesai.app into tenant canvas bundle Canvas's browser-side code (auth.ts, api.ts, billing.ts) all call fetch(PLATFORM_URL + /cp/). PLATFORM_URL comes from NEXT_PUBLIC_PLATFORM_URL at build time; with the build arg unset, it falls back to http://localhost:8080 in the compiled bundle. That means on a tenant like hongmingwang.moleculesai.app, the user's browser actually tried to fetch http://localhost:8080/cp/ auth/me — which resolves to the USER'S OWN machine, not the tenant. Login redirect loops 404. Every tenant canvas has been unable to complete a fresh login on this path; existing sessions only worked because the cookie was already set domain-wide. Fix: pass NEXT_PUBLIC_PLATFORM_URL=https://api.moleculesai.app as a build arg in the tenant-image workflow. CP already allows CORS from .moleculesai.app + credentials, and the session cookie is scoped to .moleculesai.app so tenant subdomains inherit it. Verified in prod by rebuilding canvas locally with the flag and hot-patching the hongmingwang instance via SSM. Baked chunks now contain api.moleculesai.app; browser auth redirects resolve cleanly to the CP. Self-hosted users override by rebuilding with their own URL — same pattern molecule-app uses with NEXT_PUBLIC_CP_ORIGIN. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 12:51:22 -07:00
rabbitblood	6091fca961	fix(auth): accept admin token in CanvasOrBearer for viewport PUT	2026-04-20 12:45:09 -07:00
rabbitblood	d47ca547ac	fix(auth): accept admin token in WorkspaceAuth for canvas dashboard The canvas sends NEXT_PUBLIC_ADMIN_TOKEN on all API calls but per-workspace routes (/activity, /delegations, /traces) use WorkspaceAuth which only accepts per-workspace bearer tokens. This made the canvas dashboard 401 on every workspace detail view. Fix: WorkspaceAuth now accepts the admin token as a fallback after workspace token validation fails. This lets the canvas read all workspace data with a single admin credential. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-20 12:42:43 -07:00
Hongming Wang	05aa0cc787	Merge pull request #1090 from Molecule-AI/staging promote: canvas CSP nonce fix	2026-04-20 12:34:14 -07:00
Hongming Wang	5babbb47bd	Merge pull request #1089 from Molecule-AI/fix/canvas-csp-nonce-propagation fix(canvas): root layout dynamic so CSP nonce reaches Next scripts	2026-04-20 12:34:08 -07:00
Hongming Wang	d70aef58f5	fix(canvas): make root layout dynamic so CSP nonce reaches Next scripts Tenant page loads were failing with repeated CSP violations: Executing inline script violates ... script-src 'self' 'nonce-M2M4YTVh...' 'strict-dynamic'. ... because Next.js's bootstrap inline scripts were emitted without a nonce attribute. The middleware was generating per-request nonces correctly and sending them via `x-nonce` — but the layout was fully static, so Next.js cached the HTML once and served that cached bundle (no nonces baked in) for every request. Fix: call `await headers()` in the root layout. That opts the tree into dynamic rendering AND signals Next.js to propagate the x-nonce value to its own generated <script> tags. The `nonce` return value is intentionally unused — the framework handles its bootstrap scripts automatically once the read happens. Future code that adds third-party <Script> components (analytics, etc.) should pass the returned nonce explicitly. Verified against live tenant: before this change every /_next/ chunk script tag in the HTML had no nonce attribute; expected after deploy is `<script nonce="..." src="/_next/...">` on each. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 12:34:03 -07:00
rabbitblood	5f5f70151b	fix(canvas): CSP_DEV_MODE + admin token for local Docker (#1052 follow-up) Three changes that keep getting lost on nuke+rebuild: 1. middleware.ts: read CSP_DEV_MODE env to relax CSP in local Docker 2. api.ts: send NEXT_PUBLIC_ADMIN_TOKEN header (AdminAuth on /workspaces) 3. Dockerfile: accept NEXT_PUBLIC_ADMIN_TOKEN as build arg All three are required for the canvas to work in local Docker where canvas (port 3000) fetches from platform (port 8080) cross-origin. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-20 12:23:43 -07:00
rabbitblood	b0ea25cc36	fix(canvas): add NEXT_PUBLIC_ADMIN_TOKEN + CSP_DEV_MODE to docker-compose Canvas needs AdminAuth token to fetch /workspaces (gated since PR #729) and CSP_DEV_MODE to allow cross-port fetches in local Docker. These were added earlier but lost on nuke+rebuild because they weren't committed to staging. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-20 12:19:12 -07:00
rabbitblood	6e6de392d9	chore: remove org-templates/molecule-dev from git tracking This directory belongs in the dedicated repo Molecule-AI/molecule-ai-org-template-molecule-dev. It should be cloned locally for platform mounting, never committed to molecule-core. The .gitignore already blocks it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-20 11:47:13 -07:00
molecule-ai[bot]	5c3ea0b61d	Merge pull request #1088 from Molecule-AI/fix/workspace-purge-delete-1087 fix: add ?purge=true hard-delete to DELETE /workspaces/:id (#1087)	2026-04-20 11:43:40 -07:00
rabbitblood	5a9658f83c	fix: add ?purge=true hard-delete to DELETE /workspaces/:id (#1087 ) Soft-delete (status='removed') leaves orphan DB rows and FK data forever. When ?purge=true is passed, after container cleanup the handler cascade- deletes all leaf FK tables and hard-removes the workspace row. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-20 11:08:44 -07:00
molecule-ai[bot]	7d931afce9	Merge pull request #1085 from Molecule-AI/fix/org-import-concurrency-1084 fix(org-import): limit concurrent Docker provisioning to 3 (#1084)	2026-04-20 10:38:26 -07:00
rabbitblood	5afc759859	fix(org-import): limit concurrent Docker provisioning to 3 (#1084 ) The org import fired all workspace provisioning goroutines concurrently, overwhelming Docker when creating 39+ containers. Containers timed out, leaving workspaces stuck in 'provisioning' with no schedules or hooks. Fix: - Add provisionConcurrency=3 semaphore limiting concurrent Docker ops - Increase workspaceCreatePacingMs from 50ms to 2000ms between siblings - Pass semaphore through createWorkspaceTree recursion With 39 workspaces at 3 concurrent + 2s pacing, import takes ~30s instead of timing out. Each workspace gets its full template: schedules, hooks, settings, hierarchy. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-20 10:08:17 -07:00
Hongming Wang	7c3cff22c6	Merge pull request #1083 from Molecule-AI/staging promote: staging → main (remove dead canvas waitlist)	2026-04-20 09:56:11 -07:00
Hongming Wang	cd4d2c5140	Merge pull request #1082 from Molecule-AI/chore/canvas-remove-waitlist-dead-page chore(canvas): remove dead /waitlist page (lives in molecule-app)	2026-04-20 09:56:01 -07:00
Hongming Wang	f59473f1fd	chore(canvas): remove dead /waitlist page (lives in molecule-app) #1080 added /waitlist to canvas, but canvas isn't served at app.moleculesai.app — it backs the tenant subdomains (acme.moleculesai.app etc.). The real /waitlist lives in the separate molecule-app repo, which is what the CP auth callback redirects to. molecule-app#12 has the real page + contact form wiring to /cp/waitlist/request. This canvas copy was never reachable and would only diverge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 09:55:35 -07:00
Hongming Wang	59dd873f26	Merge pull request #1081 from Molecule-AI/staging promote: staging → main (waitlist page)	2026-04-20 09:47:52 -07:00
Hongming Wang	61ed4ca293	Merge pull request #1080 from Molecule-AI/feat/waitlist-page feat(canvas): /waitlist page with contact form	2026-04-20 09:47:35 -07:00
Hongming Wang	6bdad3d1b8	feat(canvas): /waitlist page with contact form Adds the user-facing half of the beta-gate: a page at /waitlist that the CP auth callback redirects users to when their email isn't on the allowlist. Collects email + optional name + use-case and POSTs to /cp/waitlist/request (backend landed in controlplane #150). ## Behavior - No auto-pre-fill of email from URL query (CP's #145 dropped the ?email= param for the privacy reason; this test guards against a future regression on the client side). - Client-side validates email shape for instant feedback; backend re-validates. - Three UI states after submit: success → "your request is in" banner, form hidden dedup → softer "already on file" banner when backend returns dedup=true (same 200, no 409 to avoid enumeration) error → inline banner with backend message or network fallback ## Tests 9 tests in __tests__/waitlist-page.test.tsx covering: - default render + a11y (role=button, role=status, role=alert) - URL-pre-fill privacy regression guard - HTML5 + JS validation (empty, malformed) - successful POST with trimmed body - dedup branch - non-2xx with + without error field - network rejection Follow-up to the beta-gate rollout on controlplane #145 / #150. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 09:47:06 -07:00
Hongming Wang	4a072ae130	Merge pull request #1077 from Molecule-AI/staging promote: staging → main (bounded IsRunning body read)	2026-04-20 09:06:54 -07:00
Hongming Wang	dc9f934446	Merge pull request #1076 from Molecule-AI/fix/cp-provisioner-bounded-body-read fix(cp_provisioner): cap IsRunning body read at 64 KiB	2026-04-20 09:06:36 -07:00
Hongming Wang	2d80f61419	fix(cp_provisioner): cap IsRunning body read at 64 KiB IsRunning used an unbounded json.NewDecoder(resp.Body).Decode on CP status responses. Start already caps its body read at 64 KiB (cp_provisioner.go:137) to defend against a misconfigured or compromised CP streaming a huge body and exhausting memory. IsRunning is called reactively per-request from a2a_proxy and periodically from healthsweep, so it's a hotter path than Start and arguably deserves the same defense more. Adds TestIsRunning_BoundedBodyRead that serves a body padded past the cap and asserts the decode still succeeds on the JSON prefix. Follow-up to code-review Nit-2 on #1073. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 09:06:20 -07:00
Hongming Wang	ec99d7b5f1	Merge pull request #1074 from Molecule-AI/staging promote: staging → main (IsRunning contract fix)	2026-04-20 08:59:07 -07:00
Hongming Wang	35f7193ca9	Merge pull request #1073 from Molecule-AI/fix/isrunning-alive-on-transient fix(cp_provisioner): IsRunning returns (true, err) on transient failures	2026-04-20 08:58:44 -07:00
Hongming Wang	25b560960a	fix(cp_provisioner): IsRunning returns (true, err) on transient failures My #1071 made IsRunning return (false, err) on all error paths, but that breaks a2a_proxy which depends on Docker provisioner's (true, err) contract. Without this fix, any brief CP outage causes a2a_proxy to mark workspaces offline and trigger restart cascades across every tenant. Contract now matches Docker.IsRunning: transport error → (true, err) — alive, degraded signal non-2xx response → (true, err) — alive, degraded signal JSON decode error → (true, err) — alive, degraded signal 2xx state!=running → (false, nil) 2xx state==running → (true, nil) healthsweep.go is also happy with this — it skips on err regardless. Adds TestIsRunning_ContractCompat_A2AProxy as regression guard that asserts each error path explicitly against the a2a_proxy expectations. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 08:58:18 -07:00
Hongming Wang	d29ca3ce22	Merge pull request #1072 from Molecule-AI/staging chore: promote IsRunning error surfacing to main	2026-04-20 08:50:28 -07:00
Hongming Wang	1fd9aa238c	Merge pull request #1071 from Molecule-AI/fix/isrunning-surface-http-errors fix(workspace-server): IsRunning surfaces non-2xx + JSON errors	2026-04-20 08:50:03 -07:00
molecule-ai[bot]	3fbf40bf1b	Merge pull request #949 from Molecule-AI/feat/canvas-batch-operations feat(canvas): batch operations — multi-select + restart/pause/delete	2026-04-20 08:48:26 -07:00
molecule-ai[bot]	78a434dfc1	Merge pull request #1011 from Molecule-AI/test/qa-coverage-orgs-page-and-api-timeout test(canvas): QA coverage — orgs page polling + API timeout	2026-04-20 08:48:00 -07:00
molecule-ai[bot]	fe3e4366a3	Merge pull request #1015 from Molecule-AI/fix/canary-verify-health-poll-1013 fix(ci): replace sleep 360 with health-check poll in canary-verify (#1013)	2026-04-20 08:47:56 -07:00
Hongming Wang	47a15c340e	fix(workspace-server): IsRunning surfaces non-2xx + JSON errors Pre-existing silent-failure path: IsRunning decoded CP responses regardless of HTTP status, so a CP 500 → empty body → State="" → returned (false, nil). The sweeper couldn't distinguish "workspace stopped" from "CP broken" and would leave a dead row in place. ## Fix - Non-2xx → wrapped error, does NOT echo body (CP 5xx bodies may contain echoed headers; leaking into logs would expose bearer) - JSON decode error → wrapped error - Transport error → now wrapped with "cp provisioner: status:" prefix for easier log grepping ## Tests +7 cases (5-status table + malformed JSON + existing transport). IsRunning coverage 100%; overall cp_provisioner at 98%. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 08:47:55 -07:00
molecule-ai[bot]	692625b774	Merge pull request #1016 from Molecule-AI/fix/a11y-workspace-node fix(a11y): WorkspaceNode font floor, contrast, focus rings	2026-04-20 08:47:53 -07:00
molecule-ai[bot]	67eb87f43b	Merge pull request #1017 from Molecule-AI/fix/rows-err-missing fix(bundle/exporter): add rows.Err() check + MCP secret scrub	2026-04-20 08:47:49 -07:00
molecule-ai[bot]	e7b2c10c60	Merge pull request #1022 from Molecule-AI/fix/unchecked-exec-workspace-provision fix(mcp): scrub secrets in commit_memory + MCP handler tests	2026-04-20 08:47:25 -07:00
molecule-ai[bot]	70637ff4f7	Merge pull request #1049 from Molecule-AI/feat/platform-native-hma-instructions feat(runtime): inject HMA memory instructions at platform level (#1047)	2026-04-20 08:47:20 -07:00
Hongming Wang	b955b97416	Merge pull request #1070 from Molecule-AI/staging chore: promote workspace-server tenant-auth fix to main	2026-04-20 08:42:08 -07:00
Hongming Wang	df44524f6c	merge main into staging for #1070 promotion # Conflicts: # .gitignore	2026-04-20 08:41:58 -07:00
Hongming Wang	4e5071ffe2	Merge pull request #1067 from Molecule-AI/fix/tenant-workspace-auth fix(workspace-server): send X-Molecule-Admin-Token on CP calls	2026-04-20 08:39:49 -07:00
molecule-ai[bot]	24a75954ff	Merge pull request #1069 from Molecule-AI/fix/github-token-refresh-1068 fix: GitHub token refresh — WorkspaceAuth path for credential helper (#1068)	2026-04-20 08:37:46 -07:00
Hongming Wang	e8943fba6c	test(workspace-server): cover Stop/IsRunning/Close + auth-header + transport errors Closes review gap: pre-PR coverage on CPProvisioner was 37%. After this commit every exported method is exercised: - NewCPProvisioner 100% - authHeaders 100% - Start 91.7% (remainder: json.Marshal error path, unreachable with fixed-type request struct) - Stop 100% (new — header + path + error) - IsRunning 100% (new — 4-state matrix + auth) - Close 100% (new — contract no-op) New cases assert both auth headers (shared secret + admin_token) land on every outbound request, transport failures surface clear errors on Start/Stop, and IsRunning doesn't misreport on transport failure. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 08:37:39 -07:00

1 2 3 4 5 ...

1127 Commits