Commit Graph

1126 Commits

Author SHA1 Message Date
rabbitblood
afc4f13369 security: remove hardcoded API keys from post-rebuild-setup.sh
GitGuardian detected exposed MiniMax API key and GitHub PAT in the
script's default values. Replaced with env var reads from .env file
(which is gitignored). Script now validates required secrets exist
before proceeding.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 13:02:52 -07:00
Hongming Wang
b9da64d147 Merge pull request #1095 from Molecule-AI/feat/tenant-cp-proxy-same-origin
feat(router): /cp/* reverse-proxy + same-origin canvas fetches
2026-04-20 13:01:46 -07:00
Hongming Wang
52235aeb27 feat(router): /cp/* reverse-proxy to CP + same-origin canvas fetches
Canvas's browser bundle issues fetches to both CP endpoints
(/cp/auth/me, /cp/orgs, ...) AND tenant-platform endpoints
(/canvas/viewport, /approvals/pending, /org/templates). They
share ONE build-time base URL. Baking api.moleculesai.app
broke tenant calls with 404; baking the tenant subdomain broke
auth. Tried both today and saw exactly one failure mode per
attempt.

Real fix: same-origin fetches + tenant-side split. Adds:

  internal/router/cp_proxy.go      # /cp/* → CP_UPSTREAM_URL

mounted before NoRoute(canvasProxy). Now a tenant serves:

  /cp/*              → reverse-proxy to api.moleculesai.app
  /canvas/viewport,
  /approvals/pending,
  /workspaces/:id/*,
  /ws, /registry,    → tenant platform (existing handlers)
  /metrics
  everything else    → canvas UI (existing reverse-proxy)

Canvas middleware reverts to `connect-src 'self' wss:` for the
same-origin path (keeping explicit PLATFORM_URL whitelist as a
self-hosted escape hatch when the build-arg is non-empty).

CI build-arg flips to NEXT_PUBLIC_PLATFORM_URL="" so the bundle
issues relative fetches.

Security of cp_proxy:
  - Cookie + Authorization PRESERVED across the hop (opposite of
    canvas proxy) — they carry the WorkOS session, which is the
    whole point.
  - Host rewritten to upstream so CORS + cookie-domain on the CP
    side see their own hostname.
  - Upstream URL validated at construction: must parse, must be
    http(s), must have a host — misconfig fails closed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 13:01:40 -07:00
Hongming Wang
43cd8e1661 Merge pull request #1094 from Molecule-AI/staging
promote: CSP platform_url whitelist
2026-04-20 12:55:15 -07:00
Hongming Wang
dae3eb931d Merge pull request #1093 from Molecule-AI/fix/csp-allow-platform-url
fix(canvas): include PLATFORM_URL origin in CSP connect-src
2026-04-20 12:55:09 -07:00
Hongming Wang
d069231d0b fix(canvas): include NEXT_PUBLIC_PLATFORM_URL in CSP connect-src
Tenant page loads were blocked by:

  Refused to connect to 'https://api.moleculesai.app/cp/auth/me'
  because it violates the document's Content Security Policy.

CSP had `connect-src 'self' wss:` — fine for same-origin + any wss,
but browser refuses cross-origin HTTPS fetches that aren't listed.
PLATFORM_URL (baked from NEXT_PUBLIC_PLATFORM_URL, which is the CP
origin on SaaS tenants) needs to be explicit.

Fix: middleware reads NEXT_PUBLIC_PLATFORM_URL at build/runtime
and adds both the https and wss siblings to connect-src. Self-
hosted deploys that override the build-arg automatically get a
matching CSP — no hardcoded hostname.

Test added: buildCsp includes NEXT_PUBLIC_PLATFORM_URL origin in
connect-src when set. Also loosens the dev `ws:` assertion since
dev uses `connect-src *` which subsumes ws (pre-existing behavior,
test was stale).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 12:55:03 -07:00
rabbitblood
f9971306d6 feat: nuke-and-rebuild.sh — one-command fleet reset
Two scripts:
- nuke-and-rebuild.sh: docker down -v, clean orphans, rebuild, setup
- post-rebuild-setup.sh: insert global secrets (MiniMax + GH PAT),
  import org template, wait for platform health

Global secrets ensure every provisioned container gets MiniMax API
config and GitHub PAT injected as env vars automatically — no manual
settings.json deployment needed.

Usage: bash scripts/nuke-and-rebuild.sh

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 12:53:30 -07:00
Hongming Wang
73a43c2a9a Merge pull request #1092 from Molecule-AI/staging
promote: bake CP origin into tenant canvas
2026-04-20 12:51:33 -07:00
Hongming Wang
40e524dea5 Merge pull request #1091 from Molecule-AI/fix/tenant-canvas-cp-origin
fix(ci): bake api.moleculesai.app into tenant canvas bundle
2026-04-20 12:51:28 -07:00
Hongming Wang
b9e1f1e88e fix(ci): bake api.moleculesai.app into tenant canvas bundle
Canvas's browser-side code (auth.ts, api.ts, billing.ts) all call
fetch(PLATFORM_URL + /cp/*). PLATFORM_URL comes from
NEXT_PUBLIC_PLATFORM_URL at build time; with the build arg unset,
it falls back to http://localhost:8080 in the compiled bundle.

That means on a tenant like hongmingwang.moleculesai.app, the
user's browser actually tried to fetch http://localhost:8080/cp/
auth/me — which resolves to the USER'S OWN machine, not the tenant.
Login redirect loops 404. Every tenant canvas has been unable to
complete a fresh login on this path; existing sessions only worked
because the cookie was already set domain-wide.

Fix: pass NEXT_PUBLIC_PLATFORM_URL=https://api.moleculesai.app
as a build arg in the tenant-image workflow. CP already allows
CORS from *.moleculesai.app + credentials, and the session cookie
is scoped to .moleculesai.app so tenant subdomains inherit it.

Verified in prod by rebuilding canvas locally with the flag and
hot-patching the hongmingwang instance via SSM. Baked chunks now
contain api.moleculesai.app; browser auth redirects resolve
cleanly to the CP.

Self-hosted users override by rebuilding with their own URL —
same pattern molecule-app uses with NEXT_PUBLIC_CP_ORIGIN.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 12:51:22 -07:00
rabbitblood
992e6d3f38 fix(auth): accept admin token in CanvasOrBearer for viewport PUT 2026-04-20 12:45:09 -07:00
rabbitblood
1e30386aec fix(auth): accept admin token in WorkspaceAuth for canvas dashboard
The canvas sends NEXT_PUBLIC_ADMIN_TOKEN on all API calls but per-workspace
routes (/activity, /delegations, /traces) use WorkspaceAuth which only
accepts per-workspace bearer tokens. This made the canvas dashboard 401
on every workspace detail view.

Fix: WorkspaceAuth now accepts the admin token as a fallback after
workspace token validation fails. This lets the canvas read all workspace
data with a single admin credential.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 12:42:43 -07:00
Hongming Wang
f922dcc3b7 Merge pull request #1090 from Molecule-AI/staging
promote: canvas CSP nonce fix
2026-04-20 12:34:14 -07:00
Hongming Wang
9abe2e5739 Merge pull request #1089 from Molecule-AI/fix/canvas-csp-nonce-propagation
fix(canvas): root layout dynamic so CSP nonce reaches Next scripts
2026-04-20 12:34:08 -07:00
Hongming Wang
1af6f696a2 fix(canvas): make root layout dynamic so CSP nonce reaches Next scripts
Tenant page loads were failing with repeated CSP violations:

  Executing inline script violates ... script-src 'self'
  'nonce-M2M4YTVh...' 'strict-dynamic'. ...

because Next.js's bootstrap inline scripts were emitted without a
nonce attribute. The middleware was generating per-request nonces
correctly and sending them via `x-nonce` — but the layout was
fully static, so Next.js cached the HTML once and served that cached
bundle (no nonces baked in) for every request.

Fix: call `await headers()` in the root layout. That opts the tree
into dynamic rendering AND signals Next.js to propagate the
x-nonce value to its own generated <script> tags.

The `nonce` return value is intentionally unused — the framework
handles its bootstrap scripts automatically once the read happens.
Future code that adds third-party <Script> components (analytics,
etc.) should pass the returned nonce explicitly.

Verified against live tenant: before this change every /_next/
chunk script tag in the HTML had no nonce attribute; expected after
deploy is `<script nonce="..." src="/_next/...">` on each.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 12:34:03 -07:00
rabbitblood
35df23850e fix(canvas): CSP_DEV_MODE + admin token for local Docker (#1052 follow-up)
Three changes that keep getting lost on nuke+rebuild:
1. middleware.ts: read CSP_DEV_MODE env to relax CSP in local Docker
2. api.ts: send NEXT_PUBLIC_ADMIN_TOKEN header (AdminAuth on /workspaces)
3. Dockerfile: accept NEXT_PUBLIC_ADMIN_TOKEN as build arg

All three are required for the canvas to work in local Docker where
canvas (port 3000) fetches from platform (port 8080) cross-origin.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 12:23:43 -07:00
rabbitblood
504239f510 fix(canvas): add NEXT_PUBLIC_ADMIN_TOKEN + CSP_DEV_MODE to docker-compose
Canvas needs AdminAuth token to fetch /workspaces (gated since PR #729)
and CSP_DEV_MODE to allow cross-port fetches in local Docker.

These were added earlier but lost on nuke+rebuild because they weren't
committed to staging.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 12:19:12 -07:00
rabbitblood
c8dcc8c628 chore: remove org-templates/molecule-dev from git tracking
This directory belongs in the dedicated repo
Molecule-AI/molecule-ai-org-template-molecule-dev.
It should be cloned locally for platform mounting, never
committed to molecule-core. The .gitignore already blocks it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 11:47:13 -07:00
molecule-ai[bot]
ff13e5caaa Merge pull request #1088 from Molecule-AI/fix/workspace-purge-delete-1087
fix: add ?purge=true hard-delete to DELETE /workspaces/:id (#1087)
2026-04-20 11:43:40 -07:00
rabbitblood
dd224b2ae4 fix: add ?purge=true hard-delete to DELETE /workspaces/:id (#1087)
Soft-delete (status='removed') leaves orphan DB rows and FK data forever.
When ?purge=true is passed, after container cleanup the handler cascade-
deletes all leaf FK tables and hard-removes the workspace row.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 11:08:44 -07:00
molecule-ai[bot]
247c0d8dcf Merge pull request #1085 from Molecule-AI/fix/org-import-concurrency-1084
fix(org-import): limit concurrent Docker provisioning to 3 (#1084)
2026-04-20 10:38:26 -07:00
rabbitblood
762b38fa30 fix(org-import): limit concurrent Docker provisioning to 3 (#1084)
The org import fired all workspace provisioning goroutines concurrently,
overwhelming Docker when creating 39+ containers. Containers timed out,
leaving workspaces stuck in 'provisioning' with no schedules or hooks.

Fix:
- Add provisionConcurrency=3 semaphore limiting concurrent Docker ops
- Increase workspaceCreatePacingMs from 50ms to 2000ms between siblings
- Pass semaphore through createWorkspaceTree recursion

With 39 workspaces at 3 concurrent + 2s pacing, import takes ~30s instead
of timing out. Each workspace gets its full template: schedules, hooks,
settings, hierarchy.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 10:08:17 -07:00
Hongming Wang
1eea9e04ba Merge pull request #1083 from Molecule-AI/staging
promote: staging → main (remove dead canvas waitlist)
2026-04-20 09:56:11 -07:00
Hongming Wang
35d9363fbd Merge pull request #1082 from Molecule-AI/chore/canvas-remove-waitlist-dead-page
chore(canvas): remove dead /waitlist page (lives in molecule-app)
2026-04-20 09:56:01 -07:00
Hongming Wang
7a2a17591c chore(canvas): remove dead /waitlist page (lives in molecule-app)
#1080 added /waitlist to canvas, but canvas isn't served at
app.moleculesai.app — it backs the tenant subdomains (acme.moleculesai.app
etc.). The real /waitlist lives in the separate molecule-app repo,
which is what the CP auth callback redirects to.

molecule-app#12 has the real page + contact form wiring to
/cp/waitlist/request. This canvas copy was never reachable and would
only diverge.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 09:55:35 -07:00
Hongming Wang
1264aa2166 Merge pull request #1081 from Molecule-AI/staging
promote: staging → main (waitlist page)
2026-04-20 09:47:52 -07:00
Hongming Wang
eb2c7d7081 Merge pull request #1080 from Molecule-AI/feat/waitlist-page
feat(canvas): /waitlist page with contact form
2026-04-20 09:47:35 -07:00
Hongming Wang
b7149c5dda feat(canvas): /waitlist page with contact form
Adds the user-facing half of the beta-gate: a page at /waitlist that
the CP auth callback redirects users to when their email isn't on
the allowlist. Collects email + optional name + use-case and POSTs
to /cp/waitlist/request (backend landed in controlplane #150).

## Behavior

- No auto-pre-fill of email from URL query (CP's #145 dropped the
  ?email= param for the privacy reason; this test guards against a
  future regression on the client side).
- Client-side validates email shape for instant feedback; backend
  re-validates.
- Three UI states after submit:
    success → "your request is in" banner, form hidden
    dedup   → softer "already on file" banner when backend returns
              dedup=true (same 200, no 409 to avoid enumeration)
    error   → inline banner with backend message or network fallback

## Tests

9 tests in __tests__/waitlist-page.test.tsx covering:
- default render + a11y (role=button, role=status, role=alert)
- URL-pre-fill privacy regression guard
- HTML5 + JS validation (empty, malformed)
- successful POST with trimmed body
- dedup branch
- non-2xx with + without error field
- network rejection

Follow-up to the beta-gate rollout on controlplane #145 / #150.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 09:47:06 -07:00
Hongming Wang
8ae55da718 Merge pull request #1077 from Molecule-AI/staging
promote: staging → main (bounded IsRunning body read)
2026-04-20 09:06:54 -07:00
Hongming Wang
c3e5c8949e Merge pull request #1076 from Molecule-AI/fix/cp-provisioner-bounded-body-read
fix(cp_provisioner): cap IsRunning body read at 64 KiB
2026-04-20 09:06:36 -07:00
Hongming Wang
0a06cb4fc9 fix(cp_provisioner): cap IsRunning body read at 64 KiB
IsRunning used an unbounded json.NewDecoder(resp.Body).Decode on
CP status responses. Start already caps its body read at 64 KiB
(cp_provisioner.go:137) to defend against a misconfigured or
compromised CP streaming a huge body and exhausting memory.

IsRunning is called reactively per-request from a2a_proxy and
periodically from healthsweep, so it's a hotter path than Start
and arguably deserves the same defense more.

Adds TestIsRunning_BoundedBodyRead that serves a body padded past
the cap and asserts the decode still succeeds on the JSON prefix.

Follow-up to code-review Nit-2 on #1073.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 09:06:20 -07:00
Hongming Wang
0b593cfee3 Merge pull request #1074 from Molecule-AI/staging
promote: staging → main (IsRunning contract fix)
2026-04-20 08:59:07 -07:00
Hongming Wang
f71f408c20 Merge pull request #1073 from Molecule-AI/fix/isrunning-alive-on-transient
fix(cp_provisioner): IsRunning returns (true, err) on transient failures
2026-04-20 08:58:44 -07:00
Hongming Wang
cfa901b89a fix(cp_provisioner): IsRunning returns (true, err) on transient failures
My #1071 made IsRunning return (false, err) on all error paths, but that
breaks a2a_proxy which depends on Docker provisioner's (true, err) contract.
Without this fix, any brief CP outage causes a2a_proxy to mark workspaces
offline and trigger restart cascades across every tenant.

Contract now matches Docker.IsRunning:
  transport error    → (true, err)  — alive, degraded signal
  non-2xx response   → (true, err)  — alive, degraded signal
  JSON decode error  → (true, err)  — alive, degraded signal
  2xx state!=running → (false, nil)
  2xx state==running → (true, nil)

healthsweep.go is also happy with this — it skips on err regardless.

Adds TestIsRunning_ContractCompat_A2AProxy as regression guard that
asserts each error path explicitly against the a2a_proxy expectations.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 08:58:18 -07:00
Hongming Wang
54d72d9f5c Merge pull request #1072 from Molecule-AI/staging
chore: promote IsRunning error surfacing to main
2026-04-20 08:50:28 -07:00
Hongming Wang
724456b7be Merge pull request #1071 from Molecule-AI/fix/isrunning-surface-http-errors
fix(workspace-server): IsRunning surfaces non-2xx + JSON errors
2026-04-20 08:50:03 -07:00
molecule-ai[bot]
a3e70b739c Merge pull request #949 from Molecule-AI/feat/canvas-batch-operations
feat(canvas): batch operations — multi-select + restart/pause/delete
2026-04-20 08:48:26 -07:00
molecule-ai[bot]
fc0e02d692 Merge pull request #1011 from Molecule-AI/test/qa-coverage-orgs-page-and-api-timeout
test(canvas): QA coverage — orgs page polling + API timeout
2026-04-20 08:48:00 -07:00
molecule-ai[bot]
4cd36607ea Merge pull request #1015 from Molecule-AI/fix/canary-verify-health-poll-1013
fix(ci): replace sleep 360 with health-check poll in canary-verify (#1013)
2026-04-20 08:47:56 -07:00
Hongming Wang
e502003c74 fix(workspace-server): IsRunning surfaces non-2xx + JSON errors
Pre-existing silent-failure path: IsRunning decoded CP responses
regardless of HTTP status, so a CP 500 → empty body → State="" →
returned (false, nil). The sweeper couldn't distinguish "workspace
stopped" from "CP broken" and would leave a dead row in place.

## Fix

  - Non-2xx → wrapped error, does NOT echo body (CP 5xx bodies may
    contain echoed headers; leaking into logs would expose bearer)
  - JSON decode error → wrapped error
  - Transport error → now wrapped with "cp provisioner: status:"
    prefix for easier log grepping

## Tests

+7 cases (5-status table + malformed JSON + existing transport).
IsRunning coverage 100%; overall cp_provisioner at 98%.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 08:47:55 -07:00
molecule-ai[bot]
67c12b119e Merge pull request #1016 from Molecule-AI/fix/a11y-workspace-node
fix(a11y): WorkspaceNode font floor, contrast, focus rings
2026-04-20 08:47:53 -07:00
molecule-ai[bot]
fabb108cb4 Merge pull request #1017 from Molecule-AI/fix/rows-err-missing
fix(bundle/exporter): add rows.Err() check + MCP secret scrub
2026-04-20 08:47:49 -07:00
molecule-ai[bot]
eeb5552fba Merge pull request #1022 from Molecule-AI/fix/unchecked-exec-workspace-provision
fix(mcp): scrub secrets in commit_memory + MCP handler tests
2026-04-20 08:47:25 -07:00
molecule-ai[bot]
d3b310e895 Merge pull request #1049 from Molecule-AI/feat/platform-native-hma-instructions
feat(runtime): inject HMA memory instructions at platform level (#1047)
2026-04-20 08:47:20 -07:00
Hongming Wang
6671bbc82a Merge pull request #1070 from Molecule-AI/staging
chore: promote workspace-server tenant-auth fix to main
2026-04-20 08:42:08 -07:00
Hongming Wang
b145a55b06 merge main into staging for #1070 promotion
# Conflicts:
#	.gitignore
2026-04-20 08:41:58 -07:00
Hongming Wang
2730a20194 Merge pull request #1067 from Molecule-AI/fix/tenant-workspace-auth
fix(workspace-server): send X-Molecule-Admin-Token on CP calls
2026-04-20 08:39:49 -07:00
molecule-ai[bot]
bb53ff86b0 Merge pull request #1069 from Molecule-AI/fix/github-token-refresh-1068
fix: GitHub token refresh — WorkspaceAuth path for credential helper (#1068)
2026-04-20 08:37:46 -07:00
Hongming Wang
6c4d1ae4db test(workspace-server): cover Stop/IsRunning/Close + auth-header + transport errors
Closes review gap: pre-PR coverage on CPProvisioner was 37%.
After this commit every exported method is exercised:

  - NewCPProvisioner            100%
  - authHeaders                  100%
  - Start                         91.7% (remainder: json.Marshal error
                                   path, unreachable with fixed-type
                                   request struct)
  - Stop                         100% (new — header + path + error)
  - IsRunning                    100% (new — 4-state matrix + auth)
  - Close                        100% (new — contract no-op)

New cases assert both auth headers (shared secret + admin_token) land
on every outbound request, transport failures surface clear errors
on Start/Stop, and IsRunning doesn't misreport on transport failure.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 08:37:39 -07:00
rabbitblood
b1bb5f838a fix: GitHub token refresh — add WorkspaceAuth path for credential helper (#1068)
PR #729 tightened AdminAuth to require ADMIN_TOKEN, breaking the
workspace credential helper which called /admin/github-installation-token
with a workspace bearer token. Tokens expired after 60 min with no refresh.

Fix: Add /workspaces/:id/github-installation-token under WorkspaceAuth
so any authenticated workspace can refresh its GitHub token. Keep the
admin path as backward-compatible alias.

Update molecule-git-token-helper.sh to use the workspace-scoped path
when WORKSPACE_ID is set.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 08:30:02 -07:00