Commit Graph

174 Commits

Author SHA1 Message Date
molecule-ai[bot]
45715aa8a5 fix(canvas/test): patch test regressions from PR #1243 + proximity hitbox fix (#1313)
* fix(ci): revert cancel-in-progress to true — ubuntu-runner dispatch stalled

With cancel-in-progress: false, pending CI runs accumulate in the
ci-staging concurrency group. New pushes create queued runs, but
GitHub dispatches multiple runs for the same SHA instead of replacing
the pending one. All runs get stuck/cancelled before completing.

Reverting to cancel-in-progress: true restores CI operation — runs
that are superseded are cancelled, freeing the concurrency slot for
the new run to proceed.

Runner availability (ubuntu-latest dispatch stall) is a separate
infra issue tracked independently.

* fix(security): validate tar header names in copyFilesToContainer — CWE-22 path traversal (#1043)

Tar header names were built from raw map keys without validation. A malicious
server-side caller could embed "../" in a file name to escape the destPath
volume mount (/configs) and write files outside the intended directory.

Fix: validate each name with filepath.Clean + IsAbs + HasPrefix("..") checks
before using it in the tar header, then join with destPath for the archive
header. Also guard parent-directory creation against traversal.

Closes #1043.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas/test): patch regressed tests from PR #1243 orgs-page flakiness fix

Two regressions introduced by PR #1243 (fix issue #1207):

1. **ContextMenu.keyboard.test.tsx** — `setPendingDelete` now receives
   `{id, name, hasChildren}` (cascade-delete UX, PR #1252), but the test
   expected only `{id, name}`. Added `hasChildren: false` to the assertion.

2. **orgs-page.test.tsx** — 10 tests awaited `vi.advanceTimersByTimeAsync(50)`
   without `act()`. With fake timers, `setState` (synchronous) is flushed by
   `advanceTimersByTimeAsync`, but the React state update it triggers is a
   microtask — so the test saw stale render. Wrapping in `act(async () =>
   { await vi.advanceTimersByTimeAsync(50); })` ensures microtasks drain
   before assertions run.

All 813 vitest tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas): add 100px proximity threshold to drag-to-nest detection

Fixes #1052 — previously, getIntersectingNodes() returned any node whose
bounding box overlapped the dragged node, regardless of actual pixel
distance. On a sparse canvas this triggered the "Nest Workspace" dialog
even when the dragged node was nowhere near any target.

The fix adds an on-node-drag proximity filter: only nodes within 100px
(center-to-center) of the dragged node are eligible as nest targets.
Distance is computed as squared Euclidean to avoid the sqrt overhead in
the hot drag path.

Added two tests to Canvas.pan-to-node.test.tsx covering the mock wiring
and confirming the regression is addressed in Canvas.tsx.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>
Co-authored-by: Molecule AI Core-FE <core-fe@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 07:06:57 +00:00
molecule-ai[bot]
04c3bc6eb1 fix(canvas): cascade-delete UX — warn before deleting workspace with children (PR #1252)
- Store: pendingDelete now carries `hasChildren: boolean` (computed from
  nodes.some(parentId === nodeId))
- ContextMenu: passes hasChildren into setPendingDelete
- Canvas: dialog title changes to "Delete Workspace and Children" with
  ⚠️ message when hasChildren; confirms with "Delete All"

Refs: #1137

Co-authored-by: Molecule AI Fullstack (floater) <fullstack-floater@agents.moleculesai.app>
2026-04-21 03:25:12 +00:00
molecule-ai[bot]
221d8b2384 fix(canvas): guard undefined lastErrorRate and period dates in metrics (PR #1250)
- DetailsTab: use `(data.lastErrorRate ?? 0)` instead of bare multiply to
  prevent NaN% when the field is absent on pre-provisioning workspaces.
- WorkspaceUsage: make formatPeriod accept optional start/end strings;
  return "—" for undefined so the usage period shows blank rather than
  "Invalid Date" for provisioning/partial workspaces.

Refs: #1139

Co-authored-by: Molecule AI Fullstack (floater) <fullstack-floater@agents.moleculesai.app>
2026-04-21 03:22:17 +00:00
molecule-ai[bot]
883cb7ebc3 fix(canvas): eliminate flaky timer state between orgs-page tests (#1207) (#1243)
Root cause: tests used try/finally { vi.useRealTimers() / vi.useFakeTimers() }
back-and-forth. When any test's finally-block called vi.useFakeTimers(),
subsequent tests inherited fake timer state causing 50ms real setTimeouts
to not fire and mockFetch to accumulate calls across test boundaries.

Fix: consolidate timer management to beforeEach/afterEach hooks.
- beforeEach: vi.useFakeTimers() — all tests start from known fake state
- afterEach: cleanup() + vi.useRealTimers() — restore real timers for next test
- Individual tests: use vi.advanceTimersByTimeAsync(50) instead of real setTimeout

Also removed duplicate afterEach(cleanup()) and unused waitFor import.

Closes #1207.

Co-authored-by: Molecule AI Core-FE <core-fe@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 03:07:28 +00:00
molecule-ai[bot]
014295d57f fix(issue-1207): eliminate orgs-page test flakiness (#1235)
* fix(auth): F1094 — requireCallerOwnsOrg reads org_id not created_by (#1200)

Root cause: requireCallerOwnsOrg (org_plugin_allowlist.go:116) was
reading org_api_tokens.created_by to determine caller's org workspace
ID. But created_by is a provenance label ("session", "admin-token",
"org-token:<prefix>") — never a UUID. The equality check
callerOrg != targetOrgID always failed → every org-token caller
got 403 on /orgs/:id/plugins/allowlist routes.

Fix:
- Migration 036: adds org_id UUID column (nullable) to org_api_tokens
  with index. Existing pre-migration tokens get org_id=NULL → deny
  by default (safer than cross-org access).
- orgtoken.Issue: takes new orgID param; stores in org_id column.
- orgtoken.OrgIDByTokenID: new helper reads org_id for a token ID.
  Returns ("", nil) for NULL/unanchored tokens.
- requireCallerOwnsOrg: now calls OrgIDByTokenID instead of reading
  created_by. Pre-migration tokens with org_id=NULL get callerOrg=""
  → denied (safer).
- orgTokenActor (org_tokens.go): returns (createdBy, orgID) pair.
  Token minted via another org token gets its org_id set at mint time.
  Session/ADMIN_TOKEN callers get orgID="".
- orgtoken.Token struct: adds OrgID field for list display.
- orgtoken.List: selects org_id alongside other columns.
- Updated existing tests for new Issue signature.
- Added 10 regression tests covering: happy path, unanchored denial,
  cross-org denial, session bypass, DB error denial.

🤖 Generated with [Claude Code](https://claude.ai/claude-code)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(security): replace err.Error() leaks with prod-safe messages (#1206)

- workspace_provision.go: provisionWorkspace, provisionWorkspaceCP —
  replaced 7 err.Error() calls with "provisioning failed" in both
  Broadcast payloads and last_sample_error DB column. Full error
  preserved in server-side log.Printf.

- plugins_install_pipeline.go: resolveAndStage — replaced 5 err.Error()
  calls with generic messages:
    "invalid plugin source"
    "plugin source not supported"
    "invalid plugin name"
    "staged plugin exceeds size limit"
    "plugin manifest integrity check failed"

Risk mitigated: DB errors (pq: connection refused, pq: deadlock),
OS errors, and internal paths no longer leak in HTTP JSON responses
or WebSocket broadcasts.

Added regression tests (workspace_provision_test.go):
  - TestProvisionWorkspace_NoInternalErrorsInBroadcast
  - TestProvisionWorkspaceCP_NoInternalErrorsInBroadcast
  - TestResolveAndStage_NoInternalErrorsInHTTPErr

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(F1089): log panic-recovery UPDATE errors in scheduler

The panic defer blocks in tick() and fireSchedule() now capture
and log errors from the db.DB.ExecContext call that advances next_run_at
after a panic. Previously, a DB failure during panic recovery was
silent — the log line for the panic itself appeared but any subsequent
UPDATE failure was invisible, risking unnoticed scheduler drift.

context.Background() was already used (F1089 comment in place); this
commit adds the missing error capture + log.Printf on exec failure.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(issue-1207): eliminate orgs-page test flakiness

Three root causes addressed:

1. Duplicate afterEach blocks (lines 97-103) — two identical
   afterEach(() => { cleanup(); }) blocks collapsed to one.

2. Fake-timer isolation gap — if a polling test failed before its
   finally-block ran, vi.useFakeTimers() persisted globally. The next
   non-polling test's setTimeout(50) then hung indefinitely (fake
   timers don't advance without vi.advanceTimersByTime), causing
   waitFor/async timeouts. Fixed by calling vi.useRealTimers()
   unconditionally in beforeEach (guaranteed clean slate) and
   afterEach (even when a test fails before its own finally).

3. mockFetch.callHistory now cleared via mockReset() in beforeEach,
   preventing "expected 2 calls but got N" failures from carry-over
   between polling tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Molecule AI Dev Lead <dev-lead@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 02:47:25 +00:00
molecule-ai[bot]
4e39201d76 fix(canvas): show toast when clipboard API unavailable in ConsoleModal (#1199) (#1231)
Use explicit navigator.clipboard check instead of optional chaining so
the no-op case is handled explicitly. When clipboard API is unavailable
(non-HTTPS context) show a toast: "Copy requires HTTPS — please select
and copy manually". Production is always HTTPS so this only affects
local dev with http:// canvas.

Closes #1199.

Co-authored-by: Molecule AI Core-FE <core-fe@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 02:45:29 +00:00
molecule-ai[bot]
fcd3a6eaf0 fix(test): align ssrf_test.go localhost test cases with isSafeURL behaviour (#1192)
* feat(canvas): rewrite MemoryInspectorPanel to match backend API

Issue #909 (chunk 3 of #576).

The existing MemoryInspectorPanel used the wrong API endpoint
(/memory instead of /memories) and wrong field names (key/value/version
instead of id/content/scope/namespace/created_at). It also lacked
LOCAL/TEAM/GLOBAL scope tabs and a namespace filter.

Changes:
- Fix endpoint: GET /workspaces/:id/memories with ?scope= query param
- Fix MemoryEntry type to match actual API: id, content, scope,
  namespace, created_at, similarity_score
- Add LOCAL/TEAM/GLOBAL scope tabs
- Add namespace filter input
- Remove Edit functionality (no update endpoint in backend)
- Delete uses DELETE /workspaces/:id/memories/:id (by id, not key)
- Full rewrite of 27 tests to match new API and UI structure
- Uses ConfirmDialog (not native dialogs) for delete confirmation
- All dark zinc theme (no light colors)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: tighten types + improve provision-timeout message (#1135, #1136)

#1135 — TypeScript: make BudgetData.budget_used and WorkspaceMetrics
fields optional to match actual partial-response shapes from provisioning-
stuck workspaces. Runtime already guarded with ?? 0.

#1136 — provisiontimeout.go: replace misleading "check required env vars"
hint (preflight catches that case upfront) with accurate message about
container starting but failing to call /registry/register.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

* fix(test): align ssrf_test.go localhost test cases with isSafeURL behaviour

isSafeURL blocks 127.0.0.1 via ip.IsLoopback() even in dev environments.
The test cases `wantErr: false` for localhost were incorrect — the
test would fail when go test runs. Fix by changing wantErr to true
for both localhost test cases.

Rationale: loopback blocking at this layer is intentional. Access
control is enforced by WorkspaceAuth + CanCommunicate at the A2A
routing layer, not by the URL validation. Opening this would widen
the SSRF attack surface without adding real dev flexibility.

Closes: ssrf_test.go inconsistency reported 2026-04-21

Co-Authored-By: Claude Sonnet 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Molecule AI Core-UIUX <core-uiux@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 02:08:45 +00:00
Hongming Wang
c033f3b051 chore(canvas): review nits on ConsoleModal + Peers empty state
Post-review cleanup for the #1178 / #1189 bootstrap-watcher flow:

- ConsoleModal status-code matching uses \b regex anchors instead of
  raw substrings. Before, any error message containing "501" inside
  a longer digit run ("15012") would false-match into the self-hosted
  branch. Unlikely in practice but cheap to tighten.

- Peers empty-state copy now explains WHY the list is empty on
  offline / failed / provisioning workspaces instead of rendering the
  same "No reachable peers" text used for healthy workspaces with
  zero siblings. Online workspaces unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 18:02:22 -07:00
Hongming Wang
3b339d13e6 fix(canvas): skip /registry/:id/peers fetch when workspace not online
The peers endpoint requires a workspace-scoped bearer token (see
validateDiscoveryCaller in handlers/discovery.go — designed for
agent-to-agent calls). The canvas session doesn't hold that token, so
every Details-tab open for a provisioning / failed / offline workspace
fired a 401 that cluttered devtools and lit up the error banner even
though the real UX here is "no peers — the workspace hasn't booted."

Gate the fetch on status ∈ {online, degraded} and render an empty
Peers list for everything else.

Follow-up: give the canvas a way to see peers for any workspace (admin
session should be enough). Tracked separately — this fix just quiets
the noise on the common case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 17:38:25 -07:00
Hongming Wang
48900e34b0 feat(canvas): show last_sample_error + EC2 console output on failed workspaces
Part 3 of 3 for the "fail fast + comprehensive logs" UX. Platform PR
#1168 and controlplane #181 ship the server-side; this PR surfaces the
data in the canvas.

Two changes:

1. DetailsTab renders `last_sample_error` in a dedicated Error section
   when the workspace is failed (or degraded with an error). Before,
   the only trace of why a workspace failed was a generic banner —
   users had to click "View Logs", which opened the terminal tab (the
   post-boot log, empty on a runtime crash). Now the actual Python
   traceback is inline. A "View console output" button in the same
   section opens the full serial console in a modal.

2. New ConsoleModal component. Fetches GET /workspaces/:id/console
   (platform → CP → ec2:GetConsoleOutput). Portal-rendered above the
   canvas with Copy / Close / Esc handlers. Renders a friendly message
   on 501 (self-hosted deploys without CP) and 404 (instance
   terminated).

3. ProvisioningTimeout's "View Logs" button now opens the console
   modal instead of the (usually empty) terminal tab — when a
   workspace is stuck in provisioning, the cloud-init + user-data
   trace is what the user actually needs.

Tests cover the closed-state no-fetch, happy-path fetch, 501/404
messaging, and Close/Escape wiring.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 17:22:15 -07:00
molecule-ai[bot]
b1433ee8e6 Merge pull request #1171 from Molecule-AI/staging
chore: fast-forward staging with main review-cleanup commits
2026-04-21 00:16:58 +00:00
molecule-ai[bot]
45cf87c7b8 test(BatchActionBar): add hasFailedBatch success-reset test (#1170)
CP-QA approved. 34-line test for BatchActionBar retry state reset after successful batch action.
2026-04-21 00:12:37 +00:00
molecule-ai[bot]
45f5b47487 fix(security): add USER directive before ENTRYPOINT in all tenant images (#1155)
Closes: #177 (CRITICAL — Dockerfile runs as root)

Dockerfiles changed:
- workspace-server/Dockerfile (platform-only): addgroup/adduser + USER platform
- workspace-server/Dockerfile.tenant (combined Go+Canvas): addgroup/adduser + USER canvas
  + chown canvas:canvas on canvas dir so non-root node process can read it
- canvas/Dockerfile (canvas standalone): addgroup/adduser + USER canvas
- workspace-server/entrypoint-tenant.sh: update header comment (no longer starts
  as root; both processes now start non-root)

The entrypoint no longer needs a root→non-root handoff since both the Go
platform and Canvas node run as non-root by default. The 'canvas' user owns
/app and /platform, so volume mounts owned by the host's canvas user work
without needing a root init step.

Co-authored-by: Molecule AI CP-BE <cp-be@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 23:51:33 +00:00
696bd86322 Merge branch 'fix/canvas-orgs-page-tests' into staging 2026-04-20 23:47:02 +00:00
223584c66d Merge remote-tracking branch 'origin/staging' into fix/canvas-orgs-page-tests 2026-04-20 23:46:17 +00:00
acf03cd057 fix(security): suppress raw response body from user-facing billing errors
billing.ts (startCheckout, openBillingPortal): replace raw res.text()
in thrown Error with a safe status-only message. The response body from
/cp/billing/* routes can contain Stripe API error detail (invalid key,
card decline message, raw Stripe envelope) that should not reach clients.

orgs/page.tsx (createOrg): same fix — raw body → safe message.

Full body is logged server-side for debugging.

Closes: #91 (CWE-209 — Stripe key echoed in error)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 23:23:25 +00:00
a339cde8d5 fix(canvas): restore 12 orgs-page tests broken by TermsGate fetch + mock leak
Root causes:
1. TermsGate (rendered inside OrgsPage Shell) fetches /cp/auth/terms-status
   before OrgsPage fetches /cp/orgs, consuming the first mockResponseOnce
   slot — leaving /cp/orgs with no mock and throwing TypeError.
   Fix: mock TermsGate as a pass-through component in vi.mock.

2. Non-polling tests used mockFetchSession.mockResolvedValueOnce() which
   exhausted after one call; React 18 concurrent re-renders call
   fetchSession() multiple times, causing subsequent calls to return
   undefined. Fix: use mockResolvedValue() (persistent) for fetchSession.

3. vi.clearAllMocks() in beforeEach kept mockResolvedValueOnce from
   previous tests from leaking BUT the vi.fn() mock implementation was
   already reset by mockFetchSession.mockReset() in beforeEach. Tests
   were passing stale persistent mocks from previous tests. Fix:
   mockFetchSession.mockReset() in beforeEach + mockResolvedValue in
   each test.

4. Polling tests used vi.useFakeTimers() without shouldAdvanceTime,
   which prevented React's useEffect from calling fetch() (0 calls).
   Fix: use vi.useFakeTimers({ shouldAdvanceTime: true }) + await
   vi.advanceTimersByTimeAsync() to advance time during await.

5. Unmount test unmounted before effects fired (with shouldAdvanceTime).
   Fix: flush microtasks with await vi.advanceTimersByTimeAsync(0)
   before unmount so the effect runs and schedules the poll timer.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 23:12:34 +00:00
Hongming Wang
fc3ae5a63a chore: code-review cleanup on today's shipped PRs
Three nits identified during post-merge review of #1119, #1133:

1. ContextMenu.tsx imported `removeNode` from the canvas store but
   stopped using it when the delete-confirm flow moved to Canvas in
   #1133. Also removed the now-unused mock entry in the keyboard
   test so the test inventory matches the real call list.

2. Preflight's YAML parse failure was a silent pass — defensible since
   the in-container preflight owns the schema, but invisible to ops if
   a template ships malformed YAML. Log at WARN so the signal surfaces
   without blocking the provision.

3. formatMissingEnvError rendered its slice via %q, producing
   `["A" "B"]` which is Go-literal-looking and ugly in a user-facing
   error. Join with ", " instead. Test updated to assert the new
   format.

No behavioural changes beyond the log line; fixes are review nits, not
bug fixes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 16:04:57 -07:00
Hongming Wang
0ddf266e54 fix(canvas): delete workspace dialog race with context menu close
Clicking "Delete" in the workspace context menu did nothing for stuck
workspaces. The confirm dialog was rendered via portal as a child of
ContextMenu. ContextMenu's outside-click handler checks whether the
click target is inside its ref — but the portal puts the dialog in
document.body, outside the ref. So clicking the dialog's Confirm
counted as "outside", closed the menu, unmounted the dialog mid-click,
and the onConfirm handler never ran.

Hoist the pending-delete state to the canvas store and render the
confirm dialog at the Canvas level (same pattern as the existing
pendingNest dialog). The dialog now outlives ContextMenu, so the
outside-click close is harmless. Close the context menu on the Delete
click itself rather than waiting for the dialog to resolve.

Add a regression test covering the new flow and add the standard
?confirm=true query param so the backend's child-cascade guard is
consulted correctly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 15:50:30 -07:00
Hongming Wang
c3f7447e86 fix: harden stuck-provisioning UX — details crash, preflight, sweeper
Workspaces stuck in status='provisioning' previously surfaced in three
bad ways:

1. **Details tab crashed** with `Cannot read properties of undefined
   (reading 'toLocaleString')`. `BudgetSection` + `WorkspaceUsage`
   assumed full response shapes but a provisioning-stuck workspace
   returns partial `{}`. Guard each deep field with `?? 0` and cover
   the partial-response case with regression tests.

2. **Missing required env vars failed silently** 15+ minutes later as
   a cosmetic "Provisioning Timeout" banner. The in-container preflight
   catches them but by then the container has already crashed without
   calling /registry/register, so the workspace sat in 'provisioning'
   forever. Mirror the preflight server-side: parse config.yaml's
   `runtime_config.required_env` before launch, fail fast with a
   WORKSPACE_PROVISION_FAILED event naming the missing vars.

3. **No backend timeout** ever flipped a stuck workspace to 'failed'.
   Add a registry sweeper (10m default, env-overridable) that detects
   workspaces stuck past the window, flips them to 'failed', and emits
   WORKSPACE_PROVISION_TIMEOUT. Race-safe: the UPDATE re-checks the
   status + age predicate so a concurrent register/restart wins.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 14:51:39 -07:00
Hongming Wang
91187342b4 feat(auth): organization-scoped API keys for admin access
Adds user-facing API keys with full-org admin scope. Replaces the
single ADMIN_TOKEN env var with named, revocable, audited tokens
that users can mint/rotate from the canvas UI without ops
intervention.

Designed for the beta growth phase — one token tier (full admin).
Future work will split into scoped roles (admin / workspace-write
/ read-only) and per-workspace bindings. See docs/architecture/
org-api-keys.md for the design + follow-up roadmap.

## Surface

  POST   /org/tokens        mint (plaintext returned once)
  GET    /org/tokens        list live keys (prefix-only)
  DELETE /org/tokens/:id    revoke (idempotent)

All AdminAuth-gated. Bootstrap path: mint the first token via
ADMIN_TOKEN or canvas session; tokens can mint more tokens after.

## Validation as a new AdminAuth tier (2a)

AdminAuth evaluation order:
  Tier 0  lazy-bootstrap fail-open (only when no live tokens AND
          no ADMIN_TOKEN env)
  Tier 1  verified WorkOS session via /cp/auth/tenant-member
  Tier 2a org_api_tokens SELECT — NEW
  Tier 2b ADMIN_TOKEN env (bootstrap / CLI break-glass)
  Tier 3  any live workspace token (deprecated, only when ADMIN_TOKEN
          unset)

Tier 2a runs ONE indexed lookup (partial index on
token_hash WHERE revoked_at IS NULL) + an async last_used_at
bump. No measurable latency cost on the hot path.

## UI

New "Org API Keys" tab in the settings panel. Label field for
human-readable naming. Plaintext shown once + clipboard copy.
Revoke with confirm dialog. Mirrors the existing workspace-
TokensTab flow so users who've used one get the other for free.

## Security properties

  - Plaintext never stored. sha256 hash + 8-char display prefix.
  - Revocation is immediate: partial index on revoked_at IS NULL
    means the next request validates or fails in microseconds.
  - created_by audit field captures provenance: "org-token:<short>"
    when a token mints another, "session" for browser-UI mints,
    "admin-token" for the ADMIN_TOKEN bootstrap path.
  - Validate() collapses all failure shapes into ErrInvalidToken
    so response-shape can't distinguish "never existed" from
    "revoked".

## Tests

  - internal/orgtoken: 9 unit tests (hash storage, empty field
    null-ing, validation happy path, empty plaintext, unknown hash,
    revoked filtering, list ordering, revoke idempotency, has-any-
    live short-circuit).
  - AdminAuth tier-2a integration covered by existing middleware
    tests unchanged (fail-open + bearer paths).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 14:01:41 -07:00
Hongming Wang
52235aeb27 feat(router): /cp/* reverse-proxy to CP + same-origin canvas fetches
Canvas's browser bundle issues fetches to both CP endpoints
(/cp/auth/me, /cp/orgs, ...) AND tenant-platform endpoints
(/canvas/viewport, /approvals/pending, /org/templates). They
share ONE build-time base URL. Baking api.moleculesai.app
broke tenant calls with 404; baking the tenant subdomain broke
auth. Tried both today and saw exactly one failure mode per
attempt.

Real fix: same-origin fetches + tenant-side split. Adds:

  internal/router/cp_proxy.go      # /cp/* → CP_UPSTREAM_URL

mounted before NoRoute(canvasProxy). Now a tenant serves:

  /cp/*              → reverse-proxy to api.moleculesai.app
  /canvas/viewport,
  /approvals/pending,
  /workspaces/:id/*,
  /ws, /registry,    → tenant platform (existing handlers)
  /metrics
  everything else    → canvas UI (existing reverse-proxy)

Canvas middleware reverts to `connect-src 'self' wss:` for the
same-origin path (keeping explicit PLATFORM_URL whitelist as a
self-hosted escape hatch when the build-arg is non-empty).

CI build-arg flips to NEXT_PUBLIC_PLATFORM_URL="" so the bundle
issues relative fetches.

Security of cp_proxy:
  - Cookie + Authorization PRESERVED across the hop (opposite of
    canvas proxy) — they carry the WorkOS session, which is the
    whole point.
  - Host rewritten to upstream so CORS + cookie-domain on the CP
    side see their own hostname.
  - Upstream URL validated at construction: must parse, must be
    http(s), must have a host — misconfig fails closed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 13:01:40 -07:00
Hongming Wang
d069231d0b fix(canvas): include NEXT_PUBLIC_PLATFORM_URL in CSP connect-src
Tenant page loads were blocked by:

  Refused to connect to 'https://api.moleculesai.app/cp/auth/me'
  because it violates the document's Content Security Policy.

CSP had `connect-src 'self' wss:` — fine for same-origin + any wss,
but browser refuses cross-origin HTTPS fetches that aren't listed.
PLATFORM_URL (baked from NEXT_PUBLIC_PLATFORM_URL, which is the CP
origin on SaaS tenants) needs to be explicit.

Fix: middleware reads NEXT_PUBLIC_PLATFORM_URL at build/runtime
and adds both the https and wss siblings to connect-src. Self-
hosted deploys that override the build-arg automatically get a
matching CSP — no hardcoded hostname.

Test added: buildCsp includes NEXT_PUBLIC_PLATFORM_URL origin in
connect-src when set. Also loosens the dev `ws:` assertion since
dev uses `connect-src *` which subsumes ws (pre-existing behavior,
test was stale).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 12:55:03 -07:00
Hongming Wang
9abe2e5739 Merge pull request #1089 from Molecule-AI/fix/canvas-csp-nonce-propagation
fix(canvas): root layout dynamic so CSP nonce reaches Next scripts
2026-04-20 12:34:08 -07:00
Hongming Wang
1af6f696a2 fix(canvas): make root layout dynamic so CSP nonce reaches Next scripts
Tenant page loads were failing with repeated CSP violations:

  Executing inline script violates ... script-src 'self'
  'nonce-M2M4YTVh...' 'strict-dynamic'. ...

because Next.js's bootstrap inline scripts were emitted without a
nonce attribute. The middleware was generating per-request nonces
correctly and sending them via `x-nonce` — but the layout was
fully static, so Next.js cached the HTML once and served that cached
bundle (no nonces baked in) for every request.

Fix: call `await headers()` in the root layout. That opts the tree
into dynamic rendering AND signals Next.js to propagate the
x-nonce value to its own generated <script> tags.

The `nonce` return value is intentionally unused — the framework
handles its bootstrap scripts automatically once the read happens.
Future code that adds third-party <Script> components (analytics,
etc.) should pass the returned nonce explicitly.

Verified against live tenant: before this change every /_next/
chunk script tag in the HTML had no nonce attribute; expected after
deploy is `<script nonce="..." src="/_next/...">` on each.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 12:34:03 -07:00
rabbitblood
35df23850e fix(canvas): CSP_DEV_MODE + admin token for local Docker (#1052 follow-up)
Three changes that keep getting lost on nuke+rebuild:
1. middleware.ts: read CSP_DEV_MODE env to relax CSP in local Docker
2. api.ts: send NEXT_PUBLIC_ADMIN_TOKEN header (AdminAuth on /workspaces)
3. Dockerfile: accept NEXT_PUBLIC_ADMIN_TOKEN as build arg

All three are required for the canvas to work in local Docker where
canvas (port 3000) fetches from platform (port 8080) cross-origin.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 12:23:43 -07:00
Hongming Wang
7a2a17591c chore(canvas): remove dead /waitlist page (lives in molecule-app)
#1080 added /waitlist to canvas, but canvas isn't served at
app.moleculesai.app — it backs the tenant subdomains (acme.moleculesai.app
etc.). The real /waitlist lives in the separate molecule-app repo,
which is what the CP auth callback redirects to.

molecule-app#12 has the real page + contact form wiring to
/cp/waitlist/request. This canvas copy was never reachable and would
only diverge.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 09:55:35 -07:00
Hongming Wang
b7149c5dda feat(canvas): /waitlist page with contact form
Adds the user-facing half of the beta-gate: a page at /waitlist that
the CP auth callback redirects users to when their email isn't on
the allowlist. Collects email + optional name + use-case and POSTs
to /cp/waitlist/request (backend landed in controlplane #150).

## Behavior

- No auto-pre-fill of email from URL query (CP's #145 dropped the
  ?email= param for the privacy reason; this test guards against a
  future regression on the client side).
- Client-side validates email shape for instant feedback; backend
  re-validates.
- Three UI states after submit:
    success → "your request is in" banner, form hidden
    dedup   → softer "already on file" banner when backend returns
              dedup=true (same 200, no 409 to avoid enumeration)
    error   → inline banner with backend message or network fallback

## Tests

9 tests in __tests__/waitlist-page.test.tsx covering:
- default render + a11y (role=button, role=status, role=alert)
- URL-pre-fill privacy regression guard
- HTML5 + JS validation (empty, malformed)
- successful POST with trimmed body
- dedup branch
- non-2xx with + without error field
- network rejection

Follow-up to the beta-gate rollout on controlplane #145 / #150.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 09:47:06 -07:00
molecule-ai[bot]
a3e70b739c Merge pull request #949 from Molecule-AI/feat/canvas-batch-operations
feat(canvas): batch operations — multi-select + restart/pause/delete
2026-04-20 08:48:26 -07:00
molecule-ai[bot]
fc0e02d692 Merge pull request #1011 from Molecule-AI/test/qa-coverage-orgs-page-and-api-timeout
test(canvas): QA coverage — orgs page polling + API timeout
2026-04-20 08:48:00 -07:00
molecule-ai[bot]
67c12b119e Merge pull request #1016 from Molecule-AI/fix/a11y-workspace-node
fix(a11y): WorkspaceNode font floor, contrast, focus rings
2026-04-20 08:47:53 -07:00
875d4ab0a5 fix(gate-1): remove unused fireEvent import (#1011)
Mechanical lint fix. github-code-quality[bot] flagged unused
import on line 18 — fireEvent is imported but never referenced in
the test file. Removing it clears the code quality gate without
changing any test behaviour.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-20 02:52:57 +00:00
Molecule AI Frontend Engineer
da2a2b795f fix(a11y): WorkspaceNode font floor, contrast, focus rings (Cycle 10)
C1: skills badge spans text-[7px]→text-[10px]; "+N more" overflow
    text-[7px] text-zinc-500→text-[10px] text-zinc-400
C2: Team section label text-[7px] text-zinc-600→text-[10px] text-zinc-400
H4: status label text-[9px]→text-[10px]; active-tasks count
    text-[9px] text-amber-300/80→text-[10px] text-amber-300 (remove opacity
    modifier per design-system contrast rule); current-task text
    text-[9px] text-amber-300/70→text-[10px] text-amber-300
L1: add focus-visible:ring-2 focus-visible:ring-blue-500/70 to the Restart
    button (independently Tab-focusable inside role="button" wrapper) and to
    the Extract-from-team button in TeamMemberChip; TeamMemberChip
    role="button" div already has the focus ring (COVERED, no change)

762/762 tests pass · build clean

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 22:01:58 +00:00
qa-agent
4db8d30ece test(canvas): cover /orgs 5s polling on in-flight orgs
The test docstring promised polling coverage but I'd only wired the
describe-block header, not the actual tests. Closing that gap — vitest
fake timers drive three cases:

- `provisioning` org → 2nd fetch fires after 5.1s advance
- all `running` → no 2nd fetch even after 10s advance
- `awaiting_payment` org, unmount before timer fires → no post-unmount
  fetch (cleanup correctly clears the pollTimer)

The unmount case is the meaningful one: without it a fast nav-away
leaves the 5s interval chasing the CP forever. page.tsx L97-99 does
clear the timer; the test pins the contract.

Local baseline on origin/staging tip ede6597 + this branch:
  canvas vitest: 50 files / 781 tests, all green (+3 vs prior commit)
  canvas build:  clean

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-19 19:18:30 +00:00
qa-agent
e70cd1c2aa test(canvas): pin AbortSignal timeout regression + cover /orgs landing page
Two independent test additions that harden the surface freshly landed on
staging via PRs #982 (canvas fetch timeout), #992 (/orgs landing), #994
(post-checkout redirect to /orgs).

canvas/src/lib/__tests__/api.test.ts (+74 lines, 7 new tests)
  - GET/POST/PATCH/PUT/DELETE each pass an AbortSignal to fetch
  - TimeoutError (DOMException name=TimeoutError) propagates to the caller
  - Each request installs its own signal — no shared module-level controller
    that would allow one slow request to cancel an unrelated fast one
  This is the hardening nit I flagged in my APPROVE-w/-nit review of
  fix/canvas-api-fetch-timeout. Landing as a follow-up now that #982 is in
  staging.

canvas/src/app/__tests__/orgs-page.test.tsx (+251 lines, new file, 10 tests)
  - Auth guard: signed-out → redirectToLogin and no /cp/orgs fetch
  - Error state: failed /cp/orgs → Error message + Retry button
  - Empty list: CreateOrgForm renders
  - CTA by status:
      running          → "Open" link targets {slug}.moleculesai.app
      awaiting_payment → "Complete payment" → /pricing?org=<slug>
      failed           → "Contact support" mailto
  - Post-checkout: ?checkout=success renders CheckoutBanner AND
    history.replaceState scrubs the query param
  - Fetch contract: /cp/orgs called with credentials:include + AbortSignal

Local baseline on origin/staging tip ede6597:
  canvas vitest: 50 files / 778 tests, all green
  canvas build:  clean, /orgs route present (2.83 kB / 105 kB first-load)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-19 19:14:54 +00:00
Hongming Wang
dd5654b803 feat(canvas): ToS gate modal + us-east-2 data residency notice
Wraps /orgs in a TermsGate that polls /cp/auth/terms-status on mount
and overlays a blocking modal when the current terms version hasn't
been accepted yet. "I agree" POSTs /cp/auth/accept-terms and dismisses
the modal; the backend records IP + UA as GDPR Art. 7 proof-of-consent.

Also adds a short data residency notice under the page header:
workspaces run in AWS us-east-2 (Ohio, US). An EU region selector is
a future lift once the infra is provisioned there.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 07:44:47 -07:00
Hongming Wang
18894bebe8 feat(canvas): Phase 5 — credit balance pill + low-balance banner
Adds the UI surface for the credit system to /orgs:
- CreditsPill next to each org row. Tone shifts from zinc → amber at
  10% of plan to red at zero.
- LowCreditsBanner appears under the pill for running orgs when the
  balance crosses thresholds: overage_used > 0 → "overage active",
  balance <= 0 → "out of credits, upgrade", trial tail → "trial almost
  out".
- Pure helpers extracted to lib/credits.ts so formatCredits, pillTone,
  and bannerKind are unit-tested without jsdom.

Backend List query now returns credits_balance / plan_monthly_credits
/ overage_used_credits / overage_cap_credits so no second round-trip
is needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 07:27:29 -07:00
Hongming Wang
26b89400ed test(canvas): bump billing test for /orgs success_url 2026-04-19 04:26:01 -07:00
Hongming Wang
d77378294b feat(canvas): post-checkout UX — Stripe success lands on /orgs with banner
Two small polish items that together close the signup-to-running-tenant
flow for real users:

1. Stripe success_url now points at /orgs?checkout=success instead of
   the current page (was pricing). The old behavior left people staring
   at plan cards with no indication payment went through — the new
   behavior drops them right onto their org list where they can watch
   the status flip.

2. /orgs shows a green "Payment confirmed, workspace spinning up"
   banner when it sees ?checkout=success, then clears the query
   param via replaceState so a reload doesn't show it again.

3. /orgs now polls every 5s while any org is awaiting_payment or
   provisioning. Users see the Stripe webhook's effect live — no
   manual refresh needed — and once every org settles the polling
   stops so idle tabs don't hammer /cp/orgs.

Paired with PR #992 (the /orgs page itself) this makes the end-to-end
flow on BILLING_REQUIRED=true deployments feel right:
  /pricing → Stripe → /orgs?checkout=success → banner → live poll →
  "Open" button when org.status transitions to running.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 04:18:32 -07:00
Hongming Wang
b29ffb9546 feat(canvas): /orgs landing page for post-signup users
CP's Callback handler redirects every new WorkOS session to
APP_URL/orgs, but canvas had no such route — new users hit the canvas
Home component, which tries to call /workspaces on a tenant that
doesn't exist yet, and saw a confusing error. This PR plugs that gap
with a dedicated landing page that:

- Bounces anonymous visitors back to /cp/auth/login
- Zero-org users see a slug-picker (POST /cp/orgs, refresh)
- For each existing org, shows status + CTA:
  * awaiting_payment → amber "Complete payment" → /pricing?org=…
  * running          → emerald "Open" → https://<slug>.moleculesai.app
  * failed           → "Contact support" → mailto
  * provisioning     → read-only "provisioning…"
- Surfaces errors inline with a Retry button

Deliberately server-light: one GET /cp/orgs, no WebSocket, no canvas
store hydration. Goal is to move the user from signup to either
Stripe Checkout or their tenant URL with one click each.

Closes the last UX gap between the BILLING_REQUIRED gate landing on
the CP and real users being able to complete a signup today.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 04:13:54 -07:00
Hongming Wang
956c1fb9b9 fix(canvas): add 15s fetch timeout on API calls
Pre-launch audit flagged api.ts as missing a timeout on every fetch.
A slow or hung CP response would leave the UI spinning indefinitely
with no way for the user to abort — effectively a client-side DoS.

15s is long enough for real CP queries (slowest observed is Stripe
portal redirect at ~3s) and short enough that a stalled backend
surfaces as a clear error with a retry affordance.

Uses AbortSignal.timeout (widely supported since 2023) so the
abort propagates through React Query / SWR consumers cleanly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 02:12:47 -07:00
Hongming Wang
68f55c3ebc Merge pull request #974 from Molecule-AI/fix/canvas-a11y-degraded-badge
fix(canvas): degraded badge WCAG AA contrast (closes #885 p1)
2026-04-19 00:28:39 -07:00
Hongming Wang
50364817ac Merge pull request #967 from Molecule-AI/chore/shadcn-init
chore(canvas): initialize shadcn/ui CLI
2026-04-19 00:27:07 -07:00
Hongming Wang
89d96e8581 fix(canvas): degraded badge WCAG AA contrast — amber-400 → amber-300 (closes #885)
amber-400 on zinc-900 is 5.4:1 (AA pass). amber-300 is 6.9:1 (AA+AAA pass)
and matches the rest of the amber usage in WorkspaceNode (currentTask,
error detail, badge chip).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 12:05:38 -07:00
Hongming Wang
16cb728461 chore(canvas): initialize shadcn/ui — components.json + cn utility
Sets up shadcn/ui CLI so new components can be added with
`npx shadcn add <component>`. Uses new-york style, zinc base color,
no CSS variables (matches existing Tailwind-only approach).

Adds clsx + tailwind-merge for the cn() utility.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 07:57:17 -07:00
Hongming Wang
f6c1eb7482 chore(canvas): enable Turbopack for dev server — faster HMR
next dev --turbopack for significantly faster dev server startup
and hot module replacement. Build script unchanged (Turbopack for
next build is still experimental).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 07:39:03 -07:00
Hongming Wang
e416496d5f test: add BatchActionBar unit tests (7 tests)
Covers: render threshold, count badge, action buttons, clear selection,
ConfirmDialog trigger, ARIA toolbar role.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 02:21:31 -07:00
Hongming Wang
434e5747b2 fix: ChatTab comment path for workspace-server rename
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 01:48:59 -07:00
Hongming Wang
e3d22cf2ff test: update mock stores for batch selection in existing canvas tests
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 01:22:25 -07:00
Hongming Wang
6de705b2a1 feat(canvas): batch operations — multi-select + restart/pause/delete (Phase 20.3)
- Shift+click to toggle node selection (multi-select mode)
- BatchActionBar floating at bottom when >1 node selected
- Batch Restart All, Pause All, Delete All with ConfirmDialog
- Selected nodes get blue ring highlight
- Escape clears selection
- Pane click clears selection
- Dark theme, accessible (ARIA labels, focus rings)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 01:16:55 -07:00