Commit Graph

37 Commits

Author SHA1 Message Date
Hongming Wang
5a3dbb95e1 fix(api): probe /cp/auth/me before redirecting on 401
The actual cause-fix for the staging-tabs E2E saga (#2073/#2074/#2075).

Old behaviour: ANY 401 from any fetch on a SaaS tenant subdomain
called redirectToLogin → window.location.href = AuthKit. This is
wrong. Plenty of 401s don't mean "session is dead":

  - workspace-scoped endpoints (/workspaces/:id/peers, /plugins)
    require a workspace-scoped token, not the tenant admin bearer
  - resource-permission mismatches (user has tenant access but not
    this specific workspace)
  - misconfigured proxies returning 401 spuriously

A single transient one of those yanked authenticated users back to
AuthKit. Same bug yanked the staging-tabs E2E off the tenant origin
mid-test for 6+ hours tonight, leading to the cascade of test-side
mocks (#2073/#2074/#2075) that worked around the symptom without
fixing the cause.

This PR fixes it at the source. The new logic:

  - 401 on /cp/auth/* path → that IS the canonical session-dead
    signal → redirect (unchanged)
  - 401 on any other path with slug present → probe /cp/auth/me:
      probe 401 → session genuinely dead → redirect
      probe 200 → session fine, endpoint refused this token →
                  throw a real Error, caller renders error state
      probe network err → assume session-fine (conservative) →
                  throw real Error
  - slug empty (localhost / LAN / reserved subdomain) → throw
    without redirect (unchanged)

The probe adds one extra fetch on a 401, only when slug is set
and the path isn't already auth-scoped. That's rare and
worthwhile — a transient probe round-trip is cheap; an unwanted
auth redirect is a UX disaster.

Tests:
  - api-401.test.ts rewritten with the full matrix:
      * /cp/auth/me 401 → redirect (no probe, that IS the signal)
      * non-auth 401 + probe 401 → redirect
      * non-auth 401 + probe 200 → throw, no redirect  ← the fix
      * non-auth 401 + probe network err → throw, no redirect
      * empty slug paths (localhost/LAN/reserved) → throw, no probe
  - 43 tests in canvas/src/lib/__tests__/api*.test.ts all pass
  - tsc clean

The staging-tabs E2E spec's universal-401 route handler stays as
defense-in-depth (silences resource-load console noise + guards
against panels without try/catch), but the comment now describes
its role honestly: api.ts is the primary fix, the route is the
safety net.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 23:49:28 -07:00
Hongming Wang
06c85bd185
Merge pull request #2045 from Molecule-AI/feat/flat-rate-pricing-1833
feat(canvas): flat-rate pricing — rename Starter→Team, Pro→Growth (Issue #1833)
2026-04-25 05:54:06 +00:00
Hongming Wang
0b237ed9dd refactor(canvas): extract runtime profiles to @/lib/runtimeProfiles
Preparation for a "hundreds of runtimes" plugin ecosystem. Keeping the
runtime-specific UX knobs in-line inside ProvisioningTimeout scales badly
— every new runtime would require editing a component, not just adding a
table entry. Other components (create-workspace dialog, workspace card
tooltips, etc.) will want the same runtime metadata.

Changes:

- New file `canvas/src/lib/runtimeProfiles.ts` owns:
  * `RuntimeProfile` type — structural shape, every field optional so
    new runtimes can partially-fill without breaking consumers.
  * `DEFAULT_RUNTIME_PROFILE` — 2-min default floor (docker-fast).
  * `RUNTIME_PROFILES` — named overrides (currently: hermes 12 min).
  * `WorkspaceRuntimeOverrides` — interface for server-provided
    per-workspace overrides, so operators can tune via template
    manifest / workspace metadata without a canvas release.
  * `getRuntimeProfile()` — resolver with
    overrides → profile → default priority.
  * `provisionTimeoutForRuntime()` — convenience wrapper.

- `ProvisioningTimeout.tsx` now delegates to the profile module.
  `DEFAULT_PROVISION_TIMEOUT_MS` re-exported for legacy test importers.

- Tests: 16/16 (up from 9 before the first fix). Adds pinning for:
  * overrides > profile > default priority chain
  * "every entry in RUNTIME_PROFILES resolves to a number" contract
  * backward-compat export

Adding a new slow runtime is now one table entry in
`canvas/src/lib/runtimeProfiles.ts` with a mandatory `WHY` comment.
Moving to server-driven profiles later is a ~10-line change (the
resolver already threads WorkspaceRuntimeOverrides through).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 11:48:39 -07:00
Molecule AI Marketing Lead
de19cf9bae fix(canvas): apply flat-rate pricing copy for Phase 34 launch (Issue #1833)
Rename "Starter" → "Team", update tagline + pricing page hero copy to
lead with flat-rate per-org positioning — deliberate wedge against
Cursor/Windsurf per-seat pricing ($40/seat vs $29/org).

PMM decision: Issue #1833. Approved by Marketing Lead 2026-04-24.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 17:54:23 +00:00
Hongming Wang
0ef5dad1b1
Merge pull request #1993 from Molecule-AI/fix/auth-redirect-loop-regression-tests
test(auth): add regression tests for redirect loop guards
2026-04-24 06:57:12 +00:00
Hongming Wang
8c80175cd8 fix(canvas): subtree-aware layout + org-import reliability + UX polish
Five tightly-related fixes surfaced while stress-testing org-template
imports (Legal Team, Molecule Company, etc.) on a running control plane:

1) Org import was silently failing — INSERT wrote `collapsed` into the
   `workspaces` table but that column lives on `canvas_layouts`
   (005_canvas_layouts.sql). Every import returned 207 with 0 rows
   created, which `api.post` treated as success → green "Imported"
   toast + empty canvas. Moved the write to canvas_layouts; updated
   the workspace_crud PATCH path to UPSERT there too; refreshed the
   test mock. Added a client-side assertion that throws on
   2xx-with-`error`-body so future partial-failures surface a red
   toast rather than lying about success.

2) Multi-level nested layout was collision-prone: children that were
   themselves parents (CTO → Dev Lead → 6 engineers) got the same
   leaf-sized grid slot as leaf siblings and clipped into each other.
   Added post-order `sizeOfSubtree` + sibling-size-aware
   `childSlotInGrid` on both the Go server and the TS client (kept in
   sync). `buildNodesAndEdges` now uses subtree sizes for both parent
   dimensions and the rescue heuristic. `setCollapsed` on expand now
   reads each child's actual rendered width/height instead of the
   leaf-count formula — a regression test covers the CTO/Dev Lead
   scenario.

3) Provisioning-timeout banner was unusable during large imports: a
   30-workspace tree triggered 27 simultaneous "stuck" warnings 2
   minutes in (server paces + provision concurrency = 3 guarantee tail
   items legitimately wait longer). Scaled threshold with concurrent
   count (base + 45s per queue slot beyond concurrency) and added a
   Dismiss (×) button per banner.

4) Auto pan-and-zoom on org ready: after the last workspace flips out
   of `provisioning`, canvas now fitView's with a 1.2s animation,
   0.25 padding, `maxZoom: 0.8` and `minZoom: 0.25`. Without the zoom
   caps fitView was hitting the component's maxZoom=2 on small trees
   and zooming in instead of out.

5) Toolbar was visually busy: `+ N sub` count wrapped onto a second
   row on narrow viewports; status dot and workspace total were in
   separate border-delimited cells. Merged into one segment with
   `whitespace-nowrap`; A2A / Audit / Search / Help collapsed to
   icon-only 28px buttons with tooltip + aria-label (Figma/Linear
   pattern). Stop All / Restart Pending keep text — they're urgent.

Also:
- `api.{get,post,...}` accept an optional `{ timeoutMs }` so callers
  that hit intentionally-slow endpoints (org import paces 2s between
  siblings) don't trip the 15s default and report false aborts.
- `WorkspaceNode` clamps role text to 2 lines so verbose descriptions
  don't unboundedly grow card height and break the grid.
- `PARENT_HEADER_PADDING` bumped 44→130 to clear name + runtime +
  2-line role + the currentTask banner that appears during the
  initial-prompt phase.

Tests: 930 canvas tests + full Go handler suite pass. Added
regressions for (i) 207 partial-success surfacing as throw, and
(ii) setCollapsed sizing with nested-parent children.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 23:48:29 -07:00
e9be12210f test(auth): add regression tests for redirect loop guards
AuthGate now skips session fetch for /cp/auth/* paths, and
redirectToLogin guards against re-setting window.location when
already on an auth path. Both guards had no test coverage —
a future refactor could silently reintroduce the redirect loop.

Added:
- AuthGate.test.tsx: 2 cases covering /cp/auth/login and
  /cp/auth/signup path skipping (no fetchSession call, no
  redirectToLogin call, children rendered)
- auth.test.ts: 2 cases covering redirectToLogin early return
  for /cp/auth/login and /cp/auth/signup paths

Fixes: Molecule-AI/molecule-core#1541

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 06:30:35 +00:00
Hongming Wang
f2a4b6e0d3 fix: dev-mode bypass for IP rate limiter + 429 retry on GET
The 600-req/min/IP bucket is sized for SaaS where each tenant has
a distinct client IP. On a local Docker setup every panel shares
one IP — hydration (/workspaces + /templates + /org/templates +
/approvals/pending) plus polling (A2A overlay + activity tabs +
approvals + schedule + channels + audit trail) can burst past the
bucket inside a minute, blanking the canvas with 429s. The user
reported it after dragging workspaces — dragging itself is
release-only (savePosition in onNodeDragStop), but the polling
that's always running added onto startup tripped the limit.

Two-layer fix:

Server: RateLimiter.Middleware short-circuits when isDevModeFailOpen
is true (MOLECULE_ENV=development + empty ADMIN_TOKEN), matching
the Tier-1b hatch already applied to AdminAuth, WorkspaceAuth, and
discovery. SaaS production keeps the bucket.

Client: api.ts auto-retries a single 429 on idempotent GET requests,
waiting the server-provided Retry-After (capped at 20s). Mutations
(POST/PUT/PATCH/DELETE) never auto-retry to avoid double-applying.
Users on SaaS hitting a legitimate rate-limit spike get one
transparent recovery instead of an immediately-blank Canvas.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 20:44:09 -07:00
Hongming Wang
dc50a1c775 refactor(canvas): data-drive provider picker from template config.yaml
The MissingKeysModal's provider list was hardcoded in deploy-preflight.ts
as RUNTIME_PROVIDERS — a per-runtime map that duplicated what each
template repo already declares in its config.yaml. That meant adding a
new provider required changes in two places, and the UI could drift out
of sync with the actual template (e.g. when a template adds a MiniMax or
Kimi model, the picker wouldn't know).

The single source of truth for "which env vars does this workspace need"
is each template's config.yaml:

  * `runtime_config.models[].required_env` — per-model key list
  * `runtime_config.required_env`          — runtime-level AND list

Go /templates already returned `models`. This change:

  * Adds `required_env` alongside `models` on templateSummary so the
    canvas receives the full picture.
  * Rewrites deploy-preflight.ts to derive ProviderChoice[] from a
    template object via `providersFromTemplate(template)`:
      - groups `models[]` by unique required_env tuple
      - falls back to runtime_config.required_env when models is empty
      - decorates labels with model counts (e.g. "OpenRouter (14 models)")
  * `checkDeploySecrets(template, workspaceId?)` now takes a template
    object instead of a runtime string. Any-provider satisfaction still
    short-circuits preflight to ok=true.
  * MissingKeysModal receives `providers` directly; no more lookups.
  * TemplatePalette threads `template.models` + `template.required_env`
    into the preflight.

Side effects:
  * Claude Code's dual-auth (OAuth token OR Anthropic API key) now
    surfaces as two picker options — its config.yaml already declared
    both, the UI just wasn't reading them.
  * Hermes picker now shows 8 provider options (Nous, OpenRouter,
    Anthropic, Gemini, DeepSeek, GLM, Kimi, Kilocode) instead of the
    hand-picked 3, matching its 35-model reality.

Removed the legacy RUNTIME_PROVIDERS / RUNTIME_REQUIRED_KEYS /
getRequiredKeys / findMissingKeys exports; MissingKeysModal.test.tsx
deleted (its coverage is subsumed by the new template-driven
deploy-preflight.test.ts). 58 modal-adjacent tests pass; full canvas
suite 919 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 17:07:15 -07:00
Hongming Wang
baa7e1531f feat(canvas): provider-picker MissingKeysModal for multi-provider runtimes
Runtimes like Hermes and LangGraph accept any one of several LLM
provider keys (OpenRouter OR OpenAI OR Anthropic OR Nous-native).
Before this change, the missing-keys modal treated all supported
providers as simultaneously required — a fresh user on Hermes was
asked for three parallel API keys when any one suffices.

Introduces RUNTIME_PROVIDERS in deploy-preflight.ts as the canonical
per-runtime provider list (label, envVar, note). checkDeploySecrets
now returns all alternatives as missingKeys when nothing is
configured, so the modal can offer a picker.

MissingKeysModal dispatches between two render paths:

  * ProviderPickerModal — radio list of supported providers, a single
    env input for the chosen one. Saving that one key satisfies the
    preflight. Activated whenever the runtime has ≥2 provider choices.

  * AllKeysModal — legacy parallel-inputs UX, all keys must be saved
    before deploy. Kept for single-provider runtimes (claude-code,
    gemini-cli) and callers that pass unrelated-key lists.

Dual-mode preserves the pre-existing contract for every caller while
fixing the multi-provider UX. All 930 canvas vitest tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 16:41:09 -07:00
Hongming Wang
b4719ad070 fix(canvas): Legend avoids TemplatePalette + silence WS handshake races
### Two unrelated but small UI fixes surfaced while testing the Canvas

**1. Legend hidden under the open TemplatePalette.**

Legend is `fixed bottom-6 left-4 z-30`. TemplatePalette's drawer (when
open) is `fixed top-0 left-0 w-[280px] z-30` — same z-index, same
left-edge column. The Legend overlapped the palette's bottom 180 px.

Published the palette-open state to the canvas store so the Legend
can shift right (to `left-[296px]` — 280 px palette + 16 px gap) while
the palette is open, animated via a 200 ms `transition-[left]` to
match the palette's slide. Closes cleanly back to `left-4` when the
palette is dismissed.

Files:
- `store/canvas.ts` — added `templatePaletteOpen` + `setTemplatePaletteOpen`.
- `TemplatePalette.tsx` — calls `setTemplatePaletteOpen(open)` on
  every open/close transition via a new useEffect.
- `Legend.tsx` — reads the flag and swaps `left-4` <-> `left-[296px]`.

**2. "WebSocket is closed before the connection is established" spam.**

Two components (`ChatTab`, `AgentCommsPanel`) open their own short-
lived WebSocket to tail the ACTIVITY_LOGGED stream. Their cleanup
path called `ws.close()` unconditionally, which trips a browser
console warning when React StrictMode re-runs the effect in dev and
the handshake hasn't completed yet. Confirmed via DevTools console
on the running canvas.

Added a `closeWebSocketGracefully(ws)` helper in `lib/ws-close.ts`:

  - OPEN / CLOSING → close immediately (normal path).
  - CONNECTING    → defer close to the 'open' listener so the
                    browser sees a full handshake. Also wires an
                    'error' listener that cancels the queued close
                    if the handshake fails (no double-close).
  - CLOSED        → no-op.

Both consumers now call the helper in their useEffect cleanup.
Silences the warning without changing observable behaviour.

### Tests

`canvas/src/lib/__tests__/ws-close.test.ts` — 5 cases with a fake
WebSocket covering each readyState branch plus the error-before-open
cancellation path. Full vitest suite: 927/927 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 16:03:01 -07:00
Hongming Wang
de99a22ffc fix(quickstart): hotfixes discovered during live testing session
Five additional breakages surfaced while testing the restored stack
end-to-end (spin up Hermes template → click node → open side panel →
configure secrets → send chat). Each fix is narrowly scoped and has
matching unit or e2e tests so they don't regress.

### 1. SSRF defence blocked loopback A2A on self-hosted Docker

handlers/ssrf.go was rejecting `http://127.0.0.1:<port>` workspace
URLs as loopback, so POST /workspaces/:id/a2a returned 502 on every
Canvas chat send in local-dev. The provisioner on self-hosted Docker
publishes each container's A2A port on 127.0.0.1:<ephemeral> — that's
the only reachable address for the platform-on-host path.

Added `devModeAllowsLoopback()` — allows loopback only when
MOLECULE_ENV ∈ {development, dev}. SaaS (MOLECULE_ENV=production)
continues to block loopback; every other blocked range (metadata
169.254/16, TEST-NET, CGNAT, link-local) stays blocked in dev mode.

Tests: 5 new tests in ssrf_test.go covering dev-mode loopback,
dev-mode short-alias ("dev"), production still blocks loopback,
dev-mode still blocks every other range, and a 9-case table test of
the predicate with case/whitespace/typo variants.

### 2. canvas/src/lib/api.ts: 401 → login redirect broke localhost

Every 401 called `redirectToLogin()` which navigates to
`/cp/auth/login`. That route exists only on SaaS (mounted by the
cp_proxy when CP_UPSTREAM_URL is set). On localhost it 404s — users
landed on a blank "404 page not found" instead of seeing the actual
error they should fix.

Gated the redirect on the SaaS-tenant slug check: on
<slug>.moleculesai.app, redirect unchanged; on any non-SaaS host
(localhost, LAN IP, reserved subdomains like app.moleculesai.app),
throw a real error so the calling component can render a retry
affordance.

Tests: 4 new vitest cases in a dedicated api-401.test.ts (needs
jsdom for window.location.hostname) — SaaS redirects, localhost
throws, LAN hostname throws, reserved apex throws.

### 3. SecretsSection rendered a hardcoded key list

config/secrets-section.tsx shipped a fixed COMMON_KEYS list
(Anthropic / OpenAI / Google / SERP / Model Override) regardless of
what the workspace's template actually needed. A Hermes workspace
declaring MINIMAX_API_KEY in required_env got five irrelevant slots
and nothing for the key it actually needed.

Made the slot list template-driven via a new `requiredEnv?: string[]`
prop passed down from ConfigTab. Added `KNOWN_LABELS` for well-known
names and `humanizeKeyName` to turn arbitrary SCREAMING_SNAKE_CASE
into a readable label (e.g. MINIMAX_API_KEY → "Minimax API Key").
Acronyms (API, URL, ID, SDK, MCP, LLM, AI) stay uppercase. Legacy
fallback preserved when required_env is empty.

Tests: 8 new vitest cases covering known-label lookup, humanise
fallback, acronym preservation, deduplication, and both fallback
paths.

### 4. Confusing placeholder in Required Env Vars field

The TagList in ConfigTab labelled "Required Env Vars (from template)"
is a DECLARATION field — stores variable names. The placeholder
"e.g. CLAUDE_CODE_OAUTH_TOKEN" suggested that, but users naturally
typed the value of their API key into the field instead. The actual
values go in the Secrets section further down the tab.

Relabelled to "Required Env Var Names (from template)", changed the
placeholder to "variable NAME (e.g. ANTHROPIC_API_KEY) — not the
value", and added a one-line helper below pointing to Secrets.

### 5. Agent chat replies rendered 2-3 times

Three delivery paths can fire for a single agent reply — HTTP
response to POST /a2a, A2A_RESPONSE WS event, and a
send_message_to_user WS push. Paths 2↔3 were already guarded by
`sendingFromAPIRef`; path 1 had no guard. Hermes emits both the
reply body AND a send_message_to_user with the same text, which
manifested as duplicate bubbles with identical timestamps.

Added `appendMessageDeduped(prev, msg, windowMs = 3000)` in
chat/types.ts — dedupes on (role, content) within a 3s window.
Threaded into all three setMessages call sites. The window is short
enough that legitimate repeat messages ("hi", "hi") from a real
user/agent a few seconds apart still render.

Tests: 8 new vitest cases covering empty history, different content,
duplicate within window, different roles, window elapsed, stale
match, malformed timestamps, and custom window.

### 6. New end-to-end regression test

tests/e2e/test_dev_mode.sh — 7 HTTP assertions that run against a
live platform with MOLECULE_ENV=development and catch regressions
on all the dev-mode escape hatches in a single pass: AdminAuth
(empty DB + after-token), WorkspaceAuth (/activity, /delegations),
AdminAuth on /approvals/pending, and the populated
/org/templates response. Shellcheck-clean.

### Test sweep

- `go test -race ./internal/handlers/ ./internal/middleware/
  ./internal/provisioner/` — all pass
- `npx vitest run` in canvas — 922/922 pass (up from 902)
- `shellcheck --severity=warning infra/scripts/setup.sh
  tests/e2e/test_dev_mode.sh` — clean
- `bash tests/e2e/test_dev_mode.sh` — 7/7 pass against a live
  platform + populated template registry

### SaaS parity

Every relaxation remains conditional on MOLECULE_ENV=development.
Production tenants run MOLECULE_ENV=production (enforced by the
secrets-encryption strict-init path) and always set ADMIN_TOKEN, so
none of these code paths fire on hosted SaaS. Behaviour on real
tenants is byte-for-byte unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 14:57:18 -07:00
Hongming Wang
2c3eccf9d6 test(auth): provide window.location.pathname in redirectToLogin mocks
The pathname.startsWith() loop-break added to redirectToLogin needs
pathname on the mock Location object; tests were supplying only href.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 11:16:22 -07:00
rabbitblood
b360a4353f fix(auth): redirect to app.moleculesai.app for login, not tenant subdomain
Tenant subdomains (hongmingwang.moleculesai.app) proxy to EC2 platform
which has no /cp/auth/* routes. Auth UI lives on app.moleculesai.app.

Added getAuthOrigin() that detects SaaS tenant hosts and redirects to
the app subdomain for login/signup. Non-SaaS hosts (localhost, dev)
fall back to PLATFORM_URL as before.

[Molecule-Platform-Evolvement-Manager]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-23 11:16:22 -07:00
rabbitblood
6730c7713d fix(auth): redirect to login on 401 from any API call
When session credentials expire mid-use, ALL API calls return 401.
Previously this threw a generic error that crashed the UI with no
recovery path. Now the API client intercepts 401 and redirects to
login once (via redirectToLogin which already guards against loops).

Combined with the AuthGate /cp/auth/* path guard, this gives the
correct behavior: credentials lost → redirect to login → user logs
in → return_to sends them back.

[Molecule-Platform-Evolvement-Manager]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-23 11:16:22 -07:00
rabbitblood
edc42b2893 fix(auth): break infinite redirect loop on /cp/auth/login
AuthGate redirected anonymous users to /cp/auth/login?return_to=<url>,
but the login page itself triggered AuthGate, which redirected again
with double-encoded return_to. Each redirect added another encoding
layer until the URL exceeded 431 (Request Header Fields Too Large).

Two guards:
1. redirectToLogin() returns early if already on /cp/auth/* path
2. AuthGate skips redirect check entirely for /cp/auth/* paths

[Molecule-Platform-Evolvement-Manager]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-23 11:16:22 -07:00
Hongming Wang
66de81fbfa
Merge pull request #1689 from Molecule-AI/refactor/strip-secret-service-dropdown
refactor(secrets): strip Service dropdown from Add-Key form
2026-04-22 18:46:02 -07:00
Hongming Wang
8b1af9708c feat(canvas): default tier T3 and hide T1/T2 on SaaS
On SaaS every workspace gets its own EC2 VM — the Docker-sandbox
distinction between T1 (sandboxed), T2 (standard Docker), and T3
(full host access) doesn't apply. A SaaS workspace is always a
dedicated VM, which is "full access" by construction. Showing T1/T2
in that UI is a category error: users pick a sandbox level that has
no effect on the actual EC2 machine they get.

Changes:
- tenant.ts: export isSaaSTenant() — returns true when canvas is
  served at <slug>.moleculesai.app (SSR-safe: false on server)
- CreateWorkspaceDialog: when isSaaSTenant(), render only the T3
  option, default tier=3, grid collapses to a single column. Label
  gets a " — dedicated VM" hint so the user knows what they're
  getting. On self-hosted the full T1/T2/T3 picker is unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 17:02:48 -07:00
Hongming Wang
d956164812 refactor(secrets): strip Service dropdown from Add-Key form
The Add-Key form used to open with a required Service dropdown
(GitHub / Anthropic / OpenRouter / Other) that gated everything
else. The dropdown did no persistent work — the secret store only
cares about (key_name, value); the Service label was never saved
anywhere. It also suffered registry drift: today we support ~22
hermes-dispatched providers (MiniMax, Gemini, DeepSeek, Kimi, Qwen,
NVIDIA, etc.); only 3 had entries. Everyone else landed in "Other"
with no downside beyond the mandatory click.

Replaces it with:

1. Key-name <datalist> autocomplete sourced from new
   KEY_NAME_SUGGESTIONS in lib/services.ts — 26 entries covering
   common infra keys + every hermes-supported provider.

2. inferGroup(keyName) derives classification at render time,
   matching what the store already does in getGrouped(). No
   behaviour change for list grouping.

3. Provider docs link renders inline only when inferGroup
   recognises the name. For 'custom' keys we stay quiet — no
   false-structure prompt.

4. Test-connection button still available when the inferred group
   supports it AND the value is format-valid. Same providers as
   before.

SERVICES registry preserved for LIST rendering + test routing.

Result: two fields instead of three. One fewer decision. Provider-
agnostic by design — new providers work the moment someone types
their canonical env var name; no UI code change per provider.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 16:41:43 -07:00
acf03cd057 fix(security): suppress raw response body from user-facing billing errors
billing.ts (startCheckout, openBillingPortal): replace raw res.text()
in thrown Error with a safe status-only message. The response body from
/cp/billing/* routes can contain Stripe API error detail (invalid key,
card decline message, raw Stripe envelope) that should not reach clients.

orgs/page.tsx (createOrg): same fix — raw body → safe message.

Full body is logged server-side for debugging.

Closes: #91 (CWE-209 — Stripe key echoed in error)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 23:23:25 +00:00
rabbitblood
35df23850e fix(canvas): CSP_DEV_MODE + admin token for local Docker (#1052 follow-up)
Three changes that keep getting lost on nuke+rebuild:
1. middleware.ts: read CSP_DEV_MODE env to relax CSP in local Docker
2. api.ts: send NEXT_PUBLIC_ADMIN_TOKEN header (AdminAuth on /workspaces)
3. Dockerfile: accept NEXT_PUBLIC_ADMIN_TOKEN as build arg

All three are required for the canvas to work in local Docker where
canvas (port 3000) fetches from platform (port 8080) cross-origin.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 12:23:43 -07:00
molecule-ai[bot]
fc0e02d692 Merge pull request #1011 from Molecule-AI/test/qa-coverage-orgs-page-and-api-timeout
test(canvas): QA coverage — orgs page polling + API timeout
2026-04-20 08:48:00 -07:00
qa-agent
e70cd1c2aa test(canvas): pin AbortSignal timeout regression + cover /orgs landing page
Two independent test additions that harden the surface freshly landed on
staging via PRs #982 (canvas fetch timeout), #992 (/orgs landing), #994
(post-checkout redirect to /orgs).

canvas/src/lib/__tests__/api.test.ts (+74 lines, 7 new tests)
  - GET/POST/PATCH/PUT/DELETE each pass an AbortSignal to fetch
  - TimeoutError (DOMException name=TimeoutError) propagates to the caller
  - Each request installs its own signal — no shared module-level controller
    that would allow one slow request to cancel an unrelated fast one
  This is the hardening nit I flagged in my APPROVE-w/-nit review of
  fix/canvas-api-fetch-timeout. Landing as a follow-up now that #982 is in
  staging.

canvas/src/app/__tests__/orgs-page.test.tsx (+251 lines, new file, 10 tests)
  - Auth guard: signed-out → redirectToLogin and no /cp/orgs fetch
  - Error state: failed /cp/orgs → Error message + Retry button
  - Empty list: CreateOrgForm renders
  - CTA by status:
      running          → "Open" link targets {slug}.moleculesai.app
      awaiting_payment → "Complete payment" → /pricing?org=<slug>
      failed           → "Contact support" mailto
  - Post-checkout: ?checkout=success renders CheckoutBanner AND
    history.replaceState scrubs the query param
  - Fetch contract: /cp/orgs called with credentials:include + AbortSignal

Local baseline on origin/staging tip ede6597:
  canvas vitest: 50 files / 778 tests, all green
  canvas build:  clean, /orgs route present (2.83 kB / 105 kB first-load)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-19 19:14:54 +00:00
Hongming Wang
18894bebe8 feat(canvas): Phase 5 — credit balance pill + low-balance banner
Adds the UI surface for the credit system to /orgs:
- CreditsPill next to each org row. Tone shifts from zinc → amber at
  10% of plan to red at zero.
- LowCreditsBanner appears under the pill for running orgs when the
  balance crosses thresholds: overage_used > 0 → "overage active",
  balance <= 0 → "out of credits, upgrade", trial tail → "trial almost
  out".
- Pure helpers extracted to lib/credits.ts so formatCredits, pillTone,
  and bannerKind are unit-tested without jsdom.

Backend List query now returns credits_balance / plan_monthly_credits
/ overage_used_credits / overage_cap_credits so no second round-trip
is needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 07:27:29 -07:00
Hongming Wang
26b89400ed test(canvas): bump billing test for /orgs success_url 2026-04-19 04:26:01 -07:00
Hongming Wang
d77378294b feat(canvas): post-checkout UX — Stripe success lands on /orgs with banner
Two small polish items that together close the signup-to-running-tenant
flow for real users:

1. Stripe success_url now points at /orgs?checkout=success instead of
   the current page (was pricing). The old behavior left people staring
   at plan cards with no indication payment went through — the new
   behavior drops them right onto their org list where they can watch
   the status flip.

2. /orgs shows a green "Payment confirmed, workspace spinning up"
   banner when it sees ?checkout=success, then clears the query
   param via replaceState so a reload doesn't show it again.

3. /orgs now polls every 5s while any org is awaiting_payment or
   provisioning. Users see the Stripe webhook's effect live — no
   manual refresh needed — and once every org settles the polling
   stops so idle tabs don't hammer /cp/orgs.

Paired with PR #992 (the /orgs page itself) this makes the end-to-end
flow on BILLING_REQUIRED=true deployments feel right:
  /pricing → Stripe → /orgs?checkout=success → banner → live poll →
  "Open" button when org.status transitions to running.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 04:18:32 -07:00
Hongming Wang
956c1fb9b9 fix(canvas): add 15s fetch timeout on API calls
Pre-launch audit flagged api.ts as missing a timeout on every fetch.
A slow or hung CP response would leave the UI spinning indefinitely
with no way for the user to abort — effectively a client-side DoS.

15s is long enough for real CP queries (slowest observed is Stripe
portal redirect at ~3s) and short enough that a stalled backend
surfaces as a clear error with a retry affordance.

Uses AbortSignal.timeout (widely supported since 2023) so the
abort propagates through React Query / SWR consumers cleanly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 02:12:47 -07:00
Hongming Wang
16cb728461 chore(canvas): initialize shadcn/ui — components.json + cn utility
Sets up shadcn/ui CLI so new components can be added with
`npx shadcn add <component>`. Uses new-york style, zinc base color,
no CSS variables (matches existing Tailwind-only approach).

Adds clsx + tailwind-merge for the cn() utility.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 07:57:17 -07:00
Hongming Wang
6cecf44f45 fix(canvas): add hermes + gemini-cli to deploy preflight required keys
Hermes requires OPENROUTER_API_KEY (or any of its 15 providers).
Gemini CLI requires GOOGLE_API_KEY. Without these entries, the
MissingKeysModal doesn't fire and workspaces start without keys,
causing crash loops.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 21:45:54 -07:00
Hongming Wang
54bb543ff7 fix: code review findings — token UI, auth hardening, WS dedup
1. Settings panel: wire TokensTab into "API Tokens" tab (was imported
   but not rendered). Rename "API Keys" → "Secrets", add "API Tokens"
   tab. Fix docs link → doc.moleculesai.app/docs/tokens.

2. Referer match hardening: require exact host match or trailing slash
   to prevent evil.com subdomain bypass. Cache CANVAS_PROXY_URL at
   init time instead of per-request os.Getenv.

3. Extract shared deriveWsBaseUrl() to lib/ws-url.ts — eliminates
   duplicate 12-line derivation in socket.ts and TerminalTab.tsx.

4. Token list pagination: add ?limit= and ?offset= params (default
   50, max 200) to GET /workspaces/:id/tokens.

507/507 canvas tests pass, Go build + vet clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 10:42:26 -07:00
Hongming Wang
a4b1886c35 fix(canvas): address all code review findings on PR #482
- Reconcile TIER_CONFIG/TIER_COLORS into single TIER_CONFIG with both
  `color` (pill style) and `border` (bordered badge style) fields
- Remove TemplatePalette alias indirection (TIER_LABELS_SHARED → direct import)
- Extract inline spinner SVGs to shared Spinner component (3 copies → 1)
- Migrate status dot colors from 6 remaining files to shared tokens:
  SearchDialog, StatusDot, Legend, ContextMenu, Toolbar + add statusDotClass()
- Add COMM_TYPE_LABELS to design-tokens, used by CommunicationOverlay sr-only
- Update reduced-motion tests: components that delegate to design-tokens
  pass the guard check via import detection; add design-tokens.ts own test
- 507/507 tests pass, build clean

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 07:48:47 -07:00
Hongming Wang
ac6d6ce5bd fix(canvas): UX improvements — shared tokens, focus rings, loading spinners, a11y
- Extract STATUS_CONFIG, TIER_CONFIG, TIER_COLORS to shared design-tokens.ts
  (eliminates 3 duplicate definitions across WorkspaceNode, EmptyState, TemplatePalette)
- Add focus-visible:ring-2 ring-blue-500 to WorkspaceNode, SidePanel tabs,
  EmptyState buttons, TemplatePalette buttons (keyboard navigation now visible)
- Replace "Loading..." text with animated spinner SVG in EmptyState,
  TemplatePalette sidebar, and OrgTemplatesSection
- Add disabled:cursor-not-allowed + suppress hover styling when disabled
  on EmptyState template buttons and TemplatePalette deploy buttons
- Brighten SidePanel tab hover from bg-zinc-800/20 to bg-zinc-800/40
  and text from zinc-300 to zinc-200
- Add screen reader labels to CommunicationOverlay directional arrows
  and status icons (sr-only text for "sent", "received", "to", status)

Fixes #422, #424, #427

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 07:35:44 -07:00
Hongming Wang
a363b56f25 feat(tenant): combined platform + canvas Docker image with reverse proxy
Single-container tenant architecture: Go platform (:8080) + Canvas
Node.js (:3000) in one Fly machine, with Go's NoRoute handler reverse-
proxying non-API routes to the canvas. Browser only talks to :8080.

Changes:

platform/Dockerfile.tenant — multi-stage build (Go + Node + runtime).
  Bakes workspace-configs-templates/ + org-templates/ into the image.
  Build context: repo root.

platform/entrypoint-tenant.sh — starts both processes, kills both if
  either exits. Fly health check on :8080 covers the Go binary; canvas
  health is implicit (proxy returns 502 if canvas is down).

platform/internal/router/canvas_proxy.go — httputil.ReverseProxy that
  forwards unmatched routes to CANVAS_PROXY_URL (http://localhost:3000).
  Activated by NoRoute when CANVAS_PROXY_URL env is set.

platform/internal/router/router.go — wire NoRoute → canvasProxy when
  CANVAS_PROXY_URL is present; no-op otherwise (local dev unchanged).

platform/internal/middleware/securityheaders.go — relaxed CSP to allow
  Next.js inline scripts/styles/eval + WebSocket + data: URIs. The
  strict `default-src 'self'` was blocking all canvas rendering.

canvas/src/lib/api.ts — changed `||` to `??` for NEXT_PUBLIC_PLATFORM_URL
  so empty string means "same-origin" (combined image) instead of falling
  back to localhost:8080.

canvas/src/components/tabs/TerminalTab.tsx — same `??` fix for WS URL.

Verified: tenant machine boots, canvas renders, 8 runtime templates +
4 org templates visible, API routes work through the same port.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 02:46:47 -07:00
Hongming Wang
4b865fa755 feat(canvas): /pricing route with plan selector + Stripe checkout
Adds a public /pricing route the apex + tenant canvas can both serve.
Three-tier plan cards (Free, Starter, Pro) with per-plan CTA buttons
that dispatch correctly regardless of the user's state:

  Free              → redirect to signup
  Anonymous + paid  → redirect to signup (Stripe opens post-auth)
  Authed + paid     → POST /cp/billing/checkout, redirect to Stripe URL
  No tenant slug    → inline error ("pick an org first")
  Network failures  → surfaced in an ARIA alert banner

Files:
- src/lib/billing.ts — plan metadata + startCheckout + openBillingPortal
  wrappers over /cp/billing/{checkout,portal}
- src/components/PricingTable.tsx — client component, lazy session
  probe on first CTA click (no probe for anonymous browsers)
- src/app/pricing/page.tsx — server-rendered shell with SEO metadata,
  links to legal pages in the footer
- Tests: 10 billing helper tests + 9 PricingTable tests (17 total,
  additional ones cover the plan-list canonical order)

Design notes:
- The pricing data (features + prices) is a static const in billing.ts,
  not fetched from the API. Changing prices requires a deploy — which
  we'd need to do anyway for tier definition changes.
- PLAN_ID 'starter' is flagged highlighted=true so the middle card gets
  the 'Most popular' visual treatment. One source of truth; test locks it.
- Session probe is lazy (first CTA click, not mount) so anonymous
  visitors don't generate a /cp/auth/me request just to read the page.

AuthGate interaction:
- On apex (no tenant slug), AuthGate passthrough — /pricing renders freely
- On tenant subdomain, AuthGate still bounces anonymous users to login
  before reaching /pricing — this is the correct UX for the "I'm already
  logged in and want to upgrade my own org" flow

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 13:41:44 -07:00
Hongming Wang
043b8fe159 feat(canvas): AuthGate — redirect anonymous users to cp login (Phase F close)
Wraps the canvas root so every tenant-subdomain request checks for a
valid session and bounces to app.moleculesai.app/cp/auth/login with a
return_to pointing back at the current URL. Local dev + vercel preview
URLs + apex pass through unchanged.

Files:
- canvas/src/lib/auth.ts: fetchSession() probes /cp/auth/me
  (credentials:include for cross-origin cookie); returns Session on 200,
  null on 401 (anonymous, no throw), throws on 5xx so transient
  outages don't leak the UI.
- canvas/src/lib/auth.ts: redirectToLogin() builds the cp login URL
  with window.location.href as return_to; CP's isSafeReturnTo check
  rejects cross-domain bounces.
- canvas/src/components/AuthGate.tsx: client component wrapping
  children. State machine: loading → authenticated | anonymous. In
  non-SaaS mode (no tenant slug) skips the gate entirely.
- canvas/src/app/layout.tsx: wraps the root body in <AuthGate>.

Tests: +6 auth.ts (200 / 401 null / 5xx throw / credentials:include /
redirectToLogin href + signup variant). Full suite 453 green (was 447).

Pairs with molecule-controlplane PR #16 (return_to cookie handshake
on the cp side).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 20:37:26 -07:00
Hongming Wang
3abcee11b3 feat(canvas): SaaS cross-origin — slug header + cookie credentials (Phase F)
Canvas will be served at <slug>.moleculesai.app (Vercel). API calls go
cross-origin to https://app.moleculesai.app. This commit wires the
client side:

- canvas/src/lib/tenant.ts: getTenantSlug() derives the slug from
  window.location.hostname, case-insensitive, matching the control
  plane's reservedSubdomains list (app/www/api/admin/…). Server-side
  + localhost + vercel preview URLs + apex all return "" so local dev
  keeps working.

- canvas/src/lib/api.ts: adds X-Molecule-Org-Slug header + sets
  credentials:"include" on every fetch. The control plane's CORS
  middleware allows the origin + credentials; the session cookie has
  Domain=.moleculesai.app so the browser ships it.

- canvas/src/lib/api/secrets.ts: same treatment (secrets API uses its
  own fetch helper — shared slug+credentials logic applied).

Tests: +6 (tenant.test.ts covers slug / reserved / case / non-SaaS /
preview URL / apex). Full canvas suite 447/447 green.

Not in this PR:
- WS URL derivation for terminal/socket.ts (separate follow-up; WS
  needs its own slug-aware URL and the canvas terminal isn't used in
  SaaS launch day-one).
- Next.js rewrites (decided against; cross-origin with credentials
  is cleaner than path-level rewrites for session cookies).

Deploys to Vercel once merged — no manual config needed (env already set).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 20:08:39 -07:00
Hongming Wang
24fec62d7f initial commit — Molecule AI platform
Forked clean from public hackathon repo (Starfire-AgentTeam, BSL 1.1)
with full rebrand to Molecule AI under github.com/Molecule-AI/molecule-monorepo.

Brand: Starfire → Molecule AI.
Slug: starfire / agent-molecule → molecule.
Env vars: STARFIRE_* → MOLECULE_*.
Go module: github.com/agent-molecule/platform → github.com/Molecule-AI/molecule-monorepo/platform.
Python packages: starfire_plugin → molecule_plugin, starfire_agent → molecule_agent.
DB: agentmolecule → molecule.

History truncated; see public repo for prior commits and contributor
attribution. Verified green: go test -race ./... (platform), pytest
(workspace-template 1129 + sdk 132), vitest (canvas 352), build (mcp).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 11:55:37 -07:00