molecule-core

Author	SHA1	Message	Date
Hongming Wang	286dcbfd1e	fix(canvas,org): collapse org-imported parents on first paint Importing a 15-workspace org template dropped every child as a freely-positioned card into its parent's coordinate space. Parents with 5-10 kids had the kids spill below the parent's initial min size, producing the "ugly default" layout the user just flagged — a mess of overlapping cards the moment the import completed. Fix: every workspace in an org-template import that HAS children is inserted with `collapsed = true`. Leaf workspaces stay expanded (nothing to hide). The canvas renders a collapsed parent as a compact header-only card with its "N sub" badge — visually identical to the pre-refactor default the user asked for. Double-click on a collapsed parent now EXPANDS it (flipping `collapsed` locally + persisting via PATCH) so the user can drill in to see the subtree. Only once expanded does a second double-click zoom-to-team, matching the prior behaviour. Leaf-first creation order stays the same; the collapsed flag just means "render compact" not "hide from API". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 20:36:55 -07:00
Hongming Wang	507696d88a	fix(canvas,server): address review findings on `3f11df03` Five review findings from the `3f11df03` six-bug commit: 1. Add TestPeers_DevModeFailOpen_{Allows,ClosedWhenAdminTokenSet, ClosedInProduction} covering all three gating states for the security-sensitive dev-mode hatch the prior commit added to /registry/:id/peers. Previously shipped untested — a future refactor could have silently inverted polarity or removed the gate. New tests pin the contract: * MOLECULE_ENV=development + ADMIN_TOKEN="" → allow bearerless * MOLECULE_ENV=development + ADMIN_TOKEN set → require token * MOLECULE_ENV=production → require token 2. ConfigTab handleSave diffs against the RAW parsed YAML / form config instead of the DEFAULT_CONFIG-merged shape. The previous code would silently PATCH tier=1 to the DB when a user deleted the `tier:` line in raw mode (the default-merge substituted 1). Now: only fields the user actually typed participate in the diff. Type guards (typeof === "number" / "string") prevent coercion surprises on malformed YAML. 3. ConfigTab model-save failure no longer lies "Saved". The /workspaces/:id/model PATCH can reject when the runtime doesn't support the chosen model; previously we caught + console.warn'd + showed green Saved, and the user watched the model revert on next reload with no explanation. Now the save path collects a `modelSaveError` and surfaces it via setError with a partial- success message ("Other fields saved, but model update failed: …") so the user sees why. 4. ChannelsTab now surfaces BOTH channels-fetch and adapters-fetch failures, distinguishing them in the error text ("Failed to load connected channels and platforms — try refreshing"). Previously only an adapters failure was visible; a channels failure left the user with an apparently-empty list and no indication the API was unreachable. 5. ChatTab panels drop the redundant aria-hidden attribute. The `hidden`/`flex` Tailwind class already sets display:none, which removes the node from the accessibility tree on its own; the extra aria-hidden invited WAI-ARIA lint warnings if a focusable descendant ever landed inside an inactive panel. Tests: 923 canvas + full Go handler suite pass. 3 new Go tests. No behaviour change on the five prior fixes — this commit tightens their edges per the independent review. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 20:29:44 -07:00
Hongming Wang	3f11df031c	fix: six UX bugs (peers auth, scroll, chat tabs, config persist, + visibility) Six bugs reported from a live session — all shippable in one commit: 1. Peers tab 401 on local Docker. The /registry/:id/peers endpoint demands a workspace-scoped bearer token (validateDiscoveryCaller) which the canvas session doesn't hold. Added the same Tier-1b dev-mode fail-open hatch that AdminAuth and WorkspaceAuth already use — gated by MOLECULE_ENV=development + empty ADMIN_TOKEN, so SaaS production stays strict. Exported IsDevModeFailOpen from the middleware package for the handler layer to reuse. 2. Org Templates list unscrollable. OrgTemplatesSection was rendered in the TemplatePalette footer — a div without overflow — so when it expanded to 15+ entries the list extended past the viewport with no scroll. Moved it to the top of the flex-1 overflow-y-auto container. Tall lists now scroll naturally. 3. Chat tab: "My Chat" and "Agent Comms" rendered stacked instead of switching. HTML `hidden` attribute was being overridden by Tailwind's `flex` class (display: flex beats the attribute), so both tabpanels rendered concurrently. Swapped to a conditional Tailwind `hidden`/`flex` class so the inactive panel is display:none with proper CSS specificity. 4. Hermes Config form never persists. handleSave wrote config.yaml but name / tier / runtime / model all live on the workspace row (or the dedicated /workspaces/:id/model endpoint) — the form edited in-memory, the request returned 200, the next reload wiped everything back. Hermes + external runtimes manage their own config inside the container anyway, so writing config.yaml is a no-op for them; skip it. Always diff and PATCH the DB-backed fields that actually changed. 5. Channels "+ Connect" dropdown empty on first open. ChannelsTab's load() used Promise.all with a silent catch — if EITHER the channels or adapters fetch failed, both setters were skipped with no error visible. Switched to Promise.allSettled so each endpoint settles independently, and the adapters failure now surfaces via the top-level error state. 6. Plugin registry always "No plugins in registry". Same silent catch pattern in SkillsTab.tsx — load errors for /plugins, /plugins/sources, and /workspaces/:id/plugins swallowed without logging. Replaced the empty catches with console.warn so future failures are at least visible in devtools. Tests: 923 passing (unchanged). Go handler tests pass. Server rebuilt and running with the peers-auth + collapsed-persistence fixes (pid 15875). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 20:18:30 -07:00
Hongming Wang	4fd7f1e84c	fix(canvas): tighten rescue + cap toast + cover paths with tests Three follow-up review findings from the `c2b2e13a` review: 1. Rescue heuristic uses pure bbox-non-overlap. The previous `position.x < 0` branch rescued any child whose parent was later dragged past it, even when the layout was clearly recoverable (e.g. relative -40, child still overlaps parent). New rule: rescue iff the child's bbox has zero overlap with the parent's bbox — self-calibrating, scales with user-resized parents, catches screenshot-case and legacy huge-positive data. 2. Toast caps failed-name list at 3 and appends "and N more". Stops a 50-node partial failure from overflowing the toast container. 3. Cycle guard on selection-roots walk in batchNest. Corrupt parentId data can't send the loop infinite now. Cheap defensive guard — one Set per selected node. Tests added (923 total, up from 918): * canvas-topology.test: 4 rescue scenarios — screenshot case (zero-overlap rescue), negative drift kept, huge-positive rescued, user-resized layout kept. * canvas.test: selection-roots filter on a 3-level chain. * workspace_crud test: PATCH {collapsed:true} runs the UPDATE. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 20:08:14 -07:00
Hongming Wang	c2b2e13abe	fix(canvas): address code-review findings on the Canvas refactor Five issues surfaced in the review of `50b53784`. Each was either a real bug waiting to hit users or a silent failure mode. 1. Topology rescue no longer teleports user-resized children. Rescue was comparing against parentMinSize(childCount), so any child the user had placed in space the parent was resized into got snapped to the default grid on reload — undoing the layout. Now rescue fires only on obviously corrupt data: negative relative coords (legacy pre-nesting absolute positions that landed above/left of their assigned parent) or values past an MAX_PLAUSIBLE_OFFSET threshold. Children just-past the initial minimum are left alone. 2. batchNest now filters to selection-roots before planning. Previously selecting both A and A's descendant B and dragging into T yanked B out of A to become a sibling under T. Users reasonably expect the A subtree to move intact. The new pass drops any selected node whose ancestor is also selected — those follow their ancestor via React Flow's parent binding. 3. batchNest surfaces partial failure via showToast. Previously silent: 2 of 5 PATCHes fail, user sees 3 cards re-parented + 2 snapped back with no explanation. Now names the failed cards. 4. confirmNest closes the nest dialog BEFORE dispatching the async store action, so a second drag can't kick off a competing batch while the first is still in flight. 5. collapsed is now persisted. The Go workspace_crud.go Update handler ignored the `collapsed` field, so user-initiated collapse round-tripped to an expanded state on next hydrate. Added the PATCH branch (`UPDATE workspaces SET collapsed = ...`) so the state survives reload. Nits cleaned: * Removed dead dragStartParentRef in useDragHandlers. * Swapped redundant `node.data as WorkspaceNodeData` casts for a named WorkspaceNode type alias. * Canvas.tsx SR-live region now reads n.parentId (matches MiniMap + RF's native field) instead of the mirror n.data.parentId. Tests added (918 total, up from 915): * batchNest happy path — 2-root selection fires 2 combined PATCHes carrying parent_id + x + y, not 2×N sequential round-trips. * batchNest ancestor+descendant selection — subtree stays intact. * batchNest partial failure rollback — only the rejected nodes revert; successful ones stay committed. Backend change is single-line (collapsed PATCH branch); all workspace_crud Go tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 19:58:44 -07:00
Hongming Wang	50b537849a	refactor(canvas): split Canvas.tsx into hooks; parallelize batchNest Two concerns in one commit (separate files, each self-contained): ## Canvas.tsx split (from ~680 to ~250 lines) Canvas.tsx was holding drag gesture state + keyboard shortcuts + viewport wiring + JSX. Each concern now lives in its own unit under canvas/src/components/canvas/: - dragUtils.ts — pure: shouldDetach, clampChildIntoParent, DETACH_FRACTION - DropTargetBadge.tsx — the floating "Drop into: <name>" label + the dashed ghost preview at the target slot - useDragHandlers.ts — encapsulates onNodeDragStart / Drag / Stop, findDropTarget hit-test, pendingNest state, and confirmNest/cancelNest. Routes multi- select drags through batchNest automatically. - useKeyboardShortcuts — Esc, Enter, Shift+Enter, Cmd+]/[, Z — one window listener, one source of truth. - useCanvasViewport — pan-to-node + zoom-to-team CustomEvent listeners and the debounced viewport save. Canvas.tsx becomes a thin composition + JSX file. No behavioural change; the refactor is covered by the existing 915 canvas tests. ## batchNest parallelization (2N round-trips → N, all in flight) Previously nestNode fired two sequential PATCHes (parent_id then x/y) and batchNest looped nestNode sequentially. For a 5-node selection on a typical ~200ms link this was ~2s of serialized RPCs. - nestNode now combines parent_id + x + y into ONE PATCH. The Go handler (workspace_crud.go Update) already reads all three from the same body — no backend change. - batchNest rewritten: compute every re-parent plan against one snapshot, commit a single set(), then fire N PATCHes via Promise.allSettled in parallel. Per-node failures roll back only that node (others stay committed) — same semantics as the single- node path, just concurrent. - The state math in the batch path also correctly shifts descendant zIndex by depthDelta when any re-parented node has a subtree. ## Also - canvas-topology.ts: reverted P3.12's opt-in rescue to the auto- rescue default. When a child's stored relative position would render it outside the parent bbox (the visual regression the user saw after collapse → reload — Hermes child drawn outside Claude Code Agent on first paint), the child is placed in the next default grid slot. The "Arrange Children" context command stays for bigger teams. All 915 canvas tests pass. No backend changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 19:43:18 -07:00
Hongming Wang	c5abed988e	fix(canvas): address review findings on playability pass Five Critical issues caught in code review of `f3423a51`. Each one broke an invariant the original commit claimed to uphold. 1. nestNode: descendants kept their old-depth zIndex after a re-parent. Now walks the dragged subtree and shifts every descendant's zIndex by the same depthDelta so "children above ancestors" survives moves between levels of the hierarchy. 2. bumpZOrder: siblings all share zIndex = depth in fresh topology, so a single +1 bump was identical for every sibling and subsequent bumps drifted zIndex unboundedly. Rewritten to sort siblings by current zIndex and swap the target with its neighbour in the bump direction — Figma-style reorder, stays within the sibling tier. 3. findDropTarget: depth-first tiebreaker lost to bumped siblings. The visually-frontmost card after Cmd+] is a shallow sibling, but the hit test picked the deepest nested card regardless. Swapped order so zIndex wins first, depth second, area third. Also pre-computes the depth map once per call (was O(n²) via repeated .find walks — will matter past ~30 workspaces). 4. arrangeChildren: saved absolute position using `slot + parent.position`, but parent.position is RELATIVE to its own parent when nested. Grandchildren's stored x/y were in the parent's local frame and reload placed them in the wrong spot. Now walks the full ancestor chain via absOf() to get the true canvas-absolute origin before PATCHing. 5. setCollapsed: naive flip of every descendant's hidden flag diverged from the topology rebuild on hydrate. Collapse A, collapse B, then expand A — C should stay hidden because B is still collapsed, but before this fix C was unhidden. Rewritten to recompute every descendant's hidden from the full ancestry chain, matching the topology pass byte-for-byte. New round-trip test asserts the two code paths produce identical node.hidden across a full lifecycle. Also: - Removed dead cascadeMessage constant (never rendered). - Replaced hardcoded 260/120 in zoom-to-team with exported constants. - arrangeChildren PATCH catch now logs instead of silently swallowing. - Added 70→76 tests: setCollapsed 3-chain scenarios, bumpZOrder swap semantics, edge-of-list no-op. All 915 canvas tests green. Backend untouched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 19:16:48 -07:00
Hongming Wang	f3423a513d	feat(canvas): industry-pattern playability pass (P1+P2+P3) Ships the full prioritized improvement list from the canvas research report — aligns our nesting/resize UX with Miro / FigJam / tldraw / Figma conventions. Organized by priority below. ## P1 — baseline playability * Hysteresis on drag-out detach (Miro): a child only un-nests when >=20% of its bbox is outside the parent on release. Prevents accidental un-nesting from twitchy drags. * Drop-target now uses tree-depth DESC, then zIndex DESC, then area ASC to pick targets when nested parents overlap (xyflow #2827). * Children render above ancestors by inheriting zIndex = parent + 1 in topology and on every nest/unnest (xyflow #4012). * Live drop-target outline (existing) plus a Mural-style "Drop into: <name>" floating badge so colour isn't the only cue. * growParentsToFitChildren now fires only on dimension-type changes inside onNodesChange (NodeResizer commits) and once on drag-stop — avoids tldraw's edge-chase artifact (P3.11 commit-on-release). ## P2 — polish * Whimsical-style ghost preview: dashed outline at the next default grid slot inside the drop-target parent during drag. * Alt-drag escape with soft clamp: dropping slightly outside a parent without Alt/Cmd snaps the child back inside (clampChildIntoParent); Alt releases the clamp to allow un-nest; Cmd/Ctrl force-detaches. * Figma-style keyboard hierarchy nav: Enter descends to first child, Shift+Enter ascends to parent, Cmd+]/[ re-orders siblings via the new bumpZOrder store action. * Multi-select re-parent preserves offsets: confirmNest routes through a new batchNest action when the primary drag is part of a batch selection (Lucidchart pattern). ## P3 — long-tail * Minimap now shows parent cards as filled regions with a blue stroke, so hierarchy reads at a glance without zooming. * Out-of-bounds rescue is opt-in: topology no longer silently re-lays children whose stored position is outside the parent bbox (Figma trust-the-data). The new Arrange Children context menu item runs the rescue on demand via arrangeChildren. * Cmd-drag force-detach regardless of hysteresis. * Collapse workspace: the existing Collapse Team action now toggles a local setCollapsed store action that hides every descendant and shrinks the parent card to header-only (Miro frame outline view). Growth pass skips collapsed parents so they don't push back out. All 910 canvas tests green. Backend untouched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 19:03:02 -07:00
Hongming Wang	d359390f83	fix(canvas): parent auto-fit sizing + rescue out-of-bounds children Two playability bugs in the new flat-cards layout: 1. On first load or fresh org import a parent had no explicit width or height, so children whose stored position sat inside their (eventual) parent's rectangle rendered visually outside the smaller default parent box. Compute a parent starting size in canvas-topology: • 2-column grid of child-default footprints + header/side padding • Grows per child count (2→1 row, 3-4→2 rows, etc.) and stamp it onto the Node's width/height so the first paint already contains every child. 2. If a child's stored relative position actually falls outside the parent's computed bounds (legacy org-imports at 0,0, pre-refactor absolute coordinates, manually-nudged rows), assign that child a deterministic default grid slot inside the parent instead. Runtime cascade: added growParentsToFitChildren to onNodesChange so when the user drags or resizes a child past the parent's current bounds, the parent grows to contain it (+padding). Miro/FigJam-style frame auto-fit — grow-only, never shrinks under the user's manual resize. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 18:29:04 -07:00
Hongming Wang	cc194f0b7e	refactor(canvas): flat workspace cards with React Flow native parenting Every workspace now renders as a first-class card on the canvas regardless of parent_id. The old "parent card contains mini TeamMember chips" layout is gone — if B is parented to A, B renders as a full card inside A's coordinate space using React Flow's `parentId` binding, so moving A carries B along and children have the same detail + actions as root cards. Details: - canvas-topology.ts: topologically sort parents before children (React Flow ordering requirement), compute each child's RF-native parentId + relative position on load. DB keeps absolute x/y; the abs→rel conversion happens here, reverse translation in Canvas.onNodeDragStop before savePosition PATCHes the DB. - WorkspaceNode.tsx: delete the EmbeddedTeam + TeamMemberChip blocks, simplify the size classes, and add NodeResizer (visible when selected) so users can drag any edge/corner to grow or shrink. Parent cards default to a larger min size so nested children have breathing room. - Canvas.tsx drop targeting rewritten: bounds-based hit test against each node's measured absolute bbox, deepest match wins. Fixes two prior bugs at once — dropping onto Claude Code with a nested same- named Hermes no longer picks the wrong node, and the target can now be a nested workspace when that's where the pointer actually released. - canvas.ts nestNode + removeNode: translate position between old and new parent's absolute origin on nest/unnest so the card doesn't jump, and re-point the RF `parentId` alongside `data.parentId` on reparent. - Tests: hidden-flag assertions replaced with parentId checks; obsolete TeamMemberChip a11y/eject tests deleted (the UI component no longer exists). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 18:18:44 -07:00
Hongming Wang	8a07cf4035	fix(canvas): skip already-nested workspaces as drop targets Dragging one workspace onto another could pick a nested child as the "nearest" drop target instead of the visible parent card the user actually hovered. The effect: dropping a free-floating Hermes Agent onto a Claude Code Agent that already had a Hermes Agent nested inside showed "Move 'Hermes Agent' inside 'Hermes Agent'?" — the confirmation referenced the nested same-named child, not Claude Code. Why: getIntersectingNodes returns every overlapping node, including hidden=true children that render inside their parent's card. The parent and child share bounding boxes, so the child often "won" the nearest-distance check. Filter them out at the source: a node that's already got a parentId (or is hidden) is never a valid top-level drop target. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 17:49:01 -07:00
Hongming Wang	7356cf8d3a	fix(chat): clear sending spinner when any path delivers the reply Two latent bugs kept the "Processing with Claude Code..." timer ticking after the agent had already answered: 1. The A2A_RESPONSE store handler wrote into agentMessages[workspaceId] (no prefix) but ChatTab's "clear sending" effect subscribed to agentMessages["a2a:" + workspaceId]. Keys never matched — the effect was dead code from day one. Removed the dead subscription and moved the setSending(false) into the pendingAgentMsgs effect so any reply delivered via a WS push (Claude Code SDK, Hermes's send_message_to_user) also closes the spinner. 2. Added an activity-log fallback: when the platform emits a successful a2a_receive ACTIVITY_LOGGED for this workspace, clear sending and stop the timer. That covers the "runtime answered but we never saw the store message" case Claude Code exhibited tonight — the HTTP request can stay in flight while the SDK already pushed its reply. Symmetric a2a_receive error path also clears sending and surfaces the error message, so a runtime-side failure no longer hangs the UI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 17:43:30 -07:00
Hongming Wang	e337efe974	fix(canvas): propagate runtime through WORKSPACE_PROVISIONING event The side-panel runtime pill read "unknown" for newly-deployed workspaces because canvas-events.ts created the node from WORKSPACE_PROVISIONING payload — and the payload only carried name + tier. No refetch filled the gap during provisioning, so the user saw "RUNTIME unknown" on the card even though the DB row had the real runtime set. Includes runtime in every WORKSPACE_PROVISIONING emitter: * handlers/workspace.go — initial create * handlers/workspace_restart.go — explicit restart, auto-restart, and crash-recovery resume loop * handlers/org_import.go — multi-workspace org imports Canvas-side: canvas-events.ts reads payload.runtime when creating the node; the provisioning test asserts the pill value is populated before any refetch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 17:17:49 -07:00
Hongming Wang	dc50a1c775	refactor(canvas): data-drive provider picker from template config.yaml The MissingKeysModal's provider list was hardcoded in deploy-preflight.ts as RUNTIME_PROVIDERS — a per-runtime map that duplicated what each template repo already declares in its config.yaml. That meant adding a new provider required changes in two places, and the UI could drift out of sync with the actual template (e.g. when a template adds a MiniMax or Kimi model, the picker wouldn't know). The single source of truth for "which env vars does this workspace need" is each template's config.yaml: * `runtime_config.models[].required_env` — per-model key list * `runtime_config.required_env` — runtime-level AND list Go /templates already returned `models`. This change: * Adds `required_env` alongside `models` on templateSummary so the canvas receives the full picture. * Rewrites deploy-preflight.ts to derive ProviderChoice[] from a template object via `providersFromTemplate(template)`: - groups `models[]` by unique required_env tuple - falls back to runtime_config.required_env when models is empty - decorates labels with model counts (e.g. "OpenRouter (14 models)") * `checkDeploySecrets(template, workspaceId?)` now takes a template object instead of a runtime string. Any-provider satisfaction still short-circuits preflight to ok=true. * MissingKeysModal receives `providers` directly; no more lookups. * TemplatePalette threads `template.models` + `template.required_env` into the preflight. Side effects: * Claude Code's dual-auth (OAuth token OR Anthropic API key) now surfaces as two picker options — its config.yaml already declared both, the UI just wasn't reading them. * Hermes picker now shows 8 provider options (Nous, OpenRouter, Anthropic, Gemini, DeepSeek, GLM, Kimi, Kilocode) instead of the hand-picked 3, matching its 35-model reality. Removed the legacy RUNTIME_PROVIDERS / RUNTIME_REQUIRED_KEYS / getRequiredKeys / findMissingKeys exports; MissingKeysModal.test.tsx deleted (its coverage is subsumed by the new template-driven deploy-preflight.test.ts). 58 modal-adjacent tests pass; full canvas suite 919 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 17:07:15 -07:00
Hongming Wang	c5bcd7298c	Merge remote-tracking branch 'origin/staging' into fix/restore-quickstart-plus-hotfixes # Conflicts: # workspace-server/internal/handlers/ssrf.go	2026-04-23 16:42:41 -07:00
Hongming Wang	baa7e1531f	feat(canvas): provider-picker MissingKeysModal for multi-provider runtimes Runtimes like Hermes and LangGraph accept any one of several LLM provider keys (OpenRouter OR OpenAI OR Anthropic OR Nous-native). Before this change, the missing-keys modal treated all supported providers as simultaneously required — a fresh user on Hermes was asked for three parallel API keys when any one suffices. Introduces RUNTIME_PROVIDERS in deploy-preflight.ts as the canonical per-runtime provider list (label, envVar, note). checkDeploySecrets now returns all alternatives as missingKeys when nothing is configured, so the modal can offer a picker. MissingKeysModal dispatches between two render paths: * ProviderPickerModal — radio list of supported providers, a single env input for the chosen one. Saving that one key satisfies the preflight. Activated whenever the runtime has ≥2 provider choices. * AllKeysModal — legacy parallel-inputs UX, all keys must be saved before deploy. Kept for single-provider runtimes (claude-code, gemini-cli) and callers that pass unrelated-key lists. Dual-mode preserves the pre-existing contract for every caller while fixing the multi-provider UX. All 930 canvas vitest tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 16:41:09 -07:00
Hongming Wang	03b56fa5af	fix(canvas): collapse Org Templates section by default in palette The TemplatePalette's Org Templates section rendered all cards inline, each ~120 px tall (name + description + "Import org" button). With 4 org templates on disk that's ~500 px of drawer height — the individual workspace templates at the top (AutoGen / LangGraph / Hermes / …) got pushed off-screen, which is the exact complaint from the test session ("templates still 90% org, cant even see normal workspace template"). Collapsed the Org Templates section by default. The header now toggles with an ▶ caret and shows the count ("Org Templates (4)"). Clicking expands to reveal the full card list; clicking again collapses. Persists only within a session — fresh mounts start collapsed so the primary deploy path stays visible. Individual workspace templates are the usual starting point (pick a runtime, deploy one agent), while org templates are a heavier "deploy this whole pre-built team" action. Making the second expandable matches the relative frequency. - `TemplatePalette.tsx::OrgTemplatesSection` — added `expanded` state (default false), wrapped the cards in `{expanded && …}`, turned the header into a toggle button with `aria-expanded` + `aria-controls`. - `__tests__/OrgTemplatesSection.test.tsx` — 3 new rendering tests: collapsed-by-default (cards absent), click expands (cards appear), click again collapses (cards gone). Mocks /org/templates with a 2-entry response so the count assertion is stable. Full canvas vitest: 930/930 pass (up from 927). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 16:24:49 -07:00
Hongming Wang	b4719ad070	fix(canvas): Legend avoids TemplatePalette + silence WS handshake races ### Two unrelated but small UI fixes surfaced while testing the Canvas 1. Legend hidden under the open TemplatePalette. Legend is `fixed bottom-6 left-4 z-30`. TemplatePalette's drawer (when open) is `fixed top-0 left-0 w-[280px] z-30` — same z-index, same left-edge column. The Legend overlapped the palette's bottom 180 px. Published the palette-open state to the canvas store so the Legend can shift right (to `left-[296px]` — 280 px palette + 16 px gap) while the palette is open, animated via a 200 ms `transition-[left]` to match the palette's slide. Closes cleanly back to `left-4` when the palette is dismissed. Files: - `store/canvas.ts` — added `templatePaletteOpen` + `setTemplatePaletteOpen`. - `TemplatePalette.tsx` — calls `setTemplatePaletteOpen(open)` on every open/close transition via a new useEffect. - `Legend.tsx` — reads the flag and swaps `left-4` <-> `left-[296px]`. 2. "WebSocket is closed before the connection is established" spam. Two components (`ChatTab`, `AgentCommsPanel`) open their own short- lived WebSocket to tail the ACTIVITY_LOGGED stream. Their cleanup path called `ws.close()` unconditionally, which trips a browser console warning when React StrictMode re-runs the effect in dev and the handshake hasn't completed yet. Confirmed via DevTools console on the running canvas. Added a `closeWebSocketGracefully(ws)` helper in `lib/ws-close.ts`: - OPEN / CLOSING → close immediately (normal path). - CONNECTING → defer close to the 'open' listener so the browser sees a full handshake. Also wires an 'error' listener that cancels the queued close if the handshake fails (no double-close). - CLOSED → no-op. Both consumers now call the helper in their useEffect cleanup. Silences the warning without changing observable behaviour. ### Tests `canvas/src/lib/__tests__/ws-close.test.ts` — 5 cases with a fake WebSocket covering each readyState branch plus the error-before-open cancellation path. Full vitest suite: 927/927 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 16:03:01 -07:00
Hongming Wang	5eb5e38c59	fix(canvas): re-centre Toolbar on canvas area when SidePanel is open When a workspace is selected the SidePanel (fixed, right-0, z-50) opens from the right edge and covers the right third of the viewport. The Toolbar at the top was positioned `fixed top-3 left-1/2 -translate-x-1/2 z-20` — centred on the full viewport, not the remaining canvas area. Consequence: the right half of the Toolbar (Audit / Search / Help / Settings) was hidden behind the panel as soon as the user clicked any workspace. Fix: publish the live SidePanel width to the canvas store and read it in Toolbar. When a node is selected, shift the Toolbar LEFT by `sidePanelWidth / 2` so its centre lines up with the middle of the remaining canvas area. Animated via a 200 ms `transition-[margin-left]` to match the SidePanel's own slide-in easing. - `store/canvas.ts` — added `sidePanelWidth` + `setSidePanelWidth`. Default 480 (matches SIDEPANEL_DEFAULT_WIDTH). - `SidePanel.tsx` — calls `setSidePanelWidth(width)` on every width change so the store stays in sync with localStorage. - `Toolbar.tsx` — reads `sidePanelWidth`, applies a negative `marginLeft` style when `selectedNodeId` is non-null. - `SidePanel.tabs.test.tsx` — added `setSidePanelWidth: vi.fn()` to the mocked store state so SidePanel's new useEffect has a callable to invoke. 18 previously-passing tests now pass again. No visual regression when no workspace is selected — the toolbar stays in its original centred position. SaaS canvas unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 15:57:12 -07:00
Hongming Wang	a0ac72f725	test(canvas): update a11y tests for T3 default tier CreateWorkspaceDialog.a11y.test.tsx's two tier-button tests assumed T1 was the default selection. After the previous commit flipped the non-SaaS default to T3, the radio group's default-selected button changed accordingly. Updated: - "tier buttons have role=radio and aria-checked reflects selection" — T3 is now `aria-checked="true"`, T1 is the "unselected" foil we click to verify the flip. - "selected radio has tabIndex=0, others have tabIndex=-1" — T3 is the tabindex=0 member now. The roving-tabIndex and ArrowDown / ArrowRight tests further down the file start by explicitly clicking/focusing T1 or T2, so they're unaffected by the default change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 15:37:23 -07:00
Hongming Wang	2baaa977c7	feat(quickstart): default new agents to T3 (Privileged) Default tier for a newly-created workspace was T1 (Sandboxed) on self-hosted and T4 (Full Access) on SaaS. Real work needs at minimum a read_write workspace mount + Docker daemon access — that's T3 ("Privileged") per the tier ladder in CreateWorkspaceDialog. The user-visible consequence was that clicking "Deploy" on almost any template landed in a sandbox that couldn't actually run the agent's tooling until the user knew to bump the tier manually. ### Changes Platform (Go) — default tier flipped from 1→3 in two places so API callers (Canvas, molecli, org import) all get the same default: - `handlers/workspace.go`: `POST /workspaces` default when `tier` is omitted from the request body. - `handlers/template_import.go`: `generateDefaultConfig` writes `tier: 3` into the auto-generated `config.yaml` for bundle imports that don't declare one. Canvas — `CreateWorkspaceDialog.tsx` self-hosted form default flipped from T1→T3. SaaS stays at T4 (each SaaS workspace runs on its own sibling EC2, so the shared-blast-radius reasoning doesn't apply and we can safely go a tier higher). ### Tests Updated every sqlmock assertion that anchored on the old `tier=1` default: - `handlers_test.go::TestWorkspaceCreate` — default-path INSERT now expects `3`. - `handlers_additional_test.go::TestWorkspaceCreate_WithParentID` — same. - `workspace_test.go::TestWorkspaceCreate_DBInsertError` / `TestWorkspaceCreate_WithSecrets_Persists` — same. - `workspace_test.go::TestWorkspaceCreate_TemplateDefaults*` — same (current handler semantics ignore the template's `tier:` field and fall through to the default; kept tests faithful to the implementation, left a comment flagging the latent inconsistency). - `workspace_budget_test.go::TestWorkspaceBudget_Create_WithLimit` — same. - `template_import_test.go::TestGenerateDefaultConfig` — asserts `tier: 3` now. All `go test -race ./internal/handlers/` pass. Canvas `CreateWorkspaceDialog` tests don't assert the default tier (they only reference `tier` as prop data on stub workspaces) so no test update needed on that side. ### SaaS parity Zero behaviour change on hosted SaaS. The Go-side default only fires when the Canvas (or any caller) omits `tier` from the request body. The SaaS Canvas explicitly passes `tier: 4` from the CreateWorkspaceDialog `isSaaS ? 4 : 3` branch, so the Go default never runs on a SaaS request. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 15:34:22 -07:00
molecule-ai[bot]	70ff4252a8	Merge branch 'staging' into fix/config-tab-runtime-model-hermes	2026-04-23 22:11:06 +00:00
Hongming Wang	06273b11ef	fix(canvas/config): load runtime+model from workspace metadata + hide misleading config.yaml error for hermes Canvas Config tab had 3 bugs visible on hermes workspaces (#1894): 1. Runtime dropdown showed "LangGraph (default)" even when the workspace's actual runtime was hermes — because the form only loaded runtime from config.yaml, and hermes doesn't use the platform's config.yaml template. 2. Model field was empty for the same reason. 3. "No config.yaml found" error appeared on hermes workspaces despite everything being fine — hermes manages its own config at ~/.hermes/config.yaml on the workspace host. Worse, clicking Save with the empty form would silently flip `runtime` back from `hermes` to `LangGraph (default)`. ## Fix - loadConfig now always fetches workspace metadata (runtime + model) via GET /workspaces/:id and GET /workspaces/:id/model BEFORE attempting the config.yaml fetch. These act as the source of truth for runtime and model when config.yaml doesn't set them. - RUNTIMES_WITH_OWN_CONFIG set lists runtimes that manage their own config outside the platform template (hermes, external). For these: - Missing config.yaml is NOT an error — no red banner shown. - An informational gray banner tells the user where to edit the runtime's config (e.g. "edit ~/.hermes/config.yaml via Terminal tab or the hermes CLI" for hermes). Closes #1894. Verified 2026-04-23 on user's hongmingwang tenant which runs hermes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 14:58:36 -07:00
Hongming Wang	de99a22ffc	fix(quickstart): hotfixes discovered during live testing session Five additional breakages surfaced while testing the restored stack end-to-end (spin up Hermes template → click node → open side panel → configure secrets → send chat). Each fix is narrowly scoped and has matching unit or e2e tests so they don't regress. ### 1. SSRF defence blocked loopback A2A on self-hosted Docker handlers/ssrf.go was rejecting `http://127.0.0.1:<port>` workspace URLs as loopback, so POST /workspaces/:id/a2a returned 502 on every Canvas chat send in local-dev. The provisioner on self-hosted Docker publishes each container's A2A port on 127.0.0.1:<ephemeral> — that's the only reachable address for the platform-on-host path. Added `devModeAllowsLoopback()` — allows loopback only when MOLECULE_ENV ∈ {development, dev}. SaaS (MOLECULE_ENV=production) continues to block loopback; every other blocked range (metadata 169.254/16, TEST-NET, CGNAT, link-local) stays blocked in dev mode. Tests: 5 new tests in ssrf_test.go covering dev-mode loopback, dev-mode short-alias ("dev"), production still blocks loopback, dev-mode still blocks every other range, and a 9-case table test of the predicate with case/whitespace/typo variants. ### 2. canvas/src/lib/api.ts: 401 → login redirect broke localhost Every 401 called `redirectToLogin()` which navigates to `/cp/auth/login`. That route exists only on SaaS (mounted by the cp_proxy when CP_UPSTREAM_URL is set). On localhost it 404s — users landed on a blank "404 page not found" instead of seeing the actual error they should fix. Gated the redirect on the SaaS-tenant slug check: on <slug>.moleculesai.app, redirect unchanged; on any non-SaaS host (localhost, LAN IP, reserved subdomains like app.moleculesai.app), throw a real error so the calling component can render a retry affordance. Tests: 4 new vitest cases in a dedicated api-401.test.ts (needs jsdom for window.location.hostname) — SaaS redirects, localhost throws, LAN hostname throws, reserved apex throws. ### 3. SecretsSection rendered a hardcoded key list config/secrets-section.tsx shipped a fixed COMMON_KEYS list (Anthropic / OpenAI / Google / SERP / Model Override) regardless of what the workspace's template actually needed. A Hermes workspace declaring MINIMAX_API_KEY in required_env got five irrelevant slots and nothing for the key it actually needed. Made the slot list template-driven via a new `requiredEnv?: string[]` prop passed down from ConfigTab. Added `KNOWN_LABELS` for well-known names and `humanizeKeyName` to turn arbitrary SCREAMING_SNAKE_CASE into a readable label (e.g. MINIMAX_API_KEY → "Minimax API Key"). Acronyms (API, URL, ID, SDK, MCP, LLM, AI) stay uppercase. Legacy fallback preserved when required_env is empty. Tests: 8 new vitest cases covering known-label lookup, humanise fallback, acronym preservation, deduplication, and both fallback paths. ### 4. Confusing placeholder in Required Env Vars field The TagList in ConfigTab labelled "Required Env Vars (from template)" is a DECLARATION field — stores variable names. The placeholder "e.g. CLAUDE_CODE_OAUTH_TOKEN" suggested that, but users naturally typed the value of their API key into the field instead. The actual values go in the Secrets section further down the tab. Relabelled to "Required Env Var Names (from template)", changed the placeholder to "variable NAME (e.g. ANTHROPIC_API_KEY) — not the value", and added a one-line helper below pointing to Secrets. ### 5. Agent chat replies rendered 2-3 times Three delivery paths can fire for a single agent reply — HTTP response to POST /a2a, A2A_RESPONSE WS event, and a send_message_to_user WS push. Paths 2↔3 were already guarded by `sendingFromAPIRef`; path 1 had no guard. Hermes emits both the reply body AND a send_message_to_user with the same text, which manifested as duplicate bubbles with identical timestamps. Added `appendMessageDeduped(prev, msg, windowMs = 3000)` in chat/types.ts — dedupes on (role, content) within a 3s window. Threaded into all three setMessages call sites. The window is short enough that legitimate repeat messages ("hi", "hi") from a real user/agent a few seconds apart still render. Tests: 8 new vitest cases covering empty history, different content, duplicate within window, different roles, window elapsed, stale match, malformed timestamps, and custom window. ### 6. New end-to-end regression test tests/e2e/test_dev_mode.sh — 7 HTTP assertions that run against a live platform with MOLECULE_ENV=development and catch regressions on all the dev-mode escape hatches in a single pass: AdminAuth (empty DB + after-token), WorkspaceAuth (/activity, /delegations), AdminAuth on /approvals/pending, and the populated /org/templates response. Shellcheck-clean. ### Test sweep - `go test -race ./internal/handlers/ ./internal/middleware/ ./internal/provisioner/` — all pass - `npx vitest run` in canvas — 922/922 pass (up from 902) - `shellcheck --severity=warning infra/scripts/setup.sh tests/e2e/test_dev_mode.sh` — clean - `bash tests/e2e/test_dev_mode.sh` — 7/7 pass against a live platform + populated template registry ### SaaS parity Every relaxation remains conditional on MOLECULE_ENV=development. Production tenants run MOLECULE_ENV=production (enforced by the secrets-encryption strict-init path) and always set ADMIN_TOKEN, so none of these code paths fire on hosted SaaS. Behaviour on real tenants is byte-for-byte unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 14:57:18 -07:00
Hongming Wang	a93bd58b59	fix(quickstart): keep Canvas working post first workspace + hide SaaS cookie banner on localhost Follow-up to the previous commit on this branch. Two additional fresh-clone regressions surfaced during end-to-end verification, both affecting local dev only and both landing inside the same SaaS-vs-local-dev seam: ### 1. Canvas 401-loops after first workspace creation `GET /workspaces` is behind `AdminAuth` (router.go:121 — "C1: unauthenticated workspace topology exposure"). The middleware has a Tier-1 fail-open branch that only fires when no workspace tokens exist anywhere in the DB. The moment a user creates their first workspace — via either the Canvas UI, the API, or the e2e-api test suite — a token lands in the DB, Tier-1 closes, and the Canvas (which has no bearer token in local dev: no WorkOS session, no NEXT_PUBLIC_ADMIN_TOKEN baked in at build time) gets 401 on every list call. The UI renders a stuck "API GET /workspaces: 401 admin auth required" placeholder forever. SaaS is unaffected because hosted provisioning always sets both `ADMIN_TOKEN` and `MOLECULE_ENV=production`, and the Canvas there either carries a WorkOS session cookie or `NEXT_PUBLIC_ADMIN_TOKEN` baked into the JS bundle. Fix (`workspace-server/internal/middleware/wsauth_middleware.go`): add a narrow Tier-1b escape hatch that stays fail-open when both `ADMIN_TOKEN` is unset and `MOLECULE_ENV` is explicitly a dev mode ("development" / "dev"). Production never hits it (SaaS sets `MOLECULE_ENV=production`). Mirrors the existing convention in `handlers/admin_test_token.go` which gates the e2e test-token endpoint on `MOLECULE_ENV != "production"`. Three new regression tests in `wsauth_middleware_test.go`: - `TestAdminAuth_DevModeEscapeHatch_FailsOpenWithHasLiveTokens` — the happy path (dev mode, no admin token, tokens exist → 200) - `TestAdminAuth_DevModeEscapeHatch_IgnoredWhenAdminTokenSet` — explicit `ADMIN_TOKEN` wins; dev mode does not silently re-open the gate - `TestAdminAuth_DevModeEscapeHatch_IgnoredInProduction` — the SaaS-safety guarantee (production + no admin token + tokens exist → 401) `.env.example` flipped to set `MOLECULE_ENV=development` by default so new users get the dev-mode hatch automatically via `cp .env.example .env`. SaaS provisioning overrides to `production`, consistent with the existing convention used by the secrets-encryption strict-init path. ### 2. SaaS cookie/privacy banner rendered on localhost `CookieConsent` mounted unconditionally in the root layout, so `npm run dev` on localhost showed a "Cookies & your privacy" banner pointing at `moleculesai.app/legal/privacy`. That banner is a GDPR/ePrivacy compliance UI that only applies to the hosted SaaS offering; self-hosted / local-dev / Vercel-preview hosts must not see it. Fix (`canvas/src/components/CookieConsent.tsx`): gate render on `isSaaSTenant()`. Matches the convention used by `AuthGate` and the workspace tier picker elsewhere in the codebase. Tests (`canvas/src/components/__tests__/CookieConsent.test.tsx`): existing tests now stub `window.location.hostname` to a SaaS subdomain before rendering (required since `isSaaSTenant()` on jsdom's default "localhost" would suppress the banner). Added two new tests for the local-dev hide path: - `does NOT render on local dev (non-SaaS hostname)` - `does NOT render on a LAN hostname (192.168., .local)` ### Verification On a fresh-nuked DB with the updated branch: 1. `bash infra/scripts/setup.sh` — clean 2. `go run ./cmd/server` — "Applied 41 migrations", :8080 healthy, dev-mode hatch armed (`MOLECULE_ENV=development`) 3. `npm run dev` in canvas — :3000 renders, no cookie banner 4. `bash tests/e2e/test_api.sh` — 61 passed, 0 failed (test suite creates tokens; GET /workspaces stays 200 under the hatch) 5. Browser at http://localhost:3000 AFTER the e2e run: - Canvas renders the workspace list (no 401 placeholder) - No cookie banner 6. `npx vitest run` — 902 tests passed (900 prior + 2 new hide tests) 7. `go test -race ./internal/middleware/` — all passing (3 new dev-mode tests + existing Issue-180 / Issue-120 / Issue-684 suite), coverage 81.8% ### SaaS parity audit Same principle as the rest of this branch: local must work without weakening SaaS. - Dev-mode hatch: conditional on `MOLECULE_ENV=development`. Production tenants always run `MOLECULE_ENV=production` (already enforced by the secrets-encryption `InitStrict` path in `internal/crypto/aes.go`). Branch is unreachable there. - Cookie banner: gated on `isSaaSTenant()` which checks `NEXT_PUBLIC_SAAS_HOST_SUFFIX` (default `.moleculesai.app`). SaaS hosts still get the banner; every other host doesn't. No change to SaaS behaviour. #1822 backend-parity tracker untouched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 14:55:33 -07:00
molecule-ai[bot]	f18e261353	Merge branch 'staging' into fix/auth-redirect-loop	2026-04-23 20:38:18 +00:00
Hongming Wang	9ad803a802	fix(quickstart): make README cp-paste flow bugless end-to-end (#1871 ) Reproducing the README's quickstart on a clean clone surfaced seven independent bugs between `git clone` and seeing the Canvas in a browser. Each fix is minimal and local-dev-only — the SaaS/EC2 provisioner path (issue #1822) is untouched. Bugs fixed: 1. `infra/scripts/setup.sh` applied migrations via raw psql, bypassing the platform's `schema_migrations` tracker. The platform then re-ran every migration on first boot and crashed on non-idempotent ALTER TABLE statements (e.g. `036_org_api_tokens_org_id.up.sql`). Dropped the migration block — `workspace-server/internal/db/postgres.go:53` already tracks and skips applied files. 2. `.env.example` shipped `DATABASE_URL=postgres://USER:PASS@postgres:...` with literal `USER:PASS` placeholders and the Docker-internal hostname `postgres`. A `cp .env.example .env` followed by `go run ./cmd/server` on the host failed with `dial tcp: lookup postgres: no such host`. Replaced with working `dev:dev@localhost:5432` defaults that match `docker-compose.infra.yml`. 3. `docker-compose.infra.yml` and `docker-compose.yml` set `CLICKHOUSE_URL: clickhouse://...:9000/...`. Langfuse v2 rejects anything other than `http://` or `https://`, so the container crash-looped and returned HTTP 500. Switched to `http://...:8123` (HTTP interface) and added `CLICKHOUSE_MIGRATION_URL` for the migration-time native-protocol connection. Also removed `LANGFUSE_AUTO_CLICKHOUSE_MIGRATION_DISABLED` so migrations actually run. 4. `canvas/package.json` dev script crashed with `EADDRINUSE :::8080` when `.env` was sourced before `npm run dev` — Next.js reads `PORT` from env and the platform owns 8080. Pinned `dev` to `-p 3000` so sourced env can't hijack it. `start` left as-is because production `node server.js` (Dockerfile CMD) must respect `PORT` from the orchestrator. 5. README/CONTRIBUTING told users to clone `Molecule-AI/molecule-monorepo` — that repo 404s; the actual name is `molecule-core`. The Railway and Render deploy buttons had the same broken URL. Replaced in both English and Chinese READMEs and in CONTRIBUTING. Internal identifiers (Go module path, Docker network `molecule-monorepo-net`, Python helper `molecule-monorepo-status`) deliberately left alone — renaming those is an invasive refactor orthogonal to this fix. 6. README quickstart was missing `cp .env.example .env`. Users who went straight from `git clone` to `./infra/scripts/setup.sh` got a script that warned about an unset `ADMIN_TOKEN` (harmless) but then couldn't run the platform without figuring out the env setup on their own. Added the step in both READMEs and CONTRIBUTING. Deliberately NOT generating `ADMIN_TOKEN`/`SECRETS_ENCRYPTION_KEY` here — the e2e-api suite (`tests/e2e/test_api.sh`) assumes AdminAuth fallback mode (no server-side `ADMIN_TOKEN`), which is how CI runs it. 7. CI shellcheck only covered `tests/e2e/.sh` — `infra/scripts/setup.sh` is in the critical path of every new-user onboarding but was never linted. Extended the `shellcheck` job and the `changes` filter to cover `infra/scripts/`. `scripts/` deliberately excluded until its pre-existing SC3040/SC3043 warnings are cleaned up separately. Verification (fresh nuke-and-rebuild following the updated README): - `docker compose -f docker-compose.infra.yml down -v` + `rm .env` - `cp .env.example .env` → defaults work as-is - `bash infra/scripts/setup.sh` — clean, no migration errors, all 6 infra containers healthy - `cd workspace-server && go run ./cmd/server` — "Applied 41 migrations (0 already applied)", platform on :8080/health 200 - `cd canvas && npm install && npm run dev` — Canvas on :3000/ 200 even with `.env` sourced (PORT=8080 in env) - `bash tests/e2e/test_api.sh` — 61 passed, 0 failed* - `cd canvas && npx vitest run` — 900 tests passed - `cd canvas && npm run build` — production build clean - `shellcheck --severity=warning infra/scripts/*.sh` — clean - Langfuse `/api/public/health` 200 (was 500) Scope notes: - SaaS/EC2 parity (issue #1822): all files touched here are local-dev surface. Canvas container uses `node server.js` with `ENV PORT=3000` in `canvas/Dockerfile` — the `-p 3000` pin in `package.json` dev script only affects `npm run dev`, not the production CMD. - Test coverage (issue #1821): project policy is tiered coverage floors, not a blanket 100% target. Files touched here are shell scripts, YAML, Markdown, and one package.json script — not classes covered by the coverage matrix. - No overlap with open PRs — searched `setup.sh`, `quickstart`, `langfuse`, `clickhouse`, `migration`, `README`; nothing conflicts. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>	2026-04-23 19:53:43 +00:00
Hongming Wang	2c3eccf9d6	test(auth): provide window.location.pathname in redirectToLogin mocks The pathname.startsWith() loop-break added to redirectToLogin needs pathname on the mock Location object; tests were supplying only href. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 11:16:22 -07:00
rabbitblood	b360a4353f	fix(auth): redirect to app.moleculesai.app for login, not tenant subdomain Tenant subdomains (hongmingwang.moleculesai.app) proxy to EC2 platform which has no /cp/auth/* routes. Auth UI lives on app.moleculesai.app. Added getAuthOrigin() that detects SaaS tenant hosts and redirects to the app subdomain for login/signup. Non-SaaS hosts (localhost, dev) fall back to PLATFORM_URL as before. [Molecule-Platform-Evolvement-Manager] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-23 11:16:22 -07:00
rabbitblood	6730c7713d	fix(auth): redirect to login on 401 from any API call When session credentials expire mid-use, ALL API calls return 401. Previously this threw a generic error that crashed the UI with no recovery path. Now the API client intercepts 401 and redirects to login once (via redirectToLogin which already guards against loops). Combined with the AuthGate /cp/auth/* path guard, this gives the correct behavior: credentials lost → redirect to login → user logs in → return_to sends them back. [Molecule-Platform-Evolvement-Manager] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-23 11:16:22 -07:00
rabbitblood	edc42b2893	fix(auth): break infinite redirect loop on /cp/auth/login AuthGate redirected anonymous users to /cp/auth/login?return_to=<url>, but the login page itself triggered AuthGate, which redirected again with double-encoded return_to. Each redirect added another encoding layer until the URL exceeded 431 (Request Header Fields Too Large). Two guards: 1. redirectToLogin() returns early if already on /cp/auth/* path 2. AuthGate skips redirect check entirely for /cp/auth/* paths [Molecule-Platform-Evolvement-Manager] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-23 11:16:22 -07:00
Hongming Wang	dc476153c1	Merge remote-tracking branch 'origin/staging' into promote/main-to-staging-2026-04-23 # Conflicts: # canvas/src/components/__tests__/ContextMenu.keyboard.test.tsx	2026-04-23 09:50:16 -07:00
Molecule AI App-FE	8f7808642a	fix(test): add getState to useCanvasStore mock in ContextMenu keyboard test PR #1781 introduced useCanvasStore.getState() call in ContextMenu.tsx (line 169) but the existing Vitest mock for useCanvasStore in the keyboard test file lacked a getState method, causing: TypeError: useCanvasStore.getState is not a function Fix: attach getState: () => mockStore to the mock using Object.assign so the static method is available alongside the selector fn. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-23 16:43:08 +00:00
Hongming Wang	47dc72c6b3	chore: promote main → staging (52 commits, 2 conflicts resolved) Brings the staging branch up to date with main's feature-fix stream so every staging-targeted PR stops tripping on pre-existing rot. Before this merge, staging had 30+ compile + test failures from fix PRs that landed on main but never reached staging — primarily #1755's panic- cascade + schema-drift alignments. After this merge the handlers package goes from 30+ fails → 2 pre- existing nil-docker test panics (TestCopyFilesToContainer_CWE22_ RejectsTraversal + TestDeleteViaEphemeral_F1085_RejectsTraversal), both authored on staging and broken before this promotion. Tracked separately; not a merge regression. ## Conflicts resolved 1. docs/marketing/campaigns/discord-adapter-announcement/announcement.md — deleted on main (`9d0d213`: "move sensitive strategy + research to internal repo"), modified on staging. Deletion wins: marketing content moved out of the public monorepo per that commit's intent. The content lives in the internal repo. 2. workspace-server/internal/handlers/container_files.go — staging's rmTarget version kept. Main's version had `Cmd: []string{"rm", "-rf", "/configs/" + filePath}` which concatenates raw filePath AFTER the prefix-check on rmTarget, defeating the path-traversal guard (a "../etc/passwd" input passes validation but the rm cmd then traverses). Staging's `Cmd: []string{"rm", "-rf", rmTarget}` uses the validated path. Keeping staging's more-secure variant. ## Includes build unblockers from #1769 / #1782 - terminal.go: malformed handleLocalConnect repaired - terminal_test.go: missing braces in TestHandleConnect_RoutesToLocal - workspace_crud.go: unused imports + duplicate strField block - container_files_test.go: duplicate contains() removed (uses the one in workspace_provision_test.go, same package) ## Verification - go build ./... ✅ clean - go vet ./... ✅ clean - go test -race ./... — 18/20 packages green; 2 test panics in internal/handlers are pre-existing on staging (documented above)	2026-04-23 08:51:01 -07:00
Hongming Wang	68ee76c6b7	fix(canvas): add getState to useCanvasStore mock in ContextMenu keyboard test ContextMenu.tsx reads parent-workspace children via useCanvasStore.getState().nodes.filter(...) — a direct .getState() call, not the selector-calling form. The existing vi.mock exposed only the selector form, so rendering crashed with "TypeError: useCanvasStore.getState is not a function". Restructure the vi.mock factory to return Object.assign(fn, { getState: () => mockStore }) so both call shapes resolve. Factory body builds the function locally because vi.mock hoists above outer-scope variable declarations and can't reference `mockStore` via closure. Verified: all 15 tests in the file pass after the change. Unblocks the Canvas (Next.js) CI check on PR #1743 (staging→main sync). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 01:49:34 -07:00
Hongming Wang	d4cead5002	chore: extract ContextMenu Zustand fix + a2a_proxy local-docker SSRF bypass + workspace-server Dockerfile GID entrypoint Three small, non-overlapping fixes extracted from closed PR #1664: 1. canvas/src/components/ContextMenu.tsx — Replace the useMemo-over-nodes pattern with a hashed-boolean selector (s.nodes.some(...)) so Zustand's useSyncExternalStore snapshot comparison is stable. Resolves React error #185 (infinite render loop). Moves the child-node list derivation into the delete handler via getState() so the render path no longer allocates a fresh array. 2. workspace-server/internal/handlers/a2a_proxy.go — Allow the Docker-bridge hostname path (ws-<id>:8000) to skip the SSRF guard in local-docker mode. Gated on !saasMode() so SaaS deployments keep the full private-IP blocklist (a remote workspace registration can't claim a ws-* hostname and reach a sensitive VPC IP). 3. workspace-server/Dockerfile — Add entrypoint.sh that discovers the docker.sock GID at boot and adds the platform user to that group, then exec's su-exec to drop privileges. Lets the platform container reach the host docker socket without running as root. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 20:00:16 -07:00
molecule-ai[bot]	9d076b9c4d	Merge pull request #1684 from Molecule-AI/fix/missing-keys-modal-a11y-v2 fix(canvas/a11y): MissingKeysModal — backdrop aria-hidden, decorative SVGs, form labels	2026-04-23 02:54:46 +00:00
Molecule AI Core-FE	5157f80d19	fix(canvas): add type=button to ApprovalBanner action buttons (bug #1669 ) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-23 02:15:52 +00:00
Hongming Wang	e08ea7b5ba	fix(canvas): require hermes model at create + send to CP (fixes silent Anthropic 401) Root cause of the hermes 401 "Invalid API key" on SaaS workspaces: 1. CreateWorkspaceDialog never sent `model` in the /workspaces POST 2. Tenant/CP plumbed through a valid (provider, API key) but empty MODEL 3. Workspace install.sh ran with HERMES_DEFAULT_MODEL unset 4. derive-provider.sh saw no slug → PROVIDER="auto" 5. Hermes fell back to its compiled-in default (Anthropic via OpenAI-compat adapter) 6. User's MINIMAX_API_KEY was present but irrelevant — hermes tried Anthropic with it → 401 Fix: - Extend HERMES_PROVIDERS with `defaultModel` + `models` (suggestion list). Each provider ships with a known-good default so the trap is physically impossible to hit with the new form. - Add a required Model input to the Hermes panel, auto-populated from the provider's defaultModel when the provider changes (only if the user hasn't typed their own slug yet). - Datalist surfaces additional model suggestions per provider so users can pick a different size (e.g. M2.7-highspeed) without typing the whole slug. - handleCreate validates hermesModel is non-empty, sends as `model` in the POST body alongside the secrets block. - useEffect guard avoids clobbering a user-typed custom slug when they toggle providers back and forth. Existing 19 a11y tests still pass (non-SaaS path unchanged, four-tier picker still renders, arrow-key nav still wraps). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 18:59:49 -07:00
Hongming Wang	66de81fbfa	Merge pull request #1689 from Molecule-AI/refactor/strip-secret-service-dropdown refactor(secrets): strip Service dropdown from Add-Key form	2026-04-22 18:46:02 -07:00
Hongming Wang	0574e7c1d0	feat(canvas): add T4 tier (full-host access); SaaS default T4 Following feedback that T4 — not T3 — is the full-access tier: - Non-SaaS picker now shows all four tiers: T1 Sandboxed, T2 Standard, T3 Privileged, T4 Full Access. Four-column grid. - SaaS picker stays single-option but now locks to T4 (was T3). Every SaaS workspace gets a dedicated EC2 VM, which is unambiguously the "full host" case — T3 (privileged container) was a category mismatch. - Default tier on SaaS is 4 (was 3). CP provisioner already supports tier 4 (t3.large / 80 GB). TIER_CONFIG already has T4's amber color. Tests updated for the four-tier picker: wrap tests now go T4 ↔ T1, and the selection/tabIndex tests cover the fourth button. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 18:17:13 -07:00
Molecule AI Core-FE	382238daa3	test(canvas): relax setPendingDelete assertion to use expect.objectContaining Staging added hasChildren/children fields to workspace store shape. Test assertion updated to use objectContaining to avoid false negatives. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-23 00:59:38 +00:00
Molecule AI Core-FE	66c6b83ab2	test(canvas): add ActivityTab and MissingKeysModal component tests - ActivityTab.test.tsx: 27 tests covering filter bar (aria-pressed states, API reload), loading/error/empty states, ActivityRow content (type badges, method, duration_ms, summary, error styling), A2A flow indicators, auto-refresh Live/Paused toggle, refresh button, activity count - MissingKeysModal.component.test.tsx: 25 tests covering visibility, ARIA semantics (role=dialog, aria-modal, aria-labelledby), content, keyboard (Escape, Enter), save flow (disabled/.../Saved/error), Add Keys & Deploy gate, Cancel + backdrop click, Open Settings button - MissingKeysModal.test.tsx: refactored to preflight logic only (7 tests); component rendering now covered in component test file 863 tests passing (+3 net). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-23 00:58:56 +00:00
Hongming Wang	8b1af9708c	feat(canvas): default tier T3 and hide T1/T2 on SaaS On SaaS every workspace gets its own EC2 VM — the Docker-sandbox distinction between T1 (sandboxed), T2 (standard Docker), and T3 (full host access) doesn't apply. A SaaS workspace is always a dedicated VM, which is "full access" by construction. Showing T1/T2 in that UI is a category error: users pick a sandbox level that has no effect on the actual EC2 machine they get. Changes: - tenant.ts: export isSaaSTenant() — returns true when canvas is served at <slug>.moleculesai.app (SSR-safe: false on server) - CreateWorkspaceDialog: when isSaaSTenant(), render only the T3 option, default tier=3, grid collapses to a single column. Label gets a " — dedicated VM" hint so the user knows what they're getting. On self-hosted the full T1/T2/T3 picker is unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 17:02:48 -07:00
Hongming Wang	d956164812	refactor(secrets): strip Service dropdown from Add-Key form The Add-Key form used to open with a required Service dropdown (GitHub / Anthropic / OpenRouter / Other) that gated everything else. The dropdown did no persistent work — the secret store only cares about (key_name, value); the Service label was never saved anywhere. It also suffered registry drift: today we support ~22 hermes-dispatched providers (MiniMax, Gemini, DeepSeek, Kimi, Qwen, NVIDIA, etc.); only 3 had entries. Everyone else landed in "Other" with no downside beyond the mandatory click. Replaces it with: 1. Key-name <datalist> autocomplete sourced from new KEY_NAME_SUGGESTIONS in lib/services.ts — 26 entries covering common infra keys + every hermes-supported provider. 2. inferGroup(keyName) derives classification at render time, matching what the store already does in getGrouped(). No behaviour change for list grouping. 3. Provider docs link renders inline only when inferGroup recognises the name. For 'custom' keys we stay quiet — no false-structure prompt. 4. Test-connection button still available when the inferred group supports it AND the value is format-valid. Same providers as before. SERVICES registry preserved for LIST rendering + test routing. Result: two fields instead of three. One fewer decision. Provider- agnostic by design — new providers work the moment someone types their canonical env var name; no UI code change per provider. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 16:41:43 -07:00
Hongming Wang	f6e6a64ba9	fix(canvas): forward-port dynamic runtime dropdown from staging (PR #1526 ) PR #1526 shipped the /templates registry + canvas dynamic Runtime / Model / Required-Env fields on 2026-04-22 — but merged into the staging branch, not main. The staging→main promotion PR #1496 has been open unmerged for a while with 1172 commits divergence, so prod (which builds from main) still carries the old hardcoded dropdown. Symptom seen on hongmingwang.moleculesai.app today: - New Hermes Agent workspace (template declares runtime: hermes) loads Config tab → Runtime dropdown shows "LangGraph (default)" because there's no <option value="hermes"> in the hardcoded list; it falls back to empty-value silently. - Model field is a plain TextInput with static placeholder "e.g. anthropic:claude-sonnet-4-6" — should be a combobox populated from the selected runtime's models[]. - Required Env Vars is a TagList with static placeholder "e.g. CLAUDE_CODE_OAUTH_TOKEN" — should auto-populate from the selected model's required_env. - Net effect: "Save & Deploy" sends empty model + empty env to the provisioner → workspace instant-fails. This PR cherry-picks the exact three files from PR #1526 (#359dc61 on staging) forward to main, without pulling the other 1171 commits: - canvas/src/components/tabs/ConfigTab.tsx - RuntimeOption interface + FALLBACK_RUNTIME_OPTIONS (hermes, gemini-cli included) - useEffect fetches /templates and populates runtimeOptions dynamically - dropdown renders from runtimeOptions (no hardcoded list) - Model becomes a combobox with datalist of available models per selected runtime - Required Env Vars auto-populates from the selected model's required_env on model change - workspace-server/internal/handlers/templates.go - /templates endpoint returns [{id, name, runtime, models}] with per-template models registry (id, name, required_env) - workspace-server/internal/handlers/templates_test.go - Tests for runtime+models parsing and legacy top-level model fallback The canvas Runtime dropdown now resolves "hermes" correctly; Model dropdown shows the models[] from the hermes template; Env auto-populates with HERMES_API_KEY (or whichever model selected). Verified locally: - workspace-server builds clean - Template handler tests pass: TestTemplatesList_RuntimeAndModelsRegistry, TestTemplatesList_LegacyTopLevelModel, TestTemplatesList_NonexistentDir Follow-up: the staging→main promotion gap (#1496) is the underlying process issue. Either merge that PR or adopt a policy of landing fixes directly on main (as several PRs have today). Files here were chosen minimally to avoid pulling unrelated staging changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 14:28:38 -07:00
airenostars	7a89704b6e	fix(build): add missing fmt import + fix canvas Dockerfile GID (#1487 ) * docs(canary-release): flag as aspirational; link to current state The canary-release.md doc describes the pipeline as if the fleet is running — referring to AWS account 004947743811 and a configured MoleculeStagingProvisioner role. Reality as of 2026-04-22: no canary tenants are provisioned, the 3 GH Actions secrets are empty, and canary-verify.yml has failed 7/7 times in a row. Added a top-of-doc ⚠️ state note that: 1. Clarifies this is intended design, not deployed reality. 2. Notes the AWS account ID is historical / unverified. 3. Explains that merges currently rely on manual promote-latest. 4. Cross-links to molecule-controlplane/docs/canary-tenants.md for the Phase 1 work that's shipped, the Phase 2 stand-up plan, and the "should we even do this now?" decision framework. 5. Asks whoever lands Phase 2 to reconcile the two docs. No behaviour change — doc-only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(build): add missing fmt import in a2a_proxy.go, fix canvas Dockerfile GID - a2a_proxy.go: missing "fmt" import caused build failure (8 undefined references at lines 743-775). Likely dropped during a recent merge. - canvas/Dockerfile: GID 1000 already in use in node base image. Changed to dynamic group/user creation with fallback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Hongming Wang <hongmingwang.rabbit@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Hongming Wang <hongmingwangrabbit@gmail.com>	2026-04-22 21:10:58 +00:00
Molecule AI Core-UIUX	116526bff3	fix(canvas/a11y): orgs/page.tsx — form labels, error announcements, checkout banner - CreateOrgForm: replace bare <span> labels with <label htmlFor> + input id (WCAG 1.3.1 — programmatic label association); add aria-describedby hint for slug field - Error state: add role=alert on error <p> (WCAG 4.1.3 — Status Messages) - CheckoutBanner: add role=status + aria-live=polite (WCAG 4.1.3); restore decorative ✓ with aria-hidden=true Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-22 21:06:20 +00:00
Molecule AI Core-UIUX	d6dbf23172	test(canvas/a11y): add WCAG 2.1 accessibility tests for ConsoleModal and DeleteCascadeConfirmDialog ConsoleModal: role=dialog, aria-modal, aria-labelledby, backdrop aria-hidden, error role=alert, accessible button names DeleteCascadeConfirmDialog: role=dialog, aria-modal, aria-labelledby, backdrop aria-hidden, SVG aria-hidden, disabled state, keyboard interactions (Escape, Enter), accessible names Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-22 20:39:48 +00:00
Molecule AI Core-UIUX	8bb0fe70ff	fix(canvas/a11y): DeleteCascadeConfirmDialog backdrop aria-hidden (WCAG 4.1.2) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-22 20:36:05 +00:00

1 2 3 4 5

245 Commits