Importing a 15-workspace org template dropped every child as a
freely-positioned card into its parent's coordinate space. Parents
with 5-10 kids had the kids spill below the parent's initial min
size, producing the "ugly default" layout the user just flagged —
a mess of overlapping cards the moment the import completed.
Fix: every workspace in an org-template import that HAS children
is inserted with `collapsed = true`. Leaf workspaces stay
expanded (nothing to hide). The canvas renders a collapsed
parent as a compact header-only card with its "N sub" badge —
visually identical to the pre-refactor default the user asked for.
Double-click on a collapsed parent now EXPANDS it (flipping
`collapsed` locally + persisting via PATCH) so the user can drill
in to see the subtree. Only once expanded does a second
double-click zoom-to-team, matching the prior behaviour.
Leaf-first creation order stays the same; the collapsed flag
just means "render compact" not "hide from API".
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five review findings from the 3f11df03 six-bug commit:
1. Add TestPeers_DevModeFailOpen_{Allows,ClosedWhenAdminTokenSet,
ClosedInProduction} covering all three gating states for the
security-sensitive dev-mode hatch the prior commit added to
/registry/:id/peers. Previously shipped untested — a future
refactor could have silently inverted polarity or removed the
gate. New tests pin the contract:
* MOLECULE_ENV=development + ADMIN_TOKEN="" → allow bearerless
* MOLECULE_ENV=development + ADMIN_TOKEN set → require token
* MOLECULE_ENV=production → require token
2. ConfigTab handleSave diffs against the RAW parsed YAML / form
config instead of the DEFAULT_CONFIG-merged shape. The previous
code would silently PATCH tier=1 to the DB when a user deleted
the `tier:` line in raw mode (the default-merge substituted 1).
Now: only fields the user actually typed participate in the
diff. Type guards (typeof === "number" / "string") prevent
coercion surprises on malformed YAML.
3. ConfigTab model-save failure no longer lies "Saved". The
/workspaces/:id/model PATCH can reject when the runtime doesn't
support the chosen model; previously we caught + console.warn'd
+ showed green Saved, and the user watched the model revert on
next reload with no explanation. Now the save path collects a
`modelSaveError` and surfaces it via setError with a partial-
success message ("Other fields saved, but model update failed:
…") so the user sees why.
4. ChannelsTab now surfaces BOTH channels-fetch and adapters-fetch
failures, distinguishing them in the error text ("Failed to
load connected channels and platforms — try refreshing").
Previously only an adapters failure was visible; a channels
failure left the user with an apparently-empty list and no
indication the API was unreachable.
5. ChatTab panels drop the redundant aria-hidden attribute. The
`hidden`/`flex` Tailwind class already sets display:none, which
removes the node from the accessibility tree on its own; the
extra aria-hidden invited WAI-ARIA lint warnings if a focusable
descendant ever landed inside an inactive panel.
Tests: 923 canvas + full Go handler suite pass. 3 new Go tests.
No behaviour change on the five prior fixes — this commit tightens
their edges per the independent review.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Six bugs reported from a live session — all shippable in one commit:
1. Peers tab 401 on local Docker. The /registry/:id/peers endpoint
demands a workspace-scoped bearer token (validateDiscoveryCaller)
which the canvas session doesn't hold. Added the same Tier-1b
dev-mode fail-open hatch that AdminAuth and WorkspaceAuth already
use — gated by MOLECULE_ENV=development + empty ADMIN_TOKEN, so
SaaS production stays strict. Exported IsDevModeFailOpen from the
middleware package for the handler layer to reuse.
2. Org Templates list unscrollable. OrgTemplatesSection was rendered
in the TemplatePalette footer — a div without overflow — so when
it expanded to 15+ entries the list extended past the viewport
with no scroll. Moved it to the top of the flex-1 overflow-y-auto
container. Tall lists now scroll naturally.
3. Chat tab: "My Chat" and "Agent Comms" rendered stacked instead
of switching. HTML `hidden` attribute was being overridden by
Tailwind's `flex` class (display: flex beats the attribute),
so both tabpanels rendered concurrently. Swapped to a conditional
Tailwind `hidden`/`flex` class so the inactive panel is
display:none with proper CSS specificity.
4. Hermes Config form never persists. handleSave wrote config.yaml
but name / tier / runtime / model all live on the workspace row
(or the dedicated /workspaces/:id/model endpoint) — the form
edited in-memory, the request returned 200, the next reload
wiped everything back. Hermes + external runtimes manage their
own config inside the container anyway, so writing config.yaml
is a no-op for them; skip it. Always diff and PATCH the DB-backed
fields that actually changed.
5. Channels "+ Connect" dropdown empty on first open. ChannelsTab's
load() used Promise.all with a silent catch — if EITHER the
channels or adapters fetch failed, both setters were skipped
with no error visible. Switched to Promise.allSettled so each
endpoint settles independently, and the adapters failure now
surfaces via the top-level error state.
6. Plugin registry always "No plugins in registry". Same silent
catch pattern in SkillsTab.tsx — load errors for /plugins,
/plugins/sources, and /workspaces/:id/plugins swallowed without
logging. Replaced the empty catches with console.warn so future
failures are at least visible in devtools.
Tests: 923 passing (unchanged). Go handler tests pass. Server
rebuilt and running with the peers-auth + collapsed-persistence
fixes (pid 15875).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three follow-up review findings from the c2b2e13a review:
1. Rescue heuristic uses pure bbox-non-overlap. The previous
`position.x < 0` branch rescued any child whose parent was
later dragged past it, even when the layout was clearly
recoverable (e.g. relative -40, child still overlaps parent).
New rule: rescue iff the child's bbox has zero overlap with
the parent's bbox — self-calibrating, scales with user-resized
parents, catches screenshot-case and legacy huge-positive data.
2. Toast caps failed-name list at 3 and appends "and N more".
Stops a 50-node partial failure from overflowing the toast
container.
3. Cycle guard on selection-roots walk in batchNest. Corrupt
parentId data can't send the loop infinite now. Cheap
defensive guard — one Set per selected node.
Tests added (923 total, up from 918):
* canvas-topology.test: 4 rescue scenarios — screenshot case
(zero-overlap rescue), negative drift kept, huge-positive
rescued, user-resized layout kept.
* canvas.test: selection-roots filter on a 3-level chain.
* workspace_crud test: PATCH {collapsed:true} runs the UPDATE.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five issues surfaced in the review of 50b53784. Each was either a real
bug waiting to hit users or a silent failure mode.
1. Topology rescue no longer teleports user-resized children.
Rescue was comparing against parentMinSize(childCount), so any
child the user had placed in space the parent was resized into
got snapped to the default grid on reload — undoing the layout.
Now rescue fires only on obviously corrupt data: negative
relative coords (legacy pre-nesting absolute positions that
landed above/left of their assigned parent) or values past an
MAX_PLAUSIBLE_OFFSET threshold. Children just-past the initial
minimum are left alone.
2. batchNest now filters to selection-roots before planning.
Previously selecting both A and A's descendant B and dragging
into T yanked B out of A to become a sibling under T. Users
reasonably expect the A subtree to move intact. The new pass
drops any selected node whose ancestor is also selected —
those follow their ancestor via React Flow's parent binding.
3. batchNest surfaces partial failure via showToast. Previously
silent: 2 of 5 PATCHes fail, user sees 3 cards re-parented + 2
snapped back with no explanation. Now names the failed cards.
4. confirmNest closes the nest dialog BEFORE dispatching the async
store action, so a second drag can't kick off a competing batch
while the first is still in flight.
5. collapsed is now persisted. The Go workspace_crud.go Update
handler ignored the `collapsed` field, so user-initiated
collapse round-tripped to an expanded state on next hydrate.
Added the PATCH branch (`UPDATE workspaces SET collapsed = ...`)
so the state survives reload.
Nits cleaned:
* Removed dead dragStartParentRef in useDragHandlers.
* Swapped redundant `node.data as WorkspaceNodeData` casts for a
named WorkspaceNode type alias.
* Canvas.tsx SR-live region now reads n.parentId (matches
MiniMap + RF's native field) instead of the mirror n.data.parentId.
Tests added (918 total, up from 915):
* batchNest happy path — 2-root selection fires 2 combined PATCHes
carrying parent_id + x + y, not 2×N sequential round-trips.
* batchNest ancestor+descendant selection — subtree stays intact.
* batchNest partial failure rollback — only the rejected nodes
revert; successful ones stay committed.
Backend change is single-line (collapsed PATCH branch); all
workspace_crud Go tests still pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two concerns in one commit (separate files, each self-contained):
## Canvas.tsx split (from ~680 to ~250 lines)
Canvas.tsx was holding drag gesture state + keyboard shortcuts +
viewport wiring + JSX. Each concern now lives in its own unit under
canvas/src/components/canvas/:
- dragUtils.ts — pure: shouldDetach, clampChildIntoParent,
DETACH_FRACTION
- DropTargetBadge.tsx — the floating "Drop into: <name>" label + the
dashed ghost preview at the target slot
- useDragHandlers.ts — encapsulates onNodeDragStart / Drag / Stop,
findDropTarget hit-test, pendingNest state,
and confirmNest/cancelNest. Routes multi-
select drags through batchNest automatically.
- useKeyboardShortcuts — Esc, Enter, Shift+Enter, Cmd+]/[, Z — one
window listener, one source of truth.
- useCanvasViewport — pan-to-node + zoom-to-team CustomEvent
listeners and the debounced viewport save.
Canvas.tsx becomes a thin composition + JSX file. No behavioural
change; the refactor is covered by the existing 915 canvas tests.
## batchNest parallelization (2N round-trips → N, all in flight)
Previously nestNode fired two sequential PATCHes (parent_id then x/y)
and batchNest looped nestNode sequentially. For a 5-node selection on
a typical ~200ms link this was ~2s of serialized RPCs.
- nestNode now combines parent_id + x + y into ONE PATCH. The Go
handler (workspace_crud.go Update) already reads all three from the
same body — no backend change.
- batchNest rewritten: compute every re-parent plan against one
snapshot, commit a single set(), then fire N PATCHes via
Promise.allSettled in parallel. Per-node failures roll back only
that node (others stay committed) — same semantics as the single-
node path, just concurrent.
- The state math in the batch path also correctly shifts descendant
zIndex by depthDelta when any re-parented node has a subtree.
## Also
- canvas-topology.ts: reverted P3.12's opt-in rescue to the auto-
rescue default. When a child's stored relative position would render
it outside the parent bbox (the visual regression the user saw after
collapse → reload — Hermes child drawn outside Claude Code Agent on
first paint), the child is placed in the next default grid slot.
The "Arrange Children" context command stays for bigger teams.
All 915 canvas tests pass. No backend changes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five Critical issues caught in code review of f3423a51. Each one broke
an invariant the original commit claimed to uphold.
1. nestNode: descendants kept their old-depth zIndex after a re-parent.
Now walks the dragged subtree and shifts every descendant's zIndex
by the same depthDelta so "children above ancestors" survives moves
between levels of the hierarchy.
2. bumpZOrder: siblings all share zIndex = depth in fresh topology, so
a single +1 bump was identical for every sibling and subsequent
bumps drifted zIndex unboundedly. Rewritten to sort siblings by
current zIndex and swap the target with its neighbour in the bump
direction — Figma-style reorder, stays within the sibling tier.
3. findDropTarget: depth-first tiebreaker lost to bumped siblings. The
visually-frontmost card after Cmd+] is a shallow sibling, but the
hit test picked the deepest nested card regardless. Swapped order
so zIndex wins first, depth second, area third. Also pre-computes
the depth map once per call (was O(n²) via repeated .find walks —
will matter past ~30 workspaces).
4. arrangeChildren: saved absolute position using `slot + parent.position`,
but parent.position is RELATIVE to its own parent when nested.
Grandchildren's stored x/y were in the parent's local frame and
reload placed them in the wrong spot. Now walks the full ancestor
chain via absOf() to get the true canvas-absolute origin before
PATCHing.
5. setCollapsed: naive flip of every descendant's hidden flag diverged
from the topology rebuild on hydrate. Collapse A, collapse B, then
expand A — C should stay hidden because B is still collapsed, but
before this fix C was unhidden. Rewritten to recompute every
descendant's hidden from the full ancestry chain, matching the
topology pass byte-for-byte. New round-trip test asserts the two
code paths produce identical node.hidden across a full lifecycle.
Also:
- Removed dead cascadeMessage constant (never rendered).
- Replaced hardcoded 260/120 in zoom-to-team with exported constants.
- arrangeChildren PATCH catch now logs instead of silently swallowing.
- Added 70→76 tests: setCollapsed 3-chain scenarios, bumpZOrder swap
semantics, edge-of-list no-op.
All 915 canvas tests green. Backend untouched.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Ships the full prioritized improvement list from the canvas research
report — aligns our nesting/resize UX with Miro / FigJam / tldraw / Figma
conventions. Organized by priority below.
## P1 — baseline playability
* Hysteresis on drag-out detach (Miro): a child only un-nests when >=20%
of its bbox is outside the parent on release. Prevents accidental
un-nesting from twitchy drags.
* Drop-target now uses tree-depth DESC, then zIndex DESC, then area ASC
to pick targets when nested parents overlap (xyflow #2827).
* Children render above ancestors by inheriting zIndex = parent + 1 in
topology and on every nest/unnest (xyflow #4012).
* Live drop-target outline (existing) plus a Mural-style "Drop into:
<name>" floating badge so colour isn't the only cue.
* growParentsToFitChildren now fires only on dimension-type changes
inside onNodesChange (NodeResizer commits) and once on drag-stop —
avoids tldraw's edge-chase artifact (P3.11 commit-on-release).
## P2 — polish
* Whimsical-style ghost preview: dashed outline at the next default
grid slot inside the drop-target parent during drag.
* Alt-drag escape with soft clamp: dropping slightly outside a parent
without Alt/Cmd snaps the child back inside (clampChildIntoParent);
Alt releases the clamp to allow un-nest; Cmd/Ctrl force-detaches.
* Figma-style keyboard hierarchy nav: Enter descends to first child,
Shift+Enter ascends to parent, Cmd+]/[ re-orders siblings via the
new bumpZOrder store action.
* Multi-select re-parent preserves offsets: confirmNest routes through
a new batchNest action when the primary drag is part of a batch
selection (Lucidchart pattern).
## P3 — long-tail
* Minimap now shows parent cards as filled regions with a blue stroke,
so hierarchy reads at a glance without zooming.
* Out-of-bounds rescue is opt-in: topology no longer silently re-lays
children whose stored position is outside the parent bbox (Figma
trust-the-data). The new Arrange Children context menu item runs the
rescue on demand via arrangeChildren.
* Cmd-drag force-detach regardless of hysteresis.
* Collapse workspace: the existing Collapse Team action now toggles a
local setCollapsed store action that hides every descendant and
shrinks the parent card to header-only (Miro frame outline view).
Growth pass skips collapsed parents so they don't push back out.
All 910 canvas tests green. Backend untouched.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two playability bugs in the new flat-cards layout:
1. On first load or fresh org import a parent had no explicit width or
height, so children whose stored position sat inside their (eventual)
parent's rectangle rendered visually outside the smaller default
parent box. Compute a parent starting size in canvas-topology:
• 2-column grid of child-default footprints + header/side padding
• Grows per child count (2→1 row, 3-4→2 rows, etc.)
and stamp it onto the Node's width/height so the first paint already
contains every child.
2. If a child's stored relative position actually falls outside the
parent's computed bounds (legacy org-imports at 0,0, pre-refactor
absolute coordinates, manually-nudged rows), assign that child a
deterministic default grid slot inside the parent instead.
Runtime cascade: added growParentsToFitChildren to onNodesChange so when
the user drags or resizes a child past the parent's current bounds, the
parent grows to contain it (+padding). Miro/FigJam-style frame auto-fit
— grow-only, never shrinks under the user's manual resize.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Every workspace now renders as a first-class card on the canvas
regardless of parent_id. The old "parent card contains mini TeamMember
chips" layout is gone — if B is parented to A, B renders as a full card
inside A's coordinate space using React Flow's `parentId` binding, so
moving A carries B along and children have the same detail + actions as
root cards.
Details:
- canvas-topology.ts: topologically sort parents before children
(React Flow ordering requirement), compute each child's RF-native
parentId + relative position on load. DB keeps absolute x/y; the
abs→rel conversion happens here, reverse translation in
Canvas.onNodeDragStop before savePosition PATCHes the DB.
- WorkspaceNode.tsx: delete the EmbeddedTeam + TeamMemberChip blocks,
simplify the size classes, and add NodeResizer (visible when selected)
so users can drag any edge/corner to grow or shrink. Parent cards
default to a larger min size so nested children have breathing room.
- Canvas.tsx drop targeting rewritten: bounds-based hit test against
each node's measured absolute bbox, deepest match wins. Fixes two
prior bugs at once — dropping onto Claude Code with a nested same-
named Hermes no longer picks the wrong node, and the target can now
be a nested workspace when that's where the pointer actually released.
- canvas.ts nestNode + removeNode: translate position between old and
new parent's absolute origin on nest/unnest so the card doesn't jump,
and re-point the RF `parentId` alongside `data.parentId` on reparent.
- Tests: hidden-flag assertions replaced with parentId checks; obsolete
TeamMemberChip a11y/eject tests deleted (the UI component no longer
exists).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Dragging one workspace onto another could pick a nested child as the
"nearest" drop target instead of the visible parent card the user
actually hovered. The effect: dropping a free-floating Hermes Agent
onto a Claude Code Agent that already had a Hermes Agent nested inside
showed "Move 'Hermes Agent' inside 'Hermes Agent'?" — the confirmation
referenced the nested same-named child, not Claude Code.
Why: getIntersectingNodes returns every overlapping node, including
hidden=true children that render inside their parent's card. The
parent and child share bounding boxes, so the child often "won" the
nearest-distance check. Filter them out at the source: a node that's
already got a parentId (or is hidden) is never a valid top-level drop
target.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two latent bugs kept the "Processing with Claude Code..." timer ticking
after the agent had already answered:
1. The A2A_RESPONSE store handler wrote into agentMessages[workspaceId]
(no prefix) but ChatTab's "clear sending" effect subscribed to
agentMessages["a2a:" + workspaceId]. Keys never matched — the effect
was dead code from day one. Removed the dead subscription and moved
the setSending(false) into the pendingAgentMsgs effect so any reply
delivered via a WS push (Claude Code SDK, Hermes's
send_message_to_user) also closes the spinner.
2. Added an activity-log fallback: when the platform emits a successful
a2a_receive ACTIVITY_LOGGED for this workspace, clear sending and
stop the timer. That covers the "runtime answered but we never saw
the store message" case Claude Code exhibited tonight — the HTTP
request can stay in flight while the SDK already pushed its reply.
Symmetric a2a_receive error path also clears sending and surfaces the
error message, so a runtime-side failure no longer hangs the UI.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The side-panel runtime pill read "unknown" for newly-deployed workspaces
because canvas-events.ts created the node from WORKSPACE_PROVISIONING
payload — and the payload only carried name + tier. No refetch filled
the gap during provisioning, so the user saw "RUNTIME unknown" on the
card even though the DB row had the real runtime set.
Includes runtime in every WORKSPACE_PROVISIONING emitter:
* handlers/workspace.go — initial create
* handlers/workspace_restart.go — explicit restart, auto-restart, and
crash-recovery resume loop
* handlers/org_import.go — multi-workspace org imports
Canvas-side: canvas-events.ts reads payload.runtime when creating the
node; the provisioning test asserts the pill value is populated before
any refetch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The MissingKeysModal's provider list was hardcoded in deploy-preflight.ts
as RUNTIME_PROVIDERS — a per-runtime map that duplicated what each
template repo already declares in its config.yaml. That meant adding a
new provider required changes in two places, and the UI could drift out
of sync with the actual template (e.g. when a template adds a MiniMax or
Kimi model, the picker wouldn't know).
The single source of truth for "which env vars does this workspace need"
is each template's config.yaml:
* `runtime_config.models[].required_env` — per-model key list
* `runtime_config.required_env` — runtime-level AND list
Go /templates already returned `models`. This change:
* Adds `required_env` alongside `models` on templateSummary so the
canvas receives the full picture.
* Rewrites deploy-preflight.ts to derive ProviderChoice[] from a
template object via `providersFromTemplate(template)`:
- groups `models[]` by unique required_env tuple
- falls back to runtime_config.required_env when models is empty
- decorates labels with model counts (e.g. "OpenRouter (14 models)")
* `checkDeploySecrets(template, workspaceId?)` now takes a template
object instead of a runtime string. Any-provider satisfaction still
short-circuits preflight to ok=true.
* MissingKeysModal receives `providers` directly; no more lookups.
* TemplatePalette threads `template.models` + `template.required_env`
into the preflight.
Side effects:
* Claude Code's dual-auth (OAuth token OR Anthropic API key) now
surfaces as two picker options — its config.yaml already declared
both, the UI just wasn't reading them.
* Hermes picker now shows 8 provider options (Nous, OpenRouter,
Anthropic, Gemini, DeepSeek, GLM, Kimi, Kilocode) instead of the
hand-picked 3, matching its 35-model reality.
Removed the legacy RUNTIME_PROVIDERS / RUNTIME_REQUIRED_KEYS /
getRequiredKeys / findMissingKeys exports; MissingKeysModal.test.tsx
deleted (its coverage is subsumed by the new template-driven
deploy-preflight.test.ts). 58 modal-adjacent tests pass; full canvas
suite 919 pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Runtimes like Hermes and LangGraph accept any one of several LLM
provider keys (OpenRouter OR OpenAI OR Anthropic OR Nous-native).
Before this change, the missing-keys modal treated all supported
providers as simultaneously required — a fresh user on Hermes was
asked for three parallel API keys when any one suffices.
Introduces RUNTIME_PROVIDERS in deploy-preflight.ts as the canonical
per-runtime provider list (label, envVar, note). checkDeploySecrets
now returns all alternatives as missingKeys when nothing is
configured, so the modal can offer a picker.
MissingKeysModal dispatches between two render paths:
* ProviderPickerModal — radio list of supported providers, a single
env input for the chosen one. Saving that one key satisfies the
preflight. Activated whenever the runtime has ≥2 provider choices.
* AllKeysModal — legacy parallel-inputs UX, all keys must be saved
before deploy. Kept for single-provider runtimes (claude-code,
gemini-cli) and callers that pass unrelated-key lists.
Dual-mode preserves the pre-existing contract for every caller while
fixing the multi-provider UX. All 930 canvas vitest tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The TemplatePalette's Org Templates section rendered all cards
inline, each ~120 px tall (name + description + "Import org" button).
With 4 org templates on disk that's ~500 px of drawer height — the
individual workspace templates at the top (AutoGen / LangGraph /
Hermes / …) got pushed off-screen, which is the exact complaint from
the test session ("templates still 90% org, cant even see normal
workspace template").
Collapsed the Org Templates section by default. The header now
toggles with an ▶ caret and shows the count ("Org Templates (4)").
Clicking expands to reveal the full card list; clicking again
collapses. Persists only within a session — fresh mounts start
collapsed so the primary deploy path stays visible.
Individual workspace templates are the usual starting point (pick a
runtime, deploy one agent), while org templates are a heavier
"deploy this whole pre-built team" action. Making the second
expandable matches the relative frequency.
- `TemplatePalette.tsx::OrgTemplatesSection` — added `expanded`
state (default false), wrapped the cards in `{expanded && …}`,
turned the header into a toggle button with `aria-expanded` +
`aria-controls`.
- `__tests__/OrgTemplatesSection.test.tsx` — 3 new rendering tests:
collapsed-by-default (cards absent), click expands (cards appear),
click again collapses (cards gone). Mocks /org/templates with a
2-entry response so the count assertion is stable.
Full canvas vitest: 930/930 pass (up from 927).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
### Two unrelated but small UI fixes surfaced while testing the Canvas
**1. Legend hidden under the open TemplatePalette.**
Legend is `fixed bottom-6 left-4 z-30`. TemplatePalette's drawer (when
open) is `fixed top-0 left-0 w-[280px] z-30` — same z-index, same
left-edge column. The Legend overlapped the palette's bottom 180 px.
Published the palette-open state to the canvas store so the Legend
can shift right (to `left-[296px]` — 280 px palette + 16 px gap) while
the palette is open, animated via a 200 ms `transition-[left]` to
match the palette's slide. Closes cleanly back to `left-4` when the
palette is dismissed.
Files:
- `store/canvas.ts` — added `templatePaletteOpen` + `setTemplatePaletteOpen`.
- `TemplatePalette.tsx` — calls `setTemplatePaletteOpen(open)` on
every open/close transition via a new useEffect.
- `Legend.tsx` — reads the flag and swaps `left-4` <-> `left-[296px]`.
**2. "WebSocket is closed before the connection is established" spam.**
Two components (`ChatTab`, `AgentCommsPanel`) open their own short-
lived WebSocket to tail the ACTIVITY_LOGGED stream. Their cleanup
path called `ws.close()` unconditionally, which trips a browser
console warning when React StrictMode re-runs the effect in dev and
the handshake hasn't completed yet. Confirmed via DevTools console
on the running canvas.
Added a `closeWebSocketGracefully(ws)` helper in `lib/ws-close.ts`:
- OPEN / CLOSING → close immediately (normal path).
- CONNECTING → defer close to the 'open' listener so the
browser sees a full handshake. Also wires an
'error' listener that cancels the queued close
if the handshake fails (no double-close).
- CLOSED → no-op.
Both consumers now call the helper in their useEffect cleanup.
Silences the warning without changing observable behaviour.
### Tests
`canvas/src/lib/__tests__/ws-close.test.ts` — 5 cases with a fake
WebSocket covering each readyState branch plus the error-before-open
cancellation path. Full vitest suite: 927/927 pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a workspace is selected the SidePanel (fixed, right-0, z-50)
opens from the right edge and covers the right third of the
viewport. The Toolbar at the top was positioned
`fixed top-3 left-1/2 -translate-x-1/2 z-20` — centred on the full
viewport, not the remaining canvas area. Consequence: the right half
of the Toolbar (Audit / Search / Help / Settings) was hidden behind
the panel as soon as the user clicked any workspace.
Fix: publish the live SidePanel width to the canvas store and read
it in Toolbar. When a node is selected, shift the Toolbar LEFT by
`sidePanelWidth / 2` so its centre lines up with the middle of the
remaining canvas area. Animated via a 200 ms `transition-[margin-left]`
to match the SidePanel's own slide-in easing.
- `store/canvas.ts` — added `sidePanelWidth` + `setSidePanelWidth`.
Default 480 (matches SIDEPANEL_DEFAULT_WIDTH).
- `SidePanel.tsx` — calls `setSidePanelWidth(width)` on every width
change so the store stays in sync with localStorage.
- `Toolbar.tsx` — reads `sidePanelWidth`, applies a negative
`marginLeft` style when `selectedNodeId` is non-null.
- `SidePanel.tabs.test.tsx` — added `setSidePanelWidth: vi.fn()` to
the mocked store state so SidePanel's new useEffect has a callable
to invoke. 18 previously-passing tests now pass again.
No visual regression when no workspace is selected — the toolbar
stays in its original centred position. SaaS canvas unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CreateWorkspaceDialog.a11y.test.tsx's two tier-button tests assumed
T1 was the default selection. After the previous commit flipped the
non-SaaS default to T3, the radio group's default-selected button
changed accordingly.
Updated:
- "tier buttons have role=radio and aria-checked reflects selection"
— T3 is now `aria-checked="true"`, T1 is the "unselected" foil we
click to verify the flip.
- "selected radio has tabIndex=0, others have tabIndex=-1" — T3 is
the tabindex=0 member now.
The roving-tabIndex and ArrowDown / ArrowRight tests further down the
file start by explicitly clicking/focusing T1 or T2, so they're
unaffected by the default change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Default tier for a newly-created workspace was T1 (Sandboxed) on
self-hosted and T4 (Full Access) on SaaS. Real work needs at minimum
a read_write workspace mount + Docker daemon access — that's T3
("Privileged") per the tier ladder in CreateWorkspaceDialog. The
user-visible consequence was that clicking "Deploy" on almost any
template landed in a sandbox that couldn't actually run the agent's
tooling until the user knew to bump the tier manually.
### Changes
**Platform (Go)** — default tier flipped from 1→3 in two places so
API callers (Canvas, molecli, org import) all get the same default:
- `handlers/workspace.go`: `POST /workspaces` default when `tier` is
omitted from the request body.
- `handlers/template_import.go`: `generateDefaultConfig` writes
`tier: 3` into the auto-generated `config.yaml` for bundle imports
that don't declare one.
**Canvas** — `CreateWorkspaceDialog.tsx` self-hosted form default
flipped from T1→T3. SaaS stays at T4 (each SaaS workspace runs on
its own sibling EC2, so the shared-blast-radius reasoning doesn't
apply and we can safely go a tier higher).
### Tests
Updated every sqlmock assertion that anchored on the old `tier=1`
default:
- `handlers_test.go::TestWorkspaceCreate` — default-path INSERT now
expects `3`.
- `handlers_additional_test.go::TestWorkspaceCreate_WithParentID` —
same.
- `workspace_test.go::TestWorkspaceCreate_DBInsertError` /
`TestWorkspaceCreate_WithSecrets_Persists` — same.
- `workspace_test.go::TestWorkspaceCreate_TemplateDefaults*` — same
(current handler semantics ignore the template's `tier:` field and
fall through to the default; kept tests faithful to the
implementation, left a comment flagging the latent inconsistency).
- `workspace_budget_test.go::TestWorkspaceBudget_Create_WithLimit` —
same.
- `template_import_test.go::TestGenerateDefaultConfig` — asserts
`tier: 3` now.
All `go test -race ./internal/handlers/` pass.
Canvas `CreateWorkspaceDialog` tests don't assert the default tier
(they only reference `tier` as prop data on stub workspaces) so no
test update needed on that side.
### SaaS parity
Zero behaviour change on hosted SaaS. The Go-side default only fires
when the Canvas (or any caller) omits `tier` from the request body.
The SaaS Canvas explicitly passes `tier: 4` from the
CreateWorkspaceDialog `isSaaS ? 4 : 3` branch, so the Go default
never runs on a SaaS request.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Canvas Config tab had 3 bugs visible on hermes workspaces (#1894):
1. Runtime dropdown showed "LangGraph (default)" even when the workspace's
actual runtime was hermes — because the form only loaded runtime from
config.yaml, and hermes doesn't use the platform's config.yaml template.
2. Model field was empty for the same reason.
3. "No config.yaml found" error appeared on hermes workspaces despite
everything being fine — hermes manages its own config at
~/.hermes/config.yaml on the workspace host.
Worse, clicking Save with the empty form would silently flip `runtime`
back from `hermes` to `LangGraph (default)`.
## Fix
- loadConfig now always fetches workspace metadata (runtime + model)
via GET /workspaces/:id and GET /workspaces/:id/model BEFORE attempting
the config.yaml fetch. These act as the source of truth for runtime
and model when config.yaml doesn't set them.
- RUNTIMES_WITH_OWN_CONFIG set lists runtimes that manage their own
config outside the platform template (hermes, external). For these:
- Missing config.yaml is NOT an error — no red banner shown.
- An informational gray banner tells the user where to edit the
runtime's config (e.g. "edit ~/.hermes/config.yaml via Terminal tab
or the hermes CLI" for hermes).
Closes#1894.
Verified 2026-04-23 on user's hongmingwang tenant which runs hermes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five additional breakages surfaced while testing the restored stack
end-to-end (spin up Hermes template → click node → open side panel →
configure secrets → send chat). Each fix is narrowly scoped and has
matching unit or e2e tests so they don't regress.
### 1. SSRF defence blocked loopback A2A on self-hosted Docker
handlers/ssrf.go was rejecting `http://127.0.0.1:<port>` workspace
URLs as loopback, so POST /workspaces/:id/a2a returned 502 on every
Canvas chat send in local-dev. The provisioner on self-hosted Docker
publishes each container's A2A port on 127.0.0.1:<ephemeral> — that's
the only reachable address for the platform-on-host path.
Added `devModeAllowsLoopback()` — allows loopback only when
MOLECULE_ENV ∈ {development, dev}. SaaS (MOLECULE_ENV=production)
continues to block loopback; every other blocked range (metadata
169.254/16, TEST-NET, CGNAT, link-local) stays blocked in dev mode.
Tests: 5 new tests in ssrf_test.go covering dev-mode loopback,
dev-mode short-alias ("dev"), production still blocks loopback,
dev-mode still blocks every other range, and a 9-case table test of
the predicate with case/whitespace/typo variants.
### 2. canvas/src/lib/api.ts: 401 → login redirect broke localhost
Every 401 called `redirectToLogin()` which navigates to
`/cp/auth/login`. That route exists only on SaaS (mounted by the
cp_proxy when CP_UPSTREAM_URL is set). On localhost it 404s — users
landed on a blank "404 page not found" instead of seeing the actual
error they should fix.
Gated the redirect on the SaaS-tenant slug check: on
<slug>.moleculesai.app, redirect unchanged; on any non-SaaS host
(localhost, LAN IP, reserved subdomains like app.moleculesai.app),
throw a real error so the calling component can render a retry
affordance.
Tests: 4 new vitest cases in a dedicated api-401.test.ts (needs
jsdom for window.location.hostname) — SaaS redirects, localhost
throws, LAN hostname throws, reserved apex throws.
### 3. SecretsSection rendered a hardcoded key list
config/secrets-section.tsx shipped a fixed COMMON_KEYS list
(Anthropic / OpenAI / Google / SERP / Model Override) regardless of
what the workspace's template actually needed. A Hermes workspace
declaring MINIMAX_API_KEY in required_env got five irrelevant slots
and nothing for the key it actually needed.
Made the slot list template-driven via a new `requiredEnv?: string[]`
prop passed down from ConfigTab. Added `KNOWN_LABELS` for well-known
names and `humanizeKeyName` to turn arbitrary SCREAMING_SNAKE_CASE
into a readable label (e.g. MINIMAX_API_KEY → "Minimax API Key").
Acronyms (API, URL, ID, SDK, MCP, LLM, AI) stay uppercase. Legacy
fallback preserved when required_env is empty.
Tests: 8 new vitest cases covering known-label lookup, humanise
fallback, acronym preservation, deduplication, and both fallback
paths.
### 4. Confusing placeholder in Required Env Vars field
The TagList in ConfigTab labelled "Required Env Vars (from template)"
is a DECLARATION field — stores variable names. The placeholder
"e.g. CLAUDE_CODE_OAUTH_TOKEN" suggested that, but users naturally
typed the value of their API key into the field instead. The actual
values go in the Secrets section further down the tab.
Relabelled to "Required Env Var Names (from template)", changed the
placeholder to "variable NAME (e.g. ANTHROPIC_API_KEY) — not the
value", and added a one-line helper below pointing to Secrets.
### 5. Agent chat replies rendered 2-3 times
Three delivery paths can fire for a single agent reply — HTTP
response to POST /a2a, A2A_RESPONSE WS event, and a
send_message_to_user WS push. Paths 2↔3 were already guarded by
`sendingFromAPIRef`; path 1 had no guard. Hermes emits both the
reply body AND a send_message_to_user with the same text, which
manifested as duplicate bubbles with identical timestamps.
Added `appendMessageDeduped(prev, msg, windowMs = 3000)` in
chat/types.ts — dedupes on (role, content) within a 3s window.
Threaded into all three setMessages call sites. The window is short
enough that legitimate repeat messages ("hi", "hi") from a real
user/agent a few seconds apart still render.
Tests: 8 new vitest cases covering empty history, different content,
duplicate within window, different roles, window elapsed, stale
match, malformed timestamps, and custom window.
### 6. New end-to-end regression test
tests/e2e/test_dev_mode.sh — 7 HTTP assertions that run against a
live platform with MOLECULE_ENV=development and catch regressions
on all the dev-mode escape hatches in a single pass: AdminAuth
(empty DB + after-token), WorkspaceAuth (/activity, /delegations),
AdminAuth on /approvals/pending, and the populated
/org/templates response. Shellcheck-clean.
### Test sweep
- `go test -race ./internal/handlers/ ./internal/middleware/
./internal/provisioner/` — all pass
- `npx vitest run` in canvas — 922/922 pass (up from 902)
- `shellcheck --severity=warning infra/scripts/setup.sh
tests/e2e/test_dev_mode.sh` — clean
- `bash tests/e2e/test_dev_mode.sh` — 7/7 pass against a live
platform + populated template registry
### SaaS parity
Every relaxation remains conditional on MOLECULE_ENV=development.
Production tenants run MOLECULE_ENV=production (enforced by the
secrets-encryption strict-init path) and always set ADMIN_TOKEN, so
none of these code paths fire on hosted SaaS. Behaviour on real
tenants is byte-for-byte unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to the previous commit on this branch. Two additional fresh-clone
regressions surfaced during end-to-end verification, both affecting local
dev only and both landing inside the same SaaS-vs-local-dev seam:
### 1. Canvas 401-loops after first workspace creation
`GET /workspaces` is behind `AdminAuth` (router.go:121 — "C1: unauthenticated
workspace topology exposure"). The middleware has a Tier-1 fail-open branch
that only fires when *no* workspace tokens exist anywhere in the DB. The
moment a user creates their first workspace — via either the Canvas UI, the
API, or the e2e-api test suite — a token lands in the DB, Tier-1 closes, and
the Canvas (which has no bearer token in local dev: no WorkOS session, no
NEXT_PUBLIC_ADMIN_TOKEN baked in at build time) gets 401 on every list
call. The UI renders a stuck "API GET /workspaces: 401 admin auth required"
placeholder forever.
SaaS is unaffected because hosted provisioning always sets both
`ADMIN_TOKEN` and `MOLECULE_ENV=production`, and the Canvas there either
carries a WorkOS session cookie or `NEXT_PUBLIC_ADMIN_TOKEN` baked into
the JS bundle.
**Fix** (`workspace-server/internal/middleware/wsauth_middleware.go`): add
a narrow Tier-1b escape hatch that stays fail-open when *both*
`ADMIN_TOKEN` is unset *and* `MOLECULE_ENV` is explicitly a dev mode
("development" / "dev"). Production never hits it (SaaS sets
`MOLECULE_ENV=production`). Mirrors the existing convention in
`handlers/admin_test_token.go` which gates the e2e test-token endpoint on
`MOLECULE_ENV != "production"`.
Three new regression tests in `wsauth_middleware_test.go`:
- `TestAdminAuth_DevModeEscapeHatch_FailsOpenWithHasLiveTokens` — the
happy path (dev mode, no admin token, tokens exist → 200)
- `TestAdminAuth_DevModeEscapeHatch_IgnoredWhenAdminTokenSet` — explicit
`ADMIN_TOKEN` wins; dev mode does not silently re-open the gate
- `TestAdminAuth_DevModeEscapeHatch_IgnoredInProduction` — the
SaaS-safety guarantee (production + no admin token + tokens exist → 401)
`.env.example` flipped to set `MOLECULE_ENV=development` by default so
new users get the dev-mode hatch automatically via `cp .env.example .env`.
SaaS provisioning overrides to `production`, consistent with the existing
convention used by the secrets-encryption strict-init path.
### 2. SaaS cookie/privacy banner rendered on localhost
`CookieConsent` mounted unconditionally in the root layout, so
`npm run dev` on localhost showed a "Cookies & your privacy" banner
pointing at `moleculesai.app/legal/privacy`. That banner is a
GDPR/ePrivacy compliance UI that only applies to the hosted SaaS
offering; self-hosted / local-dev / Vercel-preview hosts must not
see it.
**Fix** (`canvas/src/components/CookieConsent.tsx`): gate render on
`isSaaSTenant()`. Matches the convention used by `AuthGate` and the
workspace tier picker elsewhere in the codebase.
Tests (`canvas/src/components/__tests__/CookieConsent.test.tsx`):
existing tests now stub `window.location.hostname` to a SaaS
subdomain before rendering (required since `isSaaSTenant()` on jsdom's
default "localhost" would suppress the banner). Added two new tests
for the local-dev hide path:
- `does NOT render on local dev (non-SaaS hostname)`
- `does NOT render on a LAN hostname (192.168.*, *.local)`
### Verification
On a fresh-nuked DB with the updated branch:
1. `bash infra/scripts/setup.sh` — clean
2. `go run ./cmd/server` — "Applied 41 migrations", :8080 healthy,
dev-mode hatch armed (`MOLECULE_ENV=development`)
3. `npm run dev` in canvas — :3000 renders, no cookie banner
4. `bash tests/e2e/test_api.sh` — **61 passed, 0 failed**
(test suite creates tokens; GET /workspaces stays 200 under the hatch)
5. Browser at http://localhost:3000 AFTER the e2e run:
- Canvas renders the workspace list (no 401 placeholder)
- No cookie banner
6. `npx vitest run` — **902 tests passed** (900 prior + 2 new hide tests)
7. `go test -race ./internal/middleware/` — all passing (3 new
dev-mode tests + existing Issue-180 / Issue-120 / Issue-684 suite),
coverage 81.8%
### SaaS parity audit
Same principle as the rest of this branch: local must work without
weakening SaaS.
- Dev-mode hatch: conditional on `MOLECULE_ENV=development`.
Production tenants always run `MOLECULE_ENV=production` (already
enforced by the secrets-encryption `InitStrict` path in
`internal/crypto/aes.go`). Branch is unreachable there.
- Cookie banner: gated on `isSaaSTenant()` which checks
`NEXT_PUBLIC_SAAS_HOST_SUFFIX` (default `.moleculesai.app`). SaaS
hosts still get the banner; every other host doesn't.
No change to SaaS behaviour. #1822 backend-parity tracker untouched.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reproducing the README's quickstart on a clean clone surfaced seven
independent bugs between `git clone` and seeing the Canvas in a browser.
Each fix is minimal and local-dev-only — the SaaS/EC2 provisioner path
(issue #1822) is untouched.
Bugs fixed:
1. `infra/scripts/setup.sh` applied migrations via raw psql, bypassing
the platform's `schema_migrations` tracker. The platform then re-ran
every migration on first boot and crashed on non-idempotent ALTER
TABLE statements (e.g. `036_org_api_tokens_org_id.up.sql`). Dropped
the migration block — `workspace-server/internal/db/postgres.go:53`
already tracks and skips applied files.
2. `.env.example` shipped `DATABASE_URL=postgres://USER:PASS@postgres:...`
with literal `USER:PASS` placeholders and the Docker-internal hostname
`postgres`. A `cp .env.example .env` followed by `go run ./cmd/server`
on the host failed with `dial tcp: lookup postgres: no such host`.
Replaced with working `dev:dev@localhost:5432` defaults that match
`docker-compose.infra.yml`.
3. `docker-compose.infra.yml` and `docker-compose.yml` set
`CLICKHOUSE_URL: clickhouse://...:9000/...`. Langfuse v2 rejects
anything other than `http://` or `https://`, so the container
crash-looped and returned HTTP 500. Switched to
`http://...:8123` (HTTP interface) and added `CLICKHOUSE_MIGRATION_URL`
for the migration-time native-protocol connection. Also removed
`LANGFUSE_AUTO_CLICKHOUSE_MIGRATION_DISABLED` so migrations actually
run.
4. `canvas/package.json` dev script crashed with `EADDRINUSE :::8080`
when `.env` was sourced before `npm run dev` — Next.js reads `PORT`
from env and the platform owns 8080. Pinned `dev` to
`-p 3000` so sourced env can't hijack it. `start` left as-is because
production `node server.js` (Dockerfile CMD) must respect `PORT`
from the orchestrator.
5. README/CONTRIBUTING told users to clone `Molecule-AI/molecule-monorepo`
— that repo 404s; the actual name is `molecule-core`. The Railway
and Render deploy buttons had the same broken URL. Replaced in both
English and Chinese READMEs and in CONTRIBUTING. Internal identifiers
(Go module path, Docker network `molecule-monorepo-net`, Python helper
`molecule-monorepo-status`) deliberately left alone — renaming those
is an invasive refactor orthogonal to this fix.
6. README quickstart was missing `cp .env.example .env`. Users who went
straight from `git clone` to `./infra/scripts/setup.sh` got a script
that warned about an unset `ADMIN_TOKEN` (harmless) but then couldn't
run the platform without figuring out the env setup on their own.
Added the step in both READMEs and CONTRIBUTING. Deliberately NOT
generating `ADMIN_TOKEN`/`SECRETS_ENCRYPTION_KEY` here — the e2e-api
suite (`tests/e2e/test_api.sh`) assumes AdminAuth fallback mode
(no server-side `ADMIN_TOKEN`), which is how CI runs it.
7. CI shellcheck only covered `tests/e2e/*.sh` — `infra/scripts/setup.sh`
is in the critical path of every new-user onboarding but was never
linted. Extended the `shellcheck` job and the `changes` filter to
cover `infra/scripts/`. `scripts/` deliberately excluded until its
pre-existing SC3040/SC3043 warnings are cleaned up separately.
Verification (fresh nuke-and-rebuild following the updated README):
- `docker compose -f docker-compose.infra.yml down -v` + `rm .env`
- `cp .env.example .env` → defaults work as-is
- `bash infra/scripts/setup.sh` — clean, no migration errors, all 6
infra containers healthy
- `cd workspace-server && go run ./cmd/server` — "Applied 41 migrations
(0 already applied)", platform on :8080/health 200
- `cd canvas && npm install && npm run dev` — Canvas on :3000/ 200
even with `.env` sourced (PORT=8080 in env)
- `bash tests/e2e/test_api.sh` — **61 passed, 0 failed**
- `cd canvas && npx vitest run` — **900 tests passed**
- `cd canvas && npm run build` — production build clean
- `shellcheck --severity=warning infra/scripts/*.sh` — clean
- Langfuse `/api/public/health` 200 (was 500)
Scope notes:
- SaaS/EC2 parity (issue #1822): all files touched here are local-dev
surface. Canvas container uses `node server.js` with `ENV PORT=3000`
in `canvas/Dockerfile` — the `-p 3000` pin in `package.json` dev
script only affects `npm run dev`, not the production CMD.
- Test coverage (issue #1821): project policy is tiered coverage floors,
not a blanket 100% target. Files touched here are shell scripts,
YAML, Markdown, and one package.json script — not classes covered
by the coverage matrix.
- No overlap with open PRs — searched `setup.sh`, `quickstart`,
`langfuse`, `clickhouse`, `migration`, `README`; nothing conflicts.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>
The pathname.startsWith() loop-break added to redirectToLogin needs
pathname on the mock Location object; tests were supplying only href.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tenant subdomains (hongmingwang.moleculesai.app) proxy to EC2 platform
which has no /cp/auth/* routes. Auth UI lives on app.moleculesai.app.
Added getAuthOrigin() that detects SaaS tenant hosts and redirects to
the app subdomain for login/signup. Non-SaaS hosts (localhost, dev)
fall back to PLATFORM_URL as before.
[Molecule-Platform-Evolvement-Manager]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When session credentials expire mid-use, ALL API calls return 401.
Previously this threw a generic error that crashed the UI with no
recovery path. Now the API client intercepts 401 and redirects to
login once (via redirectToLogin which already guards against loops).
Combined with the AuthGate /cp/auth/* path guard, this gives the
correct behavior: credentials lost → redirect to login → user logs
in → return_to sends them back.
[Molecule-Platform-Evolvement-Manager]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
AuthGate redirected anonymous users to /cp/auth/login?return_to=<url>,
but the login page itself triggered AuthGate, which redirected again
with double-encoded return_to. Each redirect added another encoding
layer until the URL exceeded 431 (Request Header Fields Too Large).
Two guards:
1. redirectToLogin() returns early if already on /cp/auth/* path
2. AuthGate skips redirect check entirely for /cp/auth/* paths
[Molecule-Platform-Evolvement-Manager]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PR #1781 introduced useCanvasStore.getState() call in ContextMenu.tsx
(line 169) but the existing Vitest mock for useCanvasStore in the keyboard
test file lacked a getState method, causing:
TypeError: useCanvasStore.getState is not a function
Fix: attach getState: () => mockStore to the mock using Object.assign
so the static method is available alongside the selector fn.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Brings the staging branch up to date with main's feature-fix stream so
every staging-targeted PR stops tripping on pre-existing rot. Before
this merge, staging had 30+ compile + test failures from fix PRs that
landed on main but never reached staging — primarily #1755's panic-
cascade + schema-drift alignments.
After this merge the handlers package goes from 30+ fails → 2 pre-
existing nil-docker test panics (TestCopyFilesToContainer_CWE22_
RejectsTraversal + TestDeleteViaEphemeral_F1085_RejectsTraversal),
both authored on staging and broken before this promotion. Tracked
separately; not a merge regression.
## Conflicts resolved
1. docs/marketing/campaigns/discord-adapter-announcement/announcement.md
— deleted on main (9d0d213: "move sensitive strategy + research to
internal repo"), modified on staging. Deletion wins: marketing
content moved out of the public monorepo per that commit's intent.
The content lives in the internal repo.
2. workspace-server/internal/handlers/container_files.go — staging's
rmTarget version kept. Main's version had `Cmd: []string{"rm",
"-rf", "/configs/" + filePath}` which concatenates raw filePath
AFTER the prefix-check on rmTarget, defeating the path-traversal
guard (a "../etc/passwd" input passes validation but the rm cmd
then traverses). Staging's `Cmd: []string{"rm", "-rf", rmTarget}`
uses the validated path. Keeping staging's more-secure variant.
## Includes build unblockers from #1769 / #1782
- terminal.go: malformed handleLocalConnect repaired
- terminal_test.go: missing braces in TestHandleConnect_RoutesToLocal
- workspace_crud.go: unused imports + duplicate strField block
- container_files_test.go: duplicate contains() removed (uses the one
in workspace_provision_test.go, same package)
## Verification
- go build ./... ✅ clean
- go vet ./... ✅ clean
- go test -race ./... — 18/20 packages green; 2 test panics in
internal/handlers are pre-existing on staging (documented above)
ContextMenu.tsx reads parent-workspace children via
useCanvasStore.getState().nodes.filter(...) — a direct .getState()
call, not the selector-calling form. The existing vi.mock exposed
only the selector form, so rendering crashed with
"TypeError: useCanvasStore.getState is not a function".
Restructure the vi.mock factory to return Object.assign(fn, {
getState: () => mockStore }) so both call shapes resolve. Factory body
builds the function locally because vi.mock hoists above outer-scope
variable declarations and can't reference `mockStore` via closure.
Verified: all 15 tests in the file pass after the change.
Unblocks the Canvas (Next.js) CI check on PR #1743 (staging→main sync).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three small, non-overlapping fixes extracted from closed PR #1664:
1. canvas/src/components/ContextMenu.tsx — Replace the useMemo-over-nodes
pattern with a hashed-boolean selector (s.nodes.some(...)) so Zustand's
useSyncExternalStore snapshot comparison is stable. Resolves React
error #185 (infinite render loop). Moves the child-node list derivation
into the delete handler via getState() so the render path no longer
allocates a fresh array.
2. workspace-server/internal/handlers/a2a_proxy.go — Allow the
Docker-bridge hostname path (ws-<id>:8000) to skip the SSRF guard in
local-docker mode. Gated on !saasMode() so SaaS deployments keep the
full private-IP blocklist (a remote workspace registration can't claim
a ws-* hostname and reach a sensitive VPC IP).
3. workspace-server/Dockerfile — Add entrypoint.sh that discovers the
docker.sock GID at boot and adds the platform user to that group, then
exec's su-exec to drop privileges. Lets the platform container reach
the host docker socket without running as root.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Root cause of the hermes 401 "Invalid API key" on SaaS workspaces:
1. CreateWorkspaceDialog never sent `model` in the /workspaces POST
2. Tenant/CP plumbed through a valid (provider, API key) but empty MODEL
3. Workspace install.sh ran with HERMES_DEFAULT_MODEL unset
4. derive-provider.sh saw no slug → PROVIDER="auto"
5. Hermes fell back to its compiled-in default (Anthropic via
OpenAI-compat adapter)
6. User's MINIMAX_API_KEY was present but irrelevant — hermes tried
Anthropic with it → 401
Fix:
- Extend HERMES_PROVIDERS with `defaultModel` + `models` (suggestion
list). Each provider ships with a known-good default so the trap
is physically impossible to hit with the new form.
- Add a required Model input to the Hermes panel, auto-populated
from the provider's defaultModel when the provider changes (only
if the user hasn't typed their own slug yet).
- Datalist surfaces additional model suggestions per provider so
users can pick a different size (e.g. M2.7-highspeed) without
typing the whole slug.
- handleCreate validates hermesModel is non-empty, sends as `model`
in the POST body alongside the secrets block.
- useEffect guard avoids clobbering a user-typed custom slug when
they toggle providers back and forth.
Existing 19 a11y tests still pass (non-SaaS path unchanged, four-tier
picker still renders, arrow-key nav still wraps).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Following feedback that T4 — not T3 — is the full-access tier:
- Non-SaaS picker now shows all four tiers: T1 Sandboxed, T2 Standard,
T3 Privileged, T4 Full Access. Four-column grid.
- SaaS picker stays single-option but now locks to T4 (was T3). Every
SaaS workspace gets a dedicated EC2 VM, which is unambiguously the
"full host" case — T3 (privileged container) was a category mismatch.
- Default tier on SaaS is 4 (was 3). CP provisioner already supports
tier 4 (t3.large / 80 GB). TIER_CONFIG already has T4's amber color.
Tests updated for the four-tier picker: wrap tests now go T4 ↔ T1, and
the selection/tabIndex tests cover the fourth button.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Staging added hasChildren/children fields to workspace store shape.
Test assertion updated to use objectContaining to avoid false negatives.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
On SaaS every workspace gets its own EC2 VM — the Docker-sandbox
distinction between T1 (sandboxed), T2 (standard Docker), and T3
(full host access) doesn't apply. A SaaS workspace is always a
dedicated VM, which is "full access" by construction. Showing T1/T2
in that UI is a category error: users pick a sandbox level that has
no effect on the actual EC2 machine they get.
Changes:
- tenant.ts: export isSaaSTenant() — returns true when canvas is
served at <slug>.moleculesai.app (SSR-safe: false on server)
- CreateWorkspaceDialog: when isSaaSTenant(), render only the T3
option, default tier=3, grid collapses to a single column. Label
gets a " — dedicated VM" hint so the user knows what they're
getting. On self-hosted the full T1/T2/T3 picker is unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Add-Key form used to open with a required Service dropdown
(GitHub / Anthropic / OpenRouter / Other) that gated everything
else. The dropdown did no persistent work — the secret store only
cares about (key_name, value); the Service label was never saved
anywhere. It also suffered registry drift: today we support ~22
hermes-dispatched providers (MiniMax, Gemini, DeepSeek, Kimi, Qwen,
NVIDIA, etc.); only 3 had entries. Everyone else landed in "Other"
with no downside beyond the mandatory click.
Replaces it with:
1. Key-name <datalist> autocomplete sourced from new
KEY_NAME_SUGGESTIONS in lib/services.ts — 26 entries covering
common infra keys + every hermes-supported provider.
2. inferGroup(keyName) derives classification at render time,
matching what the store already does in getGrouped(). No
behaviour change for list grouping.
3. Provider docs link renders inline only when inferGroup
recognises the name. For 'custom' keys we stay quiet — no
false-structure prompt.
4. Test-connection button still available when the inferred group
supports it AND the value is format-valid. Same providers as
before.
SERVICES registry preserved for LIST rendering + test routing.
Result: two fields instead of three. One fewer decision. Provider-
agnostic by design — new providers work the moment someone types
their canonical env var name; no UI code change per provider.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #1526 shipped the /templates registry + canvas dynamic Runtime /
Model / Required-Env fields on 2026-04-22 — but merged into the
staging branch, not main. The staging→main promotion PR #1496 has
been open unmerged for a while with 1172 commits divergence, so
prod (which builds from main) still carries the old hardcoded
dropdown.
Symptom seen on hongmingwang.moleculesai.app today:
- New Hermes Agent workspace (template declares runtime: hermes) loads
Config tab → Runtime dropdown shows "LangGraph (default)" because
there's no <option value="hermes"> in the hardcoded list; it falls
back to empty-value silently.
- Model field is a plain TextInput with static placeholder
"e.g. anthropic:claude-sonnet-4-6" — should be a combobox populated
from the selected runtime's models[].
- Required Env Vars is a TagList with static placeholder
"e.g. CLAUDE_CODE_OAUTH_TOKEN" — should auto-populate from the
selected model's required_env.
- Net effect: "Save & Deploy" sends empty model + empty env to the
provisioner → workspace instant-fails.
This PR cherry-picks the exact three files from PR #1526 (#359dc61
on staging) forward to main, without pulling the other 1171
commits:
- canvas/src/components/tabs/ConfigTab.tsx
- RuntimeOption interface + FALLBACK_RUNTIME_OPTIONS (hermes,
gemini-cli included)
- useEffect fetches /templates and populates runtimeOptions
dynamically
- dropdown renders from runtimeOptions (no hardcoded list)
- Model becomes a combobox with datalist of available models
per selected runtime
- Required Env Vars auto-populates from the selected model's
required_env on model change
- workspace-server/internal/handlers/templates.go
- /templates endpoint returns [{id, name, runtime, models}] with
per-template models registry (id, name, required_env)
- workspace-server/internal/handlers/templates_test.go
- Tests for runtime+models parsing and legacy top-level model
fallback
The canvas Runtime dropdown now resolves "hermes" correctly;
Model dropdown shows the models[] from the hermes template; Env
auto-populates with HERMES_API_KEY (or whichever model selected).
Verified locally:
- workspace-server builds clean
- Template handler tests pass: TestTemplatesList_RuntimeAndModelsRegistry,
TestTemplatesList_LegacyTopLevelModel, TestTemplatesList_NonexistentDir
Follow-up: the staging→main promotion gap (#1496) is the
underlying process issue. Either merge that PR or adopt a policy
of landing fixes directly on main (as several PRs have today).
Files here were chosen minimally to avoid pulling unrelated staging
changes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(canary-release): flag as aspirational; link to current state
The canary-release.md doc describes the pipeline as if the fleet is
running — referring to AWS account 004947743811 and a configured
MoleculeStagingProvisioner role. Reality as of 2026-04-22: no canary
tenants are provisioned, the 3 GH Actions secrets are empty, and
canary-verify.yml has failed 7/7 times in a row.
Added a top-of-doc ⚠️ state note that:
1. Clarifies this is intended design, not deployed reality.
2. Notes the AWS account ID is historical / unverified.
3. Explains that merges currently rely on manual promote-latest.
4. Cross-links to molecule-controlplane/docs/canary-tenants.md for
the Phase 1 work that's shipped, the Phase 2 stand-up plan, and
the "should we even do this now?" decision framework.
5. Asks whoever lands Phase 2 to reconcile the two docs.
No behaviour change — doc-only.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(build): add missing fmt import in a2a_proxy.go, fix canvas Dockerfile GID
- a2a_proxy.go: missing "fmt" import caused build failure (8 undefined
references at lines 743-775). Likely dropped during a recent merge.
- canvas/Dockerfile: GID 1000 already in use in node base image.
Changed to dynamic group/user creation with fallback.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Hongming Wang <hongmingwang.rabbit@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Hongming Wang <hongmingwangrabbit@gmail.com>