Independent code review surfaced three required fixes and one cheap
optional one. All addressed here.
dotenv parser:
- `export FOO=bar` was parsed as key `"export FOO"` (with embedded
space) and silently os.Setenv'd, so a developer pasting from a
direnv `.envrc` would get junk vars. Now strips the prefix.
- Quoted values weren't unwrapped: `FOO="hello world"` produced value
`"hello world"` with literal quotes. Now strips one matched pair of
surrounding `"` or `'`. Inside a quoted value `#` is part of the
value, not a comment marker (matches godotenv convention).
- UTF-8 BOM at file start (Windows editors) would have produced a
first key like U+FEFF + "FOO". Now stripped via TrimPrefix.
dotenv loader:
- findDotEnv()'s upward walk would happily pick up `~/.env` or a
sibling-repo `.env` if the binary was run from `~/Documents/other-
project/`. Real foot-gun on shared dev boxes. Now gated on a
monorepo sentinel: the candidate directory must contain
`workspace-server/go.mod`. Falls through to "no .env found" (=
pre-fix behavior) when the sentinel is absent.
socket fallback poll:
- startFallbackPoll() previously fired only on onclose, so the very
first connect attempt — when onclose hasn't fired yet because we
never had a successful onopen — left the canvas with no HTTP poll
for the duration of the failing handshake (Chrome can hold a
SYN-SENT WebSocket open ~75s before giving up). Now also called at
the top of connect(); the timer-already-running guard makes it a
no-op when one cycle later onclose calls it again.
Test coverage added: export prefix, single+double quoted values, hash
inside quotes preserved, unterminated quote falls back to bare value,
CRLF stripping locked in, BOM stripping, and a sentinel-rejection
regression test that creates a temp .env with no workspace-server
sibling and asserts findDotEnv refuses to load it.
Verified: 985 canvas tests + 30 dotenv subtests + 4 dotenv integration
tests all pass; tsc clean; rebuilt platform from monorepo root with
stripped env still loads .env (49 vars) and /workspaces returns 200.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Deleting a parent on a wedged WS used to leave the child cards on
the canvas as orphaned roots until the user manually refreshed.
Why: Canvas.tsx and DetailsTab.tsx both called `removeNode(parentId)`
after `DELETE /workspaces/:id?confirm=true` returned 200. `removeNode`
deliberately re-parents children rather than cascading — it relies on
the per-descendant WORKSPACE_REMOVED WS events the platform emits as
part of the cascade to drop each child individually. When the WS is
unhealthy those events never arrive, so the local store keeps the
children alive (now re-parented to root since their actual parent is
gone).
Fix: new `removeSubtree(rootId)` action on the canvas store mirrors
the server-side cascade — drops the root + every descendant + every
incident edge in one atomic set(). Both delete call sites now use it.
The WS events still arrive when WS is healthy and become idempotent
no-ops because the nodes are already gone.
Why a new action instead of changing removeNode: removeNode's
re-parenting behavior is correct for non-cascading flows (drag-out,
manual node detach in the future). Adding a sibling action keeps
both call shapes available rather than forcing every caller to opt
out of cascade.
6 new unit tests cover root cascade, mid-level cascade, leaf
no-op-cascade, selection clearing across the subtree, selection
preservation outside the subtree, and edge cleanup.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two related fixes for the case where the canvas thinks workspaces are
stuck provisioning when they're actually online:
1. ProvisioningTimeout banners now gate on wsStatus === "connected".
While the WS is in connecting/disconnected state, the local
"provisioning" status reflects the last event received before the
drop — workspaces may have transitioned to online minutes ago. The
8m timeout was firing against frozen state and showing a wall of
yellow warnings on already-online workspaces.
2. Socket layer now starts a 10s rehydrate poll when the WS goes
unhealthy (onclose) and stops it on onopen/disconnect. The
reconnect attempts continue in parallel; whichever recovers first
wins. rehydrate()'s existing dedup gate prevents the open-time
rehydrate from racing with a fallback poll. Without this the
store could stay frozen for minutes while WS exponential backoff
chewed through retries.
Plus the previously-uncommitted TemplatePalette flushSync change so
the import modal unmounts synchronously before doImport runs (otherwise
React batches the close with the import's setState prefix and the
modal backdrop hides the spawn animation).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Single-themed bundle of fixes accumulated while polishing the canvas
chat / agent-comms / plugins / position flows. Each piece is small;
the connective tissue is "things observable from the canvas right
panel and the org-deploy flow that surprised real users".
UI / composer
- Legend: add close X + persisted-localStorage state + reopener
pill; default open for first-time users.
- SidePanel: rename "Skills" tab label → "Plugins" (single-line;
internal panelTab enum value, component name, and store keys
unchanged).
- SkillsTab: registry tri-state UI (loading / error / empty) with
actionable Retry button + 10s explicit fetch timeout. Handle
AbortSignal.timeout's DOMException by name (TimeoutError /
AbortError) — Chromium's "signal timed out" message wouldn't
match the prior naive /timeout/ regex. Reset mountedRef on every
mount: pre-existing StrictMode dev-mode bug where cleanup-only
`current = false` was never re-set, permanently wedging every
`if (mountedRef.current) setX(...)` guard and producing a
"Loading…" panel that never resolved on hard refresh.
- ChatTab: paste-image-from-clipboard via onPaste handler; unique
monotonic-counter filenames so same-second pastes don't collide
on name+size dedup. mime→ext map avoids `image/svg+xml`-style
raw extensions on synthesised filenames. Bypasses the
DataTransfer constructor so Safari < 14.1 / older Edge work.
- ChatTab: drop stuck error toast when the WS path already
delivered the agent reply but the HTTP path errored late
(sendingFromAPIRef gate now covers the .catch() handler).
- ChatTab: filter heartbeat-style internal self-messages from the
My Chat tab so historical rows with source_id=NULL don't
surface as user-typed input.
- Modal portals: OrgImportPreflightModal + MissingKeysModal
(ProviderPickerModal + AllKeysModal) now createPortal to
document.body and clamp max-h to 80vh. Escapes the ancestor
containing block (TemplatePalette's fixed+filtered sidebar
re-anchored descendants' position:fixed to itself, hiding
modals behind workspace cards). MissingKeysModal bumped to
z-[60] for stack ordering when both modals are open.
- OrgImportPreflightModal saveOne: ref-based microtask-safe
in-flight gate replaces the brittle "set startValue inside a
setState updater and read on the next line" pattern (React 18
doesn't guarantee functional updaters run synchronously; that
path strands `saving:true` and never calls createSecret). Same
useRef pattern guards SkillsTab.loadRegistry against concurrent
fires and Fast-Refresh-stranded promises; force=true parameter
on retry click bypasses the gate.
Agent comms
- AgentCommsPanel: derive UI-facing `flow` field instead of using
activity_type-derived direction. Self-logged a2a_receive rows
(source_id == workspace_id, what the agent runtime writes to log
its own outbound delegation replies) now correctly render as
OUTBOUND with → arrow + right-justified bubble. Previously they
rendered "← From Self" with Restart pointing at THIS workspace.
- AgentCommsPanel: error rows replace the unactionable
"X failed [A2A_ERROR]" body with banner + underlying-error
code-block + cause-hint (matched on Claude Code SDK init wedge,
deadline-exceeded, agent-thrown exception, empty-error) +
Restart [peer] / Open [peer] action buttons.
- AgentCommsPanel: render text bodies through ReactMarkdown +
remark-gfm so multi-part replies (tables, code) render properly.
Multi-part text extractor
- extractReplyText (live A2A response in ChatTab) and
extractResponseText (chat history loader in message-parser):
now COLLECT from every source — top-level parts, parts.root.text,
and artifacts — joined with "\n". Previous "first source wins"
silently dropped multi-part replies (Hermes summary+detail,
Claude Code long-form table). Tests cover joined-from-parts,
joined-from-artifacts, joined-from-both.
Position stability
- canvas-topology.buildNodesAndEdges: auto-rescue heuristic now
accepts currentParentSizes map; uses max(initial min, currently
grown) for the bbox check. Fixes "child jumps to weird location
after 30s" — the periodic socket health-check rehydrate
(silenceSec > 30) was rebuilding nodes from scratch, and the
rescue's reliance on grid-derived initial size false-flagged
children the user dragged into the user-grown area.
- canvas.hydrate: pass live measured dimensions from the existing
store into buildNodesAndEdges.
- socket.RehydrateDedup: pure exported helper class that gates
rehydrate calls. Two states — in-flight (in-flight Promise reused
by concurrent callers) + post-completion window (1.5s, returns
Promise.resolve()). Initialised with -Infinity so first call
always passes the gate. Wired into ReconnectingSocket.rehydrate.
A2A edges
- New A2AEdge custom React Flow edge component portals its label
out of the SVG layer via EdgeLabelRenderer so labels (a) render
above workspace cards instead of being hidden behind them and
(b) accept clicks. Click selects source + switches panel to
Activity, but only on a NEW selection (preserves current tab on
re-click of an already-selected source).
- buildA2AEdges output tagged type:"a2a"; edgeTypes wired in
Canvas.tsx.
Tests
- 14 new vitest cases across 4 files (964 → 978 passing):
OrgImportPreflightModal saveOne single-fire / double-click,
any-of rendering; AgentCommsPanel toCommMessage flow derivation
in all four shapes; canvas-topology rescue respects-grown /
rescues-genuine-drift / fallback-without-live-size; socket
RehydrateDedup gate behaviour; message-parser multi-part
response extraction.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extends the org-import env preflight so a template can declare an
alternative: satisfy ANY one member to pass. Motivated by the
Claude-family node case where either ANTHROPIC_API_KEY or
CLAUDE_CODE_OAUTH_TOKEN unlocks the agent — forcing both was wrong.
Server (workspace-server):
- New EnvRequirement union type with custom YAML + JSON
(un)marshaling. Accepts scalar (strict) or {any_of: [...]} in
both on-disk org.yaml and inline POST /org/import bodies.
- collectOrgEnv now returns []EnvRequirement. Dedups groups by
sorted-member signature. "Strict wins" pruning drops any-of
groups that mention a name already declared strictly (same
tier and cross-tier).
- Import preflight uses EnvRequirement.IsSatisfied — scalar =
exact match, group = any member present.
- Empty any_of: [] rejected at parse time (never-satisfiable).
- 14 handler tests (6 updated for the union shape, 8 new
covering any-of satisfaction, dedup, strict-dominates-group,
cross-tier pruning, invalid-member filtering, YAML round-trip,
and empty-any-of rejection).
Canvas:
- EnvRequirement = string | {any_of: string[]} with envReqMembers,
envReqSatisfied, envReqKey helpers.
- OrgImportPreflightModal renders strict rows and any-of groups
via a new AnyOfEnvGroup sub-component: "Configure any one"
banner, per-member input, ✓-satisfied indicator, and dimmed
siblings once any member is configured so the user can still
switch providers.
- TemplatePalette.OrgTemplate.required_env / recommended_env
retyped to EnvRequirement[]; passthrough to the modal
unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Commit 5adc8a74 (part of this PR) intentionally made
molecule:fit-deploying-org fire for root-level workspaces too — it
used to only fire for children, which meant a standalone create
didn't center the viewport until the first child arrived ~2s later.
The existing regression test still expected ONLY the
molecule:pan-to-node event for a new root, so it started failing
with "expected length 1, got 2". The product behavior is correct
(centering on the root immediately is better UX); the test was
pinning the old single-dispatch shape.
Fix: assert BOTH events fire, each with the right detail payload,
so a future regression that drops either one (or duplicates) trips
the test. Single-test update, no production code change. 953/953
canvas tests pass locally.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Builds on #2061. Three internally-cohesive sub-features; easiest to
read in order.
## 1. Org-level env preflight
Server
- `OrgTemplate` + `OrgWorkspace` gain `required_env: string[]` and
`recommended_env: string[]` YAML fields.
- `GET /org/templates` walks the tree and returns the tree-union
(deduped, sorted) of both. `collectOrgEnv` dedup prefers required
when the same key is declared at both tiers.
- `POST /org/import` preflights against `global_secrets` WHERE
`octet_length(encrypted_value) > 0` (empty-value rows used to be
counted as "configured" and the per-container preflight still
failed at start time). 412 Precondition Failed + `missing_env`
list when required keys are absent. `force=true` bypasses with
an audit log line. DB lookup failure now returns 500 (was:
silent fall-through that defeated the guard). Env-var NAMES
validated against `^[A-Z][A-Z0-9_]{0,127}$` so a malicious
template can't ship pathological names into the UI or DB.
Canvas
- New `OrgImportPreflightModal`: red "Required" section (blocking)
and yellow "Recommended" section (non-blocking, import stays
enabled, shows live missing-count next to the Import button).
- Per-key password input → `PUT /settings/secrets` → strike-through
on save. Functional `setDrafts` throughout (no stale-closure
clobbers on rapid successive saves). `useEffect` seed keyed on a
sorted-join string signature so a parent re-render with a new
array identity doesn't clobber typed inputs.
- `TemplatePalette.handleImport` branches: zero env declarations →
straight to import; any declarations → fetch configured global
secret keys, open the modal.
Tests (Go): `TestCollectOrgEnv_*` (5) cover union-across-levels,
required-wins-over-recommended (including same-struct), dedup,
empty, invalid-name rejection.
## 2. EmptyState parity with TemplatePalette
The "Deploy your first agent" grid used to call `POST /workspaces`
with no preflight while the sidebar palette ran
`checkDeploySecrets` + `MissingKeysModal` first. Same template
deployed two different ways → first-run users saw containers boot
in `failed` state without guidance. Now both surfaces share one
preflight + modal handshake.
EmptyState's previous `interface Template` dropped `runtime`,
`models`, and `required_env` — silently discarding exactly the
fields the preflight needs. `Template` now lives in
`deploy-preflight.ts` and is imported from there by both surfaces.
## 3. useTemplateDeploy hook
With the preflight + modal wiring now duplicated across
EmptyState + TemplatePalette + (going forward) any third surface,
extracted the pattern into `canvas/src/hooks/useTemplateDeploy.tsx`:
const { deploy, deploying, error, modal } = useTemplateDeploy({
canvasCoords: ..., // optional, default random
onDeployed: (id) => ...,
});
Closes three drift surfaces that the duplication had created:
- `resolveRuntime` id→runtime fallback table (moved to
`deploy-preflight.ts`). EmptyState had a narrower fallback that
would have silently disagreed with the palette on any future id
needing a non-identity mapping.
- `checkDeploySecrets` call signature. One owner.
- `MissingKeysModal` JSX wiring. One owner.
Narrow try/catch around `checkDeploySecrets` so a preflight network
failure clears `deploying` and surfaces via `setError` instead of
stranding the button forever. `modal: ReactNode` (not a
`renderModal()` function) — the previous memoization bought
nothing since consumers called it inline every render. Named
`MissingKeysInfo` interface for the state shape.
## 4. Viewport auto-fit user-pan gate fix
During org deploy the canvas was meant to pan+zoom to follow each
arriving workspace (`molecule:fit-deploying-org` event → debounced
fitView). In practice the fit stayed stuck on wherever the first
fit landed.
Root cause: React Flow v12 fires `onMoveEnd` with a truthy `event`
at the END of a programmatic `fitView` animation. The original
"respect-user-pan" gate stamped `userPannedAtRef` in `onMoveEnd`,
so our own fit completing looked like a user pan, and every
subsequent auto-fit short-circuited for the rest of the deploy.
Fix: stop trusting `onMoveEnd` for user-intent detection. Register
explicit `wheel` + `pointerdown` listeners on `document` with
capture phase and `target.closest('.react-flow__pane')` filter.
Capture-phase immunity to `stopPropagation`; pane-filter rejects
toolbar / modal / side-panel clicks (the old `window` fallback
caught those). `onMoveEnd` simplified to only drive the debounced
viewport save.
Also: fit event dispatched on root arrivals (not just children),
so the canvas centers on the just-landed root immediately instead
of waiting ~2s for the first child. Animation 600ms → 400ms so
successive per-arrival fits don't pile up visually. End-state fit
stays at 1200ms — intentional asymmetry ("settling" vs
"tracking"), documented in code.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Merge origin/staging into fix/canvas-multilevel-layout-ux. 18 files
auto-merged (mostly canvas/tabs/chat and workspace-server handlers
the earlier DIRTY marker was stale relative to current staging).
- Fix 7 test failures surfaced by the merge:
1. Canvas.pan-to-node.test.tsx — mockGetIntersectingNodes was
inferred as vi.fn(() => never[]); mockReturnValueOnce of a node
object failed type check. Explicit return-type annotation.
2. Canvas.pan-to-node.test.tsx + Canvas.a11y.test.tsx — Canvas.tsx
reads deletingIds.size (new multilevel-layout state). Both mock
stores lacked deletingIds; added new Set<string>() to each.
3. canvas-batch-partial-failure.test.ts — makeWS() built a wire-
format WorkspaceData (snake_case, with x/y/uptime_seconds). The
store's node.data is now WorkspaceNodeData (camelCase, no wire-
only fields). Rewrote makeWS to produce WorkspaceNodeData and
updated 5 call-site casts. No assertions changed.
4. ConfigTab.hermes.test.tsx — two tests pinned pre-#2061 behavior
that the PR intentionally inverts:
a. "shows hermes-specific info banner" — RUNTIMES_WITH_OWN_CONFIG
now contains only {"external"}, so the banner is no longer
shown for hermes. Inverted assertion: now pins ABSENCE of
the banner, with a comment noting the inversion.
b. "config.yaml runtime wins over DB" — priority reversed:
DB is now authoritative so the tier-on-node badge matches
the form. Inverted scenario: DB=hermes + yaml=crewai →
form shows hermes. Switched test's DB runtime off langgraph
because the dropdown collapses langgraph into an empty-
valued "default" option that would hide the win signal.
- No production code changed — this commit is staging merge + test
realignment only. 953/953 canvas tests pass. tsc --noEmit clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Session's accumulated UX work across frontend and platform. Reviewable
in four logical sections — diff is large but internally cohesive
(each section fixes a gap the next one depends on).
## Chat attachments — user ↔ agent file round trip
- New POST /workspaces/:id/chat/uploads (multipart, 50 MB total /
25 MB per file, UUID-prefixed storage under
/workspace/.molecule/chat-uploads/).
- New GET /workspaces/:id/chat/download with RFC 6266 filename
escaping and binary-safe io.CopyN streaming.
- Canvas: drag-and-drop onto chat pane, pending-file pills,
per-message attachment chips with fetch+blob download (anchor
navigation can't carry auth headers).
- A2A flow carries FileParts end-to-end; hermes template executor
now consumes attachments via platform helpers.
## Platform attachment helpers (workspace/executor_helpers.py)
Every runtime's executor routes through the same helpers so future
runtimes inherit attachment awareness for free:
- extract_attached_files — resolve workspace:/file:///bare URIs,
reject traversal, skip non-existent.
- build_user_content_with_files — manifest for non-image files,
multi-modal list (text + image_url) for images. Respects
MOLECULE_DISABLE_IMAGE_INLINING for providers whose vision
adapter hangs on base64 payloads (MiniMax M2.7).
- collect_outbound_files — scans agent reply for /workspace/...
paths, stages each into chat-uploads/ (download endpoint
whitelist), emits as FileParts in the A2A response.
- ensure_workspace_writable — called at molecule-runtime startup
so non-root agents can write /workspace without each template
having to chmod in its Dockerfile.
Hermes template executor + langgraph (a2a_executor.py) + claude-code
(claude_sdk_executor.py) all adopt the helpers.
## Model selection & related platform fixes
- PUT /workspaces/:id/model — was 404'ing, so canvas "Save"
silently lost the model choice. Stores into workspace_secrets
(MODEL_PROVIDER), auto-restarts via RestartByID.
- applyRuntimeModelEnv falls back to envVars["MODEL_PROVIDER"]
so Restart propagates the stored model to HERMES_DEFAULT_MODEL
without needing the caller to rehydrate payload.Model.
- ConfigTab Tier dropdown now reads from workspaces row, not the
(stale) config.yaml — fixes "badge shows T3, form shows T2".
## ChatTab & WebSocket UX fixes
- Send button no longer locks after a dropped TASK_COMPLETE —
`sending` no longer initializes from data.currentTask.
- A2A POST timeout 15 s → 120 s. LLM turns routinely exceed 15 s;
the previous default aborted fetches while the server was still
replying, producing "agent may be unreachable" on success.
- socket.ts: disposed flag + reconnectTimer cancellation + handler
detachment fix zombie-WebSocket in React StrictMode.
- Hermes Config tab: RUNTIMES_WITH_OWN_CONFIG drops 'hermes' —
the adaptor's purpose IS the form, banner was contradictory.
- workspace_provision.go auto-recovery: try <runtime>-default AND
bare <runtime> for template path (hermes lives at the bare name).
## Org deploy/delete animation (theme-ready CSS)
- styles/theme-tokens.css — design tokens (durations, easings,
colors). Light theme overrides by setting only the deltas.
- styles/org-deploy.css — animation classes + keyframes, every
value references a token. prefers-reduced-motion respected.
- Canvas projects node.draggable=false onto locked workspaces
(deploying children AND actively-deleting ids) — RF's
authoritative drag lock; useDragHandlers retains a belt-and-
braces check.
- Organ cancel button (red pulse pill on root during deploy)
cascades via existing DELETE /workspaces/:id?confirm=true.
- Auto fit-view after each arrival, debounced 500 ms so rapid
sibling arrivals coalesce into one fit (previous per-event
fit made the viewport lurch continuously).
- Auto-fit respects user-pan — onMoveEnd stamps a user-pan
timestamp only when event !== null (ignores programmatic
fitView) so auto-fits don't self-cancel.
- deletingIds store slice + useOrgDeployState merge gives the
delete flow the same dim + non-draggable treatment as deploy.
- Platform-level classNames.ts shared by canvas-events +
useCanvasViewport (DRY'd 3 copies of split/filter/join).
## Server payload change
- org_import.go WORKSPACE_PROVISIONING broadcast now includes
parent_id + parent-RELATIVE x/y (slotX/slotY) so the canvas
renders the child at the right parent-nested slot without doing
any absolute-position walk. createWorkspaceTree signature gains
relX, relY alongside absX, absY; both call sites updated.
## Tests
- workspace/tests/test_executor_helpers.py — 11 new cases
covering URI resolution (including traversal rejection),
attached-file extraction (both Part shapes), manifest-only
vs multi-modal content, large-image skip, outbound staging,
dedup, and ensure_workspace_writable (chmod 777 + non-root
tolerance).
- workspace-server chat_files_test.go — upload validation,
Content-Disposition escaping, filename sanitisation.
- workspace-server secrets_test.go — SetModel upsert, empty
clears, invalid UUID rejection.
- tests/e2e/test_chat_attachments_e2e.sh — round-trip against
a live hermes workspace.
- tests/e2e/test_chat_attachments_multiruntime_e2e.sh — static
plumbing check + round-trip across hermes/langgraph/claude-code.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Schema-driven ChannelsTab renders no inputs when config_schema is
absent — the test's bare {type, display_name} mock mismatched the
real API shape and every getByLabelText("Bot Token") failed.
Mock now mirrors GET /channels/adapters with the Telegram schema
(bot_token password + chat_id text) so the a11y assertions run
against the actual rendered form.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lark adapter was already implemented in Go (lark.go — outbound Custom Bot
webhook + inbound Event Subscriptions with constant-time token verify),
but the Canvas connect-form hardcoded a Telegram-shaped pair of inputs
(bot_token + chat_id). Selecting "Lark / Feishu" from the dropdown
silently sent the wrong field names — there was no way to enter a
webhook URL.
Fix: move form shape to the server.
- Add `ConfigField` struct + `ConfigSchema()` method to the
`ChannelAdapter` interface. Each adapter declares its own fields with
label/type/required/sensitive/placeholder/help.
- Implement per-adapter schemas:
- Lark: webhook_url (required+sensitive) + verify_token (optional+sensitive)
- Slack: bot_token/channel_id/webhook_url/username/icon_emoji
- Discord: webhook_url + optional public_key
- Telegram: bot_token + chat_id (unchanged UX, keeps Detect Chats)
- Change `ListAdapters()` to return `[]AdapterInfo` with config_schema
inline. Sorted deterministically by display name so UI ordering is
stable across Go's random map iteration.
- Update the 3 existing `ListAdapters` test sites to struct access.
Canvas (`ChannelsTab.tsx`):
- Replace the two hardcoded bot_token/chat_id inputs with a single
schema-driven `SchemaField` component. Renders one input per field in
the order the adapter returns them.
- Form state becomes `formValues: Record<string,string>` keyed by
`ConfigField.key`. Values reset on platform-switch so stale
Telegram credentials can't leak into a new Lark channel.
- "Detect Chats" stays but only renders for platforms in
`SUPPORTS_DETECT_CHATS` (Telegram only — the only provider with
getUpdates).
- Only schema-known keys are posted in `config`, scrubbing any stale
values from previous platform selections.
Regression tests:
- `TestLark_ConfigSchema` locks in the 2-field Lark contract with the
required/sensitive flags correctly set.
- `TestListAdapters_IncludesLark` confirms registry wiring + schema
survives round-trip through ListAdapters.
Known pre-existing `TestStripPluginMarkers_AwkScript` failure in
internal/handlers is unrelated to this change (verified via stash+test
on clean staging).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Preparation for a "hundreds of runtimes" plugin ecosystem. Keeping the
runtime-specific UX knobs in-line inside ProvisioningTimeout scales badly
— every new runtime would require editing a component, not just adding a
table entry. Other components (create-workspace dialog, workspace card
tooltips, etc.) will want the same runtime metadata.
Changes:
- New file `canvas/src/lib/runtimeProfiles.ts` owns:
* `RuntimeProfile` type — structural shape, every field optional so
new runtimes can partially-fill without breaking consumers.
* `DEFAULT_RUNTIME_PROFILE` — 2-min default floor (docker-fast).
* `RUNTIME_PROFILES` — named overrides (currently: hermes 12 min).
* `WorkspaceRuntimeOverrides` — interface for server-provided
per-workspace overrides, so operators can tune via template
manifest / workspace metadata without a canvas release.
* `getRuntimeProfile()` — resolver with
overrides → profile → default priority.
* `provisionTimeoutForRuntime()` — convenience wrapper.
- `ProvisioningTimeout.tsx` now delegates to the profile module.
`DEFAULT_PROVISION_TIMEOUT_MS` re-exported for legacy test importers.
- Tests: 16/16 (up from 9 before the first fix). Adds pinning for:
* overrides > profile > default priority chain
* "every entry in RUNTIME_PROFILES resolves to a number" contract
* backward-compat export
Adding a new slow runtime is now one table entry in
`canvas/src/lib/runtimeProfiles.ts` with a mandatory `WHY` comment.
Moving to server-driven profiles later is a ~10-line change (the
resolver already threads WorkspaceRuntimeOverrides through).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Hermes workspaces cold-boot in 8-13 min (ripgrep + ffmpeg + node22 +
hermes-agent source build + Playwright + Chromium ~300MB). The canvas's
2-min hardcoded "Provisioning Timeout" warning fired at ~2min and told
users their workspace was "stuck" while it was still mid-install. Users
hit Retry, triggering fresh cold boots and cancelling healthy workspaces.
User-facing symptom (reported 2026-04-24 18:35Z): hermes workspace showed
"has been provisioning for 3m 15s — it may have encountered an issue"
with Retry + Cancel buttons, while the EC2 was installing node_modules.
Fix:
- Keep DEFAULT_PROVISION_TIMEOUT_MS = 120_000 (2min) — correct for fast
docker runtimes (claude-code, langgraph, crewai) where cold boot is
30-90s.
- Add RUNTIME_TIMEOUT_OVERRIDES_MS = { hermes: 720_000 } (12min).
Aligns with tests/e2e/test_staging_full_saas.sh's
PROVISION_TIMEOUT_SECS=900 (15min) so UI warns shortly before the
backend itself gives up.
- New timeoutForRuntime() resolves the base; per-node lookup in the
check-timeouts interval so a mixed batch (1 hermes + 2 langgraph) uses
the right threshold for each.
- timeoutMs prop is now optional. Undefined → per-runtime lookup; a
number → forces a single threshold for every workspace (tests use this
for deterministic behavior).
Tests: 4 new cases pinning the runtime-aware resolution, including a
guard that catches future regressions that would weaken hermes's budget.
Existing tests unchanged (they import DEFAULT_PROVISION_TIMEOUT_MS which
still exports 120_000).
13/13 pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EmbeddedTeam was defined in WorkspaceNode.tsx but had no call site —
TeamMemberChip (which is called directly) covers the same rendering
responsibility. The function was stranded after a prior refactor and
was flagged by github-code-quality on PR #1989 (merged 2026-04-24T14:09Z
without this cleanup because the token died before push).
Removes 25 lines of dead code. MAX_NESTING_DEPTH is kept — it is used
by TeamMemberChip at line 498.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
TeamMemberChip used MAX_NESTING_DEPTH to cap recursive sub-agent
rendering at depth 3, but the constant was never declared — causing
a TypeScript build error ('Cannot find name MAX_NESTING_DEPTH') that
blocked Canvas CI on PR #1989.
Add the constant above EmbeddedTeam with a doc comment explaining its
purpose (guards against circular parentId cycles + readability cap).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
CI failure: "Cannot find name 'useMemo'" at line 363.
useMemo was called but not imported — likely dropped during refactor.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Matches tests/e2e/test_staging_full_saas.sh's 20-min budget (#1930).
Canvas E2E was still stuck at 900s (15 min) which regularly flakes on
tenant cold boots in 12-15 min range — especially on staging where
workspace-server image pulls + AMI bootstrapping add 3-5 min vs prod.
Concrete blocker: 2026-04-24 staging→main sync (#1981) kept failing on
"tenant provision: timed out after 900s" in canvas/e2e/staging-setup.ts
despite the actual sync E2E going green. Canvas-side timeout was
strictly tighter than the sync-side timeout.
Also raises WORKSPACE_ONLINE_TIMEOUT_MS to 20 min to cover the case
where the workspace EC2 is provisioned but hermes cold-install (apt +
uv + hermes-agent clone + gateway boot) takes longer than the original
10-min budget — matches the 20-min workspace deadline in SaaS E2E.
No behavior change when things are fast. Just covers the tail.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five tightly-related fixes surfaced while stress-testing org-template
imports (Legal Team, Molecule Company, etc.) on a running control plane:
1) Org import was silently failing — INSERT wrote `collapsed` into the
`workspaces` table but that column lives on `canvas_layouts`
(005_canvas_layouts.sql). Every import returned 207 with 0 rows
created, which `api.post` treated as success → green "Imported"
toast + empty canvas. Moved the write to canvas_layouts; updated
the workspace_crud PATCH path to UPSERT there too; refreshed the
test mock. Added a client-side assertion that throws on
2xx-with-`error`-body so future partial-failures surface a red
toast rather than lying about success.
2) Multi-level nested layout was collision-prone: children that were
themselves parents (CTO → Dev Lead → 6 engineers) got the same
leaf-sized grid slot as leaf siblings and clipped into each other.
Added post-order `sizeOfSubtree` + sibling-size-aware
`childSlotInGrid` on both the Go server and the TS client (kept in
sync). `buildNodesAndEdges` now uses subtree sizes for both parent
dimensions and the rescue heuristic. `setCollapsed` on expand now
reads each child's actual rendered width/height instead of the
leaf-count formula — a regression test covers the CTO/Dev Lead
scenario.
3) Provisioning-timeout banner was unusable during large imports: a
30-workspace tree triggered 27 simultaneous "stuck" warnings 2
minutes in (server paces + provision concurrency = 3 guarantee tail
items legitimately wait longer). Scaled threshold with concurrent
count (base + 45s per queue slot beyond concurrency) and added a
Dismiss (×) button per banner.
4) Auto pan-and-zoom on org ready: after the last workspace flips out
of `provisioning`, canvas now fitView's with a 1.2s animation,
0.25 padding, `maxZoom: 0.8` and `minZoom: 0.25`. Without the zoom
caps fitView was hitting the component's maxZoom=2 on small trees
and zooming in instead of out.
5) Toolbar was visually busy: `+ N sub` count wrapped onto a second
row on narrow viewports; status dot and workspace total were in
separate border-delimited cells. Merged into one segment with
`whitespace-nowrap`; A2A / Audit / Search / Help collapsed to
icon-only 28px buttons with tooltip + aria-label (Figma/Linear
pattern). Stop All / Restart Pending keep text — they're urgent.
Also:
- `api.{get,post,...}` accept an optional `{ timeoutMs }` so callers
that hit intentionally-slow endpoints (org import paces 2s between
siblings) don't trip the 15s default and report false aborts.
- `WorkspaceNode` clamps role text to 2 lines so verbose descriptions
don't unboundedly grow card height and break the grid.
- `PARENT_HEADER_PADDING` bumped 44→130 to clear name + runtime +
2-line role + the currentTask banner that appears during the
initial-prompt phase.
Tests: 930 canvas tests + full Go handler suite pass. Added
regressions for (i) 207 partial-success surfacing as throw, and
(ii) setCollapsed sizing with nested-parent children.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AuthGate now skips session fetch for /cp/auth/* paths, and
redirectToLogin guards against re-setting window.location when
already on an auth path. Both guards had no test coverage —
a future refactor could silently reintroduce the redirect loop.
Added:
- AuthGate.test.tsx: 2 cases covering /cp/auth/login and
/cp/auth/signup path skipping (no fetchSession call, no
redirectToLogin call, children rendered)
- auth.test.ts: 2 cases covering redirectToLogin early return
for /cp/auth/login and /cp/auth/signup paths
Fixes: Molecule-AI/molecule-core#1541
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
React Flow requires parent nodes to appear before their children in
the nodes array. When they don't, it logs "Parent node {id} not
found. Please make sure that parent nodes are in front of their
child nodes in the nodes array" and — more importantly — renders
the child at canvas-absolute coords instead of parent-relative,
flashing it far outside the parent.
topology's buildNodesAndEdges already enforced this at hydrate, but
nestNode + batchNest weren't re-sorting after mutating parentId.
A freshly-nested child often ended up after-first-drag at the
wrong screen position because its new parent sat later in the
array than itself.
Extract sortParentsBeforeChildren() into canvas-topology as a
reusable DFS visit; call it at the tail of both nestNode's set()
and batchNest's commit set(). 923 tests still green — no behaviour
change beyond eliminating the warning and the position flash.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Canceling the nest/extract dialog restored the child's position but
left the parent card at its auto-grown size. growParentsToFitChildren
fires on drag-stop to fit a then-outside child; when the drag is
subsequently cancelled, the parent keeps that grown width/height
forever because the grow pass is grow-only.
Strip width/height from the ex-parent alongside the child position
restore in cancelNest — React Flow re-measures from CSS, parent
collapses back to its natural size. Same trick nestNode already
uses for the un-nest path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two follow-up polish items for drag-and-nest:
1. Cancelling the "Extract from team?" dialog now snaps the
dragged card back to where the drag started. Before, a user
who dragged a child out, saw the confirm dialog, then clicked
Cancel ended up with the card stranded outside the parent at
its drop-point position — which also got persisted via
savePosition on drag-stop. Now onNodeDragStart captures the
pre-drag position + parent, and cancelNest restores both the
RF node position and fires savePosition with the absolute
pre-drag coords so reload matches.
2. Un-nesting now clears the ex-parent's explicit width/height
in the nodes array. growParentsToFitChildren is grow-only so
it could never shrink the parent back down after a child
left; the card stayed at its auto-grown size with empty
space. Stripping width/height lets React Flow re-measure from
the card's own min-width / min-height CSS, so the parent
visually shrinks to fit whatever children remain.
923 canvas tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Un-nest used to require holding Alt (or Cmd to force-detach). That
was too conservative — when a user dragged a child clearly outside
its parent's bbox, nothing happened on release, because the default
branch soft-clamped back and only the Alt branch actually opened
the "Extract?" confirm. Matches the exact bug the user just flagged
("I can put agents in other agent, but when I drag it out, it does
not move out").
New rules:
* Past the 20 % hysteresis → confirm un-nest. Plain drag, no
modifier. This is what most users expect (Miro / Figma behave
the same way — drag outside the frame and the shape leaves it).
* Inside or within 20 % of the edge → soft-clamp back inside.
Guards against twitchy releases that momentarily overshoot the
edge by a few pixels.
* Cmd / Ctrl → force un-nest regardless of overlap. Escape-hatch
for when the user dragged within the hysteresis zone but really
wants out.
* Dropping onto a different parent → nest there (unchanged).
Alt is no longer a required modifier for un-nesting. Keeps it as
a non-gesture modifier only; no meaning unless we re-bind it later.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 600-req/min/IP bucket is sized for SaaS where each tenant has
a distinct client IP. On a local Docker setup every panel shares
one IP — hydration (/workspaces + /templates + /org/templates +
/approvals/pending) plus polling (A2A overlay + activity tabs +
approvals + schedule + channels + audit trail) can burst past the
bucket inside a minute, blanking the canvas with 429s. The user
reported it after dragging workspaces — dragging itself is
release-only (savePosition in onNodeDragStop), but the polling
that's always running added onto startup tripped the limit.
Two-layer fix:
Server: RateLimiter.Middleware short-circuits when isDevModeFailOpen
is true (MOLECULE_ENV=development + empty ADMIN_TOKEN), matching
the Tier-1b hatch already applied to AdminAuth, WorkspaceAuth, and
discovery. SaaS production keeps the bucket.
Client: api.ts auto-retries a single 429 on idempotent GET requests,
waiting the server-provided Retry-After (capped at 20s). Mutations
(POST/PUT/PATCH/DELETE) never auto-retry to avoid double-applying.
Users on SaaS hitting a legitimate rate-limit spike get one
transparent recovery instead of an immediately-blank Canvas.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Importing a 15-workspace org template dropped every child as a
freely-positioned card into its parent's coordinate space. Parents
with 5-10 kids had the kids spill below the parent's initial min
size, producing the "ugly default" layout the user just flagged —
a mess of overlapping cards the moment the import completed.
Fix: every workspace in an org-template import that HAS children
is inserted with `collapsed = true`. Leaf workspaces stay
expanded (nothing to hide). The canvas renders a collapsed
parent as a compact header-only card with its "N sub" badge —
visually identical to the pre-refactor default the user asked for.
Double-click on a collapsed parent now EXPANDS it (flipping
`collapsed` locally + persisting via PATCH) so the user can drill
in to see the subtree. Only once expanded does a second
double-click zoom-to-team, matching the prior behaviour.
Leaf-first creation order stays the same; the collapsed flag
just means "render compact" not "hide from API".
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five review findings from the 3f11df03 six-bug commit:
1. Add TestPeers_DevModeFailOpen_{Allows,ClosedWhenAdminTokenSet,
ClosedInProduction} covering all three gating states for the
security-sensitive dev-mode hatch the prior commit added to
/registry/:id/peers. Previously shipped untested — a future
refactor could have silently inverted polarity or removed the
gate. New tests pin the contract:
* MOLECULE_ENV=development + ADMIN_TOKEN="" → allow bearerless
* MOLECULE_ENV=development + ADMIN_TOKEN set → require token
* MOLECULE_ENV=production → require token
2. ConfigTab handleSave diffs against the RAW parsed YAML / form
config instead of the DEFAULT_CONFIG-merged shape. The previous
code would silently PATCH tier=1 to the DB when a user deleted
the `tier:` line in raw mode (the default-merge substituted 1).
Now: only fields the user actually typed participate in the
diff. Type guards (typeof === "number" / "string") prevent
coercion surprises on malformed YAML.
3. ConfigTab model-save failure no longer lies "Saved". The
/workspaces/:id/model PATCH can reject when the runtime doesn't
support the chosen model; previously we caught + console.warn'd
+ showed green Saved, and the user watched the model revert on
next reload with no explanation. Now the save path collects a
`modelSaveError` and surfaces it via setError with a partial-
success message ("Other fields saved, but model update failed:
…") so the user sees why.
4. ChannelsTab now surfaces BOTH channels-fetch and adapters-fetch
failures, distinguishing them in the error text ("Failed to
load connected channels and platforms — try refreshing").
Previously only an adapters failure was visible; a channels
failure left the user with an apparently-empty list and no
indication the API was unreachable.
5. ChatTab panels drop the redundant aria-hidden attribute. The
`hidden`/`flex` Tailwind class already sets display:none, which
removes the node from the accessibility tree on its own; the
extra aria-hidden invited WAI-ARIA lint warnings if a focusable
descendant ever landed inside an inactive panel.
Tests: 923 canvas + full Go handler suite pass. 3 new Go tests.
No behaviour change on the five prior fixes — this commit tightens
their edges per the independent review.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Six bugs reported from a live session — all shippable in one commit:
1. Peers tab 401 on local Docker. The /registry/:id/peers endpoint
demands a workspace-scoped bearer token (validateDiscoveryCaller)
which the canvas session doesn't hold. Added the same Tier-1b
dev-mode fail-open hatch that AdminAuth and WorkspaceAuth already
use — gated by MOLECULE_ENV=development + empty ADMIN_TOKEN, so
SaaS production stays strict. Exported IsDevModeFailOpen from the
middleware package for the handler layer to reuse.
2. Org Templates list unscrollable. OrgTemplatesSection was rendered
in the TemplatePalette footer — a div without overflow — so when
it expanded to 15+ entries the list extended past the viewport
with no scroll. Moved it to the top of the flex-1 overflow-y-auto
container. Tall lists now scroll naturally.
3. Chat tab: "My Chat" and "Agent Comms" rendered stacked instead
of switching. HTML `hidden` attribute was being overridden by
Tailwind's `flex` class (display: flex beats the attribute),
so both tabpanels rendered concurrently. Swapped to a conditional
Tailwind `hidden`/`flex` class so the inactive panel is
display:none with proper CSS specificity.
4. Hermes Config form never persists. handleSave wrote config.yaml
but name / tier / runtime / model all live on the workspace row
(or the dedicated /workspaces/:id/model endpoint) — the form
edited in-memory, the request returned 200, the next reload
wiped everything back. Hermes + external runtimes manage their
own config inside the container anyway, so writing config.yaml
is a no-op for them; skip it. Always diff and PATCH the DB-backed
fields that actually changed.
5. Channels "+ Connect" dropdown empty on first open. ChannelsTab's
load() used Promise.all with a silent catch — if EITHER the
channels or adapters fetch failed, both setters were skipped
with no error visible. Switched to Promise.allSettled so each
endpoint settles independently, and the adapters failure now
surfaces via the top-level error state.
6. Plugin registry always "No plugins in registry". Same silent
catch pattern in SkillsTab.tsx — load errors for /plugins,
/plugins/sources, and /workspaces/:id/plugins swallowed without
logging. Replaced the empty catches with console.warn so future
failures are at least visible in devtools.
Tests: 923 passing (unchanged). Go handler tests pass. Server
rebuilt and running with the peers-auth + collapsed-persistence
fixes (pid 15875).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AllKeysModal already handles focus via autoFocus={index === 0} on the
first input and a separate title-focus effect. The orphaned useEffect
referencing firstInputRef (declared only in ProviderPickerModal) caused
a TypeScript build error: "Cannot find name 'firstInputRef'".
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Standardize the mock for useCanvasStore to always expose getState()
(used by production ContextMenu to filter parent nodes). Applies the
same Object.assign-wrapping pattern introduced in #1744 to:
- ClaudeSettings.test.tsx
- tabs.a11y.test.tsx
- ContextMenu.keyboard.test.tsx (mockStore shape alignment)
Cherry-pick from #1744 left the backdrop div without aria-hidden="true"
(the outer dialog div got it instead). Re-apply aria-hidden="true" to
the backdrop div so screen readers skip the clickable overlay layer.
Also revert test assertion from bg-black → bg-black/70 to match the
exact class applied to the backdrop div.
Three follow-up review findings from the c2b2e13a review:
1. Rescue heuristic uses pure bbox-non-overlap. The previous
`position.x < 0` branch rescued any child whose parent was
later dragged past it, even when the layout was clearly
recoverable (e.g. relative -40, child still overlaps parent).
New rule: rescue iff the child's bbox has zero overlap with
the parent's bbox — self-calibrating, scales with user-resized
parents, catches screenshot-case and legacy huge-positive data.
2. Toast caps failed-name list at 3 and appends "and N more".
Stops a 50-node partial failure from overflowing the toast
container.
3. Cycle guard on selection-roots walk in batchNest. Corrupt
parentId data can't send the loop infinite now. Cheap
defensive guard — one Set per selected node.
Tests added (923 total, up from 918):
* canvas-topology.test: 4 rescue scenarios — screenshot case
(zero-overlap rescue), negative drift kept, huge-positive
rescued, user-resized layout kept.
* canvas.test: selection-roots filter on a 3-level chain.
* workspace_crud test: PATCH {collapsed:true} runs the UPDATE.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five issues surfaced in the review of 50b53784. Each was either a real
bug waiting to hit users or a silent failure mode.
1. Topology rescue no longer teleports user-resized children.
Rescue was comparing against parentMinSize(childCount), so any
child the user had placed in space the parent was resized into
got snapped to the default grid on reload — undoing the layout.
Now rescue fires only on obviously corrupt data: negative
relative coords (legacy pre-nesting absolute positions that
landed above/left of their assigned parent) or values past an
MAX_PLAUSIBLE_OFFSET threshold. Children just-past the initial
minimum are left alone.
2. batchNest now filters to selection-roots before planning.
Previously selecting both A and A's descendant B and dragging
into T yanked B out of A to become a sibling under T. Users
reasonably expect the A subtree to move intact. The new pass
drops any selected node whose ancestor is also selected —
those follow their ancestor via React Flow's parent binding.
3. batchNest surfaces partial failure via showToast. Previously
silent: 2 of 5 PATCHes fail, user sees 3 cards re-parented + 2
snapped back with no explanation. Now names the failed cards.
4. confirmNest closes the nest dialog BEFORE dispatching the async
store action, so a second drag can't kick off a competing batch
while the first is still in flight.
5. collapsed is now persisted. The Go workspace_crud.go Update
handler ignored the `collapsed` field, so user-initiated
collapse round-tripped to an expanded state on next hydrate.
Added the PATCH branch (`UPDATE workspaces SET collapsed = ...`)
so the state survives reload.
Nits cleaned:
* Removed dead dragStartParentRef in useDragHandlers.
* Swapped redundant `node.data as WorkspaceNodeData` casts for a
named WorkspaceNode type alias.
* Canvas.tsx SR-live region now reads n.parentId (matches
MiniMap + RF's native field) instead of the mirror n.data.parentId.
Tests added (918 total, up from 915):
* batchNest happy path — 2-root selection fires 2 combined PATCHes
carrying parent_id + x + y, not 2×N sequential round-trips.
* batchNest ancestor+descendant selection — subtree stays intact.
* batchNest partial failure rollback — only the rejected nodes
revert; successful ones stay committed.
Backend change is single-line (collapsed PATCH branch); all
workspace_crud Go tests still pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two concerns in one commit (separate files, each self-contained):
## Canvas.tsx split (from ~680 to ~250 lines)
Canvas.tsx was holding drag gesture state + keyboard shortcuts +
viewport wiring + JSX. Each concern now lives in its own unit under
canvas/src/components/canvas/:
- dragUtils.ts — pure: shouldDetach, clampChildIntoParent,
DETACH_FRACTION
- DropTargetBadge.tsx — the floating "Drop into: <name>" label + the
dashed ghost preview at the target slot
- useDragHandlers.ts — encapsulates onNodeDragStart / Drag / Stop,
findDropTarget hit-test, pendingNest state,
and confirmNest/cancelNest. Routes multi-
select drags through batchNest automatically.
- useKeyboardShortcuts — Esc, Enter, Shift+Enter, Cmd+]/[, Z — one
window listener, one source of truth.
- useCanvasViewport — pan-to-node + zoom-to-team CustomEvent
listeners and the debounced viewport save.
Canvas.tsx becomes a thin composition + JSX file. No behavioural
change; the refactor is covered by the existing 915 canvas tests.
## batchNest parallelization (2N round-trips → N, all in flight)
Previously nestNode fired two sequential PATCHes (parent_id then x/y)
and batchNest looped nestNode sequentially. For a 5-node selection on
a typical ~200ms link this was ~2s of serialized RPCs.
- nestNode now combines parent_id + x + y into ONE PATCH. The Go
handler (workspace_crud.go Update) already reads all three from the
same body — no backend change.
- batchNest rewritten: compute every re-parent plan against one
snapshot, commit a single set(), then fire N PATCHes via
Promise.allSettled in parallel. Per-node failures roll back only
that node (others stay committed) — same semantics as the single-
node path, just concurrent.
- The state math in the batch path also correctly shifts descendant
zIndex by depthDelta when any re-parented node has a subtree.
## Also
- canvas-topology.ts: reverted P3.12's opt-in rescue to the auto-
rescue default. When a child's stored relative position would render
it outside the parent bbox (the visual regression the user saw after
collapse → reload — Hermes child drawn outside Claude Code Agent on
first paint), the child is placed in the next default grid slot.
The "Arrange Children" context command stays for bigger teams.
All 915 canvas tests pass. No backend changes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>