Commit Graph

54 Commits

Author SHA1 Message Date
dc0c3e7a27 test(canvas): add pure-function tests for resolveRuntime and canvas-topology utilities
Some checks failed
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
sop-tier-check / tier-check (pull_request) Failing after 4s
audit-force-merge / audit (pull_request) Successful in 5s
- preflight-resolveRuntime.test.ts: resolveRuntime from deploy-preflight.ts
  covering explicit runtime-map entries, identity fallback, -default suffix
  stripping, edge cases (empty string, multiple suffixes).
- canvas-topology-pure.test.ts: sortParentsBeforeChildren (topological
  sort, orphan handling, no-op, non-mutating), defaultChildSlot (2-col
  grid), childSlotInGrid (variable-size siblings, uniform-grid fallback),
  parentMinSize (0–5 children, grid dimensions), parentMinSizeFromChildren
  (variable sizes, empty array, width/height correctness).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 04:46:28 +00:00
3e2ff63f7f feat(canvas): keyboard-accessible node drag via Arrow keys
Some checks failed
sop-tier-check / tier-check (pull_request) Failing after 4s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
Closes canvas audit item: MEDIUM keyboard-accessible node drag.

- Arrow keys move the selected node by 10px per press; Shift+Arrow
  moves by 50px. Position is persisted to the backend via savePosition.
- The modal-dialog guard (same pattern as ? shortcut) prevents Arrow
  keys from moving nodes when a modal like KeyboardShortcutsDialog is
  open — dialogs own their own arrow semantics.
- All shortcuts guarded by the inInput check so Arrow keys still work
  for text navigation inside inputs/textareas.

Changes:
- canvas.ts: new moveNode(dx, dy) store action — updates position
  directly without the grow-parents pass that onNodesChange runs on
  every drag tick (avoids edge-chase flicker).
- useKeyboardShortcuts.ts: Arrow key handler added.
- canvas.test.ts: new moveNode unit tests (position update, no-op,
  savePosition call).
- useKeyboardShortcuts.test.tsx: new integration tests for all
  keyboard shortcuts including the new Arrow key handlers.
- canvas-audit-items.md: Keyboard Shortcuts section upgraded to ,
  drag item marked done.
- canvas-events.test.ts: fix pre-existing double-}); syntax error.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-09 22:19:01 +00:00
1224f19cfc feat(canvas): screen reader live announcements for canvas state changes
Some checks failed
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 5s
sop-tier-check / tier-check (pull_request) Failing after 4s
Issue: HIGH priority item from canvas accessibility audit (2026-05-09).
Screen reader users had no way to know when workspace status changed
— the canvas updated visually but no announcement was made.

Changes:
- canvas.ts: add `liveAnnouncement: string` + `setLiveAnnouncement` to
  CanvasState so the store can hold the current announcement text.
- canvas-events.ts: set `liveAnnouncement` in handleCanvasEvent for 6
  key status transitions: ONLINE, OFFLINE, PAUSED, DEGRADED, PROVISIONING,
  REMOVED, PROVISION_FAILED. Names are looked up from store nodes so
  announcements are human-readable ("Alpha is now online" not "ws-1").
  TASK_UPDATED and AGENT_MESSAGE are intentionally excluded — they fire
  on every heartbeat and would overwhelm the user.
- Canvas.tsx: subscribe to `liveAnnouncement` from the store; render a
  visually-hidden `aria-live="polite" aria-atomic="true"` region that
  speaks the announcement then clears it after 500 ms so the same
  message doesn't re-announce on re-render. Fallback still announces
  workspace count on initial load.
- canvas-events.test.ts: 12 new test cases covering announcement
  content for all 6 event types, empty/no-announcement cases, and
  payload-name fallback when a node isn't yet in the store.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-09 21:30:33 +00:00
Hongming Wang
5c80b9c3d6 feat(canvas): render misconfigured workspaces with the configuration_status from agent_card
Closes molecule-controlplane#467 (issue filed against CP, but resolution
landed canvas-side because the workspace-server ALREADY returns the
agent_card JSONB blob with configuration_status / configuration_error
fields populated by molecule-core PR #2756). No CP-side change needed —
the gap was the canvas's blindness to those fields.

Before this PR, a workspace whose adapter.setup() failed (typically
missing/rotated LLM credential) appeared identical to a healthy one in
the canvas tile: green "Online" status, no error indication. The
operator had to dig into workspace logs to discover the env var to set.

This PR surfaces the state via the existing status-pill UX:

1. STATUS_CONFIG gains a "not_configured" entry — amber dot/glow,
   "Not configured" label. Distinct from "online" (emerald) and
   "failed" (red) — the workspace is reachable, it just needs config.

2. canvas-topology exposes getConfigurationStatus / getConfigurationError
   helpers — strict equality on the JSONB field so unknown values
   pass through as null instead of crashing the tile renderer.

3. WorkspaceNode derives an `effectiveStatus` that overrides
   data.status with "not_configured" when (status === "online" AND
   agent_card.configuration_status === "not_configured"). The override
   only applies on top of "online" — a genuinely offline / failed /
   provisioning workspace keeps its existing treatment.

4. The configuration_error string surfaces in two places: the tile's
   aria-label (screen reader access) + a truncated preview row at the
   bottom of the tile (same visual as the existing "degraded error
   preview" — mirrors the established pattern for in-tile error
   surfacing).

Test coverage: 11 new in canvas-topology-configuration-status.test.ts.
Each helper covered for the happy path, missing fields, defensive
ignores of unknown values, and an end-to-end "stale ready overrides
old error" guard.

Once this lands + canvas redeploys, operators see "Not configured:
Neither OPENAI_API_KEY nor MINIMAX_API_KEY is set" right on the
workspace tile instead of a confused-looking green "online" workspace
that silently 503s every JSON-RPC request.

Pairs with: molecule-core PR #2756 (decouple agent-card from setup),
            #2775 (boot_routes pin), #2778 (secret_redactor)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 15:14:40 -07:00
Hongming Wang
4028b81e04 refactor(canvas): route panel WS subscriptions through global socket
Both AgentCommsPanel and ChatTab's activity-feed opened raw
`new WebSocket(WS_URL)` instances per mount, with no onclose handler
and no reconnect logic. When the underlying connection dropped — idle
timeout, browser background-tab throttle, network jitter — the per-
panel sockets stayed dead until the panel re-mounted (refresh or
sub-tab unmount/remount). Live agent-comms bubbles and live activity
feed lines silently went missing in the gap, manifesting as "the
delegation didn't show up until I refreshed."

The global ReconnectingSocket in store/socket.ts already owns
reconnect, exponential backoff, health-check, and HTTP fallback poll.
Routing component subscribers through it gives every consumer those
guarantees for free, with one TCP connection per tab instead of N.

Three new pieces:

  - store/socket-events.ts: tiny pub/sub bus. emitSocketEvent fan-outs
    every decoded WSMessage to the listener Set; subscribeSocketEvents
    returns an unsubscribe. A throwing listener is logged and isolated
    so it can't break siblings.

  - store/socket.ts: ws.onmessage now calls emitSocketEvent(msg) right
    after applyEvent(msg), so the store's derived state and component
    subscribers stay in lockstep on every event arrival.

  - hooks/useSocketEvent.ts: React hook that registers exactly once
    per mount, capturing the latest handler in a ref so the closure
    sees current state/props without re-subscribing on every render.

Refactored sites:

  - AgentCommsPanel: replaced its WebSocket-in-useEffect block with
    useSocketEvent. Same parsing logic; the panel no longer opens its
    own connection.

  - ChatTab activity feed: split the previous useEffect in two — one
    seeds the activity log when `sending` flips, the other subscribes
    unconditionally and gates work on `sending` inside the handler.
    Hooks can't be conditional, so the gate has to live in the body
    rather than around the effect.

The ws-close graceful-close helper is no longer needed in either
site; the global socket owns its own teardown.

Tests: 6 new tests for the bus contract (single delivery, fan-out
order, unsubscribe, throwing-listener isolation, no-subscriber emit,
duplicate-subscribe Set semantics). All 27 existing socket tests
still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 13:12:47 -07:00
Hongming Wang
679e30538a test(canvas): cover store/classNames helpers (17% → 100%) (#1815)
[Molecule-Platform-Evolvement-Manager]

Continues the #1815 coverage rollup. classNames.ts was at 17%
in the baseline; this PR brings it to full coverage.

16 cases across 3 helpers:

**appendClass (6):**
- undefined / empty existing → just `cls`
- single-class → "a b" join
- DEDUP: existing already contains `cls` → existing unchanged.
  This is the load-bearing reason classNames.ts exists. Pre-helper
  the call sites inlined `${existing} ${cls}` with no dedup, so a
  tick that fired the same class twice produced "a a" and React
  Flow's className-equality diff saw it as a change every render.
- whitespace normalization (multi-space, leading/trailing)

**removeClass (7):**
- undefined / empty existing → ""
- removes named class
- exact match only ("spawn" must NOT match "spawn-fast")
- removing the only class → ""
- no-op when class absent
- whitespace normalization

**scheduleNodeClassRemoval (3):**
- after delayMs: calls set() with className-removed on target node;
  OTHER nodes untouched (the per-id pruning is the contract — pin
  it so a future refactor that maps over all nodes doesn't silently
  strip classes from siblings)
- does NOT fire before the delay elapses (vi.useFakeTimers + advance)
- SSR safety: when window is undefined, function is a no-op
  (neither get nor set fires)

## Note on test environment

Added `// @vitest-environment jsdom` directive — the file's
default `node` environment leaves `window` undefined, which would
make the SSR-guard happy-path test pass for the wrong reason
(every test would short-circuit). With jsdom, the SSR test
explicitly stubs `window` to undefined to exercise the guard.

## Test plan

- [x] All 16 cases pass locally (~1.1s with jsdom env spin-up)
- [x] No SUT changes
- [ ] CI green

## #1815 progress

- [x] Step 1+2: instrumentation (#2147)
- [x] utils.ts + runtime-names.ts (#2148)
- [x] canvas-actions.ts (#2149)
- [x] store/classNames.ts (this PR)
- [ ] store/canvas.ts (73% — biggest absolute gap; bigger surface,
      separate cycle)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 23:50:00 -07:00
Hongming Wang
62cfc21033 test(comms): comprehensive E2E coverage for agent → user attachments
User asked to "keep optimizing and comprehensive e2e testings to prove all
works as expected" for the communication path. Adds three layers of coverage
for PR #2130 (agent → user file attachments via send_message_to_user) since
that path has the most user-visible blast radius:

1. Shell E2E (tests/e2e/test_notify_attachments_e2e.sh) — pure platform test,
   no workspace container needed. 14 assertions covering: notify text-only
   round-trip, notify-with-attachments persists parts[].kind=file in the
   shape extractFilesFromTask reads, per-element validation rejects empty
   uri/name (regression for the missing gin `dive` bug), and a real
   /chat/uploads → /notify URI round-trip when a container is up.

2. Canvas AGENT_MESSAGE handler tests (canvas-events.test.ts +5) — pin the
   WebSocket-side filtering that drops malformed attachments, allows
   attachments-only bubbles, ignores non-array payloads, and no-ops on
   pure-empty events.

3. Persisted response_body shape test (message-parser.test.ts +1) — pins
   the {result, parts} contract the chat history loader hydrates on
   reload, so refreshing after an agent attachment restores both caption
   and download chips.

Also wires the new shell E2E into e2e-api.yml so the contract regresses
in CI rather than only in manual runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 20:41:56 -07:00
Hongming Wang
6eaacf175b fix(notify): review-flagged Critical + Required findings on PR #2130
Two Critical bugs caught in code review of the agent→user attachments PR:

1. **Empty-URI attachments slipped past validation.** Gin's
   go-playground/validator does NOT iterate slice elements without
   `dive` — verified zero `dive` usage anywhere in workspace-server —
   so the inner `binding:"required"` tags on NotifyAttachment.URI/Name
   were never enforced. `attachments: [{"uri":"","name":""}]` would
   pass validation, broadcast empty-URI chips that render blank in
   canvas, AND persist them in activity_logs for every page reload to
   re-render. Added explicit per-element validation in Notify (returns
   400 with `attachment[i]: uri and name are required`) plus
   defence-in-depth in the canvas filter (rejects empty strings, not
   just non-strings).
   3-case regression test pins the rejection.

2. **Hardcoded application/octet-stream stripped real mime types.**
   `_upload_chat_files` always passed octet-stream as the multipart
   Content-Type. chat_files.go:Upload reads `fh.Header.Get("Content-Type")`
   FIRST and only falls back to extension-sniffing when the header is
   empty, so every agent-attached file lost its real type forever —
   broke the canvas's MIME-based icon/preview logic. Now sniff via
   `mimetypes.guess_type(path)` and only fall back to octet-stream
   when sniffing returns None.

Plus three Required nits:

- `sqlmockArgMatcher` was misleading — the closure always returned
  true after capture, identical to `sqlmock.AnyArg()` semantics, but
  named like a custom matcher. Renamed to `sqlmockCaptureArg(*string)`
  so the intent (capture for post-call inspection, not validate via
  driver-callback) is unambiguous.
- Test asserted notify call by `await_args_list[1]` index — fragile
  to any future _upload_chat_files refactor that adds a pre-flight
  POST. Now filter call list by URL suffix `/notify` and assert
  exactly one match.
- Added `TestNotify_RejectsAttachmentWithEmptyURIOrName` (3 cases)
  covering empty-uri, empty-name, both-empty so the Critical fix
  stays defended.

Deferred to follow-up:

- ORDER BY tiebreaker for same-millisecond notifies — pre-existing
  risk, not regression.
- Streaming multipart upload — bounded by the platform's 50MB total
  cap so RAM ceiling is fixed; switch to streaming if cap rises.
- Symlink rejection — agent UID can already read whatever its
  filesystem perms allow via the shell tool; rejecting symlinks
  doesn't materially shrink the attack surface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 19:47:31 -07:00
Hongming Wang
d028fe19ff feat(notify): agent → user file attachments via send_message_to_user
Closes the gap where the Director would say "ZIP is ready at /tmp/foo.zip"
in plain text instead of attaching a download chip — the runtime literally
had no API for outbound file attachments. The canvas + platform's
chat-uploads infrastructure already supported the inbound (user → agent)
direction (commit 94d9331c); this PR wires the outbound side.

End-to-end shape:

  agent: send_message_to_user("Done!", attachments=["/tmp/build.zip"])
   ↓ runtime
  POST /workspaces/<self>/chat/uploads (multipart)
   ↓ platform
  /workspace/.molecule/chat-uploads/<uuid>-build.zip
   → returns {uri: workspace:/...build.zip, name, mimeType, size}
   ↓ runtime
  POST /workspaces/<self>/notify
   {message: "Done!", attachments: [{uri, name, mimeType, size}]}
   ↓ platform
  Broadcasts AGENT_MESSAGE with attachments + persists to activity_logs
  with response_body = {result: "Done!", parts: [{kind:file, file:{...}}]}
   ↓ canvas
  WS push: canvas-events.ts adds attachments to agentMessages queue
  Reload: ChatTab.loadMessagesFromDB → extractFilesFromTask sees parts[]
  Either path → ChatTab renders download chip via existing path

Files changed:

  workspace-server/internal/handlers/activity.go
    - NotifyAttachment struct {URI, Name, MimeType, Size}
    - Notify body accepts attachments[], broadcasts in payload,
      persists as response_body.parts[].kind="file"

  canvas/src/store/canvas-events.ts
    - AGENT_MESSAGE handler reads payload.attachments, type-validates
      each entry, attaches to agentMessages queue
    - Skips empty events (was: skipped only when content empty)

  workspace/a2a_tools.py
    - tool_send_message_to_user(message, attachments=[paths])
    - New _upload_chat_files helper: opens each path, multipart POSTs
      to /chat/uploads, returns the platform's metadata
    - Fail-fast on missing file / upload error — never sends a notify
      with a half-rendered attachment chip

  workspace/a2a_mcp_server.py
    - inputSchema declares attachments param so claude-code SDK
      surfaces it to the model
    - Defensive filter on the dispatch path (drops non-string entries
      if the model sends a malformed payload)

  Tests:
    - 4 new Python: success path, missing file, upload 5xx, no-attach
      backwards compat
    - 1 new Go: Notify-with-attachments persists parts[] in
      response_body so chat reload reconstructs the chip

Why /tmp paths work even though they're outside the canvas's allowed
roots: the runtime tool reads the bytes locally and re-uploads through
/chat/uploads, which lands the file under /workspace (an allowed root).
The agent can specify any readable path.

Does NOT include: agent → agent file transfer. Different design problem
(cross-workspace download auth: peer would need a credential to call
sender's /chat/download). Tracked as a follow-up under task #114.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 19:35:58 -07:00
rabbitblood
8c69a98da2 chore(simplify): share FALLBACK_POLL_MS as the tombstone TTL + trim verbose comments
Simplify pass on top of #2069 fix:

- Export FALLBACK_POLL_MS from canvas/src/store/socket.ts and import
  it as TOMBSTONE_TTL_MS in deleteTombstones.ts. Single source of
  truth — tuning one without the other would silently re-open the
  hydrate-races-delete window. Required-fix per simplify reviewer.
- Compress deleteTombstones.ts docstring from 30 lines to 10 — keep
  the "what + why module-level"; drop the long-form problem
  description (issue #2069 carries it).
- Compress canvas.ts call-site comments at removeSubtree (4 lines →
  2) and hydrate (2 lines → 2 but tighter).
- Don't reassign the workspaces parameter inside hydrate — use a
  const `live` and thread it through the two downstream calls
  (computeAutoLayout, buildNodesAndEdges). Same effect, no lint
  smell.
- Trim the canvas.test.ts integration-test preamble.

No behaviour change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 13:52:49 -07:00
rabbitblood
7bb0bc39a2 fix(canvas): tombstone deleted ids so in-flight hydrate can't resurrect them (#2069)
Closes #2069. removeSubtree dropped a parent + descendants locally
after DELETE returned 200, but a GET /workspaces request that was
IN-FLIGHT before the DELETE completed could land AFTER and hydrate
the store with a stale snapshot — re-introducing the deleted nodes
on the canvas until the next 10s fallback poll corrected it.

New module canvas/src/store/deleteTombstones.ts holds a transient
process-lifetime Map<id, deletedAt>. removeSubtree calls
markDeleted(removedIds); hydrate calls wasRecentlyDeleted(id) to
filter the incoming workspaces. TTL is 10s — matches the WS-fallback
poll cadence so a single round-trip is covered, after which a
legitimately re-imported id flows through normally.

GC happens lazily at every read AND at write time so the map stays
bounded — no separate timer / interval / unmount plumbing.

Tests:
- canvas/src/store/__tests__/deleteTombstones.test.ts: 7 cases
  covering immediate flag, never-marked, TTL boundary (9999ms vs
  10001ms), GC-on-read, GC-on-write, re-mark resets timestamp,
  iterable input.
- canvas/src/store/__tests__/canvas.test.ts: end-to-end "hydrate
  cannot resurrect ids that removeSubtree just dropped (#2069)"
  exercises the full chain at the store level.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 13:48:15 -07:00
Hongming Wang
d0f198b24f merge: resolve staging conflicts (a2a_proxy + workspace_crud)
Three files conflicted with staging changes that landed while this PR
sat open. Resolved each by combining both intents (not picking one side):

- a2a_proxy.go: keep the branch's idle-timeout signature
  (workspaceID parameter + comment) AND apply staging's #1483 SSRF
  defense-in-depth check at the top of dispatchA2A. Type-assert
  h.broadcaster (now an EventEmitter interface per staging) back to
  *Broadcaster for applyIdleTimeout's SubscribeSSE call; falls through
  to no-op when the assertion fails (test-mock case).

- a2a_proxy_test.go: keep both new test suites — branch's
  TestApplyIdleTimeout_* (3 cases for the idle-timeout helper) AND
  staging's TestDispatchA2A_RejectsUnsafeURL (#1483 regression). Updated
  the staging test's dispatchA2A call to pass the workspaceID arg
  introduced by the branch's signature change.

- workspace_crud.go: combine both Delete-cleanup intents:
  * Branch's cleanupCtx detachment (WithoutCancel + 30s) so canvas
    hang-up doesn't cancel mid-Docker-call (the container-leak fix)
  * Branch's stopAndRemove helper that skips RemoveVolume when Stop
    fails (orphan sweeper handles)
  * Staging's #1843 stopErrs aggregation so Stop failures bubble up
    as 500 to the client (the EC2 orphan-instance prevention)
  Both concerns satisfied: cleanup runs to completion past canvas
  hangup AND failed Stop calls surface to caller.

Build clean, all platform tests pass.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2026-04-26 10:43:22 -07:00
rabbitblood
1a273f21f5 feat(canvas): per-workspace provision_timeout_ms override (#2054)
Phase 1 of moving runtime UX knobs server-side. Builds the canvas
foundation: a workspace can carry its own provision_timeout_ms
(sourced server-side from a template manifest in a follow-up PR),
and ProvisioningTimeout's resolver respects it per-node.

Today the resolver had Props-level timeoutMs that applied to ALL
nodes — fine for tests but wrong for production where one batch
could mix runtimes (hermes 12-min cold boot alongside docker 2-min).
The runtime profile fallback already handles per-runtime defaults;
this PR adds the per-WORKSPACE override layer above that.

Resolution priority (most specific wins):
  1. node.provisionTimeoutMs — server-declared per-workspace
     override (this PR's new field)
  2. timeoutMs prop — single-threshold test override
  3. runtime profile in @/lib/runtimeProfiles
  4. DEFAULT_RUNTIME_PROFILE

Changes:
- WorkspaceData (socket): add optional provision_timeout_ms
- WorkspaceNodeData: add optional provisionTimeoutMs
- canvas-topology hydrate: thread the field through to node.data
- ProvisioningTimeout: extend the serialized-string node iteration
  to carry provisionTimeoutMs (4-field positional split); pass as
  the second arg to provisionTimeoutForRuntime
- 3 new tests in ProvisioningTimeout.test.tsx covering hydrate
  threading, null fall-through, and resolver priority

Phase 2 (separate PR, blocked on workspace-server template-config
loader): workspace-server reads provision_timeout_seconds from
template config.yaml at provision time, includes
provision_timeout_ms in the workspace API/socket response. Phase 3
(template-repo PR): template-hermes config.yaml declares
provision_timeout_seconds: 720; canvas RUNTIME_PROFILES.hermes
becomes redundant and can be removed.

19/19 tests pass (3 new + 16 existing).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 06:02:56 -07:00
Hongming Wang
9a223afba1 fix(dotenv,socket): review-driven hardening of .env loader + WS poll
Independent code review surfaced three required fixes and one cheap
optional one. All addressed here.

dotenv parser:
- `export FOO=bar` was parsed as key `"export FOO"` (with embedded
  space) and silently os.Setenv'd, so a developer pasting from a
  direnv `.envrc` would get junk vars. Now strips the prefix.
- Quoted values weren't unwrapped: `FOO="hello world"` produced value
  `"hello world"` with literal quotes. Now strips one matched pair of
  surrounding `"` or `'`. Inside a quoted value `#` is part of the
  value, not a comment marker (matches godotenv convention).
- UTF-8 BOM at file start (Windows editors) would have produced a
  first key like U+FEFF + "FOO". Now stripped via TrimPrefix.

dotenv loader:
- findDotEnv()'s upward walk would happily pick up `~/.env` or a
  sibling-repo `.env` if the binary was run from `~/Documents/other-
  project/`. Real foot-gun on shared dev boxes. Now gated on a
  monorepo sentinel: the candidate directory must contain
  `workspace-server/go.mod`. Falls through to "no .env found" (=
  pre-fix behavior) when the sentinel is absent.

socket fallback poll:
- startFallbackPoll() previously fired only on onclose, so the very
  first connect attempt — when onclose hasn't fired yet because we
  never had a successful onopen — left the canvas with no HTTP poll
  for the duration of the failing handshake (Chrome can hold a
  SYN-SENT WebSocket open ~75s before giving up). Now also called at
  the top of connect(); the timer-already-running guard makes it a
  no-op when one cycle later onclose calls it again.

Test coverage added: export prefix, single+double quoted values, hash
inside quotes preserved, unterminated quote falls back to bare value,
CRLF stripping locked in, BOM stripping, and a sentinel-rejection
regression test that creates a temp .env with no workspace-server
sibling and asserts findDotEnv refuses to load it.

Verified: 985 canvas tests + 30 dotenv subtests + 4 dotenv integration
tests all pass; tsc clean; rebuilt platform from monorepo root with
stripped env still loads .env (49 vars) and /workspaces returns 200.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 21:09:18 -07:00
Hongming Wang
21db85d691 fix(canvas): cascade delete locally so children disappear without WS
Deleting a parent on a wedged WS used to leave the child cards on
the canvas as orphaned roots until the user manually refreshed.

Why: Canvas.tsx and DetailsTab.tsx both called `removeNode(parentId)`
after `DELETE /workspaces/:id?confirm=true` returned 200. `removeNode`
deliberately re-parents children rather than cascading — it relies on
the per-descendant WORKSPACE_REMOVED WS events the platform emits as
part of the cascade to drop each child individually. When the WS is
unhealthy those events never arrive, so the local store keeps the
children alive (now re-parented to root since their actual parent is
gone).

Fix: new `removeSubtree(rootId)` action on the canvas store mirrors
the server-side cascade — drops the root + every descendant + every
incident edge in one atomic set(). Both delete call sites now use it.
The WS events still arrive when WS is healthy and become idempotent
no-ops because the nodes are already gone.

Why a new action instead of changing removeNode: removeNode's
re-parenting behavior is correct for non-cascading flows (drag-out,
manual node detach in the future). Adding a sibling action keeps
both call shapes available rather than forcing every caller to opt
out of cascade.

6 new unit tests cover root cascade, mid-level cascade, leaf
no-op-cascade, selection clearing across the subtree, selection
preservation outside the subtree, and edge cleanup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 20:51:09 -07:00
Hongming Wang
0b4dfbd121 fix(canvas): suppress stale provisioning banners + add WS-down HTTP fallback poll
Two related fixes for the case where the canvas thinks workspaces are
stuck provisioning when they're actually online:

1. ProvisioningTimeout banners now gate on wsStatus === "connected".
   While the WS is in connecting/disconnected state, the local
   "provisioning" status reflects the last event received before the
   drop — workspaces may have transitioned to online minutes ago. The
   8m timeout was firing against frozen state and showing a wall of
   yellow warnings on already-online workspaces.

2. Socket layer now starts a 10s rehydrate poll when the WS goes
   unhealthy (onclose) and stops it on onopen/disconnect. The
   reconnect attempts continue in parallel; whichever recovers first
   wins. rehydrate()'s existing dedup gate prevents the open-time
   rehydrate from racing with a fallback poll. Without this the
   store could stay frozen for minutes while WS exponential backoff
   chewed through retries.

Plus the previously-uncommitted TemplatePalette flushSync change so
the import modal unmounts synchronously before doImport runs (otherwise
React batches the close with the import's setState prefix and the
modal backdrop hides the spawn animation).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 20:22:15 -07:00
Hongming Wang
1d71b4e9e5 fix(canvas): bundle of UX hardening — modals, position stability, error UX, paste
Single-themed bundle of fixes accumulated while polishing the canvas
chat / agent-comms / plugins / position flows. Each piece is small;
the connective tissue is "things observable from the canvas right
panel and the org-deploy flow that surprised real users".

UI / composer
  - Legend: add close X + persisted-localStorage state + reopener
    pill; default open for first-time users.
  - SidePanel: rename "Skills" tab label → "Plugins" (single-line;
    internal panelTab enum value, component name, and store keys
    unchanged).
  - SkillsTab: registry tri-state UI (loading / error / empty) with
    actionable Retry button + 10s explicit fetch timeout. Handle
    AbortSignal.timeout's DOMException by name (TimeoutError /
    AbortError) — Chromium's "signal timed out" message wouldn't
    match the prior naive /timeout/ regex. Reset mountedRef on every
    mount: pre-existing StrictMode dev-mode bug where cleanup-only
    `current = false` was never re-set, permanently wedging every
    `if (mountedRef.current) setX(...)` guard and producing a
    "Loading…" panel that never resolved on hard refresh.
  - ChatTab: paste-image-from-clipboard via onPaste handler; unique
    monotonic-counter filenames so same-second pastes don't collide
    on name+size dedup. mime→ext map avoids `image/svg+xml`-style
    raw extensions on synthesised filenames. Bypasses the
    DataTransfer constructor so Safari < 14.1 / older Edge work.
  - ChatTab: drop stuck error toast when the WS path already
    delivered the agent reply but the HTTP path errored late
    (sendingFromAPIRef gate now covers the .catch() handler).
  - ChatTab: filter heartbeat-style internal self-messages from the
    My Chat tab so historical rows with source_id=NULL don't
    surface as user-typed input.
  - Modal portals: OrgImportPreflightModal + MissingKeysModal
    (ProviderPickerModal + AllKeysModal) now createPortal to
    document.body and clamp max-h to 80vh. Escapes the ancestor
    containing block (TemplatePalette's fixed+filtered sidebar
    re-anchored descendants' position:fixed to itself, hiding
    modals behind workspace cards). MissingKeysModal bumped to
    z-[60] for stack ordering when both modals are open.
  - OrgImportPreflightModal saveOne: ref-based microtask-safe
    in-flight gate replaces the brittle "set startValue inside a
    setState updater and read on the next line" pattern (React 18
    doesn't guarantee functional updaters run synchronously; that
    path strands `saving:true` and never calls createSecret). Same
    useRef pattern guards SkillsTab.loadRegistry against concurrent
    fires and Fast-Refresh-stranded promises; force=true parameter
    on retry click bypasses the gate.

Agent comms
  - AgentCommsPanel: derive UI-facing `flow` field instead of using
    activity_type-derived direction. Self-logged a2a_receive rows
    (source_id == workspace_id, what the agent runtime writes to log
    its own outbound delegation replies) now correctly render as
    OUTBOUND with → arrow + right-justified bubble. Previously they
    rendered "← From Self" with Restart pointing at THIS workspace.
  - AgentCommsPanel: error rows replace the unactionable
    "X failed [A2A_ERROR]" body with banner + underlying-error
    code-block + cause-hint (matched on Claude Code SDK init wedge,
    deadline-exceeded, agent-thrown exception, empty-error) +
    Restart [peer] / Open [peer] action buttons.
  - AgentCommsPanel: render text bodies through ReactMarkdown +
    remark-gfm so multi-part replies (tables, code) render properly.

Multi-part text extractor
  - extractReplyText (live A2A response in ChatTab) and
    extractResponseText (chat history loader in message-parser):
    now COLLECT from every source — top-level parts, parts.root.text,
    and artifacts — joined with "\n". Previous "first source wins"
    silently dropped multi-part replies (Hermes summary+detail,
    Claude Code long-form table). Tests cover joined-from-parts,
    joined-from-artifacts, joined-from-both.

Position stability
  - canvas-topology.buildNodesAndEdges: auto-rescue heuristic now
    accepts currentParentSizes map; uses max(initial min, currently
    grown) for the bbox check. Fixes "child jumps to weird location
    after 30s" — the periodic socket health-check rehydrate
    (silenceSec > 30) was rebuilding nodes from scratch, and the
    rescue's reliance on grid-derived initial size false-flagged
    children the user dragged into the user-grown area.
  - canvas.hydrate: pass live measured dimensions from the existing
    store into buildNodesAndEdges.
  - socket.RehydrateDedup: pure exported helper class that gates
    rehydrate calls. Two states — in-flight (in-flight Promise reused
    by concurrent callers) + post-completion window (1.5s, returns
    Promise.resolve()). Initialised with -Infinity so first call
    always passes the gate. Wired into ReconnectingSocket.rehydrate.

A2A edges
  - New A2AEdge custom React Flow edge component portals its label
    out of the SVG layer via EdgeLabelRenderer so labels (a) render
    above workspace cards instead of being hidden behind them and
    (b) accept clicks. Click selects source + switches panel to
    Activity, but only on a NEW selection (preserves current tab on
    re-click of an already-selected source).
  - buildA2AEdges output tagged type:"a2a"; edgeTypes wired in
    Canvas.tsx.

Tests
  - 14 new vitest cases across 4 files (964 → 978 passing):
    OrgImportPreflightModal saveOne single-fire / double-click,
    any-of rendering; AgentCommsPanel toCommMessage flow derivation
    in all four shapes; canvas-topology rescue respects-grown /
    rescues-genuine-drift / fallback-without-live-size; socket
    RehydrateDedup gate behaviour; message-parser multi-part
    response extraction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 19:54:43 -07:00
Hongming Wang
f995b90a85 test(canvas-events): expect both pan-to-node AND fit-deploying-org on NEW root provision
Commit 5adc8a74 (part of this PR) intentionally made
molecule:fit-deploying-org fire for root-level workspaces too — it
used to only fire for children, which meant a standalone create
didn't center the viewport until the first child arrived ~2s later.

The existing regression test still expected ONLY the
molecule:pan-to-node event for a new root, so it started failing
with "expected length 1, got 2". The product behavior is correct
(centering on the root immediately is better UX); the test was
pinning the old single-dispatch shape.

Fix: assert BOTH events fire, each with the right detail payload,
so a future regression that drops either one (or duplicates) trips
the test. Single-test update, no production code change. 953/953
canvas tests pass locally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 15:55:52 -07:00
Hongming Wang
5adc8a74d5 feat(canvas+org): env preflight, EmptyState parity, shared useTemplateDeploy hook
Builds on #2061. Three internally-cohesive sub-features; easiest to
read in order.

## 1. Org-level env preflight

Server
- `OrgTemplate` + `OrgWorkspace` gain `required_env: string[]` and
  `recommended_env: string[]` YAML fields.
- `GET /org/templates` walks the tree and returns the tree-union
  (deduped, sorted) of both. `collectOrgEnv` dedup prefers required
  when the same key is declared at both tiers.
- `POST /org/import` preflights against `global_secrets` WHERE
  `octet_length(encrypted_value) > 0` (empty-value rows used to be
  counted as "configured" and the per-container preflight still
  failed at start time). 412 Precondition Failed + `missing_env`
  list when required keys are absent. `force=true` bypasses with
  an audit log line. DB lookup failure now returns 500 (was:
  silent fall-through that defeated the guard). Env-var NAMES
  validated against `^[A-Z][A-Z0-9_]{0,127}$` so a malicious
  template can't ship pathological names into the UI or DB.

Canvas
- New `OrgImportPreflightModal`: red "Required" section (blocking)
  and yellow "Recommended" section (non-blocking, import stays
  enabled, shows live missing-count next to the Import button).
- Per-key password input → `PUT /settings/secrets` → strike-through
  on save. Functional `setDrafts` throughout (no stale-closure
  clobbers on rapid successive saves). `useEffect` seed keyed on a
  sorted-join string signature so a parent re-render with a new
  array identity doesn't clobber typed inputs.
- `TemplatePalette.handleImport` branches: zero env declarations →
  straight to import; any declarations → fetch configured global
  secret keys, open the modal.

Tests (Go): `TestCollectOrgEnv_*` (5) cover union-across-levels,
required-wins-over-recommended (including same-struct), dedup,
empty, invalid-name rejection.

## 2. EmptyState parity with TemplatePalette

The "Deploy your first agent" grid used to call `POST /workspaces`
with no preflight while the sidebar palette ran
`checkDeploySecrets` + `MissingKeysModal` first. Same template
deployed two different ways → first-run users saw containers boot
in `failed` state without guidance. Now both surfaces share one
preflight + modal handshake.

EmptyState's previous `interface Template` dropped `runtime`,
`models`, and `required_env` — silently discarding exactly the
fields the preflight needs. `Template` now lives in
`deploy-preflight.ts` and is imported from there by both surfaces.

## 3. useTemplateDeploy hook

With the preflight + modal wiring now duplicated across
EmptyState + TemplatePalette + (going forward) any third surface,
extracted the pattern into `canvas/src/hooks/useTemplateDeploy.tsx`:

  const { deploy, deploying, error, modal } = useTemplateDeploy({
    canvasCoords: ...,   // optional, default random
    onDeployed: (id) => ...,
  });

Closes three drift surfaces that the duplication had created:
- `resolveRuntime` id→runtime fallback table (moved to
  `deploy-preflight.ts`). EmptyState had a narrower fallback that
  would have silently disagreed with the palette on any future id
  needing a non-identity mapping.
- `checkDeploySecrets` call signature. One owner.
- `MissingKeysModal` JSX wiring. One owner.

Narrow try/catch around `checkDeploySecrets` so a preflight network
failure clears `deploying` and surfaces via `setError` instead of
stranding the button forever. `modal: ReactNode` (not a
`renderModal()` function) — the previous memoization bought
nothing since consumers called it inline every render. Named
`MissingKeysInfo` interface for the state shape.

## 4. Viewport auto-fit user-pan gate fix

During org deploy the canvas was meant to pan+zoom to follow each
arriving workspace (`molecule:fit-deploying-org` event → debounced
fitView). In practice the fit stayed stuck on wherever the first
fit landed.

Root cause: React Flow v12 fires `onMoveEnd` with a truthy `event`
at the END of a programmatic `fitView` animation. The original
"respect-user-pan" gate stamped `userPannedAtRef` in `onMoveEnd`,
so our own fit completing looked like a user pan, and every
subsequent auto-fit short-circuited for the rest of the deploy.

Fix: stop trusting `onMoveEnd` for user-intent detection. Register
explicit `wheel` + `pointerdown` listeners on `document` with
capture phase and `target.closest('.react-flow__pane')` filter.
Capture-phase immunity to `stopPropagation`; pane-filter rejects
toolbar / modal / side-panel clicks (the old `window` fallback
caught those). `onMoveEnd` simplified to only drive the debounced
viewport save.

Also: fit event dispatched on root arrivals (not just children),
so the canvas centers on the just-landed root immediately instead
of waiting ~2s for the first child. Animation 600ms → 400ms so
successive per-arrival fits don't pile up visually. End-state fit
stays at 1200ms — intentional asymmetry ("settling" vs
"tracking"), documented in code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 15:15:33 -07:00
Hongming Wang
425df5e5a9 merge(staging): resolve conflicts + fix 7 test regressions on top of #2061
- Merge origin/staging into fix/canvas-multilevel-layout-ux. 18 files
  auto-merged (mostly canvas/tabs/chat and workspace-server handlers
  the earlier DIRTY marker was stale relative to current staging).

- Fix 7 test failures surfaced by the merge:

  1. Canvas.pan-to-node.test.tsx — mockGetIntersectingNodes was
     inferred as vi.fn(() => never[]); mockReturnValueOnce of a node
     object failed type check. Explicit return-type annotation.

  2. Canvas.pan-to-node.test.tsx + Canvas.a11y.test.tsx — Canvas.tsx
     reads deletingIds.size (new multilevel-layout state). Both mock
     stores lacked deletingIds; added new Set<string>() to each.

  3. canvas-batch-partial-failure.test.ts — makeWS() built a wire-
     format WorkspaceData (snake_case, with x/y/uptime_seconds). The
     store's node.data is now WorkspaceNodeData (camelCase, no wire-
     only fields). Rewrote makeWS to produce WorkspaceNodeData and
     updated 5 call-site casts. No assertions changed.

  4. ConfigTab.hermes.test.tsx — two tests pinned pre-#2061 behavior
     that the PR intentionally inverts:

       a. "shows hermes-specific info banner" — RUNTIMES_WITH_OWN_CONFIG
          now contains only {"external"}, so the banner is no longer
          shown for hermes. Inverted assertion: now pins ABSENCE of
          the banner, with a comment noting the inversion.

       b. "config.yaml runtime wins over DB" — priority reversed:
          DB is now authoritative so the tier-on-node badge matches
          the form. Inverted scenario: DB=hermes + yaml=crewai →
          form shows hermes. Switched test's DB runtime off langgraph
          because the dropdown collapses langgraph into an empty-
          valued "default" option that would hide the win signal.

- No production code changed — this commit is staging merge + test
  realignment only. 953/953 canvas tests pass. tsc --noEmit clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 13:50:39 -07:00
Hongming Wang
94d9331c76 feat(canvas+platform): chat attachments, model selection, deploy/delete UX
Session's accumulated UX work across frontend and platform. Reviewable
in four logical sections — diff is large but internally cohesive
(each section fixes a gap the next one depends on).

## Chat attachments — user ↔ agent file round trip

- New POST /workspaces/:id/chat/uploads (multipart, 50 MB total /
  25 MB per file, UUID-prefixed storage under
  /workspace/.molecule/chat-uploads/).
- New GET /workspaces/:id/chat/download with RFC 6266 filename
  escaping and binary-safe io.CopyN streaming.
- Canvas: drag-and-drop onto chat pane, pending-file pills,
  per-message attachment chips with fetch+blob download (anchor
  navigation can't carry auth headers).
- A2A flow carries FileParts end-to-end; hermes template executor
  now consumes attachments via platform helpers.

## Platform attachment helpers (workspace/executor_helpers.py)

Every runtime's executor routes through the same helpers so future
runtimes inherit attachment awareness for free:
- extract_attached_files — resolve workspace:/file:///bare URIs,
  reject traversal, skip non-existent.
- build_user_content_with_files — manifest for non-image files,
  multi-modal list (text + image_url) for images. Respects
  MOLECULE_DISABLE_IMAGE_INLINING for providers whose vision
  adapter hangs on base64 payloads (MiniMax M2.7).
- collect_outbound_files — scans agent reply for /workspace/...
  paths, stages each into chat-uploads/ (download endpoint
  whitelist), emits as FileParts in the A2A response.
- ensure_workspace_writable — called at molecule-runtime startup
  so non-root agents can write /workspace without each template
  having to chmod in its Dockerfile.

Hermes template executor + langgraph (a2a_executor.py) + claude-code
(claude_sdk_executor.py) all adopt the helpers.

## Model selection & related platform fixes

- PUT /workspaces/:id/model — was 404'ing, so canvas "Save"
  silently lost the model choice. Stores into workspace_secrets
  (MODEL_PROVIDER), auto-restarts via RestartByID.
- applyRuntimeModelEnv falls back to envVars["MODEL_PROVIDER"]
  so Restart propagates the stored model to HERMES_DEFAULT_MODEL
  without needing the caller to rehydrate payload.Model.
- ConfigTab Tier dropdown now reads from workspaces row, not the
  (stale) config.yaml — fixes "badge shows T3, form shows T2".

## ChatTab & WebSocket UX fixes

- Send button no longer locks after a dropped TASK_COMPLETE —
  `sending` no longer initializes from data.currentTask.
- A2A POST timeout 15 s → 120 s. LLM turns routinely exceed 15 s;
  the previous default aborted fetches while the server was still
  replying, producing "agent may be unreachable" on success.
- socket.ts: disposed flag + reconnectTimer cancellation + handler
  detachment fix zombie-WebSocket in React StrictMode.
- Hermes Config tab: RUNTIMES_WITH_OWN_CONFIG drops 'hermes' —
  the adaptor's purpose IS the form, banner was contradictory.
- workspace_provision.go auto-recovery: try <runtime>-default AND
  bare <runtime> for template path (hermes lives at the bare name).

## Org deploy/delete animation (theme-ready CSS)

- styles/theme-tokens.css — design tokens (durations, easings,
  colors). Light theme overrides by setting only the deltas.
- styles/org-deploy.css — animation classes + keyframes, every
  value references a token. prefers-reduced-motion respected.
- Canvas projects node.draggable=false onto locked workspaces
  (deploying children AND actively-deleting ids) — RF's
  authoritative drag lock; useDragHandlers retains a belt-and-
  braces check.
- Organ cancel button (red pulse pill on root during deploy)
  cascades via existing DELETE /workspaces/:id?confirm=true.
- Auto fit-view after each arrival, debounced 500 ms so rapid
  sibling arrivals coalesce into one fit (previous per-event
  fit made the viewport lurch continuously).
- Auto-fit respects user-pan — onMoveEnd stamps a user-pan
  timestamp only when event !== null (ignores programmatic
  fitView) so auto-fits don't self-cancel.
- deletingIds store slice + useOrgDeployState merge gives the
  delete flow the same dim + non-draggable treatment as deploy.
- Platform-level classNames.ts shared by canvas-events +
  useCanvasViewport (DRY'd 3 copies of split/filter/join).

## Server payload change

- org_import.go WORKSPACE_PROVISIONING broadcast now includes
  parent_id + parent-RELATIVE x/y (slotX/slotY) so the canvas
  renders the child at the right parent-nested slot without doing
  any absolute-position walk. createWorkspaceTree signature gains
  relX, relY alongside absX, absY; both call sites updated.

## Tests

- workspace/tests/test_executor_helpers.py — 11 new cases
  covering URI resolution (including traversal rejection),
  attached-file extraction (both Part shapes), manifest-only
  vs multi-modal content, large-image skip, outbound staging,
  dedup, and ensure_workspace_writable (chmod 777 + non-root
  tolerance).
- workspace-server chat_files_test.go — upload validation,
  Content-Disposition escaping, filename sanitisation.
- workspace-server secrets_test.go — SetModel upsert, empty
  clears, invalid UUID rejection.
- tests/e2e/test_chat_attachments_e2e.sh — round-trip against
  a live hermes workspace.
- tests/e2e/test_chat_attachments_multiruntime_e2e.sh — static
  plumbing check + round-trip across hermes/langgraph/claude-code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 13:27:51 -07:00
Hongming Wang
8c80175cd8 fix(canvas): subtree-aware layout + org-import reliability + UX polish
Five tightly-related fixes surfaced while stress-testing org-template
imports (Legal Team, Molecule Company, etc.) on a running control plane:

1) Org import was silently failing — INSERT wrote `collapsed` into the
   `workspaces` table but that column lives on `canvas_layouts`
   (005_canvas_layouts.sql). Every import returned 207 with 0 rows
   created, which `api.post` treated as success → green "Imported"
   toast + empty canvas. Moved the write to canvas_layouts; updated
   the workspace_crud PATCH path to UPSERT there too; refreshed the
   test mock. Added a client-side assertion that throws on
   2xx-with-`error`-body so future partial-failures surface a red
   toast rather than lying about success.

2) Multi-level nested layout was collision-prone: children that were
   themselves parents (CTO → Dev Lead → 6 engineers) got the same
   leaf-sized grid slot as leaf siblings and clipped into each other.
   Added post-order `sizeOfSubtree` + sibling-size-aware
   `childSlotInGrid` on both the Go server and the TS client (kept in
   sync). `buildNodesAndEdges` now uses subtree sizes for both parent
   dimensions and the rescue heuristic. `setCollapsed` on expand now
   reads each child's actual rendered width/height instead of the
   leaf-count formula — a regression test covers the CTO/Dev Lead
   scenario.

3) Provisioning-timeout banner was unusable during large imports: a
   30-workspace tree triggered 27 simultaneous "stuck" warnings 2
   minutes in (server paces + provision concurrency = 3 guarantee tail
   items legitimately wait longer). Scaled threshold with concurrent
   count (base + 45s per queue slot beyond concurrency) and added a
   Dismiss (×) button per banner.

4) Auto pan-and-zoom on org ready: after the last workspace flips out
   of `provisioning`, canvas now fitView's with a 1.2s animation,
   0.25 padding, `maxZoom: 0.8` and `minZoom: 0.25`. Without the zoom
   caps fitView was hitting the component's maxZoom=2 on small trees
   and zooming in instead of out.

5) Toolbar was visually busy: `+ N sub` count wrapped onto a second
   row on narrow viewports; status dot and workspace total were in
   separate border-delimited cells. Merged into one segment with
   `whitespace-nowrap`; A2A / Audit / Search / Help collapsed to
   icon-only 28px buttons with tooltip + aria-label (Figma/Linear
   pattern). Stop All / Restart Pending keep text — they're urgent.

Also:
- `api.{get,post,...}` accept an optional `{ timeoutMs }` so callers
  that hit intentionally-slow endpoints (org import paces 2s between
  siblings) don't trip the 15s default and report false aborts.
- `WorkspaceNode` clamps role text to 2 lines so verbose descriptions
  don't unboundedly grow card height and break the grid.
- `PARENT_HEADER_PADDING` bumped 44→130 to clear name + runtime +
  2-line role + the currentTask banner that appears during the
  initial-prompt phase.

Tests: 930 canvas tests + full Go handler suite pass. Added
regressions for (i) 207 partial-success surfacing as throw, and
(ii) setCollapsed sizing with nested-parent children.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 23:48:29 -07:00
Hongming Wang
2d6ff11c4e fix(canvas): re-sort parents-before-children after nest mutation
React Flow requires parent nodes to appear before their children in
the nodes array. When they don't, it logs "Parent node {id} not
found. Please make sure that parent nodes are in front of their
child nodes in the nodes array" and — more importantly — renders
the child at canvas-absolute coords instead of parent-relative,
flashing it far outside the parent.

topology's buildNodesAndEdges already enforced this at hydrate, but
nestNode + batchNest weren't re-sorting after mutating parentId.
A freshly-nested child often ended up after-first-drag at the
wrong screen position because its new parent sat later in the
array than itself.

Extract sortParentsBeforeChildren() into canvas-topology as a
reusable DFS visit; call it at the tail of both nestNode's set()
and batchNest's commit set(). 923 tests still green — no behaviour
change beyond eliminating the warning and the position flash.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 21:00:40 -07:00
Hongming Wang
09053dfdeb fix(canvas): cancel-nest restores position; un-nest shrinks parent
Two follow-up polish items for drag-and-nest:

1. Cancelling the "Extract from team?" dialog now snaps the
   dragged card back to where the drag started. Before, a user
   who dragged a child out, saw the confirm dialog, then clicked
   Cancel ended up with the card stranded outside the parent at
   its drop-point position — which also got persisted via
   savePosition on drag-stop. Now onNodeDragStart captures the
   pre-drag position + parent, and cancelNest restores both the
   RF node position and fires savePosition with the absolute
   pre-drag coords so reload matches.

2. Un-nesting now clears the ex-parent's explicit width/height
   in the nodes array. growParentsToFitChildren is grow-only so
   it could never shrink the parent back down after a child
   left; the card stayed at its auto-grown size with empty
   space. Stripping width/height lets React Flow re-measure from
   the card's own min-width / min-height CSS, so the parent
   visually shrinks to fit whatever children remain.

923 canvas tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 20:52:28 -07:00
Hongming Wang
4fd7f1e84c fix(canvas): tighten rescue + cap toast + cover paths with tests
Three follow-up review findings from the c2b2e13a review:

1. Rescue heuristic uses pure bbox-non-overlap. The previous
   `position.x < 0` branch rescued any child whose parent was
   later dragged past it, even when the layout was clearly
   recoverable (e.g. relative -40, child still overlaps parent).
   New rule: rescue iff the child's bbox has zero overlap with
   the parent's bbox — self-calibrating, scales with user-resized
   parents, catches screenshot-case and legacy huge-positive data.

2. Toast caps failed-name list at 3 and appends "and N more".
   Stops a 50-node partial failure from overflowing the toast
   container.

3. Cycle guard on selection-roots walk in batchNest. Corrupt
   parentId data can't send the loop infinite now. Cheap
   defensive guard — one Set per selected node.

Tests added (923 total, up from 918):
 * canvas-topology.test: 4 rescue scenarios — screenshot case
   (zero-overlap rescue), negative drift kept, huge-positive
   rescued, user-resized layout kept.
 * canvas.test: selection-roots filter on a 3-level chain.
 * workspace_crud test: PATCH {collapsed:true} runs the UPDATE.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 20:08:14 -07:00
Hongming Wang
c2b2e13abe fix(canvas): address code-review findings on the Canvas refactor
Five issues surfaced in the review of 50b53784. Each was either a real
bug waiting to hit users or a silent failure mode.

1. Topology rescue no longer teleports user-resized children.
   Rescue was comparing against parentMinSize(childCount), so any
   child the user had placed in space the parent was resized into
   got snapped to the default grid on reload — undoing the layout.
   Now rescue fires only on obviously corrupt data: negative
   relative coords (legacy pre-nesting absolute positions that
   landed above/left of their assigned parent) or values past an
   MAX_PLAUSIBLE_OFFSET threshold. Children just-past the initial
   minimum are left alone.

2. batchNest now filters to selection-roots before planning.
   Previously selecting both A and A's descendant B and dragging
   into T yanked B out of A to become a sibling under T. Users
   reasonably expect the A subtree to move intact. The new pass
   drops any selected node whose ancestor is also selected —
   those follow their ancestor via React Flow's parent binding.

3. batchNest surfaces partial failure via showToast. Previously
   silent: 2 of 5 PATCHes fail, user sees 3 cards re-parented + 2
   snapped back with no explanation. Now names the failed cards.

4. confirmNest closes the nest dialog BEFORE dispatching the async
   store action, so a second drag can't kick off a competing batch
   while the first is still in flight.

5. collapsed is now persisted. The Go workspace_crud.go Update
   handler ignored the `collapsed` field, so user-initiated
   collapse round-tripped to an expanded state on next hydrate.
   Added the PATCH branch (`UPDATE workspaces SET collapsed = ...`)
   so the state survives reload.

Nits cleaned:
 * Removed dead dragStartParentRef in useDragHandlers.
 * Swapped redundant `node.data as WorkspaceNodeData` casts for a
   named WorkspaceNode type alias.
 * Canvas.tsx SR-live region now reads n.parentId (matches
   MiniMap + RF's native field) instead of the mirror n.data.parentId.

Tests added (918 total, up from 915):
 * batchNest happy path — 2-root selection fires 2 combined PATCHes
   carrying parent_id + x + y, not 2×N sequential round-trips.
 * batchNest ancestor+descendant selection — subtree stays intact.
 * batchNest partial failure rollback — only the rejected nodes
   revert; successful ones stay committed.

Backend change is single-line (collapsed PATCH branch); all
workspace_crud Go tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 19:58:44 -07:00
Hongming Wang
50b537849a refactor(canvas): split Canvas.tsx into hooks; parallelize batchNest
Two concerns in one commit (separate files, each self-contained):

## Canvas.tsx split (from ~680 to ~250 lines)

Canvas.tsx was holding drag gesture state + keyboard shortcuts +
viewport wiring + JSX. Each concern now lives in its own unit under
canvas/src/components/canvas/:

- dragUtils.ts          — pure: shouldDetach, clampChildIntoParent,
                          DETACH_FRACTION
- DropTargetBadge.tsx   — the floating "Drop into: <name>" label + the
                          dashed ghost preview at the target slot
- useDragHandlers.ts    — encapsulates onNodeDragStart / Drag / Stop,
                          findDropTarget hit-test, pendingNest state,
                          and confirmNest/cancelNest. Routes multi-
                          select drags through batchNest automatically.
- useKeyboardShortcuts  — Esc, Enter, Shift+Enter, Cmd+]/[, Z — one
                          window listener, one source of truth.
- useCanvasViewport     — pan-to-node + zoom-to-team CustomEvent
                          listeners and the debounced viewport save.

Canvas.tsx becomes a thin composition + JSX file. No behavioural
change; the refactor is covered by the existing 915 canvas tests.

## batchNest parallelization (2N round-trips → N, all in flight)

Previously nestNode fired two sequential PATCHes (parent_id then x/y)
and batchNest looped nestNode sequentially. For a 5-node selection on
a typical ~200ms link this was ~2s of serialized RPCs.

- nestNode now combines parent_id + x + y into ONE PATCH. The Go
  handler (workspace_crud.go Update) already reads all three from the
  same body — no backend change.
- batchNest rewritten: compute every re-parent plan against one
  snapshot, commit a single set(), then fire N PATCHes via
  Promise.allSettled in parallel. Per-node failures roll back only
  that node (others stay committed) — same semantics as the single-
  node path, just concurrent.
- The state math in the batch path also correctly shifts descendant
  zIndex by depthDelta when any re-parented node has a subtree.

## Also

- canvas-topology.ts: reverted P3.12's opt-in rescue to the auto-
  rescue default. When a child's stored relative position would render
  it outside the parent bbox (the visual regression the user saw after
  collapse → reload — Hermes child drawn outside Claude Code Agent on
  first paint), the child is placed in the next default grid slot.
  The "Arrange Children" context command stays for bigger teams.

All 915 canvas tests pass. No backend changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 19:43:18 -07:00
Hongming Wang
c5abed988e fix(canvas): address review findings on playability pass
Five Critical issues caught in code review of f3423a51. Each one broke
an invariant the original commit claimed to uphold.

1. nestNode: descendants kept their old-depth zIndex after a re-parent.
   Now walks the dragged subtree and shifts every descendant's zIndex
   by the same depthDelta so "children above ancestors" survives moves
   between levels of the hierarchy.

2. bumpZOrder: siblings all share zIndex = depth in fresh topology, so
   a single +1 bump was identical for every sibling and subsequent
   bumps drifted zIndex unboundedly. Rewritten to sort siblings by
   current zIndex and swap the target with its neighbour in the bump
   direction — Figma-style reorder, stays within the sibling tier.

3. findDropTarget: depth-first tiebreaker lost to bumped siblings. The
   visually-frontmost card after Cmd+] is a shallow sibling, but the
   hit test picked the deepest nested card regardless. Swapped order
   so zIndex wins first, depth second, area third. Also pre-computes
   the depth map once per call (was O(n²) via repeated .find walks —
   will matter past ~30 workspaces).

4. arrangeChildren: saved absolute position using `slot + parent.position`,
   but parent.position is RELATIVE to its own parent when nested.
   Grandchildren's stored x/y were in the parent's local frame and
   reload placed them in the wrong spot. Now walks the full ancestor
   chain via absOf() to get the true canvas-absolute origin before
   PATCHing.

5. setCollapsed: naive flip of every descendant's hidden flag diverged
   from the topology rebuild on hydrate. Collapse A, collapse B, then
   expand A — C should stay hidden because B is still collapsed, but
   before this fix C was unhidden. Rewritten to recompute every
   descendant's hidden from the full ancestry chain, matching the
   topology pass byte-for-byte. New round-trip test asserts the two
   code paths produce identical node.hidden across a full lifecycle.

Also:
- Removed dead cascadeMessage constant (never rendered).
- Replaced hardcoded 260/120 in zoom-to-team with exported constants.
- arrangeChildren PATCH catch now logs instead of silently swallowing.
- Added 70→76 tests: setCollapsed 3-chain scenarios, bumpZOrder swap
  semantics, edge-of-list no-op.

All 915 canvas tests green. Backend untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 19:16:48 -07:00
Hongming Wang
f3423a513d feat(canvas): industry-pattern playability pass (P1+P2+P3)
Ships the full prioritized improvement list from the canvas research
report — aligns our nesting/resize UX with Miro / FigJam / tldraw / Figma
conventions. Organized by priority below.

## P1 — baseline playability

* Hysteresis on drag-out detach (Miro): a child only un-nests when >=20%
  of its bbox is outside the parent on release. Prevents accidental
  un-nesting from twitchy drags.
* Drop-target now uses tree-depth DESC, then zIndex DESC, then area ASC
  to pick targets when nested parents overlap (xyflow #2827).
* Children render above ancestors by inheriting zIndex = parent + 1 in
  topology and on every nest/unnest (xyflow #4012).
* Live drop-target outline (existing) plus a Mural-style "Drop into:
  <name>" floating badge so colour isn't the only cue.
* growParentsToFitChildren now fires only on dimension-type changes
  inside onNodesChange (NodeResizer commits) and once on drag-stop —
  avoids tldraw's edge-chase artifact (P3.11 commit-on-release).

## P2 — polish

* Whimsical-style ghost preview: dashed outline at the next default
  grid slot inside the drop-target parent during drag.
* Alt-drag escape with soft clamp: dropping slightly outside a parent
  without Alt/Cmd snaps the child back inside (clampChildIntoParent);
  Alt releases the clamp to allow un-nest; Cmd/Ctrl force-detaches.
* Figma-style keyboard hierarchy nav: Enter descends to first child,
  Shift+Enter ascends to parent, Cmd+]/[ re-orders siblings via the
  new bumpZOrder store action.
* Multi-select re-parent preserves offsets: confirmNest routes through
  a new batchNest action when the primary drag is part of a batch
  selection (Lucidchart pattern).

## P3 — long-tail

* Minimap now shows parent cards as filled regions with a blue stroke,
  so hierarchy reads at a glance without zooming.
* Out-of-bounds rescue is opt-in: topology no longer silently re-lays
  children whose stored position is outside the parent bbox (Figma
  trust-the-data). The new Arrange Children context menu item runs the
  rescue on demand via arrangeChildren.
* Cmd-drag force-detach regardless of hysteresis.
* Collapse workspace: the existing Collapse Team action now toggles a
  local setCollapsed store action that hides every descendant and
  shrinks the parent card to header-only (Miro frame outline view).
  Growth pass skips collapsed parents so they don't push back out.

All 910 canvas tests green. Backend untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 19:03:02 -07:00
Hongming Wang
d359390f83 fix(canvas): parent auto-fit sizing + rescue out-of-bounds children
Two playability bugs in the new flat-cards layout:

1. On first load or fresh org import a parent had no explicit width or
   height, so children whose stored position sat inside their (eventual)
   parent's rectangle rendered visually outside the smaller default
   parent box. Compute a parent starting size in canvas-topology:
     • 2-column grid of child-default footprints + header/side padding
     • Grows per child count (2→1 row, 3-4→2 rows, etc.)
   and stamp it onto the Node's width/height so the first paint already
   contains every child.

2. If a child's stored relative position actually falls outside the
   parent's computed bounds (legacy org-imports at 0,0, pre-refactor
   absolute coordinates, manually-nudged rows), assign that child a
   deterministic default grid slot inside the parent instead.

Runtime cascade: added growParentsToFitChildren to onNodesChange so when
the user drags or resizes a child past the parent's current bounds, the
parent grows to contain it (+padding). Miro/FigJam-style frame auto-fit
— grow-only, never shrinks under the user's manual resize.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 18:29:04 -07:00
Hongming Wang
cc194f0b7e refactor(canvas): flat workspace cards with React Flow native parenting
Every workspace now renders as a first-class card on the canvas
regardless of parent_id. The old "parent card contains mini TeamMember
chips" layout is gone — if B is parented to A, B renders as a full card
inside A's coordinate space using React Flow's `parentId` binding, so
moving A carries B along and children have the same detail + actions as
root cards.

Details:

- canvas-topology.ts: topologically sort parents before children
  (React Flow ordering requirement), compute each child's RF-native
  parentId + relative position on load. DB keeps absolute x/y; the
  abs→rel conversion happens here, reverse translation in
  Canvas.onNodeDragStop before savePosition PATCHes the DB.

- WorkspaceNode.tsx: delete the EmbeddedTeam + TeamMemberChip blocks,
  simplify the size classes, and add NodeResizer (visible when selected)
  so users can drag any edge/corner to grow or shrink. Parent cards
  default to a larger min size so nested children have breathing room.

- Canvas.tsx drop targeting rewritten: bounds-based hit test against
  each node's measured absolute bbox, deepest match wins. Fixes two
  prior bugs at once — dropping onto Claude Code with a nested same-
  named Hermes no longer picks the wrong node, and the target can now
  be a nested workspace when that's where the pointer actually released.

- canvas.ts nestNode + removeNode: translate position between old and
  new parent's absolute origin on nest/unnest so the card doesn't jump,
  and re-point the RF `parentId` alongside `data.parentId` on reparent.

- Tests: hidden-flag assertions replaced with parentId checks; obsolete
  TeamMemberChip a11y/eject tests deleted (the UI component no longer
  exists).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 18:18:44 -07:00
Hongming Wang
e337efe974 fix(canvas): propagate runtime through WORKSPACE_PROVISIONING event
The side-panel runtime pill read "unknown" for newly-deployed workspaces
because canvas-events.ts created the node from WORKSPACE_PROVISIONING
payload — and the payload only carried name + tier. No refetch filled
the gap during provisioning, so the user saw "RUNTIME unknown" on the
card even though the DB row had the real runtime set.

Includes runtime in every WORKSPACE_PROVISIONING emitter:
  * handlers/workspace.go         — initial create
  * handlers/workspace_restart.go — explicit restart, auto-restart, and
                                    crash-recovery resume loop
  * handlers/org_import.go        — multi-workspace org imports

Canvas-side: canvas-events.ts reads payload.runtime when creating the
node; the provisioning test asserts the pill value is populated before
any refetch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 17:17:49 -07:00
Hongming Wang
b4719ad070 fix(canvas): Legend avoids TemplatePalette + silence WS handshake races
### Two unrelated but small UI fixes surfaced while testing the Canvas

**1. Legend hidden under the open TemplatePalette.**

Legend is `fixed bottom-6 left-4 z-30`. TemplatePalette's drawer (when
open) is `fixed top-0 left-0 w-[280px] z-30` — same z-index, same
left-edge column. The Legend overlapped the palette's bottom 180 px.

Published the palette-open state to the canvas store so the Legend
can shift right (to `left-[296px]` — 280 px palette + 16 px gap) while
the palette is open, animated via a 200 ms `transition-[left]` to
match the palette's slide. Closes cleanly back to `left-4` when the
palette is dismissed.

Files:
- `store/canvas.ts` — added `templatePaletteOpen` + `setTemplatePaletteOpen`.
- `TemplatePalette.tsx` — calls `setTemplatePaletteOpen(open)` on
  every open/close transition via a new useEffect.
- `Legend.tsx` — reads the flag and swaps `left-4` <-> `left-[296px]`.

**2. "WebSocket is closed before the connection is established" spam.**

Two components (`ChatTab`, `AgentCommsPanel`) open their own short-
lived WebSocket to tail the ACTIVITY_LOGGED stream. Their cleanup
path called `ws.close()` unconditionally, which trips a browser
console warning when React StrictMode re-runs the effect in dev and
the handshake hasn't completed yet. Confirmed via DevTools console
on the running canvas.

Added a `closeWebSocketGracefully(ws)` helper in `lib/ws-close.ts`:

  - OPEN / CLOSING → close immediately (normal path).
  - CONNECTING    → defer close to the 'open' listener so the
                    browser sees a full handshake. Also wires an
                    'error' listener that cancels the queued close
                    if the handshake fails (no double-close).
  - CLOSED        → no-op.

Both consumers now call the helper in their useEffect cleanup.
Silences the warning without changing observable behaviour.

### Tests

`canvas/src/lib/__tests__/ws-close.test.ts` — 5 cases with a fake
WebSocket covering each readyState branch plus the error-before-open
cancellation path. Full vitest suite: 927/927 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 16:03:01 -07:00
Hongming Wang
5eb5e38c59 fix(canvas): re-centre Toolbar on canvas area when SidePanel is open
When a workspace is selected the SidePanel (fixed, right-0, z-50)
opens from the right edge and covers the right third of the
viewport. The Toolbar at the top was positioned
`fixed top-3 left-1/2 -translate-x-1/2 z-20` — centred on the full
viewport, not the remaining canvas area. Consequence: the right half
of the Toolbar (Audit / Search / Help / Settings) was hidden behind
the panel as soon as the user clicked any workspace.

Fix: publish the live SidePanel width to the canvas store and read
it in Toolbar. When a node is selected, shift the Toolbar LEFT by
`sidePanelWidth / 2` so its centre lines up with the middle of the
remaining canvas area. Animated via a 200 ms `transition-[margin-left]`
to match the SidePanel's own slide-in easing.

- `store/canvas.ts` — added `sidePanelWidth` + `setSidePanelWidth`.
  Default 480 (matches SIDEPANEL_DEFAULT_WIDTH).
- `SidePanel.tsx` — calls `setSidePanelWidth(width)` on every width
  change so the store stays in sync with localStorage.
- `Toolbar.tsx` — reads `sidePanelWidth`, applies a negative
  `marginLeft` style when `selectedNodeId` is non-null.
- `SidePanel.tabs.test.tsx` — added `setSidePanelWidth: vi.fn()` to
  the mocked store state so SidePanel's new useEffect has a callable
  to invoke. 18 previously-passing tests now pass again.

No visual regression when no workspace is selected — the toolbar
stays in its original centred position. SaaS canvas unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 15:57:12 -07:00
molecule-ai[bot]
c0d5e528a4 fix(canvas): cascade-delete UX — require checkbox before Delete All (#1314)
* fix(canvas/test): restore test regressions from PR #1243

PR #1243 introduced two regressions in the canvas vitest suite:

1. ContextMenu.keyboard.test.tsx: the setPendingDelete call now
   passes `{hasChildren, id, name}` (not just `{id, name}`). Updated
   the keyboard-a11y test assertion to match the new store shape.

2. orgs-page.test.tsx: mockFetch.mockResolvedValueOnce() returned a
   plain object that didn't match the two-argument (url, options)
   call signature used by the component's fetch wrapper. Switched to
   mockImplementationOnce returning a rejected Promise — matching
   real fetch's rejection contract — and added runAllTimersAsync after
   advanceTimersByTimeAsync(50) to flush React state updates.

54 test files · 813 tests · all passing

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas): replace bounding-box intersection with distance threshold for nest detection

ReactFlow's getIntersectingNodes uses bounding-box overlap detection, which
fires the drag-over state whenever any part of two nodes' position rectangles
overlap — even when the dragged node is far from the target. This made the
"Nest Workspace" dialog appear from large distances.

Fix: scan all nodes on each drag tick and set dragOverNodeId to the closest
node within NEST_PROXIMITY_THRESHOLD (150 px, center-to-center). This matches
the intuitive behavior: nest only when the node is actually dropped near another.

Constants:
- NEST_PROXIMITY_THRESHOLD = 150px (~60% of a collapsed node's width)
- DEFAULT_NODE_WIDTH = 245px (mid-range of min/max node widths)
- DEFAULT_NODE_HEIGHT = 110px

Also removed the unused getIntersectingNodes import (was causing duplicate
identifier error when both onNodeDrag and the zoom handler called useReactFlow
in the same component scope).

Closes #1052.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas): cascade-delete UX — show child count and require checkbox before Delete All

Issue #1137: with ?confirm=true always sent, a single confirmation silently
cascades — a team lead with 20 children gets nuked on one click.

Changes:
- store/canvas.ts: pendingDelete type now includes children: {id, name}[]
- ContextMenu.tsx: passes child list to setPendingDelete on Delete click
- DeleteCascadeConfirmDialog.tsx: new component — shows child names, a
  cascade warning, and requires the operator to tick a checkbox before
  Delete All activates. Disabled by default; only enables after checkbox.
- Canvas.tsx: conditionally renders DeleteCascadeConfirmDialog for
  hasChildren workspaces, or plain ConfirmDialog for leaf workspaces.
  confirmDelete requires cascadeConfirmChecked=true when hasChildren.
- ContextMenu.keyboard.test.tsx: updated setPendingDelete assertion to
  include children:[] (no children in the test fixture).

813 tests pass.

Closes #1137.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Molecule AI Core-UIUX <core-uiux@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 07:06:45 +00:00
molecule-ai[bot]
04c3bc6eb1 fix(canvas): cascade-delete UX — warn before deleting workspace with children (PR #1252)
- Store: pendingDelete now carries `hasChildren: boolean` (computed from
  nodes.some(parentId === nodeId))
- ContextMenu: passes hasChildren into setPendingDelete
- Canvas: dialog title changes to "Delete Workspace and Children" with
  ⚠️ message when hasChildren; confirms with "Delete All"

Refs: #1137

Co-authored-by: Molecule AI Fullstack (floater) <fullstack-floater@agents.moleculesai.app>
2026-04-21 03:25:12 +00:00
molecule-ai[bot]
45cf87c7b8 test(BatchActionBar): add hasFailedBatch success-reset test (#1170)
CP-QA approved. 34-line test for BatchActionBar retry state reset after successful batch action.
2026-04-21 00:12:37 +00:00
Hongming Wang
0ddf266e54 fix(canvas): delete workspace dialog race with context menu close
Clicking "Delete" in the workspace context menu did nothing for stuck
workspaces. The confirm dialog was rendered via portal as a child of
ContextMenu. ContextMenu's outside-click handler checks whether the
click target is inside its ref — but the portal puts the dialog in
document.body, outside the ref. So clicking the dialog's Confirm
counted as "outside", closed the menu, unmounted the dialog mid-click,
and the onConfirm handler never ran.

Hoist the pending-delete state to the canvas store and render the
confirm dialog at the Canvas level (same pattern as the existing
pendingNest dialog). The dialog now outlives ContextMenu, so the
outside-click close is harmless. Close the context menu on the Delete
click itself rather than waiting for the dialog to resolve.

Add a regression test covering the new flow and add the standard
?confirm=true query param so the backend's child-cascade guard is
consulted correctly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 15:50:30 -07:00
Hongming Wang
6de705b2a1 feat(canvas): batch operations — multi-select + restart/pause/delete (Phase 20.3)
- Shift+click to toggle node selection (multi-select mode)
- BatchActionBar floating at bottom when >1 node selected
- Batch Restart All, Pause All, Delete All with ConfirmDialog
- Selected nodes get blue ring highlight
- Escape clears selection
- Pane click clears selection
- Dark theme, accessible (ARIA labels, focus rings)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 01:16:55 -07:00
2d530c80cf fix(gate-6): merge main into fix/a11y-audit-902-905 — resolve 7 conflicts
Conflicts arose because PR #892 base commits (MemoryInspectorPanel creation,
A2A overlay) had already landed on main via a different merge path, and
last-tick merges (#876, #888) had modified Toolbar, SidePanel, and test
fixtures.

Resolution strategy:
- Toolbar.tsx, SidePanel.tsx, Canvas.a11y.test.tsx, Canvas.pan-to-node.test.tsx,
  MemoryInspectorPanel.test.tsx: take main (strictly newer, already contains
  the branch's A2A overlay content plus subsequent a11y/UX fixes)
- MemoryInspectorPanel.tsx: take main (543 lines with semantic search) + apply
  sanitizeId() helper from #904 + update bodyId prefix to mem-body-
- DetailsTab.tsx: take main (has #875 Field/useId + #878 deleteButtonRef/focus)
  + apply alertdialog structure from #905 while preserving focus management

Mechanical conflict resolution by triage-agent; no logic changes beyond the
four a11y fixes already in the branch (#902-#905).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 01:34:00 +00:00
Molecule AI Frontend Engineer
c8665e9b43 fix(canvas): resolve TS errors in test fixtures — budgetLimit and AuthGate mock types
- Add budgetLimit: null to WorkspaceNodeData fixtures in canvas-capabilities,
  canvas-events, canvas-events-pan, and canvas.test.ts (inline objects)
- Add budget_limit: null to WorkspaceData fixtures in canvas-topology,
  canvas.test.ts makeWS, and ProvisioningTimeout.test.tsx
- Fix AuthGate.test.tsx TS2348: cast vi.fn() mocks to explicit call
  signatures inside vi.mock() factories (Procedure | Constructable issue)
- npx tsc --noEmit: 0 errors; 689/689 tests passing

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 00:43:55 +00:00
molecule-ai[bot]
6d530bfd51 feat(canvas): audit trail visualization panel (issue #753)
- AuditTrailPanel SidePanel tab showing the workspace audit ledger from
  GET /workspaces/:id/audit with cursor-based pagination (?cursor=, ?limit=50)
- Color-coded event-type badges: delegation=blue-500, decision=violet-500,
  gate=yellow-500, hitl=orange-500
- chain_valid=false renders red tamper warning indicator
- Event-type filter bar (All / Delegation / Decision / Gate / HITL) resets
  pagination and reloads with ?event_type= param
- Relative timestamps refreshed every 30 s without re-fetching
- Empty state with icon and descriptive copy
- Toolbar Audit button (ledger icon) switches panel to audit tab for
  selected workspace, or shows toast if no workspace is selected
- 29 new unit tests across formatAuditRelativeTime, AuditEntryRow, and
  AuditTrailPanel component integration suites
- Update SidePanel.tabs.test.tsx for 13-tab count and audit as last tab
- Add setPanelTab to Canvas test store mocks (Toolbar now reads it)

Closes #753

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 16:03:28 +00:00
Molecule AI Frontend Engineer
ca4b461d21 feat(canvas): A2A topology overlay with animated delegation edges (issue #744)
- New A2ATopologyOverlay component polls /activity fan-out every 60s and
  writes directed edges to a2aEdges store slice (separate from topology edges)
- buildA2AEdges aggregates delegate rows per source→target pair; violet-500
  animated edge when last call <5 min ago, blue-500 static otherwise
- Toolbar toggle persists to localStorage (molecule:show-a2a-edges)
- Canvas.tsx merges a2aEdges into allEdges via useMemo; pointerEvents:none
  on all edge elements keeps nodes draggable
- 24 new unit tests across pure function, helper, and component suites
- Fix Canvas.a11y and Canvas.pan-to-node store mocks (missing A2A fields)

Closes #744

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 15:44:01 +00:00
Molecule AI Frontend Engineer
07240e7095 feat(canvas): budget_limit input in workspace creation and settings UI (#541)
- Adds optional Budget limit (USD) numeric field to CreateWorkspaceDialog;
  blank = null (unlimited), populated = parsed float sent as budget_limit in
  POST /workspaces body
- Adds budget_limit field to DetailsTab edit form; saves via
  PATCH /workspaces/:id; pre-fills from current WorkspaceNodeData
- Shows 'Budget limit exceeded' warning badge when budgetUsed > budgetLimit
  (forward-compatible — badge hidden when budgetUsed is absent)
- Extends WorkspaceData, WorkspaceNodeData, and buildNodesAndEdges to carry
  budgetLimit / budgetUsed fields ready for backend hydration (issue #541 BE PR)
- Ships 22 new tests across CreateWorkspaceDialog and BudgetLimit.DetailsTab
  suites (575 total, all passing); npm run build clean; 'use client' grep empty

API shape confirmed from workspace.go and CreateWorkspacePayload struct:
  field name: budget_limit | type: number | null | units: USD

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 06:06:36 +00:00
molecule-ai[bot]
1225e9429d fix(canvas): hydration error UI (#554), radio arrow-key nav (#556), zoom-to-team context menu (#557) (#565)
- #554 CRITICAL: Add hydrationError state to Zustand store; catch handler now
  calls setHydrationError instead of silent console.error; page renders a
  full-screen zinc-950 error banner with a Retry button that reloads the page
- #556 MEDIUM: Add roving tabIndex + ArrowDown/Up/Left/Right keyboard handler
  to the tier radio group in CreateWorkspaceDialog (WCAG 2.1 compliant)
- #557 MEDIUM: Add "Zoom to Team" menu item to ContextMenu (visible only when
  node has children); dispatches molecule:zoom-to-team for keyboard accessibility
- Bonus: add missing 'use client' directive to RevealToggle.tsx

Co-authored-by: Molecule AI Frontend Engineer <frontend-engineer@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 00:35:54 +00:00
Hongming Wang
54bb543ff7 fix: code review findings — token UI, auth hardening, WS dedup
1. Settings panel: wire TokensTab into "API Tokens" tab (was imported
   but not rendered). Rename "API Keys" → "Secrets", add "API Tokens"
   tab. Fix docs link → doc.moleculesai.app/docs/tokens.

2. Referer match hardening: require exact host match or trailing slash
   to prevent evil.com subdomain bypass. Cache CANVAS_PROXY_URL at
   init time instead of per-request os.Getenv.

3. Extract shared deriveWsBaseUrl() to lib/ws-url.ts — eliminates
   duplicate 12-line derivation in socket.ts and TerminalTab.tsx.

4. Token list pagination: add ?limit= and ?offset= params (default
   50, max 200) to GET /workspaces/:id/tokens.

507/507 canvas tests pass, Go build + vet clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 10:42:26 -07:00
Hongming Wang
071fb0da88 fix(tenant): WebSocket URL derivation + AdminAuth same-origin for tenant image
Two bugs on the combined tenant image (canvas + API same-origin):

1. WebSocket URL: NEXT_PUBLIC_WS_URL="" (empty string for same-origin)
   was preserved by ?? operator, producing an invalid WS URL. Now derives
   from window.location when both env vars are empty. Same fix applied
   to TerminalTab.

2. AdminAuth blocking canvas: same-origin requests have no Origin header,
   so neither AdminAuth nor CanvasOrBearer could authenticate the canvas.
   Added isSameOriginCanvas() that checks Referer against request Host,
   gated behind CANVAS_PROXY_URL (only active on tenant image). This
   lets the canvas create/list workspaces, view events, etc. without a
   bearer token when served from the same Go process.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 08:43:01 -07:00
Canvas Agent
242306d239 fix(canvas): fitView on new workspace provision — respects user zoom level (#426)
Replace setCenter(x, y, {zoom:1}) with fitView({nodes:[{id}]}) in the
molecule:pan-to-node handler (Canvas.tsx). The old implementation forced
zoom=1 regardless of the user's current zoom level, which was jarring when
panned/zoomed away. fitView adapts to whatever zoom the user had and
gracefully fits the new node in view.

Tests:
- Canvas.pan-to-node.test.tsx: fitView called with correct nodeId after
  100ms debounce; debounce coalesces rapid successive events.
- canvas-events-pan.test.ts: molecule:pan-to-node dispatched for new
  provisions only, NOT on restart of an existing node.

Fixes #426.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 05:40:40 -07:00
Hongming Wang
8dbd7efc1a fix(canvas): replace nodes.length grid index with monotonic sequence counter (#388)
Root cause of position collision after node deletion:

  handleCanvasEvent(WORKSPACE_PROVISIONING) used nodes.length as the
  grid placement index. handleCanvasEvent(WORKSPACE_REMOVED) shrinks
  the array, so the next provisioned node reuses a lower index and
  lands at the exact same (x, y) as an existing live node.

  Example (4-col grid, COL_SPACING=320):
    Provision A → idx 0 → (100, 100)
    Provision B → idx 1 → (420, 100)
    Provision C → idx 2 → (740, 100)
    Remove    A → nodes.length drops to 2
    Provision D → idx 2 → (740, 100)  ← COLLISION with C

Fix 1 — monotonic _provisioningSequence counter (only ever increases):
  - Replaces nodes.length as the placement index
  - Immune to deletions; every provisioned node gets a unique grid slot
  - resetProvisioningSequence() exported for test teardown only

Fix 2 — the existing restart-path guard (if exists → update, not create)
  already provides idempotency for duplicate WS events on known nodes;
  confirmed: restart path does NOT increment the counter.

Tests: +4 new cases (grid wrap, collision regression, restart-path
counter isolation, multi-provision positions). 485/485 pass.
Build: next build ✓ clean.

Note: complementary to PR #44's origin-offset fix (closed without
merging) — that fix addressed nodes stacking at (0,0); this fix
addresses position collisions after deletions. Both should land.

Co-authored-by: Canvas Agent <agent@canvas.local>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 00:25:33 -07:00
Dev Lead Agent
b7292e9c13 fix(canvas): WORKSPACE_PROVISIONING grid origin offset — prevent viewport clipping
New nodes were placed at (0,0) or close to it, causing them to spawn
behind the toolbar/palette chrome and require manual panning to find.
Add GRID_ORIGIN_X/Y = 100 offset so the first node lands in clear canvas
space, and update the position assertion in the unit test accordingly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 07:53:45 +00:00