Commit Graph

272 Commits

Author SHA1 Message Date
Hongming Wang
5a3dbb95e1 fix(api): probe /cp/auth/me before redirecting on 401
The actual cause-fix for the staging-tabs E2E saga (#2073/#2074/#2075).

Old behaviour: ANY 401 from any fetch on a SaaS tenant subdomain
called redirectToLogin → window.location.href = AuthKit. This is
wrong. Plenty of 401s don't mean "session is dead":

  - workspace-scoped endpoints (/workspaces/:id/peers, /plugins)
    require a workspace-scoped token, not the tenant admin bearer
  - resource-permission mismatches (user has tenant access but not
    this specific workspace)
  - misconfigured proxies returning 401 spuriously

A single transient one of those yanked authenticated users back to
AuthKit. Same bug yanked the staging-tabs E2E off the tenant origin
mid-test for 6+ hours tonight, leading to the cascade of test-side
mocks (#2073/#2074/#2075) that worked around the symptom without
fixing the cause.

This PR fixes it at the source. The new logic:

  - 401 on /cp/auth/* path → that IS the canonical session-dead
    signal → redirect (unchanged)
  - 401 on any other path with slug present → probe /cp/auth/me:
      probe 401 → session genuinely dead → redirect
      probe 200 → session fine, endpoint refused this token →
                  throw a real Error, caller renders error state
      probe network err → assume session-fine (conservative) →
                  throw real Error
  - slug empty (localhost / LAN / reserved subdomain) → throw
    without redirect (unchanged)

The probe adds one extra fetch on a 401, only when slug is set
and the path isn't already auth-scoped. That's rare and
worthwhile — a transient probe round-trip is cheap; an unwanted
auth redirect is a UX disaster.

Tests:
  - api-401.test.ts rewritten with the full matrix:
      * /cp/auth/me 401 → redirect (no probe, that IS the signal)
      * non-auth 401 + probe 401 → redirect
      * non-auth 401 + probe 200 → throw, no redirect  ← the fix
      * non-auth 401 + probe network err → throw, no redirect
      * empty slug paths (localhost/LAN/reserved) → throw, no probe
  - 43 tests in canvas/src/lib/__tests__/api*.test.ts all pass
  - tsc clean

The staging-tabs E2E spec's universal-401 route handler stays as
defense-in-depth (silences resource-load console noise + guards
against panels without try/catch), but the comment now describes
its role honestly: api.ts is the primary fix, the route is the
safety net.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 23:49:28 -07:00
Hongming Wang
43c28710ac
Merge pull request #2066 from Molecule-AI/fix/e2e-staging-status-field
fix(e2e): poll instance_status not status — staging E2E never matched the field, masked all real bugs
2026-04-25 05:58:36 +00:00
Hongming Wang
06c85bd185
Merge pull request #2045 from Molecule-AI/feat/flat-rate-pricing-1833
feat(canvas): flat-rate pricing — rename Starter→Team, Pro→Growth (Issue #1833)
2026-04-25 05:54:06 +00:00
Hongming Wang
4e3bb3795a fix(e2e): canvas-hydration wait used a selector that never appears pre-click
Fourth E2E bug in the staging→main chain. The previous three (#2066
setup-phase fixes) let the harness reach the actual Playwright spec.
This one is in staging-tabs.spec.ts itself.

The spec at L78 waits 45s for one of:

  [role="tablist"], [data-testid="hydration-error"]

Both targets are wrong:

  1. [role="tablist"] only appears AFTER the workspace node is
     clicked (which happens 25 lines later at L100). Waiting for
     it BEFORE the click can never resolve, so the wait always
     times out at 45s regardless of whether the canvas actually
     loaded.

  2. [data-testid="hydration-error"] doesn't exist anywhere in
     the canvas. The error banner at app/page.tsx:62 only had
     role="alert" — which collides with toast notifications and
     other alert-type elements, so a more-specific selector was
     never wired.

Two-part fix:

  - Test waits on `[aria-label="Molecule AI workspace canvas"]`
    instead — that's the React Flow wrapper (Canvas.tsx:150),
    always present once hydrated regardless of workspace count
    or selection state. Hydration-error banner remains the
    secondary OR target for the failure path.

  - app/page.tsx hydration-error banner gets the missing
    `data-testid="hydration-error"` attribute. role="alert"
    stays for accessibility; the testid is for programmatic
    detection without conflict.

After this lands, the staging-tabs spec should advance past the
initial wait, click the workspace node, and exercise each tab.
If a tab fails, we get a proper test failure rather than a 45s
timeout that obscures everything.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 18:38:28 -07:00
Hongming Wang
62217250ed test(pricing): finish Starter→Team, Pro→Growth rename in 6 stale assertions
Marketing-lead agent's rename pass updated the "renders all three plans"
test (lines 56-57) but missed lines 77, 94, 114, 132, 143, 158 which still
referenced the pre-rename "Upgrade to Starter" / "Upgrade to Pro" button
names. Canvas (Next.js) build failed with getByRole timeout because the
component now says "Upgrade to Team" / "Upgrade to Growth".

Internal PlanId tuple ("free" | "starter" | "pro") and startCheckout(planId)
call are unchanged — only the user-facing button labels shifted, so
assertions like startCheckout("pro", "acme") still match the server-side API.

Verified locally: 9/9 PricingTable tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 13:01:40 -07:00
Hongming Wang
2dbd06d52e
Merge pull request #2055 from Molecule-AI/feat/lark-channel-first-class-v2
feat(channels): first-class Lark/Feishu support via schema-driven config
2026-04-24 19:57:57 +00:00
rabbitblood
998cd03265 fix(tabs-a11y): mock config_schema on adapter response
Schema-driven ChannelsTab renders no inputs when config_schema is
absent — the test's bare {type, display_name} mock mismatched the
real API shape and every getByLabelText("Bot Token") failed.

Mock now mirrors GET /channels/adapters with the Telegram schema
(bot_token password + chat_id text) so the a11y assertions run
against the actual rendered form.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 12:04:51 -07:00
molecule-ai[bot]
17f29e874a
Merge pull request #2029 from Molecule-AI/fix/canvas-a11y-tabs-v2
fix(canvas/a11y): add type=button to tab toolbar and settings buttons
2026-04-24 19:01:24 +00:00
Hongming Wang
04e60e7303
Merge pull request #2052 from Molecule-AI/fix/canvas-provisioning-timeout-runtime-aware
fix(canvas): runtime-aware provisioning-timeout threshold (hermes 12min vs default 2min)
2026-04-24 18:51:46 +00:00
rabbitblood
00265d7028 feat(channels): first-class Lark/Feishu support via schema-driven config
Lark adapter was already implemented in Go (lark.go — outbound Custom Bot
webhook + inbound Event Subscriptions with constant-time token verify),
but the Canvas connect-form hardcoded a Telegram-shaped pair of inputs
(bot_token + chat_id). Selecting "Lark / Feishu" from the dropdown
silently sent the wrong field names — there was no way to enter a
webhook URL.

Fix: move form shape to the server.

- Add `ConfigField` struct + `ConfigSchema()` method to the
  `ChannelAdapter` interface. Each adapter declares its own fields with
  label/type/required/sensitive/placeholder/help.
- Implement per-adapter schemas:
  - Lark: webhook_url (required+sensitive) + verify_token (optional+sensitive)
  - Slack: bot_token/channel_id/webhook_url/username/icon_emoji
  - Discord: webhook_url + optional public_key
  - Telegram: bot_token + chat_id (unchanged UX, keeps Detect Chats)
- Change `ListAdapters()` to return `[]AdapterInfo` with config_schema
  inline. Sorted deterministically by display name so UI ordering is
  stable across Go's random map iteration.
- Update the 3 existing `ListAdapters` test sites to struct access.

Canvas (`ChannelsTab.tsx`):
- Replace the two hardcoded bot_token/chat_id inputs with a single
  schema-driven `SchemaField` component. Renders one input per field in
  the order the adapter returns them.
- Form state becomes `formValues: Record<string,string>` keyed by
  `ConfigField.key`. Values reset on platform-switch so stale
  Telegram credentials can't leak into a new Lark channel.
- "Detect Chats" stays but only renders for platforms in
  `SUPPORTS_DETECT_CHATS` (Telegram only — the only provider with
  getUpdates).
- Only schema-known keys are posted in `config`, scrubbing any stale
  values from previous platform selections.

Regression tests:
- `TestLark_ConfigSchema` locks in the 2-field Lark contract with the
  required/sensitive flags correctly set.
- `TestListAdapters_IncludesLark` confirms registry wiring + schema
  survives round-trip through ListAdapters.

Known pre-existing `TestStripPluginMarkers_AwkScript` failure in
internal/handlers is unrelated to this change (verified via stash+test
on clean staging).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 11:51:15 -07:00
Hongming Wang
0b237ed9dd refactor(canvas): extract runtime profiles to @/lib/runtimeProfiles
Preparation for a "hundreds of runtimes" plugin ecosystem. Keeping the
runtime-specific UX knobs in-line inside ProvisioningTimeout scales badly
— every new runtime would require editing a component, not just adding a
table entry. Other components (create-workspace dialog, workspace card
tooltips, etc.) will want the same runtime metadata.

Changes:

- New file `canvas/src/lib/runtimeProfiles.ts` owns:
  * `RuntimeProfile` type — structural shape, every field optional so
    new runtimes can partially-fill without breaking consumers.
  * `DEFAULT_RUNTIME_PROFILE` — 2-min default floor (docker-fast).
  * `RUNTIME_PROFILES` — named overrides (currently: hermes 12 min).
  * `WorkspaceRuntimeOverrides` — interface for server-provided
    per-workspace overrides, so operators can tune via template
    manifest / workspace metadata without a canvas release.
  * `getRuntimeProfile()` — resolver with
    overrides → profile → default priority.
  * `provisionTimeoutForRuntime()` — convenience wrapper.

- `ProvisioningTimeout.tsx` now delegates to the profile module.
  `DEFAULT_PROVISION_TIMEOUT_MS` re-exported for legacy test importers.

- Tests: 16/16 (up from 9 before the first fix). Adds pinning for:
  * overrides > profile > default priority chain
  * "every entry in RUNTIME_PROFILES resolves to a number" contract
  * backward-compat export

Adding a new slow runtime is now one table entry in
`canvas/src/lib/runtimeProfiles.ts` with a mandatory `WHY` comment.
Moving to server-driven profiles later is a ~10-line change (the
resolver already threads WorkspaceRuntimeOverrides through).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 11:48:39 -07:00
Hongming Wang
9597d262ca fix(canvas): runtime-aware provisioning-timeout threshold
Hermes workspaces cold-boot in 8-13 min (ripgrep + ffmpeg + node22 +
hermes-agent source build + Playwright + Chromium ~300MB). The canvas's
2-min hardcoded "Provisioning Timeout" warning fired at ~2min and told
users their workspace was "stuck" while it was still mid-install. Users
hit Retry, triggering fresh cold boots and cancelling healthy workspaces.

User-facing symptom (reported 2026-04-24 18:35Z): hermes workspace showed
"has been provisioning for 3m 15s — it may have encountered an issue"
with Retry + Cancel buttons, while the EC2 was installing node_modules.

Fix:
- Keep DEFAULT_PROVISION_TIMEOUT_MS = 120_000 (2min) — correct for fast
  docker runtimes (claude-code, langgraph, crewai) where cold boot is
  30-90s.
- Add RUNTIME_TIMEOUT_OVERRIDES_MS = { hermes: 720_000 } (12min).
  Aligns with tests/e2e/test_staging_full_saas.sh's
  PROVISION_TIMEOUT_SECS=900 (15min) so UI warns shortly before the
  backend itself gives up.
- New timeoutForRuntime() resolves the base; per-node lookup in the
  check-timeouts interval so a mixed batch (1 hermes + 2 langgraph) uses
  the right threshold for each.
- timeoutMs prop is now optional. Undefined → per-runtime lookup; a
  number → forces a single threshold for every workspace (tests use this
  for deterministic behavior).

Tests: 4 new cases pinning the runtime-aware resolution, including a
guard that catches future regressions that would weaken hermes's budget.
Existing tests unchanged (they import DEFAULT_PROVISION_TIMEOUT_MS which
still exports 120_000).

13/13 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 11:46:09 -07:00
Molecule AI Core Platform Lead
49fc97e6e4 refactor(canvas): remove unused EmbeddedTeam component from WorkspaceNode
EmbeddedTeam was defined in WorkspaceNode.tsx but had no call site —
TeamMemberChip (which is called directly) covers the same rendering
responsibility. The function was stranded after a prior refactor and
was flagged by github-code-quality on PR #1989 (merged 2026-04-24T14:09Z
without this cleanup because the token died before push).

Removes 25 lines of dead code. MAX_NESTING_DEPTH is kept — it is used
by TeamMemberChip at line 498.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 18:30:36 +00:00
Molecule AI Marketing Lead
de19cf9bae fix(canvas): apply flat-rate pricing copy for Phase 34 launch (Issue #1833)
Rename "Starter" → "Team", update tagline + pricing page hero copy to
lead with flat-rate per-org positioning — deliberate wedge against
Cursor/Windsurf per-seat pricing ($40/seat vs $29/org).

PMM decision: Issue #1833. Approved by Marketing Lead 2026-04-24.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 17:54:23 +00:00
1126d7b66d fix(canvas/a11y): add type=button to tab toolbar and settings buttons
WCAG 4.1.2 / bug #1669 follow-up — fixing remaining buttons missing
type="button" across tab components and settings.

Files changed:
- FilesTab/FilesToolbar.tsx (5 buttons): +New, Upload, Export,
  Clear, ↻ (all had onClick, no type=button)
- config/secrets-section.tsx (7 buttons): Remove, Edit/Update/Cancel
  across 2 SecretRow variants + add-variable form
- config/form-inputs.tsx (2 buttons): tag remove ×, section collapse toggle
- ActivityTab.tsx (1 button): row expand toggle
- TracesTab.tsx (1 button): Refresh
- settings/UnsavedChangesGuard.tsx (2 buttons): Keep editing, Discard
  (Radix AlertDialog asChild wrappers — type=button prevents form submit)

Total: 18 buttons fixed across 6 files. 934/934 tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 14:41:35 +00:00
Hongming Wang
6b62391e5d
Merge pull request #1989 from Molecule-AI/fix/canvas-a11y-final
fix(canvas/a11y): type=button campaign + aria fixes (batch 1-3)
2026-04-24 14:05:27 +00:00
Molecule AI Core Platform Lead
4db7f6f024 fix(canvas): define MAX_NESTING_DEPTH constant in WorkspaceNode.tsx
TeamMemberChip used MAX_NESTING_DEPTH to cap recursive sub-agent
rendering at depth 3, but the constant was never declared — causing
a TypeScript build error ('Cannot find name MAX_NESTING_DEPTH') that
blocked Canvas CI on PR #1989.

Add the constant above EmbeddedTeam with a doc comment explaining its
purpose (guards against circular parentId cycles + readability cap).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 12:52:28 +00:00
9f52ee1777 fix(canvas/WorkspaceNode.tsx): add missing useMemo import
CI failure: "Cannot find name 'useMemo'" at line 363.
useMemo was called but not imported — likely dropped during refactor.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 12:40:52 +00:00
6a96641c37 fix(canvas/a11y): add type="button" to remaining canvas component buttons (batch 3)
WCAG 4.1.2 / bug #1669 follow-up — final batch completing the campaign.
Added type="button" to all buttons missing it across 14 canvas components.

Files changed (14, all additions):
- Toolbar.tsx: Stop All, Restart All, A2A toggle, Audit shortcut, Quick help, Search shortcut, Help close (7)
- MemoryInspectorPanel.tsx: scope tabs, refresh, search clear ×2, expand, delete (6)
- TemplatePalette.tsx: org refresh, toggle, Import Agent, org import, deploy template, palette refresh (6)
- ProvisioningTimeout.tsx: Retry, Cancel Request, View Logs, Keep, Remove Workspace (5)
- ConsoleModal.tsx: close, Copy output, Close (3)
- OnboardingWizard.tsx: Skip guide, action, Next (3)
- ConversationTraceModal.tsx: close ×2 (2)
- WorkspaceNode.tsx: Restart banner, Extract from team (2)
- CommunicationOverlay.tsx: toggle, close panel (2)
- Toaster.tsx: dismiss ×2 (2)
- SearchDialog.tsx: search result button (1)
- TermsGate.tsx: accept (1)
- ErrorBoundary.tsx: Reload (1)
- BundleDropZone.tsx: import trigger (1)

Total campaign (batches 1-3): 27 + 42 = 69 buttons fixed across 24 components.
All 477 canvas vitest tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 12:40:52 +00:00
32a3b84147 fix(canvas/a11y): add type="button" to MissingKeysModal, ContextMenu, CreateWorkspaceDialog tier radio
WCAG 4.1.2 / bug #1669 follow-up — modal + menu buttons need explicit type="button".

- MissingKeysModal.tsx: Save, Open Settings Panel, Cancel Deploy, Add Keys+Deploy (4)
- ContextMenu.tsx: all menuitem buttons (1 — inner menu items loop)
- CreateWorkspaceDialog.tsx: tier radio buttons in dialog (1)

56 vitest tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 12:40:52 +00:00
e14b6d2de4 fix(canvas/a11y): add type="button" to BatchActionBar, EmptyState, SidePanel, CreateWorkspaceDialog
WCAG 4.1.2 / bug #1669 follow-up — buttons without explicit type="button"
default to type="submit", risking accidental form submission.

Added type="button" to all action buttons in:
- BatchActionBar.tsx: Restart All, Pause All, Delete All, Clear Selection (4)
- EmptyState.tsx: template deploy buttons + Create blank (all)
- SidePanel.tsx: close panel, tab switches, Restart Now (3)
- CreateWorkspaceDialog.tsx: open trigger, Cancel, Create (3)

Total this commit: +12 insertions / 2 deletions across 4 files.
Prior commit (c5590c0c): ConfirmDialog + AuditTrailPanel + DeleteCascadeConfirmDialog (+7).
Combined batch: 19 buttons fixed across 7 components.

86 vitest tests pass across all touched test files.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 12:40:52 +00:00
2ff15a38a8 fix(canvas/a11y): add type="button" to ConfirmDialog, AuditTrailPanel, DeleteCascadeConfirmDialog
WCAG 4.1.2 / bug #1669 follow-up — buttons without explicit type="button"
default to type="submit", which triggers accidental form submission when
the button is rendered inside a <form> element.

Added type="button" to all action buttons in:
- ConfirmDialog.tsx: Cancel + confirm buttons (lines 123, 130)
- DeleteCascadeConfirmDialog.tsx: Cancel + Delete All buttons (lines 145, 151)
- AuditTrailPanel.tsx: filter buttons, refresh, load-more (lines 140, 154, 194)

All 51 component tests pass (5 ConfirmDialog, 46 AuditTrailPanel+DeleteCascadeConfirmDialog).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 12:40:52 +00:00
e355f447bb fix(canvas/a11y): add aria-hidden to 6 decorative SVGs + aria-label to OrgTokensTab input
WCAG 1.3.1 — inputs without visible text labels need aria-label.
WCAG 4.1.2 — decorative SVGs inside interactive elements need
aria-hidden so screen readers ignore icon content.

Changes:
- ErrorBoundary: warning triangle SVG — aria-hidden=true
- Toolbar: 4 decorative SVGs — aria-hidden=true
  (Stop All square, Restart Pending arrow, Search magnifier, Help circle)
- SettingsButton: gear icon SVG — aria-hidden=true (parent has aria-label)
- RevealToggle: EyeIcon + EyeOffIcon SVGs — aria-hidden=true
- OrgTokensTab: name input — aria-label="Organization API key label"

Bonus fix: removed duplicate title/aria-label props on Restart All button.

Note: ConsoleModal and DeleteCascadeConfirmDialog do not exist in current
staging (aae0c81) — tab trapping fix inapplicable to this codebase.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 12:40:52 +00:00
59feb65252 fix(canvas/a11y): add type=button to 24 buttons across DetailsTab, ConfigTab, FilesTab, MemoryTab
WCAG 4.1.2 / bug #1669 follow-up — DetailsTab, ConfigTab, FilesTab, and
MemoryTab had buttons without explicit type="button", causing accidental
form submission in any surrounding <form> context.

Changes:
- DetailsTab (9 buttons): Save, Cancel (edit), Restart/Retry, Edit,
  View console output, peer select, Confirm Delete, Cancel (delete), Delete Workspace
- ConfigTab AgentCardSection (3): Save, Cancel, Edit Agent Card
- ConfigTab footer (3): Save & Restart, Save, Reload
- ConfigTab textareas (2): aria-label added to Agent Card JSON editor and Raw YAML editor
- FilesTab (4): Delete All, Cancel, Delete, Cancel
- MemoryTab (11): Expand/Collapse, Open, Expand (collapsed state), Advanced,
  Refresh, Add, Save, Cancel (add form), expand entry, Delete entry, Show

Total: 32 interactive elements corrected across 4 tab components.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 12:39:43 +00:00
molecule-ai[bot]
82d15f4d33
Merge pull request #1859 from Molecule-AI/content-marketer/phase34-launch-post-v2
docs(marketing): Phase 34 launch post v2 — governance-first + tool trace
2026-04-24 07:05:54 +00:00
Hongming Wang
0ef5dad1b1
Merge pull request #1993 from Molecule-AI/fix/auth-redirect-loop-regression-tests
test(auth): add regression tests for redirect loop guards
2026-04-24 06:57:12 +00:00
Hongming Wang
8c80175cd8 fix(canvas): subtree-aware layout + org-import reliability + UX polish
Five tightly-related fixes surfaced while stress-testing org-template
imports (Legal Team, Molecule Company, etc.) on a running control plane:

1) Org import was silently failing — INSERT wrote `collapsed` into the
   `workspaces` table but that column lives on `canvas_layouts`
   (005_canvas_layouts.sql). Every import returned 207 with 0 rows
   created, which `api.post` treated as success → green "Imported"
   toast + empty canvas. Moved the write to canvas_layouts; updated
   the workspace_crud PATCH path to UPSERT there too; refreshed the
   test mock. Added a client-side assertion that throws on
   2xx-with-`error`-body so future partial-failures surface a red
   toast rather than lying about success.

2) Multi-level nested layout was collision-prone: children that were
   themselves parents (CTO → Dev Lead → 6 engineers) got the same
   leaf-sized grid slot as leaf siblings and clipped into each other.
   Added post-order `sizeOfSubtree` + sibling-size-aware
   `childSlotInGrid` on both the Go server and the TS client (kept in
   sync). `buildNodesAndEdges` now uses subtree sizes for both parent
   dimensions and the rescue heuristic. `setCollapsed` on expand now
   reads each child's actual rendered width/height instead of the
   leaf-count formula — a regression test covers the CTO/Dev Lead
   scenario.

3) Provisioning-timeout banner was unusable during large imports: a
   30-workspace tree triggered 27 simultaneous "stuck" warnings 2
   minutes in (server paces + provision concurrency = 3 guarantee tail
   items legitimately wait longer). Scaled threshold with concurrent
   count (base + 45s per queue slot beyond concurrency) and added a
   Dismiss (×) button per banner.

4) Auto pan-and-zoom on org ready: after the last workspace flips out
   of `provisioning`, canvas now fitView's with a 1.2s animation,
   0.25 padding, `maxZoom: 0.8` and `minZoom: 0.25`. Without the zoom
   caps fitView was hitting the component's maxZoom=2 on small trees
   and zooming in instead of out.

5) Toolbar was visually busy: `+ N sub` count wrapped onto a second
   row on narrow viewports; status dot and workspace total were in
   separate border-delimited cells. Merged into one segment with
   `whitespace-nowrap`; A2A / Audit / Search / Help collapsed to
   icon-only 28px buttons with tooltip + aria-label (Figma/Linear
   pattern). Stop All / Restart Pending keep text — they're urgent.

Also:
- `api.{get,post,...}` accept an optional `{ timeoutMs }` so callers
  that hit intentionally-slow endpoints (org import paces 2s between
  siblings) don't trip the 15s default and report false aborts.
- `WorkspaceNode` clamps role text to 2 lines so verbose descriptions
  don't unboundedly grow card height and break the grid.
- `PARENT_HEADER_PADDING` bumped 44→130 to clear name + runtime +
  2-line role + the currentTask banner that appears during the
  initial-prompt phase.

Tests: 930 canvas tests + full Go handler suite pass. Added
regressions for (i) 207 partial-success surfacing as throw, and
(ii) setCollapsed sizing with nested-parent children.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 23:48:29 -07:00
e9be12210f test(auth): add regression tests for redirect loop guards
AuthGate now skips session fetch for /cp/auth/* paths, and
redirectToLogin guards against re-setting window.location when
already on an auth path. Both guards had no test coverage —
a future refactor could silently reintroduce the redirect loop.

Added:
- AuthGate.test.tsx: 2 cases covering /cp/auth/login and
  /cp/auth/signup path skipping (no fetchSession call, no
  redirectToLogin call, children rendered)
- auth.test.ts: 2 cases covering redirectToLogin early return
  for /cp/auth/login and /cp/auth/signup paths

Fixes: Molecule-AI/molecule-core#1541

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 06:30:35 +00:00
Hongming Wang
d53583f9c6 Merge remote-tracking branch 'origin/staging' into fix/restore-quickstart-plus-hotfixes 2026-04-23 21:04:55 -07:00
Hongming Wang
2d6ff11c4e fix(canvas): re-sort parents-before-children after nest mutation
React Flow requires parent nodes to appear before their children in
the nodes array. When they don't, it logs "Parent node {id} not
found. Please make sure that parent nodes are in front of their
child nodes in the nodes array" and — more importantly — renders
the child at canvas-absolute coords instead of parent-relative,
flashing it far outside the parent.

topology's buildNodesAndEdges already enforced this at hydrate, but
nestNode + batchNest weren't re-sorting after mutating parentId.
A freshly-nested child often ended up after-first-drag at the
wrong screen position because its new parent sat later in the
array than itself.

Extract sortParentsBeforeChildren() into canvas-topology as a
reusable DFS visit; call it at the tail of both nestNode's set()
and batchNest's commit set(). 923 tests still green — no behaviour
change beyond eliminating the warning and the position flash.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 21:00:40 -07:00
Hongming Wang
2a8977c946 fix(canvas): cancel-nest also shrinks the parent back
Canceling the nest/extract dialog restored the child's position but
left the parent card at its auto-grown size. growParentsToFitChildren
fires on drag-stop to fit a then-outside child; when the drag is
subsequently cancelled, the parent keeps that grown width/height
forever because the grow pass is grow-only.

Strip width/height from the ex-parent alongside the child position
restore in cancelNest — React Flow re-measures from CSS, parent
collapses back to its natural size. Same trick nestNode already
uses for the un-nest path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 20:56:08 -07:00
Hongming Wang
09053dfdeb fix(canvas): cancel-nest restores position; un-nest shrinks parent
Two follow-up polish items for drag-and-nest:

1. Cancelling the "Extract from team?" dialog now snaps the
   dragged card back to where the drag started. Before, a user
   who dragged a child out, saw the confirm dialog, then clicked
   Cancel ended up with the card stranded outside the parent at
   its drop-point position — which also got persisted via
   savePosition on drag-stop. Now onNodeDragStart captures the
   pre-drag position + parent, and cancelNest restores both the
   RF node position and fires savePosition with the absolute
   pre-drag coords so reload matches.

2. Un-nesting now clears the ex-parent's explicit width/height
   in the nodes array. growParentsToFitChildren is grow-only so
   it could never shrink the parent back down after a child
   left; the card stayed at its auto-grown size with empty
   space. Stripping width/height lets React Flow re-measure from
   the card's own min-width / min-height CSS, so the parent
   visually shrinks to fit whatever children remain.

923 canvas tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 20:52:28 -07:00
Hongming Wang
512fdfd59d fix(canvas): plain drag out of parent un-nests again
Un-nest used to require holding Alt (or Cmd to force-detach). That
was too conservative — when a user dragged a child clearly outside
its parent's bbox, nothing happened on release, because the default
branch soft-clamped back and only the Alt branch actually opened
the "Extract?" confirm. Matches the exact bug the user just flagged
("I can put agents in other agent, but when I drag it out, it does
not move out").

New rules:
 * Past the 20 % hysteresis → confirm un-nest. Plain drag, no
   modifier. This is what most users expect (Miro / Figma behave
   the same way — drag outside the frame and the shape leaves it).
 * Inside or within 20 % of the edge → soft-clamp back inside.
   Guards against twitchy releases that momentarily overshoot the
   edge by a few pixels.
 * Cmd / Ctrl → force un-nest regardless of overlap. Escape-hatch
   for when the user dragged within the hysteresis zone but really
   wants out.
 * Dropping onto a different parent → nest there (unchanged).

Alt is no longer a required modifier for un-nesting. Keeps it as
a non-gesture modifier only; no meaning unless we re-bind it later.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 20:48:38 -07:00
Hongming Wang
f2a4b6e0d3 fix: dev-mode bypass for IP rate limiter + 429 retry on GET
The 600-req/min/IP bucket is sized for SaaS where each tenant has
a distinct client IP. On a local Docker setup every panel shares
one IP — hydration (/workspaces + /templates + /org/templates +
/approvals/pending) plus polling (A2A overlay + activity tabs +
approvals + schedule + channels + audit trail) can burst past the
bucket inside a minute, blanking the canvas with 429s. The user
reported it after dragging workspaces — dragging itself is
release-only (savePosition in onNodeDragStop), but the polling
that's always running added onto startup tripped the limit.

Two-layer fix:

Server: RateLimiter.Middleware short-circuits when isDevModeFailOpen
is true (MOLECULE_ENV=development + empty ADMIN_TOKEN), matching
the Tier-1b hatch already applied to AdminAuth, WorkspaceAuth, and
discovery. SaaS production keeps the bucket.

Client: api.ts auto-retries a single 429 on idempotent GET requests,
waiting the server-provided Retry-After (capped at 20s). Mutations
(POST/PUT/PATCH/DELETE) never auto-retry to avoid double-applying.
Users on SaaS hitting a legitimate rate-limit spike get one
transparent recovery instead of an immediately-blank Canvas.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 20:44:09 -07:00
Hongming Wang
286dcbfd1e fix(canvas,org): collapse org-imported parents on first paint
Importing a 15-workspace org template dropped every child as a
freely-positioned card into its parent's coordinate space. Parents
with 5-10 kids had the kids spill below the parent's initial min
size, producing the "ugly default" layout the user just flagged —
a mess of overlapping cards the moment the import completed.

Fix: every workspace in an org-template import that HAS children
is inserted with `collapsed = true`. Leaf workspaces stay
expanded (nothing to hide). The canvas renders a collapsed
parent as a compact header-only card with its "N sub" badge —
visually identical to the pre-refactor default the user asked for.

Double-click on a collapsed parent now EXPANDS it (flipping
`collapsed` locally + persisting via PATCH) so the user can drill
in to see the subtree. Only once expanded does a second
double-click zoom-to-team, matching the prior behaviour.

Leaf-first creation order stays the same; the collapsed flag
just means "render compact" not "hide from API".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 20:36:55 -07:00
Hongming Wang
507696d88a fix(canvas,server): address review findings on 3f11df03
Five review findings from the 3f11df03 six-bug commit:

1. Add TestPeers_DevModeFailOpen_{Allows,ClosedWhenAdminTokenSet,
   ClosedInProduction} covering all three gating states for the
   security-sensitive dev-mode hatch the prior commit added to
   /registry/:id/peers. Previously shipped untested — a future
   refactor could have silently inverted polarity or removed the
   gate. New tests pin the contract:
     * MOLECULE_ENV=development + ADMIN_TOKEN="" → allow bearerless
     * MOLECULE_ENV=development + ADMIN_TOKEN set → require token
     * MOLECULE_ENV=production                    → require token

2. ConfigTab handleSave diffs against the RAW parsed YAML / form
   config instead of the DEFAULT_CONFIG-merged shape. The previous
   code would silently PATCH tier=1 to the DB when a user deleted
   the `tier:` line in raw mode (the default-merge substituted 1).
   Now: only fields the user actually typed participate in the
   diff. Type guards (typeof === "number" / "string") prevent
   coercion surprises on malformed YAML.

3. ConfigTab model-save failure no longer lies "Saved". The
   /workspaces/:id/model PATCH can reject when the runtime doesn't
   support the chosen model; previously we caught + console.warn'd
   + showed green Saved, and the user watched the model revert on
   next reload with no explanation. Now the save path collects a
   `modelSaveError` and surfaces it via setError with a partial-
   success message ("Other fields saved, but model update failed:
   …") so the user sees why.

4. ChannelsTab now surfaces BOTH channels-fetch and adapters-fetch
   failures, distinguishing them in the error text ("Failed to
   load connected channels and platforms — try refreshing").
   Previously only an adapters failure was visible; a channels
   failure left the user with an apparently-empty list and no
   indication the API was unreachable.

5. ChatTab panels drop the redundant aria-hidden attribute. The
   `hidden`/`flex` Tailwind class already sets display:none, which
   removes the node from the accessibility tree on its own; the
   extra aria-hidden invited WAI-ARIA lint warnings if a focusable
   descendant ever landed inside an inactive panel.

Tests: 923 canvas + full Go handler suite pass. 3 new Go tests.
No behaviour change on the five prior fixes — this commit tightens
their edges per the independent review.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 20:29:44 -07:00
Hongming Wang
3f11df031c fix: six UX bugs (peers auth, scroll, chat tabs, config persist, + visibility)
Six bugs reported from a live session — all shippable in one commit:

1. Peers tab 401 on local Docker. The /registry/:id/peers endpoint
   demands a workspace-scoped bearer token (validateDiscoveryCaller)
   which the canvas session doesn't hold. Added the same Tier-1b
   dev-mode fail-open hatch that AdminAuth and WorkspaceAuth already
   use — gated by MOLECULE_ENV=development + empty ADMIN_TOKEN, so
   SaaS production stays strict. Exported IsDevModeFailOpen from the
   middleware package for the handler layer to reuse.

2. Org Templates list unscrollable. OrgTemplatesSection was rendered
   in the TemplatePalette footer — a div without overflow — so when
   it expanded to 15+ entries the list extended past the viewport
   with no scroll. Moved it to the top of the flex-1 overflow-y-auto
   container. Tall lists now scroll naturally.

3. Chat tab: "My Chat" and "Agent Comms" rendered stacked instead
   of switching. HTML `hidden` attribute was being overridden by
   Tailwind's `flex` class (display: flex beats the attribute),
   so both tabpanels rendered concurrently. Swapped to a conditional
   Tailwind `hidden`/`flex` class so the inactive panel is
   display:none with proper CSS specificity.

4. Hermes Config form never persists. handleSave wrote config.yaml
   but name / tier / runtime / model all live on the workspace row
   (or the dedicated /workspaces/:id/model endpoint) — the form
   edited in-memory, the request returned 200, the next reload
   wiped everything back. Hermes + external runtimes manage their
   own config inside the container anyway, so writing config.yaml
   is a no-op for them; skip it. Always diff and PATCH the DB-backed
   fields that actually changed.

5. Channels "+ Connect" dropdown empty on first open. ChannelsTab's
   load() used Promise.all with a silent catch — if EITHER the
   channels or adapters fetch failed, both setters were skipped
   with no error visible. Switched to Promise.allSettled so each
   endpoint settles independently, and the adapters failure now
   surfaces via the top-level error state.

6. Plugin registry always "No plugins in registry". Same silent
   catch pattern in SkillsTab.tsx — load errors for /plugins,
   /plugins/sources, and /workspaces/:id/plugins swallowed without
   logging. Replaced the empty catches with console.warn so future
   failures are at least visible in devtools.

Tests: 923 passing (unchanged). Go handler tests pass. Server
rebuilt and running with the peers-auth + collapsed-persistence
fixes (pid 15875).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 20:18:30 -07:00
Molecule AI App & Docs Lead
3715c06e0b fix(canvas): remove stale firstInputRef useEffect from AllKeysModal
AllKeysModal already handles focus via autoFocus={index === 0} on the
first input and a separate title-focus effect. The orphaned useEffect
referencing firstInputRef (declared only in ProviderPickerModal) caused
a TypeScript build error: "Cannot find name 'firstInputRef'".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 03:11:36 +00:00
746cb22855 fix(canvas/tests): normalize useCanvasStore mock pattern in test files
Standardize the mock for useCanvasStore to always expose getState()
(used by production ContextMenu to filter parent nodes). Applies the
same Object.assign-wrapping pattern introduced in #1744 to:
- ClaudeSettings.test.tsx
- tabs.a11y.test.tsx
- ContextMenu.keyboard.test.tsx (mockStore shape alignment)
2026-04-24 03:10:18 +00:00
680f1f50f2 fix(canvas/a11y): restore aria-hidden on backdrop div after cherry-pick conflict
Cherry-pick from #1744 left the backdrop div without aria-hidden="true"
(the outer dialog div got it instead). Re-apply aria-hidden="true" to
the backdrop div so screen readers skip the clickable overlay layer.

Also revert test assertion from bg-black → bg-black/70 to match the
exact class applied to the backdrop div.
2026-04-24 03:10:18 +00:00
Hongming Wang
4fd7f1e84c fix(canvas): tighten rescue + cap toast + cover paths with tests
Three follow-up review findings from the c2b2e13a review:

1. Rescue heuristic uses pure bbox-non-overlap. The previous
   `position.x < 0` branch rescued any child whose parent was
   later dragged past it, even when the layout was clearly
   recoverable (e.g. relative -40, child still overlaps parent).
   New rule: rescue iff the child's bbox has zero overlap with
   the parent's bbox — self-calibrating, scales with user-resized
   parents, catches screenshot-case and legacy huge-positive data.

2. Toast caps failed-name list at 3 and appends "and N more".
   Stops a 50-node partial failure from overflowing the toast
   container.

3. Cycle guard on selection-roots walk in batchNest. Corrupt
   parentId data can't send the loop infinite now. Cheap
   defensive guard — one Set per selected node.

Tests added (923 total, up from 918):
 * canvas-topology.test: 4 rescue scenarios — screenshot case
   (zero-overlap rescue), negative drift kept, huge-positive
   rescued, user-resized layout kept.
 * canvas.test: selection-roots filter on a 3-level chain.
 * workspace_crud test: PATCH {collapsed:true} runs the UPDATE.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 20:08:14 -07:00
Hongming Wang
c2b2e13abe fix(canvas): address code-review findings on the Canvas refactor
Five issues surfaced in the review of 50b53784. Each was either a real
bug waiting to hit users or a silent failure mode.

1. Topology rescue no longer teleports user-resized children.
   Rescue was comparing against parentMinSize(childCount), so any
   child the user had placed in space the parent was resized into
   got snapped to the default grid on reload — undoing the layout.
   Now rescue fires only on obviously corrupt data: negative
   relative coords (legacy pre-nesting absolute positions that
   landed above/left of their assigned parent) or values past an
   MAX_PLAUSIBLE_OFFSET threshold. Children just-past the initial
   minimum are left alone.

2. batchNest now filters to selection-roots before planning.
   Previously selecting both A and A's descendant B and dragging
   into T yanked B out of A to become a sibling under T. Users
   reasonably expect the A subtree to move intact. The new pass
   drops any selected node whose ancestor is also selected —
   those follow their ancestor via React Flow's parent binding.

3. batchNest surfaces partial failure via showToast. Previously
   silent: 2 of 5 PATCHes fail, user sees 3 cards re-parented + 2
   snapped back with no explanation. Now names the failed cards.

4. confirmNest closes the nest dialog BEFORE dispatching the async
   store action, so a second drag can't kick off a competing batch
   while the first is still in flight.

5. collapsed is now persisted. The Go workspace_crud.go Update
   handler ignored the `collapsed` field, so user-initiated
   collapse round-tripped to an expanded state on next hydrate.
   Added the PATCH branch (`UPDATE workspaces SET collapsed = ...`)
   so the state survives reload.

Nits cleaned:
 * Removed dead dragStartParentRef in useDragHandlers.
 * Swapped redundant `node.data as WorkspaceNodeData` casts for a
   named WorkspaceNode type alias.
 * Canvas.tsx SR-live region now reads n.parentId (matches
   MiniMap + RF's native field) instead of the mirror n.data.parentId.

Tests added (918 total, up from 915):
 * batchNest happy path — 2-root selection fires 2 combined PATCHes
   carrying parent_id + x + y, not 2×N sequential round-trips.
 * batchNest ancestor+descendant selection — subtree stays intact.
 * batchNest partial failure rollback — only the rejected nodes
   revert; successful ones stay committed.

Backend change is single-line (collapsed PATCH branch); all
workspace_crud Go tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 19:58:44 -07:00
dc9001835e fix(ConfigTab.hermes.test): remove unused fireEvent import 2026-04-24 02:55:51 +00:00
Hongming Wang
50b537849a refactor(canvas): split Canvas.tsx into hooks; parallelize batchNest
Two concerns in one commit (separate files, each self-contained):

## Canvas.tsx split (from ~680 to ~250 lines)

Canvas.tsx was holding drag gesture state + keyboard shortcuts +
viewport wiring + JSX. Each concern now lives in its own unit under
canvas/src/components/canvas/:

- dragUtils.ts          — pure: shouldDetach, clampChildIntoParent,
                          DETACH_FRACTION
- DropTargetBadge.tsx   — the floating "Drop into: <name>" label + the
                          dashed ghost preview at the target slot
- useDragHandlers.ts    — encapsulates onNodeDragStart / Drag / Stop,
                          findDropTarget hit-test, pendingNest state,
                          and confirmNest/cancelNest. Routes multi-
                          select drags through batchNest automatically.
- useKeyboardShortcuts  — Esc, Enter, Shift+Enter, Cmd+]/[, Z — one
                          window listener, one source of truth.
- useCanvasViewport     — pan-to-node + zoom-to-team CustomEvent
                          listeners and the debounced viewport save.

Canvas.tsx becomes a thin composition + JSX file. No behavioural
change; the refactor is covered by the existing 915 canvas tests.

## batchNest parallelization (2N round-trips → N, all in flight)

Previously nestNode fired two sequential PATCHes (parent_id then x/y)
and batchNest looped nestNode sequentially. For a 5-node selection on
a typical ~200ms link this was ~2s of serialized RPCs.

- nestNode now combines parent_id + x + y into ONE PATCH. The Go
  handler (workspace_crud.go Update) already reads all three from the
  same body — no backend change.
- batchNest rewritten: compute every re-parent plan against one
  snapshot, commit a single set(), then fire N PATCHes via
  Promise.allSettled in parallel. Per-node failures roll back only
  that node (others stay committed) — same semantics as the single-
  node path, just concurrent.
- The state math in the batch path also correctly shifts descendant
  zIndex by depthDelta when any re-parented node has a subtree.

## Also

- canvas-topology.ts: reverted P3.12's opt-in rescue to the auto-
  rescue default. When a child's stored relative position would render
  it outside the parent bbox (the visual regression the user saw after
  collapse → reload — Hermes child drawn outside Claude Code Agent on
  first paint), the child is placed in the next default grid slot.
  The "Arrange Children" context command stays for bigger teams.

All 915 canvas tests pass. No backend changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 19:43:18 -07:00
Hongming Wang
c5abed988e fix(canvas): address review findings on playability pass
Five Critical issues caught in code review of f3423a51. Each one broke
an invariant the original commit claimed to uphold.

1. nestNode: descendants kept their old-depth zIndex after a re-parent.
   Now walks the dragged subtree and shifts every descendant's zIndex
   by the same depthDelta so "children above ancestors" survives moves
   between levels of the hierarchy.

2. bumpZOrder: siblings all share zIndex = depth in fresh topology, so
   a single +1 bump was identical for every sibling and subsequent
   bumps drifted zIndex unboundedly. Rewritten to sort siblings by
   current zIndex and swap the target with its neighbour in the bump
   direction — Figma-style reorder, stays within the sibling tier.

3. findDropTarget: depth-first tiebreaker lost to bumped siblings. The
   visually-frontmost card after Cmd+] is a shallow sibling, but the
   hit test picked the deepest nested card regardless. Swapped order
   so zIndex wins first, depth second, area third. Also pre-computes
   the depth map once per call (was O(n²) via repeated .find walks —
   will matter past ~30 workspaces).

4. arrangeChildren: saved absolute position using `slot + parent.position`,
   but parent.position is RELATIVE to its own parent when nested.
   Grandchildren's stored x/y were in the parent's local frame and
   reload placed them in the wrong spot. Now walks the full ancestor
   chain via absOf() to get the true canvas-absolute origin before
   PATCHing.

5. setCollapsed: naive flip of every descendant's hidden flag diverged
   from the topology rebuild on hydrate. Collapse A, collapse B, then
   expand A — C should stay hidden because B is still collapsed, but
   before this fix C was unhidden. Rewritten to recompute every
   descendant's hidden from the full ancestry chain, matching the
   topology pass byte-for-byte. New round-trip test asserts the two
   code paths produce identical node.hidden across a full lifecycle.

Also:
- Removed dead cascadeMessage constant (never rendered).
- Replaced hardcoded 260/120 in zoom-to-team with exported constants.
- arrangeChildren PATCH catch now logs instead of silently swallowing.
- Added 70→76 tests: setCollapsed 3-chain scenarios, bumpZOrder swap
  semantics, edge-of-list no-op.

All 915 canvas tests green. Backend untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 19:16:48 -07:00
Hongming Wang
f3423a513d feat(canvas): industry-pattern playability pass (P1+P2+P3)
Ships the full prioritized improvement list from the canvas research
report — aligns our nesting/resize UX with Miro / FigJam / tldraw / Figma
conventions. Organized by priority below.

## P1 — baseline playability

* Hysteresis on drag-out detach (Miro): a child only un-nests when >=20%
  of its bbox is outside the parent on release. Prevents accidental
  un-nesting from twitchy drags.
* Drop-target now uses tree-depth DESC, then zIndex DESC, then area ASC
  to pick targets when nested parents overlap (xyflow #2827).
* Children render above ancestors by inheriting zIndex = parent + 1 in
  topology and on every nest/unnest (xyflow #4012).
* Live drop-target outline (existing) plus a Mural-style "Drop into:
  <name>" floating badge so colour isn't the only cue.
* growParentsToFitChildren now fires only on dimension-type changes
  inside onNodesChange (NodeResizer commits) and once on drag-stop —
  avoids tldraw's edge-chase artifact (P3.11 commit-on-release).

## P2 — polish

* Whimsical-style ghost preview: dashed outline at the next default
  grid slot inside the drop-target parent during drag.
* Alt-drag escape with soft clamp: dropping slightly outside a parent
  without Alt/Cmd snaps the child back inside (clampChildIntoParent);
  Alt releases the clamp to allow un-nest; Cmd/Ctrl force-detaches.
* Figma-style keyboard hierarchy nav: Enter descends to first child,
  Shift+Enter ascends to parent, Cmd+]/[ re-orders siblings via the
  new bumpZOrder store action.
* Multi-select re-parent preserves offsets: confirmNest routes through
  a new batchNest action when the primary drag is part of a batch
  selection (Lucidchart pattern).

## P3 — long-tail

* Minimap now shows parent cards as filled regions with a blue stroke,
  so hierarchy reads at a glance without zooming.
* Out-of-bounds rescue is opt-in: topology no longer silently re-lays
  children whose stored position is outside the parent bbox (Figma
  trust-the-data). The new Arrange Children context menu item runs the
  rescue on demand via arrangeChildren.
* Cmd-drag force-detach regardless of hysteresis.
* Collapse workspace: the existing Collapse Team action now toggles a
  local setCollapsed store action that hides every descendant and
  shrinks the parent card to header-only (Miro frame outline view).
  Growth pass skips collapsed parents so they don't push back out.

All 910 canvas tests green. Backend untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 19:03:02 -07:00
Hongming Wang
d359390f83 fix(canvas): parent auto-fit sizing + rescue out-of-bounds children
Two playability bugs in the new flat-cards layout:

1. On first load or fresh org import a parent had no explicit width or
   height, so children whose stored position sat inside their (eventual)
   parent's rectangle rendered visually outside the smaller default
   parent box. Compute a parent starting size in canvas-topology:
     • 2-column grid of child-default footprints + header/side padding
     • Grows per child count (2→1 row, 3-4→2 rows, etc.)
   and stamp it onto the Node's width/height so the first paint already
   contains every child.

2. If a child's stored relative position actually falls outside the
   parent's computed bounds (legacy org-imports at 0,0, pre-refactor
   absolute coordinates, manually-nudged rows), assign that child a
   deterministic default grid slot inside the parent instead.

Runtime cascade: added growParentsToFitChildren to onNodesChange so when
the user drags or resizes a child past the parent's current bounds, the
parent grows to contain it (+padding). Miro/FigJam-style frame auto-fit
— grow-only, never shrinks under the user's manual resize.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 18:29:04 -07:00
Hongming Wang
cc194f0b7e refactor(canvas): flat workspace cards with React Flow native parenting
Every workspace now renders as a first-class card on the canvas
regardless of parent_id. The old "parent card contains mini TeamMember
chips" layout is gone — if B is parented to A, B renders as a full card
inside A's coordinate space using React Flow's `parentId` binding, so
moving A carries B along and children have the same detail + actions as
root cards.

Details:

- canvas-topology.ts: topologically sort parents before children
  (React Flow ordering requirement), compute each child's RF-native
  parentId + relative position on load. DB keeps absolute x/y; the
  abs→rel conversion happens here, reverse translation in
  Canvas.onNodeDragStop before savePosition PATCHes the DB.

- WorkspaceNode.tsx: delete the EmbeddedTeam + TeamMemberChip blocks,
  simplify the size classes, and add NodeResizer (visible when selected)
  so users can drag any edge/corner to grow or shrink. Parent cards
  default to a larger min size so nested children have breathing room.

- Canvas.tsx drop targeting rewritten: bounds-based hit test against
  each node's measured absolute bbox, deepest match wins. Fixes two
  prior bugs at once — dropping onto Claude Code with a nested same-
  named Hermes no longer picks the wrong node, and the target can now
  be a nested workspace when that's where the pointer actually released.

- canvas.ts nestNode + removeNode: translate position between old and
  new parent's absolute origin on nest/unnest so the card doesn't jump,
  and re-point the RF `parentId` alongside `data.parentId` on reparent.

- Tests: hidden-flag assertions replaced with parentId checks; obsolete
  TeamMemberChip a11y/eject tests deleted (the UI component no longer
  exists).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 18:18:44 -07:00
Hongming Wang
8a07cf4035 fix(canvas): skip already-nested workspaces as drop targets
Dragging one workspace onto another could pick a nested child as the
"nearest" drop target instead of the visible parent card the user
actually hovered. The effect: dropping a free-floating Hermes Agent
onto a Claude Code Agent that already had a Hermes Agent nested inside
showed "Move 'Hermes Agent' inside 'Hermes Agent'?" — the confirmation
referenced the nested same-named child, not Claude Code.

Why: getIntersectingNodes returns every overlapping node, including
hidden=true children that render inside their parent's card. The
parent and child share bounding boxes, so the child often "won" the
nearest-distance check. Filter them out at the source: a node that's
already got a parentId (or is hidden) is never a valid top-level drop
target.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 17:49:01 -07:00
Hongming Wang
5ebe6ccb33 test: regression guards for 2026-04-23 hermes + CP bug wave
Three complementary regression tests for the chain of P0s fixed today.
Each targets a specific bug class that reached production, and will
fire loud if any of them regress.

## 1. E2E A2A assertion enhancements (tests/e2e/test_staging_full_saas.sh)

The existing A2A check looked for "error|exception" in the response text,
which was too broad and missed the actual error patterns we hit. Now
matches each known error class individually with a diagnostic fail
message pointing at the exact bug:

  - "[hermes-agent error 401]"        → hermes #12 (API_SERVER_KEY)
  - "hermes-agent unreachable"        → gateway process died
  - "model_not_found"                 → hermes #13 (model prefix)
  - "Encrypted content is not supported" → hermes #14 (api_mode)
  - "Unknown provider"                → bridge PROVIDER misconfig

Also asserts the response contains the PONG token the prompt asked for —
catches silent-truncation/echo regressions.

## 2. Hermes install.sh bridge shell harness (tools/test-hermes-bridge.sh)

4 scenarios × 16 assertions, all offline (no docker, no network):

  - openai-bridge-happy: OPENAI_API_KEY + openai/gpt-4o →
    provider=custom, model="gpt-4o" (prefix stripped),
    api_mode=chat_completions
  - operator-custom-wins: explicit HERMES_CUSTOM_* → bridge skipped
  - openrouter-not-touched: OPENROUTER_API_KEY → provider=openrouter,
    slug kept
  - non-prefixed-model: bare "gpt-4o" → prefix-strip is a no-op

Runs in <1s, can be wired into template-hermes CI. Pins the exact
config.yaml shape — any drift in derive-provider.sh or the bridge
if-block breaks a test.

## 3. Canvas ConfigTab hermes tests (ConfigTab.hermes.test.tsx)

5 vitest cases covering the #1894 bugs:

  - Runtime loads from workspace metadata when config.yaml missing
  - "No config.yaml found" red error hidden for hermes
  - Hermes info banner shown instead
  - Langgraph workspace still sees the red error (regression-guard the
    other way)
  - config.yaml runtime wins over workspace metadata when present

## Running

  bash tools/test-hermes-bridge.sh                # 16 assertions
  cd canvas && npx vitest run src/components/tabs/__tests__/ConfigTab.hermes.test.tsx  # 5 cases
  # E2E enhancements ride on the existing staging E2E workflow

## Not yet covered (tracked in #1900)

CP admin delete-tenant EC2 cascade, cp-provisioner instance_id
lookup (#1738), purge audit SQL mismatch (#241), and pq prepared-
statement cache collision (#242). These are in-controlplane-repo
concerns — separate PR with CP-side sqlmock + integration tests.

Closes items in #1900.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 17:45:13 -07:00