Commit Graph

294 Commits

Author SHA1 Message Date
Hongming Wang
43c28710ac
Merge pull request #2066 from Molecule-AI/fix/e2e-staging-status-field
fix(e2e): poll instance_status not status — staging E2E never matched the field, masked all real bugs
2026-04-25 05:58:36 +00:00
Hongming Wang
06c85bd185
Merge pull request #2045 from Molecule-AI/feat/flat-rate-pricing-1833
feat(canvas): flat-rate pricing — rename Starter→Team, Pro→Growth (Issue #1833)
2026-04-25 05:54:06 +00:00
Hongming Wang
e58ecf2974 fix(e2e): scrollIntoView before toBeVisible — clipped tabs were "missing"
Seventh E2E bug, surfaced after the AuthGate mock from the previous
commit finally let the harness reach the tab-iteration loop:

  Error: tab-skills button missing — TABS list may have drifted
  Locator: locator('#tab-skills')

The TABS bar in SidePanel is `overflow-x-auto` (intentional — there
are 13 tabs and they don't all fit on smaller viewports; the
right-edge fade gradient signals the overflow). Tabs after position
~3 are clipped, and Playwright's `toBeVisible()` returns false for
clipped elements (it checks getBoundingClientRect against viewport).

Fix: `scrollIntoViewIfNeeded()` before the visibility assertion,
mirroring what SidePanel's own keyboard handler does on arrow-key
navigation. The tab is then in view and `toBeVisible()` passes.

This was the test's 7th and (probably) final harness bug. The
chain mapping all the way from "staging E2E timed out at 1200s"
this morning:

  1. instance_status field name (#2066)
  2. staging.moleculesai.app DNS zone (#2066)
  3. X-Molecule-Org-Id TenantGuard header (#2066)
  4. Hydration selector waited pre-click (#2066)
  5. networkidle never settles (this PR's parent commits)
  6. AuthGate /cp/auth/me redirect
  7. Tab buttons clipped by overflow-x-auto

If THIS run still fails, the failure surfaces in actual product
behavior (a tab's panel content), not test mechanics.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 20:37:36 -07:00
Hongming Wang
6c70b413e0 fix(e2e): mock /cp/auth/me — AuthGate redirect was preventing canvas render
Sixth E2E bug, surfaced after the page.goto-domcontentloaded fix
finally let the navigation complete. The harness now reaches the
canvas-root selector wait but still times out because the canvas
never renders:

  TimeoutError: page.waitForSelector: Timeout 45000ms exceeded.
  waiting for [aria-label="Molecule AI workspace canvas"]

Root cause: canvas/src/components/AuthGate.tsx wraps the page,
fetches /cp/auth/me on mount, and redirects to the login page when
the response is 401. The bearer header we set via
context.setExtraHTTPHeaders works for platform API calls but does
NOT satisfy /cp/auth/me — that endpoint is cookie-based (WorkOS
session). So:

  1. AuthGate mounts
  2. Calls fetchSession() → /cp/auth/me → 401 (no session cookie)
  3. AuthGate transitions to anonymous → redirectToLogin()
  4. Browser navigates away from tenant URL
  5. The React Flow canvas root with the aria-label never mounts
  6. waitForSelector times out at 45s

Fix: context.route() intercepts /cp/auth/me and returns a fake
Session JSON so AuthGate resolves to "authenticated" and renders
its children. The session contents are cosmetic — Session.org_id
and Session.user_id appear in a few canvas surfaces but never fail
on dummy values.

This is the cleanest fix path. Alternatives considered + rejected:
  - Add a ?e2e=1 backdoor to AuthGate: production code shouldn't
    have a "skip auth" flag, even gated.
  - Real WorkOS login flow in Playwright: too much overhead per run.
  - Skip the canvas UI test, test only API: defeats the point of
    the staging E2E (which is to catch UI regressions before
    promotion).

After this lands the harness should reach the workspace-node click
step and exercise tabs — only then can a real product bug (rather
than a test-harness bug) surface. The 6-bug chain mapped to:
  1. instance_status field name (#2066)
  2. staging.moleculesai.app DNS zone (#2066)
  3. X-Molecule-Org-Id TenantGuard header (#2066)
  4. Hydration selector waited pre-click (#2066)
  5. networkidle never settles (this commit's parent)
  6. AuthGate /cp/auth/me redirect (this commit)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 19:59:04 -07:00
Hongming Wang
c2504d9361 fix(e2e): page.goto waitUntil networkidle never settles — switch to domcontentloaded
Fifth E2E bug surfaced by the previous run. After the four setup-
phase fixes (instance_status, DNS zone, X-Molecule-Org-Id, hydration
selector) plus CP#259 ending the pq cache class, the harness finally
reached the actual page navigation step — and timed out there:

  TimeoutError: page.goto: Timeout 45000ms exceeded.
    navigating to "https://...staging.moleculesai.app/", waiting until "networkidle"

`waitUntil: "networkidle"` waits for 500ms of network silence. The
canvas keeps a WebSocket connection open + polls /events and
/workspaces every few seconds for status updates, so the network
is never idle — page.goto sits on it until the default 45s timeout
and throws.

Fix: switch to `waitUntil: "domcontentloaded"`. Returns as soon as
the HTML is parsed. React hydration plus the existing
`waitForSelector` line below is what actually gates ready-for-
interaction; the goto's job is just to land on the page.

This is a generally-applicable lesson — networkidle is broken for
any SPA with a heartbeat. Notably, our existing canvas unit tests
that mock @xyflow/react and don't open WebSockets DON'T hit this,
which is why this only surfaces against staging.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 19:43:46 -07:00
Hongming Wang
4e3bb3795a fix(e2e): canvas-hydration wait used a selector that never appears pre-click
Fourth E2E bug in the staging→main chain. The previous three (#2066
setup-phase fixes) let the harness reach the actual Playwright spec.
This one is in staging-tabs.spec.ts itself.

The spec at L78 waits 45s for one of:

  [role="tablist"], [data-testid="hydration-error"]

Both targets are wrong:

  1. [role="tablist"] only appears AFTER the workspace node is
     clicked (which happens 25 lines later at L100). Waiting for
     it BEFORE the click can never resolve, so the wait always
     times out at 45s regardless of whether the canvas actually
     loaded.

  2. [data-testid="hydration-error"] doesn't exist anywhere in
     the canvas. The error banner at app/page.tsx:62 only had
     role="alert" — which collides with toast notifications and
     other alert-type elements, so a more-specific selector was
     never wired.

Two-part fix:

  - Test waits on `[aria-label="Molecule AI workspace canvas"]`
    instead — that's the React Flow wrapper (Canvas.tsx:150),
    always present once hydrated regardless of workspace count
    or selection state. Hydration-error banner remains the
    secondary OR target for the failure path.

  - app/page.tsx hydration-error banner gets the missing
    `data-testid="hydration-error"` attribute. role="alert"
    stays for accessibility; the testid is for programmatic
    detection without conflict.

After this lands, the staging-tabs spec should advance past the
initial wait, click the workspace node, and exercise each tab.
If a tab fails, we get a proper test failure rather than a 45s
timeout that obscures everything.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 18:38:28 -07:00
Hongming Wang
4fdeabdbe0 fix(e2e): send X-Molecule-Org-Id header — TenantGuard 404s without it
Third E2E bug in the staging→main chain, found while debugging the
\`Workspace create 404\` failure that surfaced after the previous two
E2E fixes (instance_status, staging.moleculesai.app DNS).

Root cause: workspace-server's \`middleware/TenantGuard\` middleware
returns 404 (not 401/403, intentionally — see comment in
\`tenant_guard.go\`: "must not be inferable by probing other orgs'
machines") when a request to the tenant origin lacks one of:
  - X-Molecule-Org-Id header matching MOLECULE_ORG_ID env on the tenant
  - Fly-Replay-Src state from the CP router (production browser path)
  - Same-origin Canvas (Referer == Host)

The E2E was a direct GitHub-Actions curl with neither — every non-
allowlisted route 404'd with the platform's ratelimit headers but
none of the security headers, which made it look like a missing
route in the platform.

The org UUID is already on the admin-orgs row alongside instance_status,
so capture it during the readiness poll and add it to the tenantAuth
header bag. Both /workspaces (POST) and /workspaces/:id (GET) now
carry it.

Allowlist still contains /health, /metrics, /registry/register,
/registry/heartbeat — so the TLS readiness step (which hits /health)
keeps working without the header.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 18:13:13 -07:00
Hongming Wang
edcac16b81 fix(e2e): use staging.moleculesai.app for tenant DNS — wrong zone hung TLS poll
Second related E2E bug, surfaced after #2066's instance_status fix
let the harness reach the TLS readiness step:

  Error: tenant TLS: timed out after 180s

The CP provisioner writes staging tenant DNS as
<slug>.staging.moleculesai.app (with the staging. subdomain
prefix — visible in the EC2 provisioner DNS log line). The harness
was building https://<slug>.moleculesai.app (prod-zone shape),
so DNS literally didn't resolve, fetch threw NXDOMAIN inside the
silent catch, and waitFor saw null on every 5s poll until 180s
elapsed.

Fix: parameterize as STAGING_TENANT_DOMAIN env var, default
staging.moleculesai.app. Doc-comment example updated to match.
Override hatch is there only for ops running this harness against
a non-default zone.

Verified manually: a freshly-provisioned tenant
(e2e-canvas-20260425-sav9fe) was unreachable at the prod-shaped
URL (NXDOMAIN) but reached CF at the staging-shaped URL.

teardown.ts only hits CP, not the tenant URL — no fix needed there.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 17:45:48 -07:00
Hongming Wang
754f361c03 fix(e2e): poll instance_status not status — waitFor never matched, masked real bugs
Staging Canvas Playwright E2E has been timing out at 1200s on every
recent run. Found via /code-review-and-quality on the staging→main
promotion chain.

The CP /cp/admin/orgs response shape is (handlers/admin.go:118):

  type adminOrgSummary struct {
    ...
    InstanceStatus string `json:"instance_status,omitempty"`
    ...
  }

There is NO top-level `status` field. The waitFor predicate compared
`row.status === "running"` against undefined on every poll — the
predicate could never resolve truthy. The harness invariably wedged
on the 20-min timeout regardless of whether the tenant was actually
provisioned.

This bug has been double-edged:
  - It MASKED the #242 pq-cache-collision class for hours: the
    tenants WERE provisioning fine, but the test couldn't tell.
  - It survived #255, #257 (real CP fixes) — the test still timed
    out, making us suspect more CP bugs that didn't exist.

Fix: poll `row.instance_status` instead. One-line change. Identical
fix for the failed-state branch one line below.

No new tests for the harness itself; the fix's correctness is
verified by the next E2E run on the affected branch passing
end-to-end. If it doesn't pass after this, there's a separate
bug we can hunt cleanly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 17:32:12 -07:00
Hongming Wang
62217250ed test(pricing): finish Starter→Team, Pro→Growth rename in 6 stale assertions
Marketing-lead agent's rename pass updated the "renders all three plans"
test (lines 56-57) but missed lines 77, 94, 114, 132, 143, 158 which still
referenced the pre-rename "Upgrade to Starter" / "Upgrade to Pro" button
names. Canvas (Next.js) build failed with getByRole timeout because the
component now says "Upgrade to Team" / "Upgrade to Growth".

Internal PlanId tuple ("free" | "starter" | "pro") and startCheckout(planId)
call are unchanged — only the user-facing button labels shifted, so
assertions like startCheckout("pro", "acme") still match the server-side API.

Verified locally: 9/9 PricingTable tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 13:01:40 -07:00
Hongming Wang
2dbd06d52e
Merge pull request #2055 from Molecule-AI/feat/lark-channel-first-class-v2
feat(channels): first-class Lark/Feishu support via schema-driven config
2026-04-24 19:57:57 +00:00
rabbitblood
998cd03265 fix(tabs-a11y): mock config_schema on adapter response
Schema-driven ChannelsTab renders no inputs when config_schema is
absent — the test's bare {type, display_name} mock mismatched the
real API shape and every getByLabelText("Bot Token") failed.

Mock now mirrors GET /channels/adapters with the Telegram schema
(bot_token password + chat_id text) so the a11y assertions run
against the actual rendered form.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 12:04:51 -07:00
molecule-ai[bot]
92a0c0073d
Merge pull request #2058 from Molecule-AI/chore/canvas-node22-upgrade
chore(canvas): upgrade node:20-alpine → node:22-alpine
2026-04-24 19:04:25 +00:00
molecule-ai[bot]
17f29e874a
Merge pull request #2029 from Molecule-AI/fix/canvas-a11y-tabs-v2
fix(canvas/a11y): add type=button to tab toolbar and settings buttons
2026-04-24 19:01:24 +00:00
1e5fc48acb chore(canvas): upgrade node:20-alpine → node:22-alpine
Node.js 20 reaches EOL 2026-09 and actions/checkout@v4 emits
Node.js 20 deprecation warnings on GitHub Actions (Node 24 forced
2026-06-02). Next.js 15.1 is fully compatible with Node 22.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 18:54:30 +00:00
Hongming Wang
04e60e7303
Merge pull request #2052 from Molecule-AI/fix/canvas-provisioning-timeout-runtime-aware
fix(canvas): runtime-aware provisioning-timeout threshold (hermes 12min vs default 2min)
2026-04-24 18:51:46 +00:00
rabbitblood
00265d7028 feat(channels): first-class Lark/Feishu support via schema-driven config
Lark adapter was already implemented in Go (lark.go — outbound Custom Bot
webhook + inbound Event Subscriptions with constant-time token verify),
but the Canvas connect-form hardcoded a Telegram-shaped pair of inputs
(bot_token + chat_id). Selecting "Lark / Feishu" from the dropdown
silently sent the wrong field names — there was no way to enter a
webhook URL.

Fix: move form shape to the server.

- Add `ConfigField` struct + `ConfigSchema()` method to the
  `ChannelAdapter` interface. Each adapter declares its own fields with
  label/type/required/sensitive/placeholder/help.
- Implement per-adapter schemas:
  - Lark: webhook_url (required+sensitive) + verify_token (optional+sensitive)
  - Slack: bot_token/channel_id/webhook_url/username/icon_emoji
  - Discord: webhook_url + optional public_key
  - Telegram: bot_token + chat_id (unchanged UX, keeps Detect Chats)
- Change `ListAdapters()` to return `[]AdapterInfo` with config_schema
  inline. Sorted deterministically by display name so UI ordering is
  stable across Go's random map iteration.
- Update the 3 existing `ListAdapters` test sites to struct access.

Canvas (`ChannelsTab.tsx`):
- Replace the two hardcoded bot_token/chat_id inputs with a single
  schema-driven `SchemaField` component. Renders one input per field in
  the order the adapter returns them.
- Form state becomes `formValues: Record<string,string>` keyed by
  `ConfigField.key`. Values reset on platform-switch so stale
  Telegram credentials can't leak into a new Lark channel.
- "Detect Chats" stays but only renders for platforms in
  `SUPPORTS_DETECT_CHATS` (Telegram only — the only provider with
  getUpdates).
- Only schema-known keys are posted in `config`, scrubbing any stale
  values from previous platform selections.

Regression tests:
- `TestLark_ConfigSchema` locks in the 2-field Lark contract with the
  required/sensitive flags correctly set.
- `TestListAdapters_IncludesLark` confirms registry wiring + schema
  survives round-trip through ListAdapters.

Known pre-existing `TestStripPluginMarkers_AwkScript` failure in
internal/handlers is unrelated to this change (verified via stash+test
on clean staging).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 11:51:15 -07:00
Hongming Wang
0b237ed9dd refactor(canvas): extract runtime profiles to @/lib/runtimeProfiles
Preparation for a "hundreds of runtimes" plugin ecosystem. Keeping the
runtime-specific UX knobs in-line inside ProvisioningTimeout scales badly
— every new runtime would require editing a component, not just adding a
table entry. Other components (create-workspace dialog, workspace card
tooltips, etc.) will want the same runtime metadata.

Changes:

- New file `canvas/src/lib/runtimeProfiles.ts` owns:
  * `RuntimeProfile` type — structural shape, every field optional so
    new runtimes can partially-fill without breaking consumers.
  * `DEFAULT_RUNTIME_PROFILE` — 2-min default floor (docker-fast).
  * `RUNTIME_PROFILES` — named overrides (currently: hermes 12 min).
  * `WorkspaceRuntimeOverrides` — interface for server-provided
    per-workspace overrides, so operators can tune via template
    manifest / workspace metadata without a canvas release.
  * `getRuntimeProfile()` — resolver with
    overrides → profile → default priority.
  * `provisionTimeoutForRuntime()` — convenience wrapper.

- `ProvisioningTimeout.tsx` now delegates to the profile module.
  `DEFAULT_PROVISION_TIMEOUT_MS` re-exported for legacy test importers.

- Tests: 16/16 (up from 9 before the first fix). Adds pinning for:
  * overrides > profile > default priority chain
  * "every entry in RUNTIME_PROFILES resolves to a number" contract
  * backward-compat export

Adding a new slow runtime is now one table entry in
`canvas/src/lib/runtimeProfiles.ts` with a mandatory `WHY` comment.
Moving to server-driven profiles later is a ~10-line change (the
resolver already threads WorkspaceRuntimeOverrides through).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 11:48:39 -07:00
Hongming Wang
9597d262ca fix(canvas): runtime-aware provisioning-timeout threshold
Hermes workspaces cold-boot in 8-13 min (ripgrep + ffmpeg + node22 +
hermes-agent source build + Playwright + Chromium ~300MB). The canvas's
2-min hardcoded "Provisioning Timeout" warning fired at ~2min and told
users their workspace was "stuck" while it was still mid-install. Users
hit Retry, triggering fresh cold boots and cancelling healthy workspaces.

User-facing symptom (reported 2026-04-24 18:35Z): hermes workspace showed
"has been provisioning for 3m 15s — it may have encountered an issue"
with Retry + Cancel buttons, while the EC2 was installing node_modules.

Fix:
- Keep DEFAULT_PROVISION_TIMEOUT_MS = 120_000 (2min) — correct for fast
  docker runtimes (claude-code, langgraph, crewai) where cold boot is
  30-90s.
- Add RUNTIME_TIMEOUT_OVERRIDES_MS = { hermes: 720_000 } (12min).
  Aligns with tests/e2e/test_staging_full_saas.sh's
  PROVISION_TIMEOUT_SECS=900 (15min) so UI warns shortly before the
  backend itself gives up.
- New timeoutForRuntime() resolves the base; per-node lookup in the
  check-timeouts interval so a mixed batch (1 hermes + 2 langgraph) uses
  the right threshold for each.
- timeoutMs prop is now optional. Undefined → per-runtime lookup; a
  number → forces a single threshold for every workspace (tests use this
  for deterministic behavior).

Tests: 4 new cases pinning the runtime-aware resolution, including a
guard that catches future regressions that would weaken hermes's budget.
Existing tests unchanged (they import DEFAULT_PROVISION_TIMEOUT_MS which
still exports 120_000).

13/13 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 11:46:09 -07:00
Molecule AI Core Platform Lead
49fc97e6e4 refactor(canvas): remove unused EmbeddedTeam component from WorkspaceNode
EmbeddedTeam was defined in WorkspaceNode.tsx but had no call site —
TeamMemberChip (which is called directly) covers the same rendering
responsibility. The function was stranded after a prior refactor and
was flagged by github-code-quality on PR #1989 (merged 2026-04-24T14:09Z
without this cleanup because the token died before push).

Removes 25 lines of dead code. MAX_NESTING_DEPTH is kept — it is used
by TeamMemberChip at line 498.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 18:30:36 +00:00
Molecule AI Marketing Lead
de19cf9bae fix(canvas): apply flat-rate pricing copy for Phase 34 launch (Issue #1833)
Rename "Starter" → "Team", update tagline + pricing page hero copy to
lead with flat-rate per-org positioning — deliberate wedge against
Cursor/Windsurf per-seat pricing ($40/seat vs $29/org).

PMM decision: Issue #1833. Approved by Marketing Lead 2026-04-24.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 17:54:23 +00:00
1126d7b66d fix(canvas/a11y): add type=button to tab toolbar and settings buttons
WCAG 4.1.2 / bug #1669 follow-up — fixing remaining buttons missing
type="button" across tab components and settings.

Files changed:
- FilesTab/FilesToolbar.tsx (5 buttons): +New, Upload, Export,
  Clear, ↻ (all had onClick, no type=button)
- config/secrets-section.tsx (7 buttons): Remove, Edit/Update/Cancel
  across 2 SecretRow variants + add-variable form
- config/form-inputs.tsx (2 buttons): tag remove ×, section collapse toggle
- ActivityTab.tsx (1 button): row expand toggle
- TracesTab.tsx (1 button): Refresh
- settings/UnsavedChangesGuard.tsx (2 buttons): Keep editing, Discard
  (Radix AlertDialog asChild wrappers — type=button prevents form submit)

Total: 18 buttons fixed across 6 files. 934/934 tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 14:41:35 +00:00
Hongming Wang
6b62391e5d
Merge pull request #1989 from Molecule-AI/fix/canvas-a11y-final
fix(canvas/a11y): type=button campaign + aria fixes (batch 1-3)
2026-04-24 14:05:27 +00:00
Molecule AI Core Platform Lead
4db7f6f024 fix(canvas): define MAX_NESTING_DEPTH constant in WorkspaceNode.tsx
TeamMemberChip used MAX_NESTING_DEPTH to cap recursive sub-agent
rendering at depth 3, but the constant was never declared — causing
a TypeScript build error ('Cannot find name MAX_NESTING_DEPTH') that
blocked Canvas CI on PR #1989.

Add the constant above EmbeddedTeam with a doc comment explaining its
purpose (guards against circular parentId cycles + readability cap).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 12:52:28 +00:00
9f52ee1777 fix(canvas/WorkspaceNode.tsx): add missing useMemo import
CI failure: "Cannot find name 'useMemo'" at line 363.
useMemo was called but not imported — likely dropped during refactor.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 12:40:52 +00:00
6a96641c37 fix(canvas/a11y): add type="button" to remaining canvas component buttons (batch 3)
WCAG 4.1.2 / bug #1669 follow-up — final batch completing the campaign.
Added type="button" to all buttons missing it across 14 canvas components.

Files changed (14, all additions):
- Toolbar.tsx: Stop All, Restart All, A2A toggle, Audit shortcut, Quick help, Search shortcut, Help close (7)
- MemoryInspectorPanel.tsx: scope tabs, refresh, search clear ×2, expand, delete (6)
- TemplatePalette.tsx: org refresh, toggle, Import Agent, org import, deploy template, palette refresh (6)
- ProvisioningTimeout.tsx: Retry, Cancel Request, View Logs, Keep, Remove Workspace (5)
- ConsoleModal.tsx: close, Copy output, Close (3)
- OnboardingWizard.tsx: Skip guide, action, Next (3)
- ConversationTraceModal.tsx: close ×2 (2)
- WorkspaceNode.tsx: Restart banner, Extract from team (2)
- CommunicationOverlay.tsx: toggle, close panel (2)
- Toaster.tsx: dismiss ×2 (2)
- SearchDialog.tsx: search result button (1)
- TermsGate.tsx: accept (1)
- ErrorBoundary.tsx: Reload (1)
- BundleDropZone.tsx: import trigger (1)

Total campaign (batches 1-3): 27 + 42 = 69 buttons fixed across 24 components.
All 477 canvas vitest tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 12:40:52 +00:00
32a3b84147 fix(canvas/a11y): add type="button" to MissingKeysModal, ContextMenu, CreateWorkspaceDialog tier radio
WCAG 4.1.2 / bug #1669 follow-up — modal + menu buttons need explicit type="button".

- MissingKeysModal.tsx: Save, Open Settings Panel, Cancel Deploy, Add Keys+Deploy (4)
- ContextMenu.tsx: all menuitem buttons (1 — inner menu items loop)
- CreateWorkspaceDialog.tsx: tier radio buttons in dialog (1)

56 vitest tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 12:40:52 +00:00
e14b6d2de4 fix(canvas/a11y): add type="button" to BatchActionBar, EmptyState, SidePanel, CreateWorkspaceDialog
WCAG 4.1.2 / bug #1669 follow-up — buttons without explicit type="button"
default to type="submit", risking accidental form submission.

Added type="button" to all action buttons in:
- BatchActionBar.tsx: Restart All, Pause All, Delete All, Clear Selection (4)
- EmptyState.tsx: template deploy buttons + Create blank (all)
- SidePanel.tsx: close panel, tab switches, Restart Now (3)
- CreateWorkspaceDialog.tsx: open trigger, Cancel, Create (3)

Total this commit: +12 insertions / 2 deletions across 4 files.
Prior commit (c5590c0c): ConfirmDialog + AuditTrailPanel + DeleteCascadeConfirmDialog (+7).
Combined batch: 19 buttons fixed across 7 components.

86 vitest tests pass across all touched test files.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 12:40:52 +00:00
2ff15a38a8 fix(canvas/a11y): add type="button" to ConfirmDialog, AuditTrailPanel, DeleteCascadeConfirmDialog
WCAG 4.1.2 / bug #1669 follow-up — buttons without explicit type="button"
default to type="submit", which triggers accidental form submission when
the button is rendered inside a <form> element.

Added type="button" to all action buttons in:
- ConfirmDialog.tsx: Cancel + confirm buttons (lines 123, 130)
- DeleteCascadeConfirmDialog.tsx: Cancel + Delete All buttons (lines 145, 151)
- AuditTrailPanel.tsx: filter buttons, refresh, load-more (lines 140, 154, 194)

All 51 component tests pass (5 ConfirmDialog, 46 AuditTrailPanel+DeleteCascadeConfirmDialog).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 12:40:52 +00:00
e355f447bb fix(canvas/a11y): add aria-hidden to 6 decorative SVGs + aria-label to OrgTokensTab input
WCAG 1.3.1 — inputs without visible text labels need aria-label.
WCAG 4.1.2 — decorative SVGs inside interactive elements need
aria-hidden so screen readers ignore icon content.

Changes:
- ErrorBoundary: warning triangle SVG — aria-hidden=true
- Toolbar: 4 decorative SVGs — aria-hidden=true
  (Stop All square, Restart Pending arrow, Search magnifier, Help circle)
- SettingsButton: gear icon SVG — aria-hidden=true (parent has aria-label)
- RevealToggle: EyeIcon + EyeOffIcon SVGs — aria-hidden=true
- OrgTokensTab: name input — aria-label="Organization API key label"

Bonus fix: removed duplicate title/aria-label props on Restart All button.

Note: ConsoleModal and DeleteCascadeConfirmDialog do not exist in current
staging (aae0c81) — tab trapping fix inapplicable to this codebase.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 12:40:52 +00:00
59feb65252 fix(canvas/a11y): add type=button to 24 buttons across DetailsTab, ConfigTab, FilesTab, MemoryTab
WCAG 4.1.2 / bug #1669 follow-up — DetailsTab, ConfigTab, FilesTab, and
MemoryTab had buttons without explicit type="button", causing accidental
form submission in any surrounding <form> context.

Changes:
- DetailsTab (9 buttons): Save, Cancel (edit), Restart/Retry, Edit,
  View console output, peer select, Confirm Delete, Cancel (delete), Delete Workspace
- ConfigTab AgentCardSection (3): Save, Cancel, Edit Agent Card
- ConfigTab footer (3): Save & Restart, Save, Reload
- ConfigTab textareas (2): aria-label added to Agent Card JSON editor and Raw YAML editor
- FilesTab (4): Delete All, Cancel, Delete, Cancel
- MemoryTab (11): Expand/Collapse, Open, Expand (collapsed state), Advanced,
  Refresh, Add, Save, Cancel (add form), expand entry, Delete entry, Show

Total: 32 interactive elements corrected across 4 tab components.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 12:39:43 +00:00
Hongming Wang
46fbffb95b fix(canvas/e2e): raise staging-setup deadline 15 min → 20 min
Matches tests/e2e/test_staging_full_saas.sh's 20-min budget (#1930).
Canvas E2E was still stuck at 900s (15 min) which regularly flakes on
tenant cold boots in 12-15 min range — especially on staging where
workspace-server image pulls + AMI bootstrapping add 3-5 min vs prod.

Concrete blocker: 2026-04-24 staging→main sync (#1981) kept failing on
"tenant provision: timed out after 900s" in canvas/e2e/staging-setup.ts
despite the actual sync E2E going green. Canvas-side timeout was
strictly tighter than the sync-side timeout.

Also raises WORKSPACE_ONLINE_TIMEOUT_MS to 20 min to cover the case
where the workspace EC2 is provisioned but hermes cold-install (apt +
uv + hermes-agent clone + gateway boot) takes longer than the original
10-min budget — matches the 20-min workspace deadline in SaaS E2E.

No behavior change when things are fast. Just covers the tail.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 01:26:13 -07:00
molecule-ai[bot]
82d15f4d33
Merge pull request #1859 from Molecule-AI/content-marketer/phase34-launch-post-v2
docs(marketing): Phase 34 launch post v2 — governance-first + tool trace
2026-04-24 07:05:54 +00:00
Hongming Wang
0ef5dad1b1
Merge pull request #1993 from Molecule-AI/fix/auth-redirect-loop-regression-tests
test(auth): add regression tests for redirect loop guards
2026-04-24 06:57:12 +00:00
Hongming Wang
8c80175cd8 fix(canvas): subtree-aware layout + org-import reliability + UX polish
Five tightly-related fixes surfaced while stress-testing org-template
imports (Legal Team, Molecule Company, etc.) on a running control plane:

1) Org import was silently failing — INSERT wrote `collapsed` into the
   `workspaces` table but that column lives on `canvas_layouts`
   (005_canvas_layouts.sql). Every import returned 207 with 0 rows
   created, which `api.post` treated as success → green "Imported"
   toast + empty canvas. Moved the write to canvas_layouts; updated
   the workspace_crud PATCH path to UPSERT there too; refreshed the
   test mock. Added a client-side assertion that throws on
   2xx-with-`error`-body so future partial-failures surface a red
   toast rather than lying about success.

2) Multi-level nested layout was collision-prone: children that were
   themselves parents (CTO → Dev Lead → 6 engineers) got the same
   leaf-sized grid slot as leaf siblings and clipped into each other.
   Added post-order `sizeOfSubtree` + sibling-size-aware
   `childSlotInGrid` on both the Go server and the TS client (kept in
   sync). `buildNodesAndEdges` now uses subtree sizes for both parent
   dimensions and the rescue heuristic. `setCollapsed` on expand now
   reads each child's actual rendered width/height instead of the
   leaf-count formula — a regression test covers the CTO/Dev Lead
   scenario.

3) Provisioning-timeout banner was unusable during large imports: a
   30-workspace tree triggered 27 simultaneous "stuck" warnings 2
   minutes in (server paces + provision concurrency = 3 guarantee tail
   items legitimately wait longer). Scaled threshold with concurrent
   count (base + 45s per queue slot beyond concurrency) and added a
   Dismiss (×) button per banner.

4) Auto pan-and-zoom on org ready: after the last workspace flips out
   of `provisioning`, canvas now fitView's with a 1.2s animation,
   0.25 padding, `maxZoom: 0.8` and `minZoom: 0.25`. Without the zoom
   caps fitView was hitting the component's maxZoom=2 on small trees
   and zooming in instead of out.

5) Toolbar was visually busy: `+ N sub` count wrapped onto a second
   row on narrow viewports; status dot and workspace total were in
   separate border-delimited cells. Merged into one segment with
   `whitespace-nowrap`; A2A / Audit / Search / Help collapsed to
   icon-only 28px buttons with tooltip + aria-label (Figma/Linear
   pattern). Stop All / Restart Pending keep text — they're urgent.

Also:
- `api.{get,post,...}` accept an optional `{ timeoutMs }` so callers
  that hit intentionally-slow endpoints (org import paces 2s between
  siblings) don't trip the 15s default and report false aborts.
- `WorkspaceNode` clamps role text to 2 lines so verbose descriptions
  don't unboundedly grow card height and break the grid.
- `PARENT_HEADER_PADDING` bumped 44→130 to clear name + runtime +
  2-line role + the currentTask banner that appears during the
  initial-prompt phase.

Tests: 930 canvas tests + full Go handler suite pass. Added
regressions for (i) 207 partial-success surfacing as throw, and
(ii) setCollapsed sizing with nested-parent children.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 23:48:29 -07:00
e9be12210f test(auth): add regression tests for redirect loop guards
AuthGate now skips session fetch for /cp/auth/* paths, and
redirectToLogin guards against re-setting window.location when
already on an auth path. Both guards had no test coverage —
a future refactor could silently reintroduce the redirect loop.

Added:
- AuthGate.test.tsx: 2 cases covering /cp/auth/login and
  /cp/auth/signup path skipping (no fetchSession call, no
  redirectToLogin call, children rendered)
- auth.test.ts: 2 cases covering redirectToLogin early return
  for /cp/auth/login and /cp/auth/signup paths

Fixes: Molecule-AI/molecule-core#1541

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 06:30:35 +00:00
Hongming Wang
d53583f9c6 Merge remote-tracking branch 'origin/staging' into fix/restore-quickstart-plus-hotfixes 2026-04-23 21:04:55 -07:00
Hongming Wang
2d6ff11c4e fix(canvas): re-sort parents-before-children after nest mutation
React Flow requires parent nodes to appear before their children in
the nodes array. When they don't, it logs "Parent node {id} not
found. Please make sure that parent nodes are in front of their
child nodes in the nodes array" and — more importantly — renders
the child at canvas-absolute coords instead of parent-relative,
flashing it far outside the parent.

topology's buildNodesAndEdges already enforced this at hydrate, but
nestNode + batchNest weren't re-sorting after mutating parentId.
A freshly-nested child often ended up after-first-drag at the
wrong screen position because its new parent sat later in the
array than itself.

Extract sortParentsBeforeChildren() into canvas-topology as a
reusable DFS visit; call it at the tail of both nestNode's set()
and batchNest's commit set(). 923 tests still green — no behaviour
change beyond eliminating the warning and the position flash.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 21:00:40 -07:00
Hongming Wang
2a8977c946 fix(canvas): cancel-nest also shrinks the parent back
Canceling the nest/extract dialog restored the child's position but
left the parent card at its auto-grown size. growParentsToFitChildren
fires on drag-stop to fit a then-outside child; when the drag is
subsequently cancelled, the parent keeps that grown width/height
forever because the grow pass is grow-only.

Strip width/height from the ex-parent alongside the child position
restore in cancelNest — React Flow re-measures from CSS, parent
collapses back to its natural size. Same trick nestNode already
uses for the un-nest path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 20:56:08 -07:00
Hongming Wang
09053dfdeb fix(canvas): cancel-nest restores position; un-nest shrinks parent
Two follow-up polish items for drag-and-nest:

1. Cancelling the "Extract from team?" dialog now snaps the
   dragged card back to where the drag started. Before, a user
   who dragged a child out, saw the confirm dialog, then clicked
   Cancel ended up with the card stranded outside the parent at
   its drop-point position — which also got persisted via
   savePosition on drag-stop. Now onNodeDragStart captures the
   pre-drag position + parent, and cancelNest restores both the
   RF node position and fires savePosition with the absolute
   pre-drag coords so reload matches.

2. Un-nesting now clears the ex-parent's explicit width/height
   in the nodes array. growParentsToFitChildren is grow-only so
   it could never shrink the parent back down after a child
   left; the card stayed at its auto-grown size with empty
   space. Stripping width/height lets React Flow re-measure from
   the card's own min-width / min-height CSS, so the parent
   visually shrinks to fit whatever children remain.

923 canvas tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 20:52:28 -07:00
Hongming Wang
512fdfd59d fix(canvas): plain drag out of parent un-nests again
Un-nest used to require holding Alt (or Cmd to force-detach). That
was too conservative — when a user dragged a child clearly outside
its parent's bbox, nothing happened on release, because the default
branch soft-clamped back and only the Alt branch actually opened
the "Extract?" confirm. Matches the exact bug the user just flagged
("I can put agents in other agent, but when I drag it out, it does
not move out").

New rules:
 * Past the 20 % hysteresis → confirm un-nest. Plain drag, no
   modifier. This is what most users expect (Miro / Figma behave
   the same way — drag outside the frame and the shape leaves it).
 * Inside or within 20 % of the edge → soft-clamp back inside.
   Guards against twitchy releases that momentarily overshoot the
   edge by a few pixels.
 * Cmd / Ctrl → force un-nest regardless of overlap. Escape-hatch
   for when the user dragged within the hysteresis zone but really
   wants out.
 * Dropping onto a different parent → nest there (unchanged).

Alt is no longer a required modifier for un-nesting. Keeps it as
a non-gesture modifier only; no meaning unless we re-bind it later.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 20:48:38 -07:00
Hongming Wang
f2a4b6e0d3 fix: dev-mode bypass for IP rate limiter + 429 retry on GET
The 600-req/min/IP bucket is sized for SaaS where each tenant has
a distinct client IP. On a local Docker setup every panel shares
one IP — hydration (/workspaces + /templates + /org/templates +
/approvals/pending) plus polling (A2A overlay + activity tabs +
approvals + schedule + channels + audit trail) can burst past the
bucket inside a minute, blanking the canvas with 429s. The user
reported it after dragging workspaces — dragging itself is
release-only (savePosition in onNodeDragStop), but the polling
that's always running added onto startup tripped the limit.

Two-layer fix:

Server: RateLimiter.Middleware short-circuits when isDevModeFailOpen
is true (MOLECULE_ENV=development + empty ADMIN_TOKEN), matching
the Tier-1b hatch already applied to AdminAuth, WorkspaceAuth, and
discovery. SaaS production keeps the bucket.

Client: api.ts auto-retries a single 429 on idempotent GET requests,
waiting the server-provided Retry-After (capped at 20s). Mutations
(POST/PUT/PATCH/DELETE) never auto-retry to avoid double-applying.
Users on SaaS hitting a legitimate rate-limit spike get one
transparent recovery instead of an immediately-blank Canvas.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 20:44:09 -07:00
Hongming Wang
286dcbfd1e fix(canvas,org): collapse org-imported parents on first paint
Importing a 15-workspace org template dropped every child as a
freely-positioned card into its parent's coordinate space. Parents
with 5-10 kids had the kids spill below the parent's initial min
size, producing the "ugly default" layout the user just flagged —
a mess of overlapping cards the moment the import completed.

Fix: every workspace in an org-template import that HAS children
is inserted with `collapsed = true`. Leaf workspaces stay
expanded (nothing to hide). The canvas renders a collapsed
parent as a compact header-only card with its "N sub" badge —
visually identical to the pre-refactor default the user asked for.

Double-click on a collapsed parent now EXPANDS it (flipping
`collapsed` locally + persisting via PATCH) so the user can drill
in to see the subtree. Only once expanded does a second
double-click zoom-to-team, matching the prior behaviour.

Leaf-first creation order stays the same; the collapsed flag
just means "render compact" not "hide from API".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 20:36:55 -07:00
Hongming Wang
507696d88a fix(canvas,server): address review findings on 3f11df03
Five review findings from the 3f11df03 six-bug commit:

1. Add TestPeers_DevModeFailOpen_{Allows,ClosedWhenAdminTokenSet,
   ClosedInProduction} covering all three gating states for the
   security-sensitive dev-mode hatch the prior commit added to
   /registry/:id/peers. Previously shipped untested — a future
   refactor could have silently inverted polarity or removed the
   gate. New tests pin the contract:
     * MOLECULE_ENV=development + ADMIN_TOKEN="" → allow bearerless
     * MOLECULE_ENV=development + ADMIN_TOKEN set → require token
     * MOLECULE_ENV=production                    → require token

2. ConfigTab handleSave diffs against the RAW parsed YAML / form
   config instead of the DEFAULT_CONFIG-merged shape. The previous
   code would silently PATCH tier=1 to the DB when a user deleted
   the `tier:` line in raw mode (the default-merge substituted 1).
   Now: only fields the user actually typed participate in the
   diff. Type guards (typeof === "number" / "string") prevent
   coercion surprises on malformed YAML.

3. ConfigTab model-save failure no longer lies "Saved". The
   /workspaces/:id/model PATCH can reject when the runtime doesn't
   support the chosen model; previously we caught + console.warn'd
   + showed green Saved, and the user watched the model revert on
   next reload with no explanation. Now the save path collects a
   `modelSaveError` and surfaces it via setError with a partial-
   success message ("Other fields saved, but model update failed:
   …") so the user sees why.

4. ChannelsTab now surfaces BOTH channels-fetch and adapters-fetch
   failures, distinguishing them in the error text ("Failed to
   load connected channels and platforms — try refreshing").
   Previously only an adapters failure was visible; a channels
   failure left the user with an apparently-empty list and no
   indication the API was unreachable.

5. ChatTab panels drop the redundant aria-hidden attribute. The
   `hidden`/`flex` Tailwind class already sets display:none, which
   removes the node from the accessibility tree on its own; the
   extra aria-hidden invited WAI-ARIA lint warnings if a focusable
   descendant ever landed inside an inactive panel.

Tests: 923 canvas + full Go handler suite pass. 3 new Go tests.
No behaviour change on the five prior fixes — this commit tightens
their edges per the independent review.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 20:29:44 -07:00
Hongming Wang
3f11df031c fix: six UX bugs (peers auth, scroll, chat tabs, config persist, + visibility)
Six bugs reported from a live session — all shippable in one commit:

1. Peers tab 401 on local Docker. The /registry/:id/peers endpoint
   demands a workspace-scoped bearer token (validateDiscoveryCaller)
   which the canvas session doesn't hold. Added the same Tier-1b
   dev-mode fail-open hatch that AdminAuth and WorkspaceAuth already
   use — gated by MOLECULE_ENV=development + empty ADMIN_TOKEN, so
   SaaS production stays strict. Exported IsDevModeFailOpen from the
   middleware package for the handler layer to reuse.

2. Org Templates list unscrollable. OrgTemplatesSection was rendered
   in the TemplatePalette footer — a div without overflow — so when
   it expanded to 15+ entries the list extended past the viewport
   with no scroll. Moved it to the top of the flex-1 overflow-y-auto
   container. Tall lists now scroll naturally.

3. Chat tab: "My Chat" and "Agent Comms" rendered stacked instead
   of switching. HTML `hidden` attribute was being overridden by
   Tailwind's `flex` class (display: flex beats the attribute),
   so both tabpanels rendered concurrently. Swapped to a conditional
   Tailwind `hidden`/`flex` class so the inactive panel is
   display:none with proper CSS specificity.

4. Hermes Config form never persists. handleSave wrote config.yaml
   but name / tier / runtime / model all live on the workspace row
   (or the dedicated /workspaces/:id/model endpoint) — the form
   edited in-memory, the request returned 200, the next reload
   wiped everything back. Hermes + external runtimes manage their
   own config inside the container anyway, so writing config.yaml
   is a no-op for them; skip it. Always diff and PATCH the DB-backed
   fields that actually changed.

5. Channels "+ Connect" dropdown empty on first open. ChannelsTab's
   load() used Promise.all with a silent catch — if EITHER the
   channels or adapters fetch failed, both setters were skipped
   with no error visible. Switched to Promise.allSettled so each
   endpoint settles independently, and the adapters failure now
   surfaces via the top-level error state.

6. Plugin registry always "No plugins in registry". Same silent
   catch pattern in SkillsTab.tsx — load errors for /plugins,
   /plugins/sources, and /workspaces/:id/plugins swallowed without
   logging. Replaced the empty catches with console.warn so future
   failures are at least visible in devtools.

Tests: 923 passing (unchanged). Go handler tests pass. Server
rebuilt and running with the peers-auth + collapsed-persistence
fixes (pid 15875).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 20:18:30 -07:00
Molecule AI App & Docs Lead
3715c06e0b fix(canvas): remove stale firstInputRef useEffect from AllKeysModal
AllKeysModal already handles focus via autoFocus={index === 0} on the
first input and a separate title-focus effect. The orphaned useEffect
referencing firstInputRef (declared only in ProviderPickerModal) caused
a TypeScript build error: "Cannot find name 'firstInputRef'".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 03:11:36 +00:00
746cb22855 fix(canvas/tests): normalize useCanvasStore mock pattern in test files
Standardize the mock for useCanvasStore to always expose getState()
(used by production ContextMenu to filter parent nodes). Applies the
same Object.assign-wrapping pattern introduced in #1744 to:
- ClaudeSettings.test.tsx
- tabs.a11y.test.tsx
- ContextMenu.keyboard.test.tsx (mockStore shape alignment)
2026-04-24 03:10:18 +00:00
680f1f50f2 fix(canvas/a11y): restore aria-hidden on backdrop div after cherry-pick conflict
Cherry-pick from #1744 left the backdrop div without aria-hidden="true"
(the outer dialog div got it instead). Re-apply aria-hidden="true" to
the backdrop div so screen readers skip the clickable overlay layer.

Also revert test assertion from bg-black → bg-black/70 to match the
exact class applied to the backdrop div.
2026-04-24 03:10:18 +00:00
Hongming Wang
4fd7f1e84c fix(canvas): tighten rescue + cap toast + cover paths with tests
Three follow-up review findings from the c2b2e13a review:

1. Rescue heuristic uses pure bbox-non-overlap. The previous
   `position.x < 0` branch rescued any child whose parent was
   later dragged past it, even when the layout was clearly
   recoverable (e.g. relative -40, child still overlaps parent).
   New rule: rescue iff the child's bbox has zero overlap with
   the parent's bbox — self-calibrating, scales with user-resized
   parents, catches screenshot-case and legacy huge-positive data.

2. Toast caps failed-name list at 3 and appends "and N more".
   Stops a 50-node partial failure from overflowing the toast
   container.

3. Cycle guard on selection-roots walk in batchNest. Corrupt
   parentId data can't send the loop infinite now. Cheap
   defensive guard — one Set per selected node.

Tests added (923 total, up from 918):
 * canvas-topology.test: 4 rescue scenarios — screenshot case
   (zero-overlap rescue), negative drift kept, huge-positive
   rescued, user-resized layout kept.
 * canvas.test: selection-roots filter on a 3-level chain.
 * workspace_crud test: PATCH {collapsed:true} runs the UPDATE.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 20:08:14 -07:00
Hongming Wang
c2b2e13abe fix(canvas): address code-review findings on the Canvas refactor
Five issues surfaced in the review of 50b53784. Each was either a real
bug waiting to hit users or a silent failure mode.

1. Topology rescue no longer teleports user-resized children.
   Rescue was comparing against parentMinSize(childCount), so any
   child the user had placed in space the parent was resized into
   got snapped to the default grid on reload — undoing the layout.
   Now rescue fires only on obviously corrupt data: negative
   relative coords (legacy pre-nesting absolute positions that
   landed above/left of their assigned parent) or values past an
   MAX_PLAUSIBLE_OFFSET threshold. Children just-past the initial
   minimum are left alone.

2. batchNest now filters to selection-roots before planning.
   Previously selecting both A and A's descendant B and dragging
   into T yanked B out of A to become a sibling under T. Users
   reasonably expect the A subtree to move intact. The new pass
   drops any selected node whose ancestor is also selected —
   those follow their ancestor via React Flow's parent binding.

3. batchNest surfaces partial failure via showToast. Previously
   silent: 2 of 5 PATCHes fail, user sees 3 cards re-parented + 2
   snapped back with no explanation. Now names the failed cards.

4. confirmNest closes the nest dialog BEFORE dispatching the async
   store action, so a second drag can't kick off a competing batch
   while the first is still in flight.

5. collapsed is now persisted. The Go workspace_crud.go Update
   handler ignored the `collapsed` field, so user-initiated
   collapse round-tripped to an expanded state on next hydrate.
   Added the PATCH branch (`UPDATE workspaces SET collapsed = ...`)
   so the state survives reload.

Nits cleaned:
 * Removed dead dragStartParentRef in useDragHandlers.
 * Swapped redundant `node.data as WorkspaceNodeData` casts for a
   named WorkspaceNode type alias.
 * Canvas.tsx SR-live region now reads n.parentId (matches
   MiniMap + RF's native field) instead of the mirror n.data.parentId.

Tests added (918 total, up from 915):
 * batchNest happy path — 2-root selection fires 2 combined PATCHes
   carrying parent_id + x + y, not 2×N sequential round-trips.
 * batchNest ancestor+descendant selection — subtree stays intact.
 * batchNest partial failure rollback — only the rejected nodes
   revert; successful ones stay committed.

Backend change is single-line (collapsed PATCH branch); all
workspace_crud Go tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 19:58:44 -07:00