molecule-core

Author	SHA1	Message	Date
Hongming Wang	2dbd06d52e	Merge pull request #2055 from Molecule-AI/feat/lark-channel-first-class-v2 feat(channels): first-class Lark/Feishu support via schema-driven config	2026-04-24 19:57:57 +00:00
rabbitblood	998cd03265	fix(tabs-a11y): mock config_schema on adapter response Schema-driven ChannelsTab renders no inputs when config_schema is absent — the test's bare {type, display_name} mock mismatched the real API shape and every getByLabelText("Bot Token") failed. Mock now mirrors GET /channels/adapters with the Telegram schema (bot_token password + chat_id text) so the a11y assertions run against the actual rendered form. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 12:04:51 -07:00
molecule-ai[bot]	92a0c0073d	Merge pull request #2058 from Molecule-AI/chore/canvas-node22-upgrade chore(canvas): upgrade node:20-alpine → node:22-alpine	2026-04-24 19:04:25 +00:00
molecule-ai[bot]	17f29e874a	Merge pull request #2029 from Molecule-AI/fix/canvas-a11y-tabs-v2 fix(canvas/a11y): add type=button to tab toolbar and settings buttons	2026-04-24 19:01:24 +00:00
Molecule AI Core-DevOps	1e5fc48acb	chore(canvas): upgrade node:20-alpine → node:22-alpine Node.js 20 reaches EOL 2026-09 and actions/checkout@v4 emits Node.js 20 deprecation warnings on GitHub Actions (Node 24 forced 2026-06-02). Next.js 15.1 is fully compatible with Node 22. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-24 18:54:30 +00:00
Hongming Wang	04e60e7303	Merge pull request #2052 from Molecule-AI/fix/canvas-provisioning-timeout-runtime-aware fix(canvas): runtime-aware provisioning-timeout threshold (hermes 12min vs default 2min)	2026-04-24 18:51:46 +00:00
rabbitblood	00265d7028	feat(channels): first-class Lark/Feishu support via schema-driven config Lark adapter was already implemented in Go (lark.go — outbound Custom Bot webhook + inbound Event Subscriptions with constant-time token verify), but the Canvas connect-form hardcoded a Telegram-shaped pair of inputs (bot_token + chat_id). Selecting "Lark / Feishu" from the dropdown silently sent the wrong field names — there was no way to enter a webhook URL. Fix: move form shape to the server. - Add `ConfigField` struct + `ConfigSchema()` method to the `ChannelAdapter` interface. Each adapter declares its own fields with label/type/required/sensitive/placeholder/help. - Implement per-adapter schemas: - Lark: webhook_url (required+sensitive) + verify_token (optional+sensitive) - Slack: bot_token/channel_id/webhook_url/username/icon_emoji - Discord: webhook_url + optional public_key - Telegram: bot_token + chat_id (unchanged UX, keeps Detect Chats) - Change `ListAdapters()` to return `[]AdapterInfo` with config_schema inline. Sorted deterministically by display name so UI ordering is stable across Go's random map iteration. - Update the 3 existing `ListAdapters` test sites to struct access. Canvas (`ChannelsTab.tsx`): - Replace the two hardcoded bot_token/chat_id inputs with a single schema-driven `SchemaField` component. Renders one input per field in the order the adapter returns them. - Form state becomes `formValues: Record<string,string>` keyed by `ConfigField.key`. Values reset on platform-switch so stale Telegram credentials can't leak into a new Lark channel. - "Detect Chats" stays but only renders for platforms in `SUPPORTS_DETECT_CHATS` (Telegram only — the only provider with getUpdates). - Only schema-known keys are posted in `config`, scrubbing any stale values from previous platform selections. Regression tests: - `TestLark_ConfigSchema` locks in the 2-field Lark contract with the required/sensitive flags correctly set. - `TestListAdapters_IncludesLark` confirms registry wiring + schema survives round-trip through ListAdapters. Known pre-existing `TestStripPluginMarkers_AwkScript` failure in internal/handlers is unrelated to this change (verified via stash+test on clean staging). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 11:51:15 -07:00
Hongming Wang	0b237ed9dd	refactor(canvas): extract runtime profiles to @/lib/runtimeProfiles Preparation for a "hundreds of runtimes" plugin ecosystem. Keeping the runtime-specific UX knobs in-line inside ProvisioningTimeout scales badly — every new runtime would require editing a component, not just adding a table entry. Other components (create-workspace dialog, workspace card tooltips, etc.) will want the same runtime metadata. Changes: - New file `canvas/src/lib/runtimeProfiles.ts` owns: * `RuntimeProfile` type — structural shape, every field optional so new runtimes can partially-fill without breaking consumers. * `DEFAULT_RUNTIME_PROFILE` — 2-min default floor (docker-fast). * `RUNTIME_PROFILES` — named overrides (currently: hermes 12 min). * `WorkspaceRuntimeOverrides` — interface for server-provided per-workspace overrides, so operators can tune via template manifest / workspace metadata without a canvas release. * `getRuntimeProfile()` — resolver with overrides → profile → default priority. * `provisionTimeoutForRuntime()` — convenience wrapper. - `ProvisioningTimeout.tsx` now delegates to the profile module. `DEFAULT_PROVISION_TIMEOUT_MS` re-exported for legacy test importers. - Tests: 16/16 (up from 9 before the first fix). Adds pinning for: * overrides > profile > default priority chain * "every entry in RUNTIME_PROFILES resolves to a number" contract * backward-compat export Adding a new slow runtime is now one table entry in `canvas/src/lib/runtimeProfiles.ts` with a mandatory `WHY` comment. Moving to server-driven profiles later is a ~10-line change (the resolver already threads WorkspaceRuntimeOverrides through). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 11:48:39 -07:00
Hongming Wang	9597d262ca	fix(canvas): runtime-aware provisioning-timeout threshold Hermes workspaces cold-boot in 8-13 min (ripgrep + ffmpeg + node22 + hermes-agent source build + Playwright + Chromium ~300MB). The canvas's 2-min hardcoded "Provisioning Timeout" warning fired at ~2min and told users their workspace was "stuck" while it was still mid-install. Users hit Retry, triggering fresh cold boots and cancelling healthy workspaces. User-facing symptom (reported 2026-04-24 18:35Z): hermes workspace showed "has been provisioning for 3m 15s — it may have encountered an issue" with Retry + Cancel buttons, while the EC2 was installing node_modules. Fix: - Keep DEFAULT_PROVISION_TIMEOUT_MS = 120_000 (2min) — correct for fast docker runtimes (claude-code, langgraph, crewai) where cold boot is 30-90s. - Add RUNTIME_TIMEOUT_OVERRIDES_MS = { hermes: 720_000 } (12min). Aligns with tests/e2e/test_staging_full_saas.sh's PROVISION_TIMEOUT_SECS=900 (15min) so UI warns shortly before the backend itself gives up. - New timeoutForRuntime() resolves the base; per-node lookup in the check-timeouts interval so a mixed batch (1 hermes + 2 langgraph) uses the right threshold for each. - timeoutMs prop is now optional. Undefined → per-runtime lookup; a number → forces a single threshold for every workspace (tests use this for deterministic behavior). Tests: 4 new cases pinning the runtime-aware resolution, including a guard that catches future regressions that would weaken hermes's budget. Existing tests unchanged (they import DEFAULT_PROVISION_TIMEOUT_MS which still exports 120_000). 13/13 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 11:46:09 -07:00
Molecule AI Core Platform Lead	49fc97e6e4	refactor(canvas): remove unused EmbeddedTeam component from WorkspaceNode EmbeddedTeam was defined in WorkspaceNode.tsx but had no call site — TeamMemberChip (which is called directly) covers the same rendering responsibility. The function was stranded after a prior refactor and was flagged by github-code-quality on PR #1989 (merged 2026-04-24T14:09Z without this cleanup because the token died before push). Removes 25 lines of dead code. MAX_NESTING_DEPTH is kept — it is used by TeamMemberChip at line 498. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-24 18:30:36 +00:00
Molecule AI Core-UIUX	1126d7b66d	fix(canvas/a11y): add type=button to tab toolbar and settings buttons WCAG 4.1.2 / bug #1669 follow-up — fixing remaining buttons missing type="button" across tab components and settings. Files changed: - FilesTab/FilesToolbar.tsx (5 buttons): +New, Upload, Export, Clear, ↻ (all had onClick, no type=button) - config/secrets-section.tsx (7 buttons): Remove, Edit/Update/Cancel across 2 SecretRow variants + add-variable form - config/form-inputs.tsx (2 buttons): tag remove ×, section collapse toggle - ActivityTab.tsx (1 button): row expand toggle - TracesTab.tsx (1 button): Refresh - settings/UnsavedChangesGuard.tsx (2 buttons): Keep editing, Discard (Radix AlertDialog asChild wrappers — type=button prevents form submit) Total: 18 buttons fixed across 6 files. 934/934 tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-24 14:41:35 +00:00
Hongming Wang	6b62391e5d	Merge pull request #1989 from Molecule-AI/fix/canvas-a11y-final fix(canvas/a11y): type=button campaign + aria fixes (batch 1-3)	2026-04-24 14:05:27 +00:00
Molecule AI Core Platform Lead	4db7f6f024	fix(canvas): define MAX_NESTING_DEPTH constant in WorkspaceNode.tsx TeamMemberChip used MAX_NESTING_DEPTH to cap recursive sub-agent rendering at depth 3, but the constant was never declared — causing a TypeScript build error ('Cannot find name MAX_NESTING_DEPTH') that blocked Canvas CI on PR #1989. Add the constant above EmbeddedTeam with a doc comment explaining its purpose (guards against circular parentId cycles + readability cap). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-24 12:52:28 +00:00
Molecule AI Core-UIUX	9f52ee1777	fix(canvas/WorkspaceNode.tsx): add missing useMemo import CI failure: "Cannot find name 'useMemo'" at line 363. useMemo was called but not imported — likely dropped during refactor. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-24 12:40:52 +00:00
Molecule AI Core-UIUX	6a96641c37	fix(canvas/a11y): add type="button" to remaining canvas component buttons (batch 3) WCAG 4.1.2 / bug #1669 follow-up — final batch completing the campaign. Added type="button" to all buttons missing it across 14 canvas components. Files changed (14, all additions): - Toolbar.tsx: Stop All, Restart All, A2A toggle, Audit shortcut, Quick help, Search shortcut, Help close (7) - MemoryInspectorPanel.tsx: scope tabs, refresh, search clear ×2, expand, delete (6) - TemplatePalette.tsx: org refresh, toggle, Import Agent, org import, deploy template, palette refresh (6) - ProvisioningTimeout.tsx: Retry, Cancel Request, View Logs, Keep, Remove Workspace (5) - ConsoleModal.tsx: close, Copy output, Close (3) - OnboardingWizard.tsx: Skip guide, action, Next (3) - ConversationTraceModal.tsx: close ×2 (2) - WorkspaceNode.tsx: Restart banner, Extract from team (2) - CommunicationOverlay.tsx: toggle, close panel (2) - Toaster.tsx: dismiss ×2 (2) - SearchDialog.tsx: search result button (1) - TermsGate.tsx: accept (1) - ErrorBoundary.tsx: Reload (1) - BundleDropZone.tsx: import trigger (1) Total campaign (batches 1-3): 27 + 42 = 69 buttons fixed across 24 components. All 477 canvas vitest tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-24 12:40:52 +00:00
Molecule AI Core-UIUX	32a3b84147	fix(canvas/a11y): add type="button" to MissingKeysModal, ContextMenu, CreateWorkspaceDialog tier radio WCAG 4.1.2 / bug #1669 follow-up — modal + menu buttons need explicit type="button". - MissingKeysModal.tsx: Save, Open Settings Panel, Cancel Deploy, Add Keys+Deploy (4) - ContextMenu.tsx: all menuitem buttons (1 — inner menu items loop) - CreateWorkspaceDialog.tsx: tier radio buttons in dialog (1) 56 vitest tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-24 12:40:52 +00:00
Molecule AI Core-UIUX	e14b6d2de4	fix(canvas/a11y): add type="button" to BatchActionBar, EmptyState, SidePanel, CreateWorkspaceDialog WCAG 4.1.2 / bug #1669 follow-up — buttons without explicit type="button" default to type="submit", risking accidental form submission. Added type="button" to all action buttons in: - BatchActionBar.tsx: Restart All, Pause All, Delete All, Clear Selection (4) - EmptyState.tsx: template deploy buttons + Create blank (all) - SidePanel.tsx: close panel, tab switches, Restart Now (3) - CreateWorkspaceDialog.tsx: open trigger, Cancel, Create (3) Total this commit: +12 insertions / 2 deletions across 4 files. Prior commit (c5590c0c): ConfirmDialog + AuditTrailPanel + DeleteCascadeConfirmDialog (+7). Combined batch: 19 buttons fixed across 7 components. 86 vitest tests pass across all touched test files. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-24 12:40:52 +00:00
Molecule AI Core-UIUX	2ff15a38a8	fix(canvas/a11y): add type="button" to ConfirmDialog, AuditTrailPanel, DeleteCascadeConfirmDialog WCAG 4.1.2 / bug #1669 follow-up — buttons without explicit type="button" default to type="submit", which triggers accidental form submission when the button is rendered inside a <form> element. Added type="button" to all action buttons in: - ConfirmDialog.tsx: Cancel + confirm buttons (lines 123, 130) - DeleteCascadeConfirmDialog.tsx: Cancel + Delete All buttons (lines 145, 151) - AuditTrailPanel.tsx: filter buttons, refresh, load-more (lines 140, 154, 194) All 51 component tests pass (5 ConfirmDialog, 46 AuditTrailPanel+DeleteCascadeConfirmDialog). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-24 12:40:52 +00:00
Molecule AI Core-UIUX	e355f447bb	fix(canvas/a11y): add aria-hidden to 6 decorative SVGs + aria-label to OrgTokensTab input WCAG 1.3.1 — inputs without visible text labels need aria-label. WCAG 4.1.2 — decorative SVGs inside interactive elements need aria-hidden so screen readers ignore icon content. Changes: - ErrorBoundary: warning triangle SVG — aria-hidden=true - Toolbar: 4 decorative SVGs — aria-hidden=true (Stop All square, Restart Pending arrow, Search magnifier, Help circle) - SettingsButton: gear icon SVG — aria-hidden=true (parent has aria-label) - RevealToggle: EyeIcon + EyeOffIcon SVGs — aria-hidden=true - OrgTokensTab: name input — aria-label="Organization API key label" Bonus fix: removed duplicate title/aria-label props on Restart All button. Note: ConsoleModal and DeleteCascadeConfirmDialog do not exist in current staging (aae0c81) — tab trapping fix inapplicable to this codebase. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-24 12:40:52 +00:00
Molecule AI Core-UIUX	59feb65252	fix(canvas/a11y): add type=button to 24 buttons across DetailsTab, ConfigTab, FilesTab, MemoryTab WCAG 4.1.2 / bug #1669 follow-up — DetailsTab, ConfigTab, FilesTab, and MemoryTab had buttons without explicit type="button", causing accidental form submission in any surrounding <form> context. Changes: - DetailsTab (9 buttons): Save, Cancel (edit), Restart/Retry, Edit, View console output, peer select, Confirm Delete, Cancel (delete), Delete Workspace - ConfigTab AgentCardSection (3): Save, Cancel, Edit Agent Card - ConfigTab footer (3): Save & Restart, Save, Reload - ConfigTab textareas (2): aria-label added to Agent Card JSON editor and Raw YAML editor - FilesTab (4): Delete All, Cancel, Delete, Cancel - MemoryTab (11): Expand/Collapse, Open, Expand (collapsed state), Advanced, Refresh, Add, Save, Cancel (add form), expand entry, Delete entry, Show Total: 32 interactive elements corrected across 4 tab components. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-24 12:39:43 +00:00
Hongming Wang	46fbffb95b	fix(canvas/e2e): raise staging-setup deadline 15 min → 20 min Matches tests/e2e/test_staging_full_saas.sh's 20-min budget (#1930). Canvas E2E was still stuck at 900s (15 min) which regularly flakes on tenant cold boots in 12-15 min range — especially on staging where workspace-server image pulls + AMI bootstrapping add 3-5 min vs prod. Concrete blocker: 2026-04-24 staging→main sync (#1981) kept failing on "tenant provision: timed out after 900s" in canvas/e2e/staging-setup.ts despite the actual sync E2E going green. Canvas-side timeout was strictly tighter than the sync-side timeout. Also raises WORKSPACE_ONLINE_TIMEOUT_MS to 20 min to cover the case where the workspace EC2 is provisioned but hermes cold-install (apt + uv + hermes-agent clone + gateway boot) takes longer than the original 10-min budget — matches the 20-min workspace deadline in SaaS E2E. No behavior change when things are fast. Just covers the tail. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 01:26:13 -07:00
molecule-ai[bot]	82d15f4d33	Merge pull request #1859 from Molecule-AI/content-marketer/phase34-launch-post-v2 docs(marketing): Phase 34 launch post v2 — governance-first + tool trace	2026-04-24 07:05:54 +00:00
Hongming Wang	0ef5dad1b1	Merge pull request #1993 from Molecule-AI/fix/auth-redirect-loop-regression-tests test(auth): add regression tests for redirect loop guards	2026-04-24 06:57:12 +00:00
Hongming Wang	8c80175cd8	fix(canvas): subtree-aware layout + org-import reliability + UX polish Five tightly-related fixes surfaced while stress-testing org-template imports (Legal Team, Molecule Company, etc.) on a running control plane: 1) Org import was silently failing — INSERT wrote `collapsed` into the `workspaces` table but that column lives on `canvas_layouts` (005_canvas_layouts.sql). Every import returned 207 with 0 rows created, which `api.post` treated as success → green "Imported" toast + empty canvas. Moved the write to canvas_layouts; updated the workspace_crud PATCH path to UPSERT there too; refreshed the test mock. Added a client-side assertion that throws on 2xx-with-`error`-body so future partial-failures surface a red toast rather than lying about success. 2) Multi-level nested layout was collision-prone: children that were themselves parents (CTO → Dev Lead → 6 engineers) got the same leaf-sized grid slot as leaf siblings and clipped into each other. Added post-order `sizeOfSubtree` + sibling-size-aware `childSlotInGrid` on both the Go server and the TS client (kept in sync). `buildNodesAndEdges` now uses subtree sizes for both parent dimensions and the rescue heuristic. `setCollapsed` on expand now reads each child's actual rendered width/height instead of the leaf-count formula — a regression test covers the CTO/Dev Lead scenario. 3) Provisioning-timeout banner was unusable during large imports: a 30-workspace tree triggered 27 simultaneous "stuck" warnings 2 minutes in (server paces + provision concurrency = 3 guarantee tail items legitimately wait longer). Scaled threshold with concurrent count (base + 45s per queue slot beyond concurrency) and added a Dismiss (×) button per banner. 4) Auto pan-and-zoom on org ready: after the last workspace flips out of `provisioning`, canvas now fitView's with a 1.2s animation, 0.25 padding, `maxZoom: 0.8` and `minZoom: 0.25`. Without the zoom caps fitView was hitting the component's maxZoom=2 on small trees and zooming in instead of out. 5) Toolbar was visually busy: `+ N sub` count wrapped onto a second row on narrow viewports; status dot and workspace total were in separate border-delimited cells. Merged into one segment with `whitespace-nowrap`; A2A / Audit / Search / Help collapsed to icon-only 28px buttons with tooltip + aria-label (Figma/Linear pattern). Stop All / Restart Pending keep text — they're urgent. Also: - `api.{get,post,...}` accept an optional `{ timeoutMs }` so callers that hit intentionally-slow endpoints (org import paces 2s between siblings) don't trip the 15s default and report false aborts. - `WorkspaceNode` clamps role text to 2 lines so verbose descriptions don't unboundedly grow card height and break the grid. - `PARENT_HEADER_PADDING` bumped 44→130 to clear name + runtime + 2-line role + the currentTask banner that appears during the initial-prompt phase. Tests: 930 canvas tests + full Go handler suite pass. Added regressions for (i) 207 partial-success surfacing as throw, and (ii) setCollapsed sizing with nested-parent children. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 23:48:29 -07:00
Molecule AI Core-FE	e9be12210f	test(auth): add regression tests for redirect loop guards AuthGate now skips session fetch for /cp/auth/* paths, and redirectToLogin guards against re-setting window.location when already on an auth path. Both guards had no test coverage — a future refactor could silently reintroduce the redirect loop. Added: - AuthGate.test.tsx: 2 cases covering /cp/auth/login and /cp/auth/signup path skipping (no fetchSession call, no redirectToLogin call, children rendered) - auth.test.ts: 2 cases covering redirectToLogin early return for /cp/auth/login and /cp/auth/signup paths Fixes: Molecule-AI/molecule-core#1541 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-24 06:30:35 +00:00
Hongming Wang	d53583f9c6	Merge remote-tracking branch 'origin/staging' into fix/restore-quickstart-plus-hotfixes	2026-04-23 21:04:55 -07:00
Hongming Wang	2d6ff11c4e	fix(canvas): re-sort parents-before-children after nest mutation React Flow requires parent nodes to appear before their children in the nodes array. When they don't, it logs "Parent node {id} not found. Please make sure that parent nodes are in front of their child nodes in the nodes array" and — more importantly — renders the child at canvas-absolute coords instead of parent-relative, flashing it far outside the parent. topology's buildNodesAndEdges already enforced this at hydrate, but nestNode + batchNest weren't re-sorting after mutating parentId. A freshly-nested child often ended up after-first-drag at the wrong screen position because its new parent sat later in the array than itself. Extract sortParentsBeforeChildren() into canvas-topology as a reusable DFS visit; call it at the tail of both nestNode's set() and batchNest's commit set(). 923 tests still green — no behaviour change beyond eliminating the warning and the position flash. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 21:00:40 -07:00
Hongming Wang	2a8977c946	fix(canvas): cancel-nest also shrinks the parent back Canceling the nest/extract dialog restored the child's position but left the parent card at its auto-grown size. growParentsToFitChildren fires on drag-stop to fit a then-outside child; when the drag is subsequently cancelled, the parent keeps that grown width/height forever because the grow pass is grow-only. Strip width/height from the ex-parent alongside the child position restore in cancelNest — React Flow re-measures from CSS, parent collapses back to its natural size. Same trick nestNode already uses for the un-nest path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 20:56:08 -07:00
Hongming Wang	09053dfdeb	fix(canvas): cancel-nest restores position; un-nest shrinks parent Two follow-up polish items for drag-and-nest: 1. Cancelling the "Extract from team?" dialog now snaps the dragged card back to where the drag started. Before, a user who dragged a child out, saw the confirm dialog, then clicked Cancel ended up with the card stranded outside the parent at its drop-point position — which also got persisted via savePosition on drag-stop. Now onNodeDragStart captures the pre-drag position + parent, and cancelNest restores both the RF node position and fires savePosition with the absolute pre-drag coords so reload matches. 2. Un-nesting now clears the ex-parent's explicit width/height in the nodes array. growParentsToFitChildren is grow-only so it could never shrink the parent back down after a child left; the card stayed at its auto-grown size with empty space. Stripping width/height lets React Flow re-measure from the card's own min-width / min-height CSS, so the parent visually shrinks to fit whatever children remain. 923 canvas tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 20:52:28 -07:00
Hongming Wang	512fdfd59d	fix(canvas): plain drag out of parent un-nests again Un-nest used to require holding Alt (or Cmd to force-detach). That was too conservative — when a user dragged a child clearly outside its parent's bbox, nothing happened on release, because the default branch soft-clamped back and only the Alt branch actually opened the "Extract?" confirm. Matches the exact bug the user just flagged ("I can put agents in other agent, but when I drag it out, it does not move out"). New rules: * Past the 20 % hysteresis → confirm un-nest. Plain drag, no modifier. This is what most users expect (Miro / Figma behave the same way — drag outside the frame and the shape leaves it). * Inside or within 20 % of the edge → soft-clamp back inside. Guards against twitchy releases that momentarily overshoot the edge by a few pixels. * Cmd / Ctrl → force un-nest regardless of overlap. Escape-hatch for when the user dragged within the hysteresis zone but really wants out. * Dropping onto a different parent → nest there (unchanged). Alt is no longer a required modifier for un-nesting. Keeps it as a non-gesture modifier only; no meaning unless we re-bind it later. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 20:48:38 -07:00
Hongming Wang	f2a4b6e0d3	fix: dev-mode bypass for IP rate limiter + 429 retry on GET The 600-req/min/IP bucket is sized for SaaS where each tenant has a distinct client IP. On a local Docker setup every panel shares one IP — hydration (/workspaces + /templates + /org/templates + /approvals/pending) plus polling (A2A overlay + activity tabs + approvals + schedule + channels + audit trail) can burst past the bucket inside a minute, blanking the canvas with 429s. The user reported it after dragging workspaces — dragging itself is release-only (savePosition in onNodeDragStop), but the polling that's always running added onto startup tripped the limit. Two-layer fix: Server: RateLimiter.Middleware short-circuits when isDevModeFailOpen is true (MOLECULE_ENV=development + empty ADMIN_TOKEN), matching the Tier-1b hatch already applied to AdminAuth, WorkspaceAuth, and discovery. SaaS production keeps the bucket. Client: api.ts auto-retries a single 429 on idempotent GET requests, waiting the server-provided Retry-After (capped at 20s). Mutations (POST/PUT/PATCH/DELETE) never auto-retry to avoid double-applying. Users on SaaS hitting a legitimate rate-limit spike get one transparent recovery instead of an immediately-blank Canvas. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 20:44:09 -07:00
Hongming Wang	286dcbfd1e	fix(canvas,org): collapse org-imported parents on first paint Importing a 15-workspace org template dropped every child as a freely-positioned card into its parent's coordinate space. Parents with 5-10 kids had the kids spill below the parent's initial min size, producing the "ugly default" layout the user just flagged — a mess of overlapping cards the moment the import completed. Fix: every workspace in an org-template import that HAS children is inserted with `collapsed = true`. Leaf workspaces stay expanded (nothing to hide). The canvas renders a collapsed parent as a compact header-only card with its "N sub" badge — visually identical to the pre-refactor default the user asked for. Double-click on a collapsed parent now EXPANDS it (flipping `collapsed` locally + persisting via PATCH) so the user can drill in to see the subtree. Only once expanded does a second double-click zoom-to-team, matching the prior behaviour. Leaf-first creation order stays the same; the collapsed flag just means "render compact" not "hide from API". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 20:36:55 -07:00
Hongming Wang	507696d88a	fix(canvas,server): address review findings on `3f11df03` Five review findings from the `3f11df03` six-bug commit: 1. Add TestPeers_DevModeFailOpen_{Allows,ClosedWhenAdminTokenSet, ClosedInProduction} covering all three gating states for the security-sensitive dev-mode hatch the prior commit added to /registry/:id/peers. Previously shipped untested — a future refactor could have silently inverted polarity or removed the gate. New tests pin the contract: * MOLECULE_ENV=development + ADMIN_TOKEN="" → allow bearerless * MOLECULE_ENV=development + ADMIN_TOKEN set → require token * MOLECULE_ENV=production → require token 2. ConfigTab handleSave diffs against the RAW parsed YAML / form config instead of the DEFAULT_CONFIG-merged shape. The previous code would silently PATCH tier=1 to the DB when a user deleted the `tier:` line in raw mode (the default-merge substituted 1). Now: only fields the user actually typed participate in the diff. Type guards (typeof === "number" / "string") prevent coercion surprises on malformed YAML. 3. ConfigTab model-save failure no longer lies "Saved". The /workspaces/:id/model PATCH can reject when the runtime doesn't support the chosen model; previously we caught + console.warn'd + showed green Saved, and the user watched the model revert on next reload with no explanation. Now the save path collects a `modelSaveError` and surfaces it via setError with a partial- success message ("Other fields saved, but model update failed: …") so the user sees why. 4. ChannelsTab now surfaces BOTH channels-fetch and adapters-fetch failures, distinguishing them in the error text ("Failed to load connected channels and platforms — try refreshing"). Previously only an adapters failure was visible; a channels failure left the user with an apparently-empty list and no indication the API was unreachable. 5. ChatTab panels drop the redundant aria-hidden attribute. The `hidden`/`flex` Tailwind class already sets display:none, which removes the node from the accessibility tree on its own; the extra aria-hidden invited WAI-ARIA lint warnings if a focusable descendant ever landed inside an inactive panel. Tests: 923 canvas + full Go handler suite pass. 3 new Go tests. No behaviour change on the five prior fixes — this commit tightens their edges per the independent review. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 20:29:44 -07:00
Hongming Wang	3f11df031c	fix: six UX bugs (peers auth, scroll, chat tabs, config persist, + visibility) Six bugs reported from a live session — all shippable in one commit: 1. Peers tab 401 on local Docker. The /registry/:id/peers endpoint demands a workspace-scoped bearer token (validateDiscoveryCaller) which the canvas session doesn't hold. Added the same Tier-1b dev-mode fail-open hatch that AdminAuth and WorkspaceAuth already use — gated by MOLECULE_ENV=development + empty ADMIN_TOKEN, so SaaS production stays strict. Exported IsDevModeFailOpen from the middleware package for the handler layer to reuse. 2. Org Templates list unscrollable. OrgTemplatesSection was rendered in the TemplatePalette footer — a div without overflow — so when it expanded to 15+ entries the list extended past the viewport with no scroll. Moved it to the top of the flex-1 overflow-y-auto container. Tall lists now scroll naturally. 3. Chat tab: "My Chat" and "Agent Comms" rendered stacked instead of switching. HTML `hidden` attribute was being overridden by Tailwind's `flex` class (display: flex beats the attribute), so both tabpanels rendered concurrently. Swapped to a conditional Tailwind `hidden`/`flex` class so the inactive panel is display:none with proper CSS specificity. 4. Hermes Config form never persists. handleSave wrote config.yaml but name / tier / runtime / model all live on the workspace row (or the dedicated /workspaces/:id/model endpoint) — the form edited in-memory, the request returned 200, the next reload wiped everything back. Hermes + external runtimes manage their own config inside the container anyway, so writing config.yaml is a no-op for them; skip it. Always diff and PATCH the DB-backed fields that actually changed. 5. Channels "+ Connect" dropdown empty on first open. ChannelsTab's load() used Promise.all with a silent catch — if EITHER the channels or adapters fetch failed, both setters were skipped with no error visible. Switched to Promise.allSettled so each endpoint settles independently, and the adapters failure now surfaces via the top-level error state. 6. Plugin registry always "No plugins in registry". Same silent catch pattern in SkillsTab.tsx — load errors for /plugins, /plugins/sources, and /workspaces/:id/plugins swallowed without logging. Replaced the empty catches with console.warn so future failures are at least visible in devtools. Tests: 923 passing (unchanged). Go handler tests pass. Server rebuilt and running with the peers-auth + collapsed-persistence fixes (pid 15875). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 20:18:30 -07:00
Molecule AI App & Docs Lead	3715c06e0b	fix(canvas): remove stale firstInputRef useEffect from AllKeysModal AllKeysModal already handles focus via autoFocus={index === 0} on the first input and a separate title-focus effect. The orphaned useEffect referencing firstInputRef (declared only in ProviderPickerModal) caused a TypeScript build error: "Cannot find name 'firstInputRef'". Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-24 03:11:36 +00:00
Molecule AI Core-QA	746cb22855	fix(canvas/tests): normalize useCanvasStore mock pattern in test files Standardize the mock for useCanvasStore to always expose getState() (used by production ContextMenu to filter parent nodes). Applies the same Object.assign-wrapping pattern introduced in #1744 to: - ClaudeSettings.test.tsx - tabs.a11y.test.tsx - ContextMenu.keyboard.test.tsx (mockStore shape alignment)	2026-04-24 03:10:18 +00:00
Molecule AI Core-QA	680f1f50f2	fix(canvas/a11y): restore aria-hidden on backdrop div after cherry-pick conflict Cherry-pick from #1744 left the backdrop div without aria-hidden="true" (the outer dialog div got it instead). Re-apply aria-hidden="true" to the backdrop div so screen readers skip the clickable overlay layer. Also revert test assertion from bg-black → bg-black/70 to match the exact class applied to the backdrop div.	2026-04-24 03:10:18 +00:00
Hongming Wang	4fd7f1e84c	fix(canvas): tighten rescue + cap toast + cover paths with tests Three follow-up review findings from the `c2b2e13a` review: 1. Rescue heuristic uses pure bbox-non-overlap. The previous `position.x < 0` branch rescued any child whose parent was later dragged past it, even when the layout was clearly recoverable (e.g. relative -40, child still overlaps parent). New rule: rescue iff the child's bbox has zero overlap with the parent's bbox — self-calibrating, scales with user-resized parents, catches screenshot-case and legacy huge-positive data. 2. Toast caps failed-name list at 3 and appends "and N more". Stops a 50-node partial failure from overflowing the toast container. 3. Cycle guard on selection-roots walk in batchNest. Corrupt parentId data can't send the loop infinite now. Cheap defensive guard — one Set per selected node. Tests added (923 total, up from 918): * canvas-topology.test: 4 rescue scenarios — screenshot case (zero-overlap rescue), negative drift kept, huge-positive rescued, user-resized layout kept. * canvas.test: selection-roots filter on a 3-level chain. * workspace_crud test: PATCH {collapsed:true} runs the UPDATE. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 20:08:14 -07:00
Hongming Wang	c2b2e13abe	fix(canvas): address code-review findings on the Canvas refactor Five issues surfaced in the review of `50b53784`. Each was either a real bug waiting to hit users or a silent failure mode. 1. Topology rescue no longer teleports user-resized children. Rescue was comparing against parentMinSize(childCount), so any child the user had placed in space the parent was resized into got snapped to the default grid on reload — undoing the layout. Now rescue fires only on obviously corrupt data: negative relative coords (legacy pre-nesting absolute positions that landed above/left of their assigned parent) or values past an MAX_PLAUSIBLE_OFFSET threshold. Children just-past the initial minimum are left alone. 2. batchNest now filters to selection-roots before planning. Previously selecting both A and A's descendant B and dragging into T yanked B out of A to become a sibling under T. Users reasonably expect the A subtree to move intact. The new pass drops any selected node whose ancestor is also selected — those follow their ancestor via React Flow's parent binding. 3. batchNest surfaces partial failure via showToast. Previously silent: 2 of 5 PATCHes fail, user sees 3 cards re-parented + 2 snapped back with no explanation. Now names the failed cards. 4. confirmNest closes the nest dialog BEFORE dispatching the async store action, so a second drag can't kick off a competing batch while the first is still in flight. 5. collapsed is now persisted. The Go workspace_crud.go Update handler ignored the `collapsed` field, so user-initiated collapse round-tripped to an expanded state on next hydrate. Added the PATCH branch (`UPDATE workspaces SET collapsed = ...`) so the state survives reload. Nits cleaned: * Removed dead dragStartParentRef in useDragHandlers. * Swapped redundant `node.data as WorkspaceNodeData` casts for a named WorkspaceNode type alias. * Canvas.tsx SR-live region now reads n.parentId (matches MiniMap + RF's native field) instead of the mirror n.data.parentId. Tests added (918 total, up from 915): * batchNest happy path — 2-root selection fires 2 combined PATCHes carrying parent_id + x + y, not 2×N sequential round-trips. * batchNest ancestor+descendant selection — subtree stays intact. * batchNest partial failure rollback — only the rejected nodes revert; successful ones stay committed. Backend change is single-line (collapsed PATCH branch); all workspace_crud Go tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 19:58:44 -07:00
Molecule AI Integration Tester	dc9001835e	fix(ConfigTab.hermes.test): remove unused fireEvent import	2026-04-24 02:55:51 +00:00
Hongming Wang	50b537849a	refactor(canvas): split Canvas.tsx into hooks; parallelize batchNest Two concerns in one commit (separate files, each self-contained): ## Canvas.tsx split (from ~680 to ~250 lines) Canvas.tsx was holding drag gesture state + keyboard shortcuts + viewport wiring + JSX. Each concern now lives in its own unit under canvas/src/components/canvas/: - dragUtils.ts — pure: shouldDetach, clampChildIntoParent, DETACH_FRACTION - DropTargetBadge.tsx — the floating "Drop into: <name>" label + the dashed ghost preview at the target slot - useDragHandlers.ts — encapsulates onNodeDragStart / Drag / Stop, findDropTarget hit-test, pendingNest state, and confirmNest/cancelNest. Routes multi- select drags through batchNest automatically. - useKeyboardShortcuts — Esc, Enter, Shift+Enter, Cmd+]/[, Z — one window listener, one source of truth. - useCanvasViewport — pan-to-node + zoom-to-team CustomEvent listeners and the debounced viewport save. Canvas.tsx becomes a thin composition + JSX file. No behavioural change; the refactor is covered by the existing 915 canvas tests. ## batchNest parallelization (2N round-trips → N, all in flight) Previously nestNode fired two sequential PATCHes (parent_id then x/y) and batchNest looped nestNode sequentially. For a 5-node selection on a typical ~200ms link this was ~2s of serialized RPCs. - nestNode now combines parent_id + x + y into ONE PATCH. The Go handler (workspace_crud.go Update) already reads all three from the same body — no backend change. - batchNest rewritten: compute every re-parent plan against one snapshot, commit a single set(), then fire N PATCHes via Promise.allSettled in parallel. Per-node failures roll back only that node (others stay committed) — same semantics as the single- node path, just concurrent. - The state math in the batch path also correctly shifts descendant zIndex by depthDelta when any re-parented node has a subtree. ## Also - canvas-topology.ts: reverted P3.12's opt-in rescue to the auto- rescue default. When a child's stored relative position would render it outside the parent bbox (the visual regression the user saw after collapse → reload — Hermes child drawn outside Claude Code Agent on first paint), the child is placed in the next default grid slot. The "Arrange Children" context command stays for bigger teams. All 915 canvas tests pass. No backend changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 19:43:18 -07:00
Hongming Wang	c5abed988e	fix(canvas): address review findings on playability pass Five Critical issues caught in code review of `f3423a51`. Each one broke an invariant the original commit claimed to uphold. 1. nestNode: descendants kept their old-depth zIndex after a re-parent. Now walks the dragged subtree and shifts every descendant's zIndex by the same depthDelta so "children above ancestors" survives moves between levels of the hierarchy. 2. bumpZOrder: siblings all share zIndex = depth in fresh topology, so a single +1 bump was identical for every sibling and subsequent bumps drifted zIndex unboundedly. Rewritten to sort siblings by current zIndex and swap the target with its neighbour in the bump direction — Figma-style reorder, stays within the sibling tier. 3. findDropTarget: depth-first tiebreaker lost to bumped siblings. The visually-frontmost card after Cmd+] is a shallow sibling, but the hit test picked the deepest nested card regardless. Swapped order so zIndex wins first, depth second, area third. Also pre-computes the depth map once per call (was O(n²) via repeated .find walks — will matter past ~30 workspaces). 4. arrangeChildren: saved absolute position using `slot + parent.position`, but parent.position is RELATIVE to its own parent when nested. Grandchildren's stored x/y were in the parent's local frame and reload placed them in the wrong spot. Now walks the full ancestor chain via absOf() to get the true canvas-absolute origin before PATCHing. 5. setCollapsed: naive flip of every descendant's hidden flag diverged from the topology rebuild on hydrate. Collapse A, collapse B, then expand A — C should stay hidden because B is still collapsed, but before this fix C was unhidden. Rewritten to recompute every descendant's hidden from the full ancestry chain, matching the topology pass byte-for-byte. New round-trip test asserts the two code paths produce identical node.hidden across a full lifecycle. Also: - Removed dead cascadeMessage constant (never rendered). - Replaced hardcoded 260/120 in zoom-to-team with exported constants. - arrangeChildren PATCH catch now logs instead of silently swallowing. - Added 70→76 tests: setCollapsed 3-chain scenarios, bumpZOrder swap semantics, edge-of-list no-op. All 915 canvas tests green. Backend untouched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 19:16:48 -07:00
Hongming Wang	f3423a513d	feat(canvas): industry-pattern playability pass (P1+P2+P3) Ships the full prioritized improvement list from the canvas research report — aligns our nesting/resize UX with Miro / FigJam / tldraw / Figma conventions. Organized by priority below. ## P1 — baseline playability * Hysteresis on drag-out detach (Miro): a child only un-nests when >=20% of its bbox is outside the parent on release. Prevents accidental un-nesting from twitchy drags. * Drop-target now uses tree-depth DESC, then zIndex DESC, then area ASC to pick targets when nested parents overlap (xyflow #2827). * Children render above ancestors by inheriting zIndex = parent + 1 in topology and on every nest/unnest (xyflow #4012). * Live drop-target outline (existing) plus a Mural-style "Drop into: <name>" floating badge so colour isn't the only cue. * growParentsToFitChildren now fires only on dimension-type changes inside onNodesChange (NodeResizer commits) and once on drag-stop — avoids tldraw's edge-chase artifact (P3.11 commit-on-release). ## P2 — polish * Whimsical-style ghost preview: dashed outline at the next default grid slot inside the drop-target parent during drag. * Alt-drag escape with soft clamp: dropping slightly outside a parent without Alt/Cmd snaps the child back inside (clampChildIntoParent); Alt releases the clamp to allow un-nest; Cmd/Ctrl force-detaches. * Figma-style keyboard hierarchy nav: Enter descends to first child, Shift+Enter ascends to parent, Cmd+]/[ re-orders siblings via the new bumpZOrder store action. * Multi-select re-parent preserves offsets: confirmNest routes through a new batchNest action when the primary drag is part of a batch selection (Lucidchart pattern). ## P3 — long-tail * Minimap now shows parent cards as filled regions with a blue stroke, so hierarchy reads at a glance without zooming. * Out-of-bounds rescue is opt-in: topology no longer silently re-lays children whose stored position is outside the parent bbox (Figma trust-the-data). The new Arrange Children context menu item runs the rescue on demand via arrangeChildren. * Cmd-drag force-detach regardless of hysteresis. * Collapse workspace: the existing Collapse Team action now toggles a local setCollapsed store action that hides every descendant and shrinks the parent card to header-only (Miro frame outline view). Growth pass skips collapsed parents so they don't push back out. All 910 canvas tests green. Backend untouched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 19:03:02 -07:00
Hongming Wang	d359390f83	fix(canvas): parent auto-fit sizing + rescue out-of-bounds children Two playability bugs in the new flat-cards layout: 1. On first load or fresh org import a parent had no explicit width or height, so children whose stored position sat inside their (eventual) parent's rectangle rendered visually outside the smaller default parent box. Compute a parent starting size in canvas-topology: • 2-column grid of child-default footprints + header/side padding • Grows per child count (2→1 row, 3-4→2 rows, etc.) and stamp it onto the Node's width/height so the first paint already contains every child. 2. If a child's stored relative position actually falls outside the parent's computed bounds (legacy org-imports at 0,0, pre-refactor absolute coordinates, manually-nudged rows), assign that child a deterministic default grid slot inside the parent instead. Runtime cascade: added growParentsToFitChildren to onNodesChange so when the user drags or resizes a child past the parent's current bounds, the parent grows to contain it (+padding). Miro/FigJam-style frame auto-fit — grow-only, never shrinks under the user's manual resize. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 18:29:04 -07:00
Hongming Wang	cc194f0b7e	refactor(canvas): flat workspace cards with React Flow native parenting Every workspace now renders as a first-class card on the canvas regardless of parent_id. The old "parent card contains mini TeamMember chips" layout is gone — if B is parented to A, B renders as a full card inside A's coordinate space using React Flow's `parentId` binding, so moving A carries B along and children have the same detail + actions as root cards. Details: - canvas-topology.ts: topologically sort parents before children (React Flow ordering requirement), compute each child's RF-native parentId + relative position on load. DB keeps absolute x/y; the abs→rel conversion happens here, reverse translation in Canvas.onNodeDragStop before savePosition PATCHes the DB. - WorkspaceNode.tsx: delete the EmbeddedTeam + TeamMemberChip blocks, simplify the size classes, and add NodeResizer (visible when selected) so users can drag any edge/corner to grow or shrink. Parent cards default to a larger min size so nested children have breathing room. - Canvas.tsx drop targeting rewritten: bounds-based hit test against each node's measured absolute bbox, deepest match wins. Fixes two prior bugs at once — dropping onto Claude Code with a nested same- named Hermes no longer picks the wrong node, and the target can now be a nested workspace when that's where the pointer actually released. - canvas.ts nestNode + removeNode: translate position between old and new parent's absolute origin on nest/unnest so the card doesn't jump, and re-point the RF `parentId` alongside `data.parentId` on reparent. - Tests: hidden-flag assertions replaced with parentId checks; obsolete TeamMemberChip a11y/eject tests deleted (the UI component no longer exists). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 18:18:44 -07:00
Hongming Wang	8a07cf4035	fix(canvas): skip already-nested workspaces as drop targets Dragging one workspace onto another could pick a nested child as the "nearest" drop target instead of the visible parent card the user actually hovered. The effect: dropping a free-floating Hermes Agent onto a Claude Code Agent that already had a Hermes Agent nested inside showed "Move 'Hermes Agent' inside 'Hermes Agent'?" — the confirmation referenced the nested same-named child, not Claude Code. Why: getIntersectingNodes returns every overlapping node, including hidden=true children that render inside their parent's card. The parent and child share bounding boxes, so the child often "won" the nearest-distance check. Filter them out at the source: a node that's already got a parentId (or is hidden) is never a valid top-level drop target. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 17:49:01 -07:00
Hongming Wang	5ebe6ccb33	test: regression guards for 2026-04-23 hermes + CP bug wave Three complementary regression tests for the chain of P0s fixed today. Each targets a specific bug class that reached production, and will fire loud if any of them regress. ## 1. E2E A2A assertion enhancements (tests/e2e/test_staging_full_saas.sh) The existing A2A check looked for "error\|exception" in the response text, which was too broad and missed the actual error patterns we hit. Now matches each known error class individually with a diagnostic fail message pointing at the exact bug: - "[hermes-agent error 401]" → hermes #12 (API_SERVER_KEY) - "hermes-agent unreachable" → gateway process died - "model_not_found" → hermes #13 (model prefix) - "Encrypted content is not supported" → hermes #14 (api_mode) - "Unknown provider" → bridge PROVIDER misconfig Also asserts the response contains the PONG token the prompt asked for — catches silent-truncation/echo regressions. ## 2. Hermes install.sh bridge shell harness (tools/test-hermes-bridge.sh) 4 scenarios × 16 assertions, all offline (no docker, no network): - openai-bridge-happy: OPENAI_API_KEY + openai/gpt-4o → provider=custom, model="gpt-4o" (prefix stripped), api_mode=chat_completions - operator-custom-wins: explicit HERMES_CUSTOM_* → bridge skipped - openrouter-not-touched: OPENROUTER_API_KEY → provider=openrouter, slug kept - non-prefixed-model: bare "gpt-4o" → prefix-strip is a no-op Runs in <1s, can be wired into template-hermes CI. Pins the exact config.yaml shape — any drift in derive-provider.sh or the bridge if-block breaks a test. ## 3. Canvas ConfigTab hermes tests (ConfigTab.hermes.test.tsx) 5 vitest cases covering the #1894 bugs: - Runtime loads from workspace metadata when config.yaml missing - "No config.yaml found" red error hidden for hermes - Hermes info banner shown instead - Langgraph workspace still sees the red error (regression-guard the other way) - config.yaml runtime wins over workspace metadata when present ## Running bash tools/test-hermes-bridge.sh # 16 assertions cd canvas && npx vitest run src/components/tabs/__tests__/ConfigTab.hermes.test.tsx # 5 cases # E2E enhancements ride on the existing staging E2E workflow ## Not yet covered (tracked in #1900) CP admin delete-tenant EC2 cascade, cp-provisioner instance_id lookup (#1738), purge audit SQL mismatch (#241), and pq prepared- statement cache collision (#242). These are in-controlplane-repo concerns — separate PR with CP-side sqlmock + integration tests. Closes items in #1900. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 17:45:13 -07:00
Hongming Wang	7356cf8d3a	fix(chat): clear sending spinner when any path delivers the reply Two latent bugs kept the "Processing with Claude Code..." timer ticking after the agent had already answered: 1. The A2A_RESPONSE store handler wrote into agentMessages[workspaceId] (no prefix) but ChatTab's "clear sending" effect subscribed to agentMessages["a2a:" + workspaceId]. Keys never matched — the effect was dead code from day one. Removed the dead subscription and moved the setSending(false) into the pendingAgentMsgs effect so any reply delivered via a WS push (Claude Code SDK, Hermes's send_message_to_user) also closes the spinner. 2. Added an activity-log fallback: when the platform emits a successful a2a_receive ACTIVITY_LOGGED for this workspace, clear sending and stop the timer. That covers the "runtime answered but we never saw the store message" case Claude Code exhibited tonight — the HTTP request can stay in flight while the SDK already pushed its reply. Symmetric a2a_receive error path also clears sending and surfaces the error message, so a runtime-side failure no longer hangs the UI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 17:43:30 -07:00
Hongming Wang	e337efe974	fix(canvas): propagate runtime through WORKSPACE_PROVISIONING event The side-panel runtime pill read "unknown" for newly-deployed workspaces because canvas-events.ts created the node from WORKSPACE_PROVISIONING payload — and the payload only carried name + tier. No refetch filled the gap during provisioning, so the user saw "RUNTIME unknown" on the card even though the DB row had the real runtime set. Includes runtime in every WORKSPACE_PROVISIONING emitter: * handlers/workspace.go — initial create * handlers/workspace_restart.go — explicit restart, auto-restart, and crash-recovery resume loop * handlers/org_import.go — multi-workspace org imports Canvas-side: canvas-events.ts reads payload.runtime when creating the node; the provisioning test asserts the pill value is populated before any refetch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 17:17:49 -07:00
Hongming Wang	dc50a1c775	refactor(canvas): data-drive provider picker from template config.yaml The MissingKeysModal's provider list was hardcoded in deploy-preflight.ts as RUNTIME_PROVIDERS — a per-runtime map that duplicated what each template repo already declares in its config.yaml. That meant adding a new provider required changes in two places, and the UI could drift out of sync with the actual template (e.g. when a template adds a MiniMax or Kimi model, the picker wouldn't know). The single source of truth for "which env vars does this workspace need" is each template's config.yaml: * `runtime_config.models[].required_env` — per-model key list * `runtime_config.required_env` — runtime-level AND list Go /templates already returned `models`. This change: * Adds `required_env` alongside `models` on templateSummary so the canvas receives the full picture. * Rewrites deploy-preflight.ts to derive ProviderChoice[] from a template object via `providersFromTemplate(template)`: - groups `models[]` by unique required_env tuple - falls back to runtime_config.required_env when models is empty - decorates labels with model counts (e.g. "OpenRouter (14 models)") * `checkDeploySecrets(template, workspaceId?)` now takes a template object instead of a runtime string. Any-provider satisfaction still short-circuits preflight to ok=true. * MissingKeysModal receives `providers` directly; no more lookups. * TemplatePalette threads `template.models` + `template.required_env` into the preflight. Side effects: * Claude Code's dual-auth (OAuth token OR Anthropic API key) now surfaces as two picker options — its config.yaml already declared both, the UI just wasn't reading them. * Hermes picker now shows 8 provider options (Nous, OpenRouter, Anthropic, Gemini, DeepSeek, GLM, Kimi, Kilocode) instead of the hand-picked 3, matching its 35-model reality. Removed the legacy RUNTIME_PROVIDERS / RUNTIME_REQUIRED_KEYS / getRequiredKeys / findMissingKeys exports; MissingKeysModal.test.tsx deleted (its coverage is subsumed by the new template-driven deploy-preflight.test.ts). 58 modal-adjacent tests pass; full canvas suite 919 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 17:07:15 -07:00

1 2 3 4 5 ...

283 Commits