The External Connect modal had tabs for Python SDK / curl / Claude Code
channel / Universal MCP. Operators using hermes / codex / openclaw as
their external runtime had no copy-paste; they pieced together
WORKSPACE_ID + PLATFORM_URL + auth_token into config files by reading
docs.
Adds three runtime-specific snippets stamped server-side:
- **Hermes** — installs molecule-ai-workspace-runtime + the
hermes-channel-molecule plugin, exports the 4 env vars, and writes
the gateway.plugin_platforms.molecule block into ~/.hermes/config.yaml.
Same long-poll-based push semantics the Claude Code channel tab
delivers (push parity with the in-tree template-hermes adapter).
- **Codex** — wires the molecule_runtime A2A MCP server into
~/.codex/config.toml ([mcp_servers.molecule] block with env_vars
passthrough + literal env values). Outbound tools only — codex's
MCP client doesn't route arbitrary notifications/* (verified by
reading codex-rs/codex-mcp/src/connection_manager.rs); push parity
on external codex would need a separate bridge daemon, tracked
as future work. Snippet calls this out so operators know to pair
with Python SDK if they need inbound delivery.
- **OpenClaw** — installs openclaw + onboards, wires the molecule
MCP server via openclaw mcp set, starts the gateway on loopback.
Same outbound-tools-only caveat as codex; the in-tree template-
openclaw adapter implements the full sessions.steer push path,
but an external setup would need the same bridge daemon to translate
platform inbox events into sessions.steer calls. Future work.
Default open tab changed from "Claude Code" to "Universal MCP".
Universal MCP is runtime-agnostic and works as a starting point for
any operator regardless of their downstream agent runtime; runtime-
specific tabs are still one click away. Pre-2026-05-03 the modal
defaulted to Claude Code, so operators using non-Claude runtimes
opened to a tab they had to skip past.
Tab order also reorganized:
Universal MCP → Python SDK → Claude Code → Hermes → Codex → OpenClaw → curl → Fields
Each runtime-specific tab is gated on the platform supplying the
snippet (older platform builds without the field don't show empty
tabs).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User feedback: chat-bubble agent text still washed out after #2618 +
#2623. Looked at the actual rendered colors and the issue was Tailwind
Typography's `prose-invert` defaults — body text ships at zinc-300,
which lands at ~5.3:1 against bg-zinc-700. Passes AA but visibly
duller than the user bubble's crisp white-on-blue (~10:1).
Override the prose CSS variables on the agent bubble in dark mode:
- body → zinc-100 (was zinc-300)
- headings / bold → white
- inline code → zinc-100
That brings agent body text to ~13:1 against bg-zinc-700, matching the
user bubble's brightness so both sides of the conversation read at
the same crispness.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same bug class as #2622 (ConfirmDialog), but on a more critical surface
— this is the top-of-page banner asking the user to approve / deny a
real workspace permission request.
1. **Deny was a no-op hover.** `bg-surface-card hover:bg-surface-card`
gave zero visual feedback before the user clicked a destructive
action. Now lifts to surface-elevated + brightens the text so the
button visibly responds.
2. **Approve hover went LIGHTER.** `bg-emerald-600 hover:bg-emerald-500`
dropped white-text contrast on hover. Reversed to emerald-700.
3. **No focus rings on either button.** Keyboard users had no way to
tell which decision was focused. Added focus-visible rings
(offset against the dark amber banner bg) — emerald for Approve,
amber for Deny so the choice is unambiguous.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Discovered during code review of the #2623 hotfix audit. Same
regression class as #2618: prose-invert applied where the bubble bg
themes between light/dark, leaving markdown unreadable in one theme.
`MarkdownBody` was unconditionally `prose-invert` — fine for the
outgoing-message bubble (bg-cyan-900, dark in both themes) and the
failure bubble (bg-red-950, dark in both themes), but WRONG for the
incoming-message bubble (bg-surface-card, which themes LIGHT in light
mode). Result: light prose body text on light cream bg = invisible
markdown for incoming peer-to-peer messages in light mode.
Added an `invert: "always" | "dark-only"` prop to MarkdownBody. The
NormalMessage call sites switch on `msg.flow` so each bubble gets the
direction matching its bg's theming behavior. Failure bubble keeps
the default ("always") since red-950 stays dark.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Regression from PR #2618 (chat dark-contrast).
PR #2618 switched the agent bubble bg to `dark:bg-zinc-700` so it
visibly elevates against the dark panel — but the inner ReactMarkdown
prose div only got `prose-invert` for USER messages. Result: in dark
mode the agent's markdown text rendered with the Tailwind Typography
plugin's default dark body color on top of the new dark bg = invisible
text. User reported empty-looking gray rectangles where agent replies
should be.
Fix: apply `dark:prose-invert` to agent bubbles so prose body text
flips light alongside the bg. Light mode unchanged (default prose
colors against the warm `bg-surface-card`).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three issues on a high-stakes surface (revoke token, delete workspace,
cascade delete):
1. **Cancel hover was a no-op.** `bg-surface-card hover:bg-surface-card`
gave zero visual feedback on hover. Now hovers to surface-elevated
with a softened border so the button visibly lifts.
2. **Confirm hovers went LIGHTER, dropping white-text contrast.**
`bg-red-600 hover:bg-red-500` made the destructive button less
readable on hover. Same for warning (amber) and primary (accent).
Reversed to hover-darker so contrast holds in both themes.
3. **No focus-visible rings on either button.** Keyboard users had no
indication of focus position (WCAG 2.4.7 fail). Added
`focus-visible:ring-2 focus-visible:ring-accent/40` on Cancel and
`focus-visible:ring-2 focus-visible:ring-offset-2 ...accent/60` on
Confirm so the focused destructive action is unambiguous.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User screenshot showed pale lavender user bubbles with hard-to-read white
text and a nearly-invisible agent bubble blending into the dark panel.
Root causes:
1. Tailwind v4 defaults `dark:` to `prefers-color-scheme: dark`. Our
ThemeProvider writes `data-theme="dark"` on <html> so user toggle wins
over OS — but `dark:` classes elsewhere in the codebase weren't
tracking it. Added `@custom-variant dark` to re-bind the variant.
2. `bg-accent` themes lighter in dark mode (--color-accent: #6883e8),
dropping white-text contrast to ~3:1 (fails WCAG AA). Switched user
bubble to solid blue-600/500 so it stays ~5:1 in both modes.
3. `bg-surface-card` (#1a1d23) was only ~7% lighter than the panel bg
(#0e1014), making agent bubbles disappear. Bumped to zinc-700 in
dark; light mode keeps the warm surface-card tint.
4. System (error) bubble's /10 overlay was nearly invisible; raised to
/25 in dark with stronger border + ink for readability.
Sub-tab + textarea polish included: low-contrast `text-ink-soft` →
`text-ink-mid`, focus-visible rings on tabs, dark variants on textarea.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Top-of-canvas Toolbar had multiple low-contrast surfaces in light theme:
Action buttons (Stop All, Restart Pending):
- bg-red-950/50 + bg-amber-950/40 → bg-bad/10 + bg-warm/10 with bg-bad/40
+ bg-warm/40 borders. Dark-tinted backgrounds with /40-/50 alpha render
as nearly invisible smudges on warm-paper; semantic tokens at /10 give
a clear pale-bad / pale-warm tint that scales correctly in dark mode.
- Both gain focus-visible:ring-2 focus-visible:ring-{bad,warm}/40.
Toggle button (A2A edges):
- Active state: bg-blue-950/50 → bg-accent/15 (themes correctly).
- Inactive state: bg-surface-card/50 + text-ink-soft → solid bg-surface-card
+ text-ink-mid; hover bumps to text-ink. Drops the redundant
"hover:bg-surface-card/50" identity hover.
Icon buttons (Audit, Search, Help):
- Same pattern as toggle inactive: solid bg-surface-card + text-ink-mid +
text-ink hover, with focus-visible:ring-2 focus-visible:ring-accent/40.
Workspace count + bullet separator:
- text-ink-soft (3.5:1 on warm-paper) → text-ink-mid (7:1).
WS connection status:
- "Live": text-ink-soft → text-ink-mid (paired with the green dot).
- "Reconnecting": text-ink-soft → text-warm (semantic match for amber dot).
- "Offline": text-ink-soft → text-bad (semantic match for red dot).
Status text now reinforces the dot colour instead of disappearing on
light surfaces.
Help popover:
- Close button: text-ink-soft → text-ink-mid + focus-visible:underline.
- HelpRow body text: text-ink-soft → text-ink-mid (was 3.5:1 on the
bg-surface-sunken/45 popover row — failed AA for body text).
Chat bubble fixes (canvas/src/components/tabs/ChatTab.tsx):
- User bubble: bg-accent-strong/30 + text-blue-100 → bg-accent + text-white
(translucent dark-blue overlay on warm-paper surface read as pale lavender
with near-invisible light-blue text — a real WCAG AA failure on the
highest-traffic surface in canvas).
- System/error bubble: bg-red-900/30 + text-red-200 → bg-bad/10 + text-bad,
using semantic tokens so dark-mode adapts automatically.
- Agent bubble: drop /80 + /30 opacity modifiers; solid bg-surface-card +
text-ink + border-line gives consistent contrast in both themes.
- prose-invert was unconditional, so markdown text on agent/system bubbles
rendered as light text on a light surface in light mode. Make it apply
only on the user bubble (the only dark surface in this component).
- Timestamp: text-ink-soft is too pale on warm-paper; use text-ink-mid for
agent/system, white/70 for user (visible on the now-solid blue bg).
Sub-tab bar fixes (canvas/src/components/SidePanel.tsx):
- Right-edge fade was hardcoded `from-zinc-950` — that paints a dark vertical
strip on the right edge of the tab bar in light mode. Switch to
`from-surface` so the gradient blends into whichever theme is active.
- Inactive tab text: text-ink-soft (~3.5:1 on warm-paper) → text-ink-mid
(~7:1). Active tab background: drop the /40 opacity so the selection is
unambiguous on either surface.
No semantic-token additions; all changes use existing warm-paper tokens
that already work in both themes.
Three loading-state divs were missing the role/aria pattern that
TemplatePalette.tsx and EmptyState.tsx already follow. Screen readers
get no signal that the page is waiting:
- canvas/src/app/page.tsx — full-screen "Loading canvas..." while
the websocket hydrates. First paint of the entire app.
- canvas/src/components/settings/TokensTab.tsx — "Loading tokens..."
- canvas/src/components/settings/OrgTokensTab.tsx — "Loading keys..."
Add role="status" + aria-live="polite" to the wrapping div so
assistive tech announces the wait and the eventual transition.
Visual rendering unchanged.
The tier system in CreateWorkspaceDialog and design-tokens has been
T1 Sandboxed / T2 Standard / T3 Privileged / T4 Full Access, but two
chrome surfaces still showed the older 3-tier mapping with T3 as
"Full Access":
- Legend (bottom-left chrome on every canvas page) listed only T1/T2/T3
and called T3 "Full Access". On a SaaS tenant the actual workspace
badges render T4 (in amber/warm) — there was no T4 entry in the
legend at all, so the user sees an undocumented orange badge.
- ConfigTab tier dropdown (per-workspace settings → Sandboxing) had no
T4 option at all and called T3 "Full Access". So an existing T4
workspace would show "T3 — Full Access" as the selected option,
silently downgrading the displayed tier on the settings panel.
- tenant.ts isSaaSTenant() doc comment claimed SaaS workspaces are
"inherently T3 Full Access" — wrong on both the number and the lock
rationale (SaaS hides T1/T2/T3, not just T1/T2).
Fix:
- Legend now imports TIER_CONFIG and renders all four tiers
(Sandboxed/Standard/Privileged/Full Access) using the same color
swatches as the badges on workspace cards. Eliminates the previous
drift where Legend's hardcoded sky/violet/warm chips didn't match
the gray/sky/violet/amber actually rendered on nodes.
- ConfigTab adds the missing T4 — Full Access option and renames T3
to Privileged.
- tenant.ts comment updated to match the picker's actual hide list.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #2555 (Tailwind v4 + warm-paper) migrated all canvas chrome (toolbar,
side panel, modal layer) to semantic tokens, but missed the React Flow
viewport's `colorMode="dark"` literal — and two paired hardcoded dark
literals on the Background dot color and MiniMap mask. Net result on
prod: the user picked light mode, the toolbar flipped warm-paper, but
the canvas backplate, edges, dots, controls, and minimap stayed black —
visibly half-themed.
Three coordinated fixes inside the canvas viewport:
- ReactFlow `colorMode={resolvedTheme}` so the library's own dark/light
styles flip with the user's choice.
- Background dot color picks the line-soft tone in light mode (zinc-800
was invisible-on-cream).
- MiniMap maskColor warm-tints the off-viewport dim so the unselected
region doesn't render as a hard black bar over warm-paper.
Verification:
- `npx tsc --noEmit` clean
- `npx vitest run` 188/188 pass
- (will browser-verify post-redeploy on hongming.moleculesai.app)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Independent code review of #2555 caught two contrast regressions left
by the bulk perl pass:
1. text-white → text-ink mass-substitution silently broke destructive
and primary buttons. text-ink resolves to #15181c (warm-paper
near-black) in light mode — dark text on bg-red-600 / bg-amber-600
/ bg-emerald-600 / bg-blue-600 / bg-accent / bg-accent-strong /
bg-good / bg-bad fails WCAG contrast and looks broken. Per-line
pass flips text-ink → text-white only when a saturated bg utility
is present; tinted-state pills (bg-red-950/50 etc.) keep their
intentionally-retained text-* literals.
2. Original mapping table was missing bg-zinc-600 (most-used
hover-state literal for cancel buttons — caused them to JUMP from
warm cream resting state to dark zinc on hover in light mode) and
text-zinc-700/800/900 (separator dots and decorative dim text
invisible on warm-paper light bg). Extended mapping fills these
gaps with bg-surface-card / text-ink-soft.
Also: drop stale tailwind.config.ts reference from components.json
(file deleted by the v3→v4 migration); switch baseColor zinc →
neutral and enable cssVariables since v4 uses CSS-driven tokens.
Future shadcn-cli invocations would have failed or written malformed
components without this.
27 sites in 27 files affected by #1, ~20 sites in 20 files by #2.
1214/1214 unit tests still pass; build still clean.
Findings courtesy of multi-model review per code-review-and-quality
skill — different blind spots catch different bugs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI's `npm ci` failed because the previous lock was generated on macOS
arm64, which omits the Linux-specific optional deps that
@tailwindcss/postcss → lightningcss-linux-x64-gnu transitively need
(@emnapi/runtime, @emnapi/core).
Re-ran `npm install --include=optional` so the lock includes every
platform variant of lightningcss + the @emnapi packages they pull in.
Runner (Linux x64) now has what it needs; local macOS install still
fine (npm picks the matching binary at install time).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #2545 self-review findings.
(1) originalModel was set from wsMetadataModel alone. On a hermes/pre-#240
workspace where MODEL_PROVIDER was never written but YAML has
runtime_config.model: "something", originalModel="" while the form
rendered "something" — handleSave's diff fired /model PUT on every
unrelated save (tier change → workspace auto-restart). Snapshot from
the actual rendered model in BOTH loadConfig branches so the diff
stays scoped to user-initiated changes.
(2) The store-flush test asserted the call happened but didn't pin
success-gating. A future refactor wrapping the PATCH in try/catch and
unconditionally calling updateNodeData would have shipped green and
left the badge lying about server-rejected writes. New test pins the
PATCH-rejects-no-flush invariant.
(3) Hermes-edge regression test for (1).
All 1214 canvas tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three drift bugs in ConfigTab + ProviderModelSelector. Same root pattern:
the form's display, the diff baseline, and the canvas store all read or
write from different copies of the same data, so what the user sees and
what the runtime actually uses can diverge silently.
(1) currentModelId read runtime_config.model first; loadConfig overrode
only top-level config.model. With template YAML `runtime_config.model:
sonnet` and live MODEL_PROVIDER=`MiniMax-M2`, the form rendered
"Claude Code subscription / Claude Sonnet (OAuth)" while the container
env (and chat) used MiniMax-M2. Fix: loadConfig propagates
wsMetadataModel into BOTH places.
(2) handleSave's nextModel-vs-oldModel diff compared the form value to
the YAML default. After (1) mirrors wsMetadataModel into the form's
runtime_config.model for display, that diff was always non-zero on
no-op saves and would fire /model PUT — which auto-restarts. New
originalModel state tracks the loaded MODEL_PROVIDER and is the diff
baseline.
(3) handleSave PATCHed the workspace row but never pushed the same
fields into useCanvasStore.updateNodeData. User picked T3, hit Save &
Restart, DB updated to tier=3, header pill kept showing T2 until full
hydrate. Fix: mirror dbPatch into the store.
Bonus: ProviderModelSelector.handleProviderChange used to auto-default
the model to next.models[0] (alphabetically first) when switching
providers. User picked the MiniMax provider intending MiniMax-M2.7;
the form silently set MiniMax-M2 (first in the bucket) and the
workspace deployed with the wrong model. Now empty-default for
multi-model providers, force explicit pick — Save/Deploy already gate
on model.trim() === "".
Three new tests in ConfigTab.provider.test.tsx pin (1)/(2)/(3); two
existing ProviderModelSelector tests updated to reflect the no-silent-
default behaviour, with a new single-model-auto-pick test for the
0-vs-many boundary. 1212/1212 canvas tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The shared <ProviderModelSelector> component was authored on disk but
never landed — three deploy/configure surfaces still rendered the
legacy free-text "MODEL slug" input + provider-radio list. Tasks #239
and #243 closed at "component exists" rather than "user-visible
behavior changed", and the integration sat in a working-tree stash
that was never committed.
This PR is the missing integration:
- canvas/src/components/ProviderModelSelector.tsx (new, 509 lines):
single-source-of-truth Provider→Model cascade. Builds a catalog
from `template.models[].required_env` (groups by sorted+joined env
names so two MiniMax models with the same auth land in one
provider), exposes vendor detection helper + back-derivation. No
per-template hardcoding — fully driven by the upstream payload.
- canvas/src/components/MissingKeysModal.tsx: replaces the inline
`<input type="text">` + `<fieldset>` of provider radios with one
`<ProviderModelSelector>`. Same external contract
(`onKeysAdded(model)`), so callers in useTemplateDeploy don't move.
- canvas/src/components/tabs/ConfigTab.tsx: replaces ad-hoc Model
text input + Provider radio with the same selector, fixing the
display-vs-storage drift class that #190 first patched.
Tests
=====
- ProviderModelSelector.test.tsx (new, 269 lines): cascade behavior,
vendor auto-snap, back-derivation from saved config.
- MissingKeysModal.cascade.test.tsx: rewritten to assert dropdown
shape (was asserting the legacy text-input shape).
- ConfigTab.hermes.test.tsx + ConfigTab.provider.test.tsx: updated
for the new selector shape.
- 1208/1208 canvas tests pass locally.
User-visible fix: clicking any deploy/configure surface from the
sidebar now shows the cascade UX (Provider dropdown first, Model
dropdown filtered) instead of the legacy free-text MODEL slug.
Closes the integration gap behind #239 + #243. Builds on merged
runtime PRs #2538 (universal MODEL_PROVIDER) + #32 + #38 (per-vendor
audit).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The TemplatePalette deploy modal (MissingKeysModal → ProviderPickerModal)
let the model field and provider radio drift apart. When a hermes
template defaulted the model to "MiniMax-M2.7-highspeed" but the radio
defaulted to providers[0] (Anthropic), the env-var input below asked
for ANTHROPIC_API_KEY. A user pasting their MINIMAX_API_KEY there (or
just dismissing the dialog) ended up with a workspace whose
runtime_config.model=MiniMax + ANTHROPIC_API_KEY env — the hermes
adapter then crashed during boot before /registry/register, surfacing
as WORKSPACE_PROVISION_FAILED 12 minutes later.
Caught 2026-05-02 on hongming/Hermes Agent (workspace 95ed3ff2-…
ended with: "container started but never called /registry/register").
Sibling of the ConfigTab cascade fix in PR #2516 (task #236) — same
pattern, different surface. Plumbs the template's full ModelSpec[]
(with required_env per model) into the picker. When the typed model
matches a registry entry, snap the radio so the env-var fields
underneath match what the model actually needs.
Free-text models (typed slug not in the registry) and models with no
required_env (local/self-hosted endpoints) leave the radio alone — the
user can still pick a provider manually. Backwards-compat: callers
that don't pass `models` get the pre-cascade behavior, pinned by a
regression test.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to PR #2509/#2510. The defensive v1-detection branches in
extract_attached_files (Python) and extractFilesFromTask (TypeScript)
were merged with comments claiming they fix a "v0→v1 silent-drop"
bug that surfaced as the 2026-05-01 hongming "no text content"
incident. Live test disproved that hypothesis: a2a-sdk's JSON-RPC
layer validates inbound requests against the v0 Pydantic union, so
v1 shapes are rejected at the request boundary — the v1 detection
branch is unreachable on the JSON-RPC ingress path. The actual root
cause of the hongming incident was the missing /workspace chown
fixed by CP PR #381 + test #382.
Update the comments to honestly describe these branches as
defensive future-proofing (kept against an eventual SDK schema
migration or in-process callers that construct Parts directly from
protobuf), not as fixes for an observed bug. Also trims
ChatTab.tsx's outbound-shape comment block from ~21 lines to a
3-line pointer to the SDK union.
Comment-only change. No behavior change. 86 workspace tests + 91
canvas tests still pass.
The previous PR (#2509) flipped canvas outbound file parts to the v1
flat shape `{url, filename, mediaType}` based on a hypothesis that
a2a-sdk's JSON-RPC parser silently dropped v0 `{kind:"file", file:{...}}`
shapes. Live test shows the opposite: a2a-sdk's JSON-RPC layer
validates against the v0 Pydantic discriminated union (TextPart |
FilePart | DataPart), so v1 flat shape is rejected with:
Invalid Request:
params.message.parts.0.TextPart.text — Field required
params.message.parts.0.FilePart.file — Field required
params.message.parts.0.DataPart.data — Field required
The actual root cause of the user-visible "Error: message contained
no text content" was the missing `/workspace` chown (CP PR #381 +
test pin #382), not a wire-shape mismatch. Verified end-to-end by
sending a v0 image-only message after PR #381 + workspace re-provision
— agent receives the file, reads its bytes, and replies normally.
Reverting only the canvas outbound shape. Defensive v1-tolerance
stays in:
- workspace/executor_helpers.py — extract_attached_files still
accepts v1 protobuf parts in case a future client emits them or
a future SDK release flips internal representation. Harmless on
the v0 hot path.
- canvas/message-parser.ts — extractFilesFromTask still tolerates
v1 shape on incoming agent responses. Some agents may emit v1
when their internal serializer round-trips through protobuf.
Tests stay green (91 canvas, 86 workspace).
Image-only chats surface "Error: message contained no text content"
because canvas posts v0 `{kind:"file", file:{uri,name,mimeType}}` shapes
that the workspace runtime's a2a-sdk v1 protobuf parser silently drops:
v1 `Part` has fields `[text, raw, url, data, metadata, filename,
media_type]` and `ignore_unknown_fields=True` discards `kind`+`file`,
producing a fully-empty Part. With no text and no extracted file
attachments, the executor's "no text content" guard fires.
Three coordinated changes close the gap:
1. canvas/ChatTab.tsx — outbound file parts now carry the v1 flat
shape `{url, filename, mediaType}` so the v1 protobuf parser
populates Part fields instead of dropping them.
2. workspace/executor_helpers.py — extract_attached_files learns the
v1 detection branch (non-empty `part.url` + `filename` +
`media_type`) alongside the existing v0 RootModel and flat-file
shapes. Defends every runtime that mounts the OSS wheel against
the same drop, including any pre-fix client still on the wire.
3. canvas/message-parser.ts — extractFilesFromTask tolerates the v1
shape on incoming agent responses too, so file chips render in
chat history regardless of which Part shape the runtime emits.
Test pins:
- workspace/tests/test_executor_helpers.py:
+ v1 protobuf shape extraction
+ empty-Part defense (v0→v1 silent-drop fall-through returns [])
- canvas message-parser test:
+ v1 protobuf flat parts
+ filename fallback to URL basename for v1
Previously the picker modal opened only when preflight failed OR the
template offered ≥2 provider options. Single-provider templates with
saved keys (claude-code, langgraph) deployed silently using the
template's compiled-in default model — denying the user a final
chance to override before an EC2 boots and burns billing on the
wrong tier.
The picker UI already supports the "all-keys-saved single-provider"
case as a confirm-only prompt (provider radio is hidden, model input
is pre-filled with template.model), so flipping shouldShowPicker to
unconditional is a one-line change with the picker UX absorbing it.
Test plan
- Existing "single-provider skips picker when preflight.ok" regression
guard inverted to assert picker always opens.
- Three happy-path tests refactored to drive through the picker via
a new deployThroughPicker helper instead of expecting an immediate
POST.
- POST-failure tests likewise refactored — the failure now surfaces
through the picker click-through path, not the direct deploy()
call.
- 15/15 tests pass; deploy-preflight.test.ts unchanged + 20/20.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Self-review of #2460 found two issues:
1. Critical: Override button in ProviderPickerModal called
/settings/secrets when no workspaceId, overwriting the GLOBAL
secret used by every workspace. The only consumers of this
modal today (TemplatePalette, EmptyState via useTemplateDeploy)
never pass workspaceId, so Override was always destructive.
Removed entirely — the picker still solves the user-reported
bug (always-ask + reuse saved keys); per-workspace key override
can be a separate PR that plumbs secrets through POST /workspaces.
2. Optional: /settings/secrets was being fetched twice — once
inside checkDeploySecrets (silently) and again in the hook to
populate configuredKeys. Surfaced configuredKeys on
PreflightResult so the hook re-uses the existing fetch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Clicking a hermes template tile silently deployed when global env
covered the API key, producing "No LLM provider configured" 500
because the workspace booted with no explicit model slug — the
adapter fell back to its compiled-in default which 401s on the
user's actual provider key.
Fix: in useTemplateDeploy, open the picker whenever the template
declares ≥2 provider options, even when preflight.ok=true. The
modal renders pre-saved keys as Saved (with an Override link) and
adds a model input pre-filled from the template's default. Single-
provider templates (claude-code, langgraph) still skip the picker
since there's nothing to choose.
POST /workspaces now includes the picker's model slug so hermes-
style routing reads the prefix at install time.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors the data-driven pattern PR #2454 set in ConfigTab: read
runtime_config.providers from /templates and filter the modal's
provider <select> to that subset. Same source of truth, three fewer
hardcoded copies of the provider list.
Behavior:
- Template declares providers → dropdown shows only those.
- Template ships no providers field → fall back to full HERMES_PROVIDERS
catalog (back-compat for older templates / self-hosted setups).
- Declared list has no overlap with our static metadata → fall back to
full catalog so the form can't lock the operator out.
- hermesProvider snaps back to the first available pick when its
current value falls out of the filtered list.
Tests: 3 new pinning the filter, no-providers-field fallback, and
the unknown-providers fallback. All 27 CreateWorkspaceDialog tests
pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Option B PR-5. Canvas Config tab now exposes a Provider override input
that's adapter-driven from each runtime's template — no hardcoded
provider list in the canvas. PUT /workspaces/:id/provider on Save
when dirty; auto-restart suppression to avoid double-restart with
the model handler's own restart.
The dropdown's suggestion list comes from /templates →
runtime_config.providers (the field added in
molecule-ai-workspace-template-hermes PR #31). For templates that
haven't migrated to the explicit providers list yet, suggestions
derive from model[].id slug prefixes — still adapter-driven, just
inferred. This keeps existing templates working while platform team
migrates them one at a time.
workspace-server changes:
- Add Providers []string field to templateSummary JSON
- Parse runtime_config.providers in /templates handler
- 2 new tests pin the surfacing + omitempty behavior
canvas changes:
- Remove hardcoded PROVIDER_SUGGESTIONS constant
- Add provider/originalProvider state + PUT-on-save logic
- Add deriveProvidersFromModels() fallback helper
- Wire RuntimeOption.providers from /templates response
- 8 new tests pin the behavior end-to-end
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Model dropdown's onChange writes to config.runtime_config.model
whenever a runtime is set (hermes, claude-code, etc.), and only
falls back to top-level config.model when no runtime is selected.
But handleSave used to diff the new value against top-level
nextSource.model only — so for any runtime-bearing workspace, the
PUT /workspaces/:id/model never fired and MODEL_PROVIDER never
landed in workspace_secrets.
Symptom (2026-04-30, hongmingwang Hermes Agent
32993ee7-840e-4c02-8ca8-cb9d75d112a5):
- User picks minimax/MiniMax-M2.7-highspeed from the dropdown
- Hits Save & Restart
- Save reports success; restart fires
- The new EC2 boots with HERMES_DEFAULT_MODEL empty
- install.sh defaults to nousresearch/hermes-4-70b
- hermes-agent errors "No LLM provider configured" on every chat
turn because no NOUS_API_KEY / OPENROUTER_API_KEY is set
- Reload Config tab → model field reverts to whatever
GET /workspaces/:id/model returns (i.e. empty / template default)
handleSave now reads the effective model from runtime_config.model
first and falls back to top-level model for legacy no-runtime
workspaces. Same change for the old-value diff so a no-op Save
still skips the PUT.
Tests pin both branches: PUTs /model when the dropdown changed
runtime_config.model on a hermes workspace; does NOT PUT when
the value is unchanged from what GET /model returned.
Critical:
- ExternalConnectModal.tsx: filledUniversalMcp substitution searched
for WORKSPACE_AUTH_TOKEN but the snippet's placeholder is now
MOLECULE_WORKSPACE_TOKEN (changed in the previous polish commit
876c0bfc). Operators copy-pasting the MCP tab would have gotten a
literal "<paste from create response>" instead of the token. Fix
the substitution to match the new placeholder name.
Important:
- mcp_cli._platform_register: 401/403 from initial register now hard-
exits with code 3 + an actionable stderr message pointing the
operator at the canvas Tokens tab. Pre-fix: warning log + continue,
which made a bad-token startup silently fail (heartbeat 401's
forever, every tool call also 401's, no clear surfacing in the
operator's MCP client). 500/503 still log + continue (transient
platform blips shouldn't abort the MCP loop).
- a2a_mcp_server.cli_main docstring: removed stale claim that this is
the wheel's console-script entry-point target. The actual target is
mcp_cli.main since 2026-04-30. Wheel-smoke pins both names so the
functionality was correct, but the doc was lying.
Test coverage: 3 new mcp_cli tests:
- register 401 exits code=3 + stderr mentions canvas Tokens tab
- register 403 (C18 hijack rejection) takes same path
- register 500/503 does NOT exit — only auth errors hard-fail
Findings deferred to follow-up (acceptable per review rubric):
- Code dedup across mcp_cli / heartbeat.py / molecule_agent SDK
- Pooled httpx.Client for connection reuse
- Heartbeat exponential backoff
- Token-resolution ordering parity (env-first vs file-first)
between mcp_cli.main and platform_auth.get_token
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The canvas tab snippet for the Universal MCP path was written before
this PR added the built-in register + heartbeat thread. Earlier wording
described it as "outbound-only — pair with the Claude Code or Python SDK
tab for heartbeat + inbound messages" — that's stale. molecule-mcp now
handles register + heartbeat itself; the only thing it doesn't yet do is
inbound A2A delivery.
Updated:
- externalUniversalMcpTemplate header comment + body — describes
standalone behavior, points operators at SDK/channel only when they
need INBOUND (not heartbeat).
- Drops the now-redundant curl-register step from the snippet — the
binary registers itself on startup.
- Canvas modal label likewise updated.
No runtime / behavior change; pure docs polish so a copy-pasting
operator's mental model matches what the binary actually does.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The "Connect your external agent" dialog already covered Claude Code,
Python SDK, curl, and raw fields. This adds a Universal MCP tab that
documents the new \`molecule-mcp\` console script — the runtime-
agnostic baseline shipped by PR #2413's workspace-runtime changes.
Surface area:
- New \`externalUniversalMcpTemplate\` constant in workspace-server.
Three-step snippet: pip install runtime → one-shot register via curl
→ wire molecule-mcp into agent's MCP config (Claude Code example,
notes that hermes/codex/etc. take the same env-var contract).
- Workspace create response now includes \`universal_mcp_snippet\`
alongside the existing curl/python/channel snippets.
- Canvas modal renders the tab when \`universal_mcp_snippet\` is
present; backward-compatible with older platform builds (tab hides
when empty).
Origin/WAF coverage (the user explicitly asked for this):
- The runtime wheel handles Origin automatically (this PR's earlier
commit on platform_auth.auth_headers).
- The curl tab now sets \`Origin: {{PLATFORM_URL}}\` preemptively
with an explanatory comment; \`/registry/register\` is currently
WAF-allowed without it but adding now keeps the snippet working
if WAF rules expand. The comment also explains why
\`/workspaces/*\` paths return empty 404 without Origin — the
exact failure mode I hit while smoke-testing this PR live.
- The MCP snippet's footer notes that the wheel auto-handles
Origin so operators don't think about it.
End-to-end verification (against live tenant
hongmingwang.moleculesai.app, freshly registered workspace):
- get_workspace_info → full JSON
- list_peers → "Claude Code Agent (ID: 97ac32e9..., status: online)"
- recall_memory → "No memories found."
all returned by the molecule-mcp binary speaking MCP stdio to
this Claude Code session.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Workspace-server has GET /buildinfo (PR #2398) — `curl https://<slug>.
moleculesai.app/buildinfo` returns the live git SHA. Canvas had no
parallel: debugging "is this the deployed code?" required reading
Vercel's UI or response headers (deployment ID, not git SHA).
Add canvas /api/buildinfo returning {git_sha, git_ref, vercel_env}
sourced from VERCEL_GIT_COMMIT_SHA / _REF / VERCEL_ENV — Vercel injects
these at build time from the deploying commit. Outside Vercel (local
`next dev`, harness) all three are unset and the endpoint returns
`git_sha: "dev"`, the same sentinel workspace-server uses pre-ldflags-
injection.
Now both surfaces speak the same vocabulary:
curl https://<slug>.moleculesai.app/buildinfo
curl https://canvas.moleculesai.app/api/buildinfo
3 tests cover dev-fallback, Vercel-injected SHA pass-through, and JSON
content type.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Setup wrote .playwright-staging-state.json at the END (step 7), only
after org create + provision-wait + TLS + workspace create + workspace-
online all succeeded. If setup crashed at steps 1-6, the org existed in
CP but the state file did not, so Playwright's globalTeardown bailed
out ("nothing to tear down") and the workflow safety-net pattern-swept
every e2e-canvas-<today>-* org to compensate. That sweep deleted
concurrent runs' live tenants — including their CF DNS records —
causing victims' next fetch to die with `getaddrinfo ENOTFOUND`.
Race observed 2026-04-30 on PR #2264 staging→main: three real-test
runs killed each other mid-test, blocking 68 commits of staging→main
promotion.
Fix: write the state file as setup's first action, right after slug
generation, before any CP call. Now:
- Crash before slug gen → no state file, no orphan to clean
- Crash during steps 1-6 → state file has slug; teardown deletes
it (DELETE 404s if org never created)
- Setup completes → state file has full state; teardown
deletes the slug
The workflow safety-net no longer pattern-sweeps; it reads the state
file and deletes only the recorded slug. Concurrent canvas-E2E runs no
longer poison each other.
Verified by:
- tsc --noEmit on staging-setup.ts + staging-teardown.ts
- YAML lint on e2e-staging-canvas.yml
- Code review: state file write moved to line 113 (post-makeSlug,
pre-CP) with the original line-249 write retained as a "promote
to full state" overwrite at the end
When the platform's create-external-workspace response includes
`claude_code_channel_snippet` (added in this same PR's first commit),
the modal surfaces it as the **first** tab — defaulting to it for new
external workspaces because polling-based + no-tunnel is the lowest-
friction path. Falls back to Python tab when the field is absent
(older platform builds).
Type addition is optional (`claude_code_channel_snippet?: string`)
so the canvas keeps building against pre-#2304 platform responses
during the soak window.
Auth-token stamping mirrors existing python/curl behavior — the
.env's `MOLECULE_WORKSPACE_TOKENS=<paste auth_token from create
response>` placeholder gets filled in client-side so the copy-paste
block is truly ready to run.
Also adds the missing 'use client' directive — the file uses useState
+ useCallback but didn't have the Next.js client-component marker.
Pre-commit caught it; existing absence was a latent bug that would
surface as an SSR hook error if any path rendered this component
during server rendering.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Consolidates the remaining safe-to-merge dependabot PRs from the
2026-04-28 wave into one consumable PR. Replaces three earlier
single-bump PRs (#2245, #2230, #2231) which were closed in favor of
this single batch — same pattern as #2235.
GitHub Actions majors (SHA-pinned per org convention):
github/codeql-action v3 → v4.35.2 (#2228)
actions/setup-node v4 → v6.4.0 (#2218)
actions/upload-artifact v4 → v7.0.1 (#2216)
actions/setup-python v5 → v6.2.0 (#2214)
npm dev deps (canvas/, lockfile regenerated in node:22-bookworm
container so @emnapi/* and other Linux-only optional deps are
properly resolved — Mac-native `npm install` strips them, which
caused the earlier #2235 batch to drop these two):
@types/node ^22 → ^25.6 (#2231)
jsdom ^25 → ^29.1 (#2230)
Why each is safe
setup-node v4 → v6 / setup-python v5 → v6:
Every consumer call pins node-version / python-version
explicitly. v5 / v6 changed defaults but pinned consumers
are unaffected. Confirmed via grep across .github/workflows/
— all setup-node call sites pin '20' or '22', all
setup-python call sites pin '3.11'.
codeql-action v3 → v4.35.2:
Used as init/autobuild/analyze sub-actions in codeql.yml.
v4 bundles a newer CodeQL CLI; ubuntu-latest auto-updates
so functional behavior is unchanged. The deprecated
CODEQL_ACTION_CLEANUP_TRAP_CACHES env var (per v4.35.2
release notes) is undocumented and we don't set it.
upload-artifact v4 → v7.0.1:
v6 introduced Node.js 24 runtime requiring Actions Runner
>= 2.327.1. All upload-artifact users (codeql.yml,
e2e-staging-canvas.yml) run on `ubuntu-latest` (GitHub-
hosted), which auto-updates the runner agent. Self-hosted
runners are NOT used for these jobs.
@types/node 22 → 25 / jsdom 25 → 29:
Both are dev-only — @types/node is type definitions,
jsdom backs vitest's DOM environment. Tests pass:
79 files / 1154 tests in node:22-bookworm container.
Verified locally (Linux container so the lockfile reflects what
CI's `npm ci` will install):
- cd canvas && npm install --include=optional → 169 packages
- npm test → 1154/1154 pass
- npm ci → clean install succeeds
- npm run build → Next.js prerendering succeeds
Closes when this lands (the 3 individual auto-merge PRs from earlier
were closed):
#2228#2218#2216#2214#2231#2230
NOT included (CI failing on dependabot's own run — major framework
bumps that need code-side migration tasks, not safe auto-bumps):
#2233 next 15 → 16
#2232 tailwindcss 3 → 4
#2226 typescript 5 → 6
Both AgentCommsPanel and ChatTab's activity-feed opened raw
`new WebSocket(WS_URL)` instances per mount, with no onclose handler
and no reconnect logic. When the underlying connection dropped — idle
timeout, browser background-tab throttle, network jitter — the per-
panel sockets stayed dead until the panel re-mounted (refresh or
sub-tab unmount/remount). Live agent-comms bubbles and live activity
feed lines silently went missing in the gap, manifesting as "the
delegation didn't show up until I refreshed."
The global ReconnectingSocket in store/socket.ts already owns
reconnect, exponential backoff, health-check, and HTTP fallback poll.
Routing component subscribers through it gives every consumer those
guarantees for free, with one TCP connection per tab instead of N.
Three new pieces:
- store/socket-events.ts: tiny pub/sub bus. emitSocketEvent fan-outs
every decoded WSMessage to the listener Set; subscribeSocketEvents
returns an unsubscribe. A throwing listener is logged and isolated
so it can't break siblings.
- store/socket.ts: ws.onmessage now calls emitSocketEvent(msg) right
after applyEvent(msg), so the store's derived state and component
subscribers stay in lockstep on every event arrival.
- hooks/useSocketEvent.ts: React hook that registers exactly once
per mount, capturing the latest handler in a ref so the closure
sees current state/props without re-subscribing on every render.
Refactored sites:
- AgentCommsPanel: replaced its WebSocket-in-useEffect block with
useSocketEvent. Same parsing logic; the panel no longer opens its
own connection.
- ChatTab activity feed: split the previous useEffect in two — one
seeds the activity log when `sending` flips, the other subscribes
unconditionally and gates work on `sending` inside the handler.
Hooks can't be conditional, so the gate has to live in the body
rather than around the effect.
The ws-close graceful-close helper is no longer needed in either
site; the global socket owns its own teardown.
Tests: 6 new tests for the bus contract (single delivery, fan-out
order, unsubscribe, throwing-listener isolation, no-subscriber emit,
duplicate-subscribe Set semantics). All 27 existing socket tests
still pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Third-pass review caught a fourth WS path I missed. The original fix +
the stale-callback follow-up patched 3 sites that release the in-flight
guards (pendingAgentMsgs effect, HTTP .then() success, HTTP .catch()
success), but the ACTIVITY_LOGGED handler at lines 410-419 also clears
`sending` + `sendingFromAPIRef` when the platform logs the workspace's
a2a_receive ok/error. It only cleared 2 of the 3 refs — same exact
bug class as the original. If THIS path wins the race (a2a_receive
activity logged before pendingAgentMsgs delivers the reply text),
sendInFlightRef stays stuck true and the next sendMessage() silently
no-ops at line 464.
Fix: route both branches (ok and error) through releaseSendGuards()
so all four sites are now uniform.
Updated the helper's docstring to explicitly list all four sites and
warn that any future "I saw the reply" path that only clears the
natural pair (sending + sendingFromAPIRef) will silently re-introduce
the freeze. The disabled-button logic can't see sendInFlightRef so
the visible state diverges from the synchronous re-entry guard
otherwise.
This is exactly the drift `releaseSendGuards()` was supposed to
prevent — the helper landed in the prior commit but the activity-log
site wasn't migrated to use it. Fixing now closes the gap.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Self-review on PR #2185 surfaced a latent race the original fix exposed:
the WS-clears-guards path now releases sendInFlightRef immediately, which
means a user can fire msg #2 between WS-arrival and HTTP-arrival for
msg #1. Without coordination, msg #1's late .then() sees
sendingFromAPIRef=true (set by msg #2's send), enters the main body,
and runs setSending(false) + appendMessageDeduped against msg #1's
response body — clobbering msg #2's in-flight UI state.
This race is realistic for claude-code SDK: the comment at line 294-298
already calls WS the "authoritative reply arrived" signal, and the user
typically reads-then-types before the trailing HTTP completes. Without
the original Send-button freeze "protecting" the race, it surfaces.
Two changes:
1. Token-keyed callbacks. sendTokenRef bumps on every sendMessage
entry; .then()/.catch() capture the token in closure and bail
without touching any flags if a newer send has superseded them.
The newer send owns the in-flight guards.
2. releaseSendGuards() helper. The three-clear-guards trio
(setSending, sendingFromAPIRef, sendInFlightRef) now lives in one
useCallback so the WS handler, .then() success, and .catch()
success can't drift apart. A future contributor dropping one of
the three would silently re-introduce either the post-WS Send
freeze or the stale-callback clobber.
Skipped a unit test for this regression — ChatTab has no __tests__
file and a mount test would need WS + zustand + api mocks. The fix
is 4 logical lines (token capture + 2 guard checks) and the manual
test covers it. Follow-up to add a focused mount test when ChatTab
gets its first __tests__ file.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Send button + Enter both silently no-op'd after the first agent reply
on runtimes that deliver via WebSocket (claude-code SDK does this per
the comment at ChatTab.tsx:294-298). The visible disabled-state checks
(sending, uploading, agentReachable) were all clean — the freeze came
from a third synchronous reentry guard the button can't see:
if (sendInFlightRef.current) return; // ChatTab.tsx:438
The ref was set true at the start of sendMessage() and only cleared in
.then() / .catch() of the HTTP fall-through and the upload-failure
branch. The WS-push handler in the pendingAgentMsgs effect cleared
`sending` and `sendingFromAPIRef` but left `sendInFlightRef` stuck
true. The HTTP .then() then early-returned at the dedup check (line
513) without touching the ref — only the .catch() early-return path
did. Net result: refresh fixed it because the ref reset on remount.
Two-line fix:
- WS handler: also clear sendInFlightRef when the push delivers the
reply (primary fix; no race window where the ref is stuck while
the user can already type)
- .then() early-return: mirror .catch()'s cleanup as defense in
depth, so neither delivery order leaks the ref
While here: A2AEdge.test.tsx fixture was typed `as never` to dodge
EdgeProps' discriminated-union complaint, which broke spreading at
the call sites with TS2698 ("Spread types may only be created from
object types"). Replaced with `as unknown as ComponentProps<typeof
A2AEdge>` — preserves the original "skip restating every optional
field" intent and keeps a spreadable type.
All 10 A2AEdge tests pass; tsc --noEmit is clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The first commit on this branch left the lockfile inconsistent for
Node 20's npm 10:
npm error \`npm ci\` can only install packages when your package.json
and package-lock.json are in sync. Please update your lock file...
npm error Missing: @emnapi/runtime@1.10.0 from lock file
npm error Missing: @emnapi/core@1.10.0 from lock file
Root cause: my local install ran on Node 24 / npm 11, which doesn't
write peer-optional transitive entries (@img/sharp-* declares
@emnapi/runtime as peerOptional). The Canvas tabs E2E job uses Node 20
/ npm 10, which DOES expect those entries and rejected the lockfile
with EUSAGE.
Regenerated the lockfile under Node 20.19.4 (matches the lowest CI
node version, lockfile is forward-compatible with 22 and 24). 6 new
@emnapi/* entries added; postcss stays at 8.5.12 (the original goal
of this branch).
Verification:
- \`nvm use 20 && npm ci\` clean
- 1148/1148 vitest pass under Node 20
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the medium-severity dependabot alert on canvas/package-lock.json.
Upstream advisory GHSA-qx2v-qp2m-jg93: "PostCSS has XSS via Unescaped
</style> in its CSS Stringify Output" — fixed in 8.5.10. We pull
8.5.12 since it's already published in the ^8.5.10 line.
package.json's caret range bumps from ^8.4.0 to ^8.5.12 — wider floor
prevents a future install from re-pinning below the safe version. The
8.x major-line constraint is preserved, so no breaking-change risk.
Verification: full canvas vitest suite passes (1148/1148 across
78 files).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
[Molecule-Platform-Evolvement-Manager]
Continues the #1815 coverage rollup. classNames.ts was at 17%
in the baseline; this PR brings it to full coverage.
16 cases across 3 helpers:
**appendClass (6):**
- undefined / empty existing → just `cls`
- single-class → "a b" join
- DEDUP: existing already contains `cls` → existing unchanged.
This is the load-bearing reason classNames.ts exists. Pre-helper
the call sites inlined `${existing} ${cls}` with no dedup, so a
tick that fired the same class twice produced "a a" and React
Flow's className-equality diff saw it as a change every render.
- whitespace normalization (multi-space, leading/trailing)
**removeClass (7):**
- undefined / empty existing → ""
- removes named class
- exact match only ("spawn" must NOT match "spawn-fast")
- removing the only class → ""
- no-op when class absent
- whitespace normalization
**scheduleNodeClassRemoval (3):**
- after delayMs: calls set() with className-removed on target node;
OTHER nodes untouched (the per-id pruning is the contract — pin
it so a future refactor that maps over all nodes doesn't silently
strip classes from siblings)
- does NOT fire before the delay elapses (vi.useFakeTimers + advance)
- SSR safety: when window is undefined, function is a no-op
(neither get nor set fires)
## Note on test environment
Added `// @vitest-environment jsdom` directive — the file's
default `node` environment leaves `window` undefined, which would
make the SSR-guard happy-path test pass for the wrong reason
(every test would short-circuit). With jsdom, the SSR test
explicitly stubs `window` to undefined to exercise the guard.
## Test plan
- [x] All 16 cases pass locally (~1.1s with jsdom env spin-up)
- [x] No SUT changes
- [ ] CI green
## #1815 progress
- [x] Step 1+2: instrumentation (#2147)
- [x] utils.ts + runtime-names.ts (#2148)
- [x] canvas-actions.ts (#2149)
- [x] store/classNames.ts (this PR)
- [ ] store/canvas.ts (73% — biggest absolute gap; bigger surface,
separate cycle)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
[Molecule-Platform-Evolvement-Manager]
Continues the #1815 coverage rollup. canvas-actions.ts was at 25%
in the baseline run from #2147; this PR brings the file's two
helpers to full coverage.
5 cases:
**markAllWorkspacesNeedRestart (3):**
- calls updateNodeData on every node with `{needsRestart: true}`
- no-op when the canvas has zero workspaces
- preserves call ordering — matters because the toolbar's
Restart Pending pill observes per-node data changes
incrementally; a refactor that shuffled iteration order would
silently change which workspaces flash first
**markWorkspaceNeedsRestart (2):**
- targeted call: updateNodeData fires exactly once on the named id
- defensive: regardless of how many other workspaces exist in the
store, only the target workspace gets updated. Pre-this-test, a
refactor that accidentally wired this function through the
per-node iteration path of markAll would silently mark every
workspace — pinning the cardinality here catches that.
## Mock strategy
Standard pattern for canvas store: mock useCanvasStore as both the
selector function AND a getState()-bearing object. updateNodeData
is a vi.fn() spy so the test asserts on calls + args directly.
## Test plan
- [x] All 5 cases pass locally (~132ms)
- [x] No SUT changes — pure additive coverage
- [ ] CI green
## #1815 progress
- [x] Step 1+2: instrumentation + script (#2147)
- [x] utils.ts + runtime-names.ts (#2148)
- [x] canvas-actions.ts (this PR)
- [ ] Remaining low-coverage targets: store/classNames.ts (17%),
store/canvas.ts (73% — largest absolute gap by lines)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
[Molecule-Platform-Evolvement-Manager]
Closes two of the 0%-coverage files surfaced by the baseline run in
PR #2147 (vitest coverage instrumentation). Both files are tiny
utility helpers with high-touch read paths.
## utils.cn (8 cases)
Wraps `twMerge(clsx(inputs))` — every conditionally-styled component
flows through here. The load-bearing case is the **last-wins
Tailwind dedup**: `cn("p-2", "p-4")` → "p-4". A regression that lost
twMerge would silently double-apply utilities (cosmetically broken,
breaks `:where()` rules + theme overrides).
Cases:
- single class unchanged
- multiple positional classes joined
- array input flattening (clsx)
- object syntax with truthy/falsy keys
- last-wins dedup on conflicting Tailwind utilities (the
regression-locked guarantee)
- non-conflicting utilities both survive (p-2 + m-4)
- mixed input shapes (string + array + object + string)
- nullish / empty inputs don't throw
## runtime-names.runtimeDisplayName (4 it.each cases + 3 it())
Friendly-name lookup that surfaces the workspace runtime in the chat
indicator, details tab, and a few component labels.
Cases:
- known runtimes map to display strings
(claude-code → Claude Code, langgraph → LangGraph, etc.)
- unknown runtime falls back to input string verbatim
(a NEW runtime not yet in the lookup still renders something
operator-debuggable rather than a generic placeholder)
- empty string falls back to "agent" (final default)
- case-sensitivity pinned: "Claude-Code" / "LANGGRAPH" miss the
lookup. The upstream slug is already normalized lowercase, so a
future refactor that lowercases input "for safety" would
silently change behavior — pinning the contract here.
## Test plan
- [x] All 17 cases pass locally (~129ms)
- [x] No SUT changes — pure additive coverage
- [ ] CI green
## #1815 progress
- [x] Step 1+2: coverage instrumentation + script (#2147)
- [x] 0%-file gaps utils.ts + runtime-names.ts (this PR)
- [ ] More 0%/low-coverage files: lib/canvas-actions.ts (25%),
store/classNames.ts (17%) — separate PRs
- [ ] Step 3b: thresholds + CI gate once baseline catches up
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
[Molecule-Platform-Evolvement-Manager]
Closes step 1+2 of #1815. Step 3 (CI gate + threshold) is split into
a follow-up because today's baseline is ~46% lines / ~45% statements,
not the 70% the issue's draft thresholds assumed.
## What this lands
- `canvas/vitest.config.ts` — `coverage` block with v8 provider,
reporters: text (terminal) / html (./coverage/index.html) /
json-summary (machine-readable for tooling). NO threshold —
pure observability.
- `canvas/package.json` — adds `test:coverage` script
(`vitest run --coverage`); existing `test` script is unchanged so
the default workflow is identical.
- `canvas/package-lock.json` — adds @vitest/coverage-v8@^4.1.5 (the
v8 provider Vitest uses for native coverage).
## Why no threshold yet
Issue draft threshold was 70%/70%/65%/70% (lines/funcs/branches/stmts).
Local baseline today:
```
Statements : 45.19% (3248/7186)
Branches : 39.87% (2034/5101)
Functions : 40.99% (724/1766)
Lines : 46.36% (2905/6265)
```
Turning on a 70% gate today would either fail CI immediately or get
papered over with an ad-hoc exclude list. Better path: land
observability now, run coverage in PR review for any new code
(via the new script), gate later when the baseline catches up.
## Heatmap (from local run, top gaps)
- `src/lib/runtime-names.ts` — 0% (untouched by tests)
- `src/lib/utils.ts` — 0%
- `src/lib/canvas-actions.ts` — 25%
- `src/store/classNames.ts` — 17%
- `src/store/canvas.ts` — 73% (already-tested but the largest absolute
gap by lines)
Each is a concrete follow-up issue / PR target.
## Test plan
- [x] `npx vitest run --coverage` runs cleanly locally (~10s) and
produces `./coverage/index.html` + a `coverage-summary.json`
- [x] Existing `npm run test` workflow unchanged — instrumentation
only activates with `--coverage` flag
- [x] No production-code changes — pure tooling addition
## Follow-ups (each tracked separately; this PR keeps minimal scope)
- Step 3a — write tests for the 0% files above (~tiny each)
- Step 3b — once baseline ≥ thresholds, add `thresholds` block to
vitest.config.ts + a `npm run test:coverage` step in
`.github/workflows/ci.yml`'s Canvas job
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
[Molecule-Platform-Evolvement-Manager]
Closes the fourth and final item from #2071 — but at a slightly
different layer than the issue listed: tests `dragUtils.ts` (the
74-LOC pure-ish geometry helpers) instead of the full 296-LOC
`useDragHandlers` hook. Rationale below.
15 cases across 2 buckets:
**shouldDetach (8):**
- child fully inside parent → false
- child drifted slightly past edge but under DETACH_FRACTION → false
- child past 20% threshold on X → true (un-nest)
- child past 20% threshold on Y → true (un-nest)
- missing child node → true (conservative fallback per source comment)
- missing parent node → true (same)
- measured size absent → falls back to React Flow's 220x120 defaults
(mirrors initial-mount race where measurement hasn't run yet)
- DETACH_FRACTION constant pinned at 0.2 (Miro/tldraw convention)
**clampChildIntoParent (7):**
- child already inside bounds → no-op (no setState — proven by
reference equality on mockState.nodes)
- drifted past top-left → clamps to (0, 0)
- drifted past bottom-right → clamps to (parentW - childW, parentH - childH)
- per-axis independence: X past edge + Y inside → only X clamps
- child not in store → early return, no setState
- child internalNode missing → early return, no setState
- multi-node store: clamping one node MUST NOT touch siblings
## Why dragUtils, not the full useDragHandlers hook
The hook (296 LOC) orchestrates React Flow drag events + Zustand
mutations. Testing it would need heavyweight `useReactFlow` +
internal-node + `setDragOverNode` / `nestNode` / `batchNest` /
`isDescendant` mocks just to drive event handlers — and the
*decisions* the hook makes all delegate to these two helpers:
- `shouldDetach` decides "is this a real un-nest?"
- `clampChildIntoParent` snaps the child back when the user drifted
slightly past the edge without holding Alt/Cmd
Pinning these locks the hot path the user feels. The hook's
remaining surface (modifier-key snapshotting, drop-target
broadcasting, commit-on-release grow pass) is plumbing — worth
testing as a follow-up if it ever regresses, but lower
correctness leverage per LOC of test setup.
## #2071 status after this PR
- [x] useTemplateDeploy (#2121)
- [x] A2AEdge (#2143)
- [x] OrgCancelButton (#2145)
- [x] dragUtils geometry helpers (this PR)
- [ ] Full useDragHandlers hook orchestration — explicit deferral
with rationale above
## Test plan
- [x] All 15 cases pass locally (`vitest run dragUtils.test.ts` — 131ms)
- [x] No changes to the SUT — pure additive coverage
- [ ] CI green
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
[Molecule-Platform-Evolvement-Manager]
Closes the third item from #2071 (Canvas test gaps follow-up). Builds
on the A2AEdge tests in PR #2143.
10 cases across 4 buckets:
**Render (2):**
- Default pill with `Cancel (N)` text + correct ARIA label
- Confirm dialog NOT visible until pill click
**Pill click (3):**
- Click flips to confirming view + stops propagation (so React Flow
doesn't interpret the click as a node selection)
- Confirm copy pluralizes correctly: count=1 → "Delete 1 workspace?",
count>1 → "Delete N workspaces?". Negative assertion guards against
the wrong-form regressing in either direction.
**No / cancel-confirm (1):**
- Click No → returns to pill, no API call, no store mutation
**Yes / cascade-delete (4):**
- Happy path: beginDelete locks the WHOLE subtree (root + children,
NOT unrelated workspace) → api.del("/workspaces/<id>?confirm=true")
→ optimistic store filter strips subtree, keeps unrelated → success
toast → endDelete in finally
- WS-event race: WS_REMOVED handler clears the root mid-flight. The
bail-out branch (`!postDeleteState.nodes.some(n => n.id === rootId)`)
must NOT then run a second optimistic filter. Pre-fix the post-await
subtree walk would miss any orphaned descendants whose parentId got
reparented upward by handleCanvasEvent — pinned now.
- Error path: api.del rejects → endDelete UNDOes the lock + error
toast surfaces the message → subtree STAYS in the store so the user
can retry / interact with the still-deploying nodes
- Non-Error rejection (e.g. string thrown directly): toast surfaces
the canned "Cancel failed" fallback instead of attempting `.message`
## Mocking
- `@/lib/api`, `@/components/Toaster`: simple spy mocks
- `@/store/canvas`: object that satisfies BOTH the selector pattern
(`useCanvasStore(s => s.x)`) AND `getState()` / `setState()` since
the cascade-delete handler walks the subtree via `getState()` and
mutates via `setState()` for the optimistic removal. `vi.hoisted`
preserves referential identity so the mock fns wired into the
state object are observed by every consumer.
## Test plan
- [x] All 10 cases pass locally (`vitest run OrgCancelButton.test.tsx` — ~990ms)
- [x] No changes to the SUT — pure additive coverage
- [ ] CI green
## #2071 progress after this PR
- [x] useTemplateDeploy (PR #2121)
- [x] A2AEdge (PR #2143)
- [x] OrgCancelButton (this PR)
- [ ] useDragHandlers — separate PR
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
[Molecule-Platform-Evolvement-Manager]
Closes the second item from #2071 (Canvas test gaps follow-up):
adds behavioural coverage for the custom React Flow edge that renders
delegation counts between workspaces and routes a click into the
source workspace's Activity feed.
10 cases across 2 buckets:
**Render (6):**
- Empty label → BaseEdge only, NO portaled HTML pill (the most
common state for cold edges; pill must not render-through-empty)
- Non-empty label → pill renders with the exact label text
- isHot=true → violet accent classes; blue accent NOT present
- isHot=false → blue accent classes
- ARIA pluralization: count=1 → "1 delegation from …" (singular)
- ARIA pluralization: count=7 → "7 delegations from …" (plural)
**Click behaviour (4):**
- Click → selectNode(source)
- FRESH selection (selectedNodeId != source) → also setPanelTab("activity")
- RE-click of already-selected source → setPanelTab MUST NOT fire
(this is the regression-locked guarantee — preserves the user's
current tab when they intentionally moved to Chat / Memory while
inspecting the same peer)
- stopPropagation: parent onClick must NOT see the event (otherwise
the canvas Pane's clear-selection handler would fire and undo the
edge's own selectNode call)
## Mocking strategy
- `@xyflow/react`: BaseEdge → <g data-testid>, EdgeLabelRenderer →
inline pass-through (no portal), getBezierPath → fixed [path, x, y].
Lets the test render the component without a ReactFlow provider.
- `@/store/canvas`: vi.hoisted-shared mock state with selectNode +
setPanelTab spies and a mutable selectedNodeId. The store's
getState() returns the same object so the click handler's
`useCanvasStore.getState().selectedNodeId` lookup works.
Pattern matches the existing `A2ATopologyOverlay.test.tsx` setup
in the same module.
## Test plan
- [x] All 10 cases pass locally (`vitest run A2AEdge.test.tsx` — ~1.3s)
- [x] No changes to the SUT — pure additive coverage
- [ ] CI green
## Remaining #2071 items
- OrgCancelButton tests
- useDragHandlers tests
Each is a separate PR.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reviewer bot flagged: ChatTab.tsx imported extractResponseText but
no longer used it after the loop body moved to historyHydration.ts
(the helper imports it directly). Drop from the named import to
unblock merge. extractFilesFromTask remains used at line 515 for the
WS A2A_RESPONSE handler's reply-files extraction.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reviewer follow-up to PR #2134 (Optional finding). The history loader
walked text on the user branch but never extracted file parts — so a
chat reload after a session where the user dragged in a file rendered
the text bubble but lost the download chip. Symmetric to the agent
branch which already handles this via extractFilesFromTask.
Wire shape from ChatTab's outbound POST:
request_body = {params: {message: {parts: [
{kind: "text", text: "..."},
{kind: "file", file: {uri, name, mimeType?, size?}}
]}}}
extractFilesFromTask walks `task.parts`, so we feed it `params.message`
(the inner object that has the parts array). Three new tests:
- hydrates file attachments from request_body
- emits an attachments-only bubble when text is empty (drag-drop
without caption — pre-fix the empty userText short-circuited and
the row was dropped entirely)
- internal-self predicate suppresses the row even with attachments
(defence-in-depth for future internal triggers)
Stacked on #2134; this branch's parent commit is its tip.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User pushed back: the timestamp bug should have been caught by E2E.
Right — my earlier coverage tested the server contract (notify endpoint,
WS broadcast filter) but never the chat-history HYDRATION path. Without
a unit test that froze the wall clock and asserted timestamps came from
created_at, a future refactor could re-introduce the same bug.
This commit:
1. Extracts the per-row → ChatMessage[] mapping out of the closure
inside loadMessagesFromDB into chat/historyHydration.ts. Pure
function, no React dependency, easy to test.
2. Adds 12 vitest cases in __tests__/historyHydration.test.ts covering:
- Timestamp regression (3 tests, with system time frozen to 2030 so
a regression starts producing "2030-…" timestamps and the assertion
fails unmistakably). The third test mirrors the user's screenshot:
two rows with distinct created_at must produce distinct timestamps.
- User-message extraction (text, internal-self filter, null body)
- Agent-message extraction (text, error→system role, file attachments,
null body, body with neither text nor files)
- End-to-end: a single row with both request and response emits
two messages with the same timestamp (the canonical canvas-source
row pattern)
3. The new file-attachment test caught a SECOND latent bug — the helper
was passing `response_body.result ?? response_body` to extractFiles
FromTask, which passes the STRING "<text>" for the notify-with-
attachments shape `{result: "<text>", parts: [...]}` and silently
returns []. So a chat reload after an agent attached a file would
lose the chips. Fixed by only unwrapping `result` when it's an
object (the task-shape) and falling through to response_body
otherwise (the notify shape).
ChatTab now imports the helper and the loop body becomes one line:
`messages.push(...activityRowToMessages(a, isInternalSelfMessage))`.
Verification:
- 12/12 historyHydration tests pass
- 1072/1072 full canvas vitest pass (was 1060 before, +12)
- tsc --noEmit clean
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User flagged that all historical user bubbles render with the same
"now" clock after a chat reload — both messages in the screenshot
showed 9:01:58 PM despite being sent hours apart.
ChatTab.tsx:142 minted user messages with createMessage(...) which
calls new Date().toISOString() — fine for a freshly-typed message,
wrong for hydrated history. Every reload re-stamped all user bubbles
to the render moment, collapsing the visible chronology. The agent
path on line 157 already overrides with a.created_at; mirror that.
One-line fix (spread + override timestamp) plus a comment explaining
why the override is load-bearing so the next refactor doesn't drop it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User asked to "keep optimizing and comprehensive e2e testings to prove all
works as expected" for the communication path. Adds three layers of coverage
for PR #2130 (agent → user file attachments via send_message_to_user) since
that path has the most user-visible blast radius:
1. Shell E2E (tests/e2e/test_notify_attachments_e2e.sh) — pure platform test,
no workspace container needed. 14 assertions covering: notify text-only
round-trip, notify-with-attachments persists parts[].kind=file in the
shape extractFilesFromTask reads, per-element validation rejects empty
uri/name (regression for the missing gin `dive` bug), and a real
/chat/uploads → /notify URI round-trip when a container is up.
2. Canvas AGENT_MESSAGE handler tests (canvas-events.test.ts +5) — pin the
WebSocket-side filtering that drops malformed attachments, allows
attachments-only bubbles, ignores non-array payloads, and no-ops on
pure-empty events.
3. Persisted response_body shape test (message-parser.test.ts +1) — pins
the {result, parts} contract the chat history loader hydrates on
reload, so refreshing after an agent attachment restores both caption
and download chips.
Also wires the new shell E2E into e2e-api.yml so the contract regresses
in CI rather than only in manual runs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User flagged two paper cuts in Agent Comms after the grouping PR:
"Delegating to f6f3a023-ab3c-4a69-b101-976028a4a7ec" reads as gibberish
because it's a UUID, and the chat is "one way" with only outbound bubbles
even though peers are clearly responding.
Both fixes are in toCommMessage's delegation branch:
1. Pull text from the actual payload, not the platform's audit-log summary.
- delegate row → request_body.task (the task text the agent sent).
Fallback when missing: "Delegating to <resolved-peer-name>" — never
the raw UUID.
- delegate_result row → response_body.response_preview / .text (the
peer's actual reply). Fallback paths render human-readable status
for queued / failed cases ("Queued — Peer Agent is busy on a prior
task...") instead of platform jargon.
2. delegate_result rows render flow="in" — even though source_id=us
(the platform writes the row on our side), the conversational
direction is peer → us. The chat now shows alternating bubbles
(out: "Build me 10 landing pages" → in: "Done — ZIP at /tmp/...")
instead of one-sided "→ To X" wall.
The WS push handler in this same file now populates request_body /
response_body from the DELEGATION_SENT / DELEGATION_COMPLETE event
payloads (task_preview, response_preview), so live-pushed bubbles use
the same text-extraction path as the GET-on-mount.
Tests:
- 4 new in toCommMessage's delegation branch:
- delegate row prefers request_body.task over summary
- delegate row falls back to name-resolved label when task missing
- delegate_result row is INBOUND (flow="in")
- delegate_result queued shows human-readable wait message including
the resolved peer name
- Replaces the previous "delegate row maps text from summary" tests
which encoded the (now-undesirable) platform-summary-as-text behavior.
- All 15 tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The chronological-only view was a noodle once Director + N peers
exchange more than a few rounds. New layout: a sub-tab bar at the
top of the panel, with "All" pinned leftmost and one tab per peer
(name + count). Selecting a peer filters the thread to that one
DD↔X conversation; "All" preserves the previous chronological view
as the default.
Tab ordering follows Slack/Linear DM-list convention: most-recent
activity descending, so active conversations rise to the top
without the user scrolling. Counts in parens match Slack's unread
hint pattern (no separate read/unread state — the count is total
in this conversation, computed from the same in-memory message
list the panel already maintains).
Pure-helper extraction: peer-summary derivation lives in
`buildPeerSummary(messages)` so the sort + count logic is unit-
testable without rendering the panel. 5 new tests cover: count
aggregation, most-recent-first ordering, lastTs as max-not-last,
empty input, name-stability when the same peerId carries different
names across messages.
Keyboard: ArrowLeft/Right cycle peer tabs (matches the existing
My Chat / Agent Comms tab pattern in ChatTab). Auto-prune: if the
selected peer has zero messages after a setMessages update (rare,
e.g. dedupe drops the last bubble), fall back to "All" so the
viewer doesn't see an empty thread.
Frontend-only — no platform / runtime / DB changes. The existing
`peerId` / `peerName` fields on CommMessage already carry every
piece of data the new UI needs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two Critical bugs caught in code review of the agent→user attachments PR:
1. **Empty-URI attachments slipped past validation.** Gin's
go-playground/validator does NOT iterate slice elements without
`dive` — verified zero `dive` usage anywhere in workspace-server —
so the inner `binding:"required"` tags on NotifyAttachment.URI/Name
were never enforced. `attachments: [{"uri":"","name":""}]` would
pass validation, broadcast empty-URI chips that render blank in
canvas, AND persist them in activity_logs for every page reload to
re-render. Added explicit per-element validation in Notify (returns
400 with `attachment[i]: uri and name are required`) plus
defence-in-depth in the canvas filter (rejects empty strings, not
just non-strings).
3-case regression test pins the rejection.
2. **Hardcoded application/octet-stream stripped real mime types.**
`_upload_chat_files` always passed octet-stream as the multipart
Content-Type. chat_files.go:Upload reads `fh.Header.Get("Content-Type")`
FIRST and only falls back to extension-sniffing when the header is
empty, so every agent-attached file lost its real type forever —
broke the canvas's MIME-based icon/preview logic. Now sniff via
`mimetypes.guess_type(path)` and only fall back to octet-stream
when sniffing returns None.
Plus three Required nits:
- `sqlmockArgMatcher` was misleading — the closure always returned
true after capture, identical to `sqlmock.AnyArg()` semantics, but
named like a custom matcher. Renamed to `sqlmockCaptureArg(*string)`
so the intent (capture for post-call inspection, not validate via
driver-callback) is unambiguous.
- Test asserted notify call by `await_args_list[1]` index — fragile
to any future _upload_chat_files refactor that adds a pre-flight
POST. Now filter call list by URL suffix `/notify` and assert
exactly one match.
- Added `TestNotify_RejectsAttachmentWithEmptyURIOrName` (3 cases)
covering empty-uri, empty-name, both-empty so the Critical fix
stays defended.
Deferred to follow-up:
- ORDER BY tiebreaker for same-millisecond notifies — pre-existing
risk, not regression.
- Streaming multipart upload — bounded by the platform's 50MB total
cap so RAM ceiling is fixed; switch to streaming if cap rises.
- Symlink rejection — agent UID can already read whatever its
filesystem perms allow via the shell tool; rejecting symlinks
doesn't materially shrink the attack surface.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the gap where the Director would say "ZIP is ready at /tmp/foo.zip"
in plain text instead of attaching a download chip — the runtime literally
had no API for outbound file attachments. The canvas + platform's
chat-uploads infrastructure already supported the inbound (user → agent)
direction (commit 94d9331c); this PR wires the outbound side.
End-to-end shape:
agent: send_message_to_user("Done!", attachments=["/tmp/build.zip"])
↓ runtime
POST /workspaces/<self>/chat/uploads (multipart)
↓ platform
/workspace/.molecule/chat-uploads/<uuid>-build.zip
→ returns {uri: workspace:/...build.zip, name, mimeType, size}
↓ runtime
POST /workspaces/<self>/notify
{message: "Done!", attachments: [{uri, name, mimeType, size}]}
↓ platform
Broadcasts AGENT_MESSAGE with attachments + persists to activity_logs
with response_body = {result: "Done!", parts: [{kind:file, file:{...}}]}
↓ canvas
WS push: canvas-events.ts adds attachments to agentMessages queue
Reload: ChatTab.loadMessagesFromDB → extractFilesFromTask sees parts[]
Either path → ChatTab renders download chip via existing path
Files changed:
workspace-server/internal/handlers/activity.go
- NotifyAttachment struct {URI, Name, MimeType, Size}
- Notify body accepts attachments[], broadcasts in payload,
persists as response_body.parts[].kind="file"
canvas/src/store/canvas-events.ts
- AGENT_MESSAGE handler reads payload.attachments, type-validates
each entry, attaches to agentMessages queue
- Skips empty events (was: skipped only when content empty)
workspace/a2a_tools.py
- tool_send_message_to_user(message, attachments=[paths])
- New _upload_chat_files helper: opens each path, multipart POSTs
to /chat/uploads, returns the platform's metadata
- Fail-fast on missing file / upload error — never sends a notify
with a half-rendered attachment chip
workspace/a2a_mcp_server.py
- inputSchema declares attachments param so claude-code SDK
surfaces it to the model
- Defensive filter on the dispatch path (drops non-string entries
if the model sends a malformed payload)
Tests:
- 4 new Python: success path, missing file, upload 5xx, no-attach
backwards compat
- 1 new Go: Notify-with-attachments persists parts[] in
response_body so chat reload reconstructs the chip
Why /tmp paths work even though they're outside the canvas's allowed
roots: the runtime tool reads the bytes locally and re-uploads through
/chat/uploads, which lands the file under /workspace (an allowed root).
The agent can specify any readable path.
Does NOT include: agent → agent file transfer. Different design problem
(cross-workspace download auth: peer would need a credential to call
sender's /chat/download). Tracked as a follow-up under task #114.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
[Molecule-Platform-Evolvement-Manager]
Addresses github-code-quality finding on PR #2064:
> Comparison between inconvertible types
> Variable 'info' cannot be of type null, but it is compared to
> an expression of type null.
By line 75, `info` has been narrowed to non-null via the
`if (!info) return null;` guard at line 56 — so `open={info !== null}`
always evaluates to `true`. Switch to JSX shorthand `open` for
clarity and to silence the static check.
Behaviorally identical; the modal still opens whenever the parent
renders this component (which only happens with non-null info).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Critical follow-up to PR #2126's review. Two real bugs:
1. **Runtime QUEUED never resolved.** Platform's drain stitch updates
the platform's delegate_result row when a queued delegation finally
completes, but never pushes back to the runtime. The LLM polling
check_delegation_status saw status="queued" forever — combined with
the new docstring guidance ("queued → wait, peer will reply"), the
model would wait indefinitely on a state that never resolves.
Strictly worse than pre-PR behavior where it would have at least
bypassed.
2. **Live updates dead code.** delegation.go writes activity rows by
direct INSERT INTO activity_logs, bypassing the LogActivity helper
that fires ACTIVITY_LOGGED. Adding "delegation" to the canvas's
ACTIVITY_LOGGED filter (PR #2126 first cut) was inert — initial
GET worked, live updates did not.
Fix:
(1) Runtime side, workspace/builtin_tools/delegation.py:
- New `_refresh_queued_from_platform(task_id)` async helper that
pulls /workspaces/<self>/delegations and finds the platform-side
delegate_result row for our task_id.
- check_delegation_status calls _refresh when local status is
QUEUED, so the LLM's poll itself drives state convergence.
- Best-effort: GET failure leaves local state untouched, next
poll retries.
- Docstring updated to reflect the actual behavior ("polls
transparently — keep polling and you'll see the flip").
- 4 new tests cover: QUEUED → completed via refresh; QUEUED →
failed via refresh; refresh keeps QUEUED when platform hasn't
resolved; refresh swallows network errors safely.
(2) Canvas side, AgentCommsPanel.tsx WS push handler:
- Listens for DELEGATION_SENT / DELEGATION_STATUS / DELEGATION_COMPLETE
/ DELEGATION_FAILED in addition to ACTIVITY_LOGGED.
- Each event's payload synthesized into an ActivityEntry shape
so toCommMessage's existing delegation branch maps it. Status
derived: STATUS uses payload.status, COMPLETE → "completed",
FAILED → "failed", SENT → "pending".
- The ACTIVITY_LOGGED branch keeps the "delegation" type accepted
as a no-op-today / future-proof path: if delegation handlers
are ever refactored to call LogActivity, this lights up
automatically without another canvas change.
Doesn't change: the docstring guidance ("queued → wait, don't bypass")
is now actually load-bearing because the refresh path will deliver
the eventual outcome. Without the refresh, the guidance was a trap.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two bugs that compounded into the "Director does the work itself" UX:
1. workspace/builtin_tools/delegation.py: _execute_delegation only
handled HTTP 200 in the response branch. When the peer's a2a-proxy
returned HTTP 202 + {queued: true} (single-SDK-session bottleneck
on the peer), the loop fell through. Two iterations later the
`if "error" in result` check tried to access an unbound `result`,
the goroutine ended quietly, and the delegation stayed at FAILED
with error="None". The LLM checking status saw "failed" + the
platform's "Delegation queued — target at capacity" log line in
chat context, concluded the peer was permanently unavailable, and
bypassed delegation to do the work itself.
Fix: explicit 202+queued branch. Adds DelegationStatus.QUEUED,
marks the local delegation as QUEUED, mirrors to the platform,
and returns cleanly without retrying. The retry loop is for
transient transport errors — queueing is a real ack, not a failure
to retry against (retrying would just re-queue the same task).
check_delegation_status docstring extended with explicit per-status
guidance: pending/in_progress → wait, queued → wait (peer busy on
prior task, reply WILL arrive), completed → use result, failed →
real error in error field; only fall back on failed, never queued.
2. canvas/src/components/tabs/chat/AgentCommsPanel.tsx: filter dropped
every delegation row because it whitelisted only a2a_send /
a2a_receive. activity_type='delegation' rows (written by the
platform's /delegate handler with method='delegate' or
'delegate_result') never reached toCommMessage. User saw "No
agent-to-agent communications yet" while 6+ delegations existed
in the DB.
Fix: include "delegation" in the both the initial filter and the
WS push filter, plus a delegation branch in toCommMessage that
maps the row as outbound (always — platform proxies on our behalf)
and uses summary as the primary text source.
Tests:
- 3 new Python tests cover the 202+queued path: status becomes
QUEUED not FAILED; no retry on queued (counted by URL match
against the A2A target since the mock is shared across all
AsyncClient calls); bare 202 without {queued:true} still
falls through to the existing retry-then-FAILED path.
- 3 new TS tests cover the delegation mapper: 'delegate' row
maps as outbound to target with summary text; queued
'delegate_result' preserves status='queued' (load-bearing for
the LLM's wait-vs-bypass decision); missing target_id returns
null instead of rendering a ghost.
Does NOT solve: the underlying single-SDK-session bottleneck that
causes peers to queue in the first place. Tracked as task #102
(parallel SDK sessions per workspace) — real architectural work.
This PR makes the runtime handle the queueing correctly so the LLM
doesn't bail out, and makes the delegations visible in Agent Comms
so operators can see what's happening.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
[Molecule-Platform-Evolvement-Manager]
Closes the first item from #2071 (Canvas test gaps follow-up):
adds behavioural coverage for the shared template-deploy hook that
both TemplatePalette (sidebar) and EmptyState (welcome grid) drive.
10 cases across 4 buckets:
**Happy path (4):**
- preflight ok → POST /workspaces → onDeployed fires with new id
- caller-supplied canvasCoords flows into the POST body
- default coords fall in [100,500) × [100,400) when canvasCoords omitted
- template.runtime is preferred over the resolveRuntime fallback
(locks the deduped-fallback table contract added in #2061)
**Preflight failures (2):**
- network throw sets error AND clears `deploying` (regression test
for the "stranded button" bug called out in the SUT's inline
comment — drop the try block and you'll fail this test)
- not-ok-with-missing-keys opens the modal without firing POST
**Modal lifecycle (2):**
- 'keys added' click retries POST without re-running preflight
(verifies the executeDeploy / deploy split — preflight call count
stays at 1, POST count goes to 1)
- 'cancel' click closes modal without firing POST
**POST failures (2):**
- Error rejection surfaces the message
- non-Error rejection surfaces the "Deploy failed" fallback
Mocks `@/lib/api`, `@/lib/deploy-preflight`, and `@/components/MissingKeysModal`
(stand-in component exposes the two callbacks as test-id buttons —
the real radix modal is irrelevant to this hook's behavior). Test
file follows the `vi.hoisted` + import-after-mocks pattern from
`canvas/src/app/__tests__/orgs-page.test.tsx`.
## Test plan
- [x] All 10 cases pass locally (`vitest run useTemplateDeploy.test.tsx`)
- [x] No changes to the SUT — pure additive coverage
- [ ] CI green
Follow-ups for the rest of #2071 (separate PRs):
- A2AEdge rendering + click-to-select-source
- OrgCancelButton cancel flow + optimistic state
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
[Molecule-Platform-Evolvement-Manager]
## What was breaking
Two distinct failure modes in `.github/workflows/secret-scan.yml`,
both visible after PR #2115 / #2117 hit the merge queue:
1. **`merge_group` events**: the script reads `github.event.before /
after` to determine BASE/HEAD. Those properties only exist on
`push` events. On `merge_group` events both came back empty, the
script fell through to "no BASE → scan entire tree" mode, and
false-positived on `canvas/src/lib/validation/__tests__/secret-formats.test.ts`
which contains a `ghp_xxxx…` literal as a masking-function fixture.
(Run 24966890424 — exit 1, "matched: ghp_[A-Za-z0-9]{36,}".)
2. **`push` events with shallow clone**: `fetch-depth: 2` doesn't
always cover BASE across true merge commits. When BASE is in the
payload but absent from the local object DB, `git diff` errors
out with `fatal: bad object <sha>` and the job exits 128.
(Run 24966796278 — push at 20:53Z merging #2115.)
## Fixes
- Add a dedicated fetch step for `merge_group.base_sha` (mirrors
the existing pull_request base fetch) so the diff base is in the
object DB before `git diff` runs.
- Move event-specific SHAs into a step `env:` block so the script
uses a clean `case` over `${{ github.event_name }}` instead of
a single `if pull_request / else push` that left merge_group on
the empty branch.
- Add an on-demand fetch for the push-event BASE when it isn't in
the shallow clone, plus a `git cat-file -e` guard before the
diff so we fall through cleanly to the "scan entire tree" path
if the fetch fails (correct, just slower) instead of exiting 128.
## Defense-in-depth
`secret-formats.test.ts` had two literal continuous-string fixtures
(`'ghp_xxxx…'`, `'github_pat_xxxx…'`). The ghp_ one matched the
secret-scan regex. Switched both to the `'prefix_' + 'x'.repeat(N)`
pattern already used elsewhere in the same file — runtime value is
the same, but the literal source text no longer matches the regex
even if the BASE detection ever falls back to tree-scan mode again.
## Test plan
- [x] No remaining regex matches in the secret-formats.test.ts source
- [x] YAML structure preserved
- [ ] CI passes on this PR's pull_request scan (was already passing)
- [ ] CI passes on this PR's merge_group scan (the new path)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to #2110 (which generalised pruneStaleKeys to Map<string, T>).
Identified by the simplify reviewer on that PR as the only other
in-tree caller of the same shape: `for (const id of map.keys()) { if
(!liveIds.has(id)) map.delete(id); }`.
Net: -3 lines, one less hand-rolled GC loop. No behaviour change —
the helper does exactly what the inline block did.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Simplify pass on top of #2069 fix:
- Export FALLBACK_POLL_MS from canvas/src/store/socket.ts and import
it as TOMBSTONE_TTL_MS in deleteTombstones.ts. Single source of
truth — tuning one without the other would silently re-open the
hydrate-races-delete window. Required-fix per simplify reviewer.
- Compress deleteTombstones.ts docstring from 30 lines to 10 — keep
the "what + why module-level"; drop the long-form problem
description (issue #2069 carries it).
- Compress canvas.ts call-site comments at removeSubtree (4 lines →
2) and hydrate (2 lines → 2 but tighter).
- Don't reassign the workspaces parameter inside hydrate — use a
const `live` and thread it through the two downstream calls
(computeAutoLayout, buildNodesAndEdges). Same effect, no lint
smell.
- Trim the canvas.test.ts integration-test preamble.
No behaviour change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes#2069. removeSubtree dropped a parent + descendants locally
after DELETE returned 200, but a GET /workspaces request that was
IN-FLIGHT before the DELETE completed could land AFTER and hydrate
the store with a stale snapshot — re-introducing the deleted nodes
on the canvas until the next 10s fallback poll corrected it.
New module canvas/src/store/deleteTombstones.ts holds a transient
process-lifetime Map<id, deletedAt>. removeSubtree calls
markDeleted(removedIds); hydrate calls wasRecentlyDeleted(id) to
filter the incoming workspaces. TTL is 10s — matches the WS-fallback
poll cadence so a single round-trip is covered, after which a
legitimately re-imported id flows through normally.
GC happens lazily at every read AND at write time so the map stays
bounded — no separate timer / interval / unmount plumbing.
Tests:
- canvas/src/store/__tests__/deleteTombstones.test.ts: 7 cases
covering immediate flag, never-marked, TTL boundary (9999ms vs
10001ms), GC-on-read, GC-on-write, re-mark resets timestamp,
iterable input.
- canvas/src/store/__tests__/canvas.test.ts: end-to-end "hydrate
cannot resurrect ids that removeSubtree just dropped (#2069)"
exercises the full chain at the store level.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Simplify pass on top of #2070 fix:
- Rename pruneStaleSubtreeIds → pruneStaleKeys, generalize to
Map<string, T> so the same shape can absorb other keyed-by-node-id
caches (ProvisioningTimeout.tsx tracking map is the obvious next
caller — left as a follow-up to keep this PR scoped).
- Trim the helper docstring to remove implementation-detail rot
(O(map_size), cadence claims). The ref-block comment carries the
rationale where it actually matters (at the call site).
- Add identity-preservation test: survivors must keep their original
Set reference. Guards against a future "rebuild instead of delete"
regression that would silently invalidate downstream === checks.
No behaviour change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes#2070. The Map<rootId, Set<nodeId>> in useCanvasViewport.ts
accumulated entries indefinitely — adds on every successful auto-fit,
never deletes when a root left state.nodes (cascade delete or manual
remove). Operationally invisible until thousands of imports, but the
fix is cheap.
Adds pruneStaleSubtreeIds(map, liveNodeIds) — a pure helper exported
alongside the existing shouldFitGrowing helper, called at the top of
runFit before any read or write to the map. Bounds the map to "roots
present right now" instead of "every root ever auto-fitted in this
session." O(map_size) per fit; runs only at user-driven cadence.
Tests in __tests__/useCanvasViewport.test.ts cover the four cases:
delete-some / no-op / clear-all / never-add.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Simplify pass on top of the canary fix:
- Drop the three CP commit SHAs from comments — issue #2090 covers
the audit trail, SHAs would rot.
- Pull the inline `900` into TLS_TIMEOUT_SEC=$((15 * 60)) so the
bash mirrors the TS side (15 min) at a glance.
- TENANT_HOST extraction now strips http(s) AND any port suffix, so
getent doesn't silently fail on a ws://host:443 style URL.
- sed-redact Authorization/Cookie out of the curl -v dump, defensive
against future callers adding an auth header to this probe.
Pure cleanup; no behaviour change to the happy path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Canary #2090 has been red for 6 consecutive runs over 4+ hours, all
timing out at the TLS-readiness step exactly at the 10-min cap. Time
window correlates with three CP commits that landed today/yesterday
and changed EC2 boot behaviour:
- molecule-controlplane@a3eb8be — fix(ec2): force fresh clone of /opt/adapter
- molecule-controlplane@ed70405 — feat(sweep): wire up healthcheck loop
- molecule-controlplane@4ab339e — fix(provisioner): aggregate cleanup errors
Two changes here, both surgical:
1. Bump the bash-side TLS deadline from 600s to 900s, and the canvas TS
mirror from 10m to 15m. Stays below the 20-min provision envelope
(so a genuinely-stuck tenant still fails loud at the earlier
provision step instead of masquerading as TLS).
2. On TLS-timeout, dump a diagnostic burst before exiting:
- getent hosts $TENANT_HOST (DNS resolution state)
- curl -kv $TENANT_URL/health (TLS handshake + HTTP layer)
The previous failure log was just "no 2xx in N min" with no signal
for which layer was actually broken. After this, the next timeout
tells us whether DNS, TLS handshake, or HTTP layer is the culprit
so the CP root cause can be isolated without speculation.
This is the unblock; a separate molecule-controlplane issue tracks the
underlying regression suspicion.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three files conflicted with staging changes that landed while this PR
sat open. Resolved each by combining both intents (not picking one side):
- a2a_proxy.go: keep the branch's idle-timeout signature
(workspaceID parameter + comment) AND apply staging's #1483 SSRF
defense-in-depth check at the top of dispatchA2A. Type-assert
h.broadcaster (now an EventEmitter interface per staging) back to
*Broadcaster for applyIdleTimeout's SubscribeSSE call; falls through
to no-op when the assertion fails (test-mock case).
- a2a_proxy_test.go: keep both new test suites — branch's
TestApplyIdleTimeout_* (3 cases for the idle-timeout helper) AND
staging's TestDispatchA2A_RejectsUnsafeURL (#1483 regression). Updated
the staging test's dispatchA2A call to pass the workspaceID arg
introduced by the branch's signature change.
- workspace_crud.go: combine both Delete-cleanup intents:
* Branch's cleanupCtx detachment (WithoutCancel + 30s) so canvas
hang-up doesn't cancel mid-Docker-call (the container-leak fix)
* Branch's stopAndRemove helper that skips RemoveVolume when Stop
fails (orphan sweeper handles)
* Staging's #1843 stopErrs aggregation so Stop failures bubble up
as 500 to the client (the EC2 orphan-instance prevention)
Both concerns satisfied: cleanup runs to completion past canvas
hangup AND failed Stop calls surface to caller.
Build clean, all platform tests pass.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Closes a 4+ cycle Canvas tabs E2E flake pattern that's been blocking
staging→main PRs since 2026-04-24+ (#2096, #2094, #2055, #2079, ...).
Root cause: TLS_TIMEOUT_MS=180s (3 min) is too tight for the layered
realities of staging tenant TLS readiness:
1. Cloudflare DNS propagation through the edge (1-2 min typical)
2. Tenant CF Tunnel registering the new hostname (1-2 min)
3. CF edge ACME cert provisioning + cache (1-3 min)
Each layer can add 1-3 min on its own under heavy staging load — the
realistic worst case is well past the 3-min cap.
Provision and workspace-online timeouts were already raised to 20 min
(staging-setup.ts:42-46 history). The TLS gate was the remaining
under-budgeted step. Bumping to 10 min keeps it inside the 20-min
PROVISION envelope so a genuinely-stuck tenant still fails loud at
the earlier provision step rather than masquerading as a TLS issue.
Both call sites raised together:
- canvas/e2e/staging-setup.ts: TLS_TIMEOUT_MS = 10 * 60 * 1000
- tests/e2e/test_staging_full_saas.sh: TLS_DEADLINE += 600
Each carries an inline rationale comment so the next reviewer sees
the layer-by-layer decomposition without re-reading the issue thread.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the canvas-side loop on #2054. Phases 1+2 plumbed
provision_timeout_ms from template manifest → workspace API →
canvas socket → node-data → ProvisioningTimeout resolver. The
template-hermes manifest declares provision_timeout_seconds: 720
(filed as a separate template-repo PR). With that flow live, the
canvas-side hardcoded RUNTIME_PROFILES.hermes entry is redundant.
Removed:
- RUNTIME_PROFILES.hermes (was 720000ms hardcoded in canvas/src/lib/runtimeProfiles.ts)
Doc updates:
- RUNTIME_PROFILES jsdoc explains the map is now empty by design —
new runtimes that need a non-default cold-boot threshold should
declare runtime_config.provision_timeout_seconds in their template
manifest, NOT add an entry here.
Tests updated (3):
- "returns hermes override when runtime = hermes" → "hermes returns
default — value moved server-side post-#2054 phase 3". Asserts
RUNTIME_PROFILES.hermes is undefined.
- The two server-override tests now compare against
DEFAULT_RUNTIME_PROFILE since hermes no longer has a profile entry.
19/19 pass locally. The end-state for hermes:
workspace-server reads template manifest at request time →
workspace API includes provision_timeout_ms: 720000 →
canvas hydrate populates node.data.provisionTimeoutMs →
ProvisioningTimeout resolver picks it up via overrides.
Same effective threshold (720s), now declarative and one-edit-point
per runtime.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
simplify-review note: the |/,-delimited node string is brittle if a
future string-typed field is added without sanitization. Document
which fields are user-typed (name — already sanitized) vs primitive
(id is UUID, runtime is a slug, provisionTimeoutMs is numeric) so
the next field-add doesn't accidentally introduce an injection
vector for the splitter.
Skipped (false-positive review finding): the agent flagged the
prop > runtime-profile order as inconsistent with the docstring,
but the docstring explicitly lists the prop at #2 (between node and
runtime-profile) — matches both the implementation AND the original
behavior pre-#2054 (the prop was 'timeoutMs ?? runtime-profile').
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 of moving runtime UX knobs server-side. Builds the canvas
foundation: a workspace can carry its own provision_timeout_ms
(sourced server-side from a template manifest in a follow-up PR),
and ProvisioningTimeout's resolver respects it per-node.
Today the resolver had Props-level timeoutMs that applied to ALL
nodes — fine for tests but wrong for production where one batch
could mix runtimes (hermes 12-min cold boot alongside docker 2-min).
The runtime profile fallback already handles per-runtime defaults;
this PR adds the per-WORKSPACE override layer above that.
Resolution priority (most specific wins):
1. node.provisionTimeoutMs — server-declared per-workspace
override (this PR's new field)
2. timeoutMs prop — single-threshold test override
3. runtime profile in @/lib/runtimeProfiles
4. DEFAULT_RUNTIME_PROFILE
Changes:
- WorkspaceData (socket): add optional provision_timeout_ms
- WorkspaceNodeData: add optional provisionTimeoutMs
- canvas-topology hydrate: thread the field through to node.data
- ProvisioningTimeout: extend the serialized-string node iteration
to carry provisionTimeoutMs (4-field positional split); pass as
the second arg to provisionTimeoutForRuntime
- 3 new tests in ProvisioningTimeout.test.tsx covering hydrate
threading, null fall-through, and resolver priority
Phase 2 (separate PR, blocked on workspace-server template-config
loader): workspace-server reads provision_timeout_seconds from
template config.yaml at provision time, includes
provision_timeout_ms in the workspace API/socket response. Phase 3
(template-repo PR): template-hermes config.yaml declares
provision_timeout_seconds: 720; canvas RUNTIME_PROFILES.hermes
becomes redundant and can be removed.
19/19 tests pass (3 new + 16 existing).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User reported the canvas threw a generic "API GET /workspaces: 500
{auth check failed}" error when local Postgres + Redis were both
down. Two problems:
1. The error code (500) and message ("auth check failed") said
nothing useful. The actual condition was "platform can't reach
its datastore to validate your token" — a Service Unavailable
class, not Internal Server Error.
2. The canvas had no way to distinguish infra-down from a real
auth bug, so it rendered the raw API string in the same
generic-error overlay it uses for everything.
Fix in two layers:
Server (wsauth_middleware.go):
- New abortAuthLookupError helper centralises all three sites
that previously returned `500 {"error":"auth check failed"}`
when HasAnyLiveTokenGlobal or orgtoken.Validate hit a DB error.
- Now returns 503 + structured body
`{"error": "...", "code": "platform_unavailable"}`. 503 is
the correct semantic ("retry shortly, infra is unavailable")
and the code field is the contract the canvas reads.
- Body deliberately excludes the underlying DB error string —
production hostnames / connection-string fragments must not
leak into a user-visible error toast.
Canvas (api.ts):
- New PlatformUnavailableError class. api.ts inspects 503
responses for the platform_unavailable code and throws the
typed error instead of the generic "API GET /…: 503 …"
message. Generic 503s (upstream-busy, etc.) keep the legacy
path so existing busy-retry UX isn't disrupted.
Canvas (page.tsx):
- New PlatformDownDiagnostic component renders when the
initial hydration catches PlatformUnavailableError.
Surfaces the actual condition with operator-actionable
copy ("brew services start postgresql@14 / redis") +
pointer to the platform log + a Reload button.
Tests:
- Go: TestAdminAuth_DatastoreError_Returns503PlatformUnavailable
pins the response shape (status, code field, no DB-error leak)
- Canvas: 5 tests for PlatformUnavailableError classification —
typed throw on 503+code match, generic-Error fallback for
503-without-code (upstream busy), 500 stays generic, non-JSON
body falls back to generic.
1015 canvas tests + full Go middleware suite pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The actual cause-fix for the staging-tabs E2E saga (#2073/#2074/#2075).
Old behaviour: ANY 401 from any fetch on a SaaS tenant subdomain
called redirectToLogin → window.location.href = AuthKit. This is
wrong. Plenty of 401s don't mean "session is dead":
- workspace-scoped endpoints (/workspaces/:id/peers, /plugins)
require a workspace-scoped token, not the tenant admin bearer
- resource-permission mismatches (user has tenant access but not
this specific workspace)
- misconfigured proxies returning 401 spuriously
A single transient one of those yanked authenticated users back to
AuthKit. Same bug yanked the staging-tabs E2E off the tenant origin
mid-test for 6+ hours tonight, leading to the cascade of test-side
mocks (#2073/#2074/#2075) that worked around the symptom without
fixing the cause.
This PR fixes it at the source. The new logic:
- 401 on /cp/auth/* path → that IS the canonical session-dead
signal → redirect (unchanged)
- 401 on any other path with slug present → probe /cp/auth/me:
probe 401 → session genuinely dead → redirect
probe 200 → session fine, endpoint refused this token →
throw a real Error, caller renders error state
probe network err → assume session-fine (conservative) →
throw real Error
- slug empty (localhost / LAN / reserved subdomain) → throw
without redirect (unchanged)
The probe adds one extra fetch on a 401, only when slug is set
and the path isn't already auth-scoped. That's rare and
worthwhile — a transient probe round-trip is cheap; an unwanted
auth redirect is a UX disaster.
Tests:
- api-401.test.ts rewritten with the full matrix:
* /cp/auth/me 401 → redirect (no probe, that IS the signal)
* non-auth 401 + probe 401 → redirect
* non-auth 401 + probe 200 → throw, no redirect ← the fix
* non-auth 401 + probe network err → throw, no redirect
* empty slug paths (localhost/LAN/reserved) → throw, no probe
- 43 tests in canvas/src/lib/__tests__/api*.test.ts all pass
- tsc clean
The staging-tabs E2E spec's universal-401 route handler stays as
defense-in-depth (silences resource-load console noise + guards
against panels without try/catch), but the comment now describes
its role honestly: api.ts is the primary fix, the route is the
safety net.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After #2074, the staging-tabs spec stopped failing on the auth-redirect
locator timeout (good — the broadened 401-mock works) but started
failing on a different aggregate check:
Error: unexpected console errors:
Failed to load resource: the server responded with a status of 404
Failed to load resource: the server responded with a status of 404
Failed to load resource: the server responded with a status of 404
Browser console messages for resource-load failures omit the URL,
so the message is uninformative on its own — we can't filter
selectively (e.g. "is this a missing-CSS noise or a real broken
endpoint?"). The previous filter list (sentry/vercel/WebSocket/
favicon/molecule-icon) catches specific known-noisy strings but
this generic "Failed to load resource" doesn't contain any of them.
Two changes:
1. Add page.on('requestfailed') + page.on('response>=400') logging
to capture the URL of any failed request. Logs to test stdout
(visible in the workflow log) — leaves a breadcrumb so a real
bug isn't completely hidden when we filter the generic message.
2. Add "Failed to load resource" to the filter list. With (1) in
place we still see the URLs for diagnosis; the generic console
message is just noise.
Real JS exceptions (panel crash, undefined access, etc.) come with
a file path and stack trace and aren't matched by either filter,
so the gate still catches actual bugs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
#2073 caught workspace-scoped 401s but missed non-workspace paths.
SkillsTab.tsx alone fetches /plugins and /plugins/sources, both
outside the /workspaces/<id>/* tree. Either of those 401s with the
tenant admin bearer in SaaS mode → canvas/src/lib/api.ts:62-74
redirects to AuthKit → page navigates away mid-test → next locator
times out.
Same failure signature observed at 16:03Z post-#2073 merge:
e2e/staging-tabs.spec.ts:45:7 › tab: skills
TimeoutError: locator.scrollIntoViewIfNeeded: Timeout 5000ms
- navigated to "https://scenic-pumpkin-83.authkit.app/?..."
Broaden the route to "**" with `request.resourceType() !== "fetch"`
short-circuit (preserves HTML/JS/CSS pass-through) and a
/cp/auth/me skip (the dedicated mock above wins). Same 401 →
empty-body conversion logic; just a wider net.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bundle review of pieces 1/2/3 surfaced two critical issues plus a
handful of required + optional fixes. All addressed.
Critical:
1. Migration 043 was missing 'paused' and 'hibernated' from the
workspace_status enum. Both are real production statuses written
by workspace_restart.go (lines 283 and 406), introduced by
migration 029_workspace_hibernation. The original `USING
status::workspace_status` cast would have errored mid-transaction
on any production DB containing those values. Added both. Also
added `SET LOCAL lock_timeout = '5s'` so the migration aborts
instead of stalling the workspace fleet behind a slow SELECT.
2. The chat activity-feed window kept only 8 lines, and a single
multi-tool turn (Read 5 files + Grep + Bash + Edit + delegate)
easily flushed older context before the user could read it.
Extracted appendActivityLine to chat/activityLog.ts with a
20-line window AND consecutive-duplicate collapse (same tool
on the same target twice in a row is noise, not new progress).
5 unit tests pin the behavior.
Required:
3. The SDK wedge flag was sticky-only — a single transient
Control-request-timeout from a flaky network blip locked the
workspace into degraded for the whole process lifetime, even
when the next query() would have succeeded. Added
_clear_sdk_wedge_on_success(), called from _run_query's success
path. The next heartbeat after a working query reports
runtime_state empty and the platform recovers the workspace to
online without a manual restart. New regression test.
4. _report_tool_use now sets target_id = WORKSPACE_ID for self-
actions, matching the convention other self-logged activity
rows use. DB consumers joining on target_id see a well-defined
value instead of NULL.
Optional taken:
5. Tightened _WEDGE_ERROR_PATTERNS from "control request timeout"
to "control request timeout: initialize" — suffix-anchored so a
future SDK error on an in-flight tool-call control message
doesn't get misclassified as the unrecoverable post-init wedge.
6. Dropped the redundant "context canceled" substring fallback in
isUpstreamBusyError. errors.Is(err, context.Canceled) is the
typed check; the substring would also match healthy client-side
aborts, which we don't want classified as upstream-busy.
Verified: 1010 canvas tests + 64 Python tests + full Go suite pass;
migration applies cleanly on dev DB with all 8 enum values; reverse
migration restores TEXT.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two halves of the same UX win — the user wants to see what Claude is
doing while a chat reply is in flight instead of staring at "0s" for
minutes.
Workspace side (claude_sdk_executor.py):
- The executor's _run_query message loop already iterated the SDK
stream for AssistantMessage.TextBlock content. Now also detects
ToolUseBlock / ServerToolUseBlock entries (by class name, since
the conftest stub doesn't define them) and fires-and-forgets a
POST /workspaces/:id/activity row of type agent_log per tool use.
- _summarize_tool_use maps the common tools (Read, Write, Edit,
Bash, Glob, Grep, WebFetch, WebSearch, Task, TodoWrite) to a
one-line summary with the file path / pattern / command, falling
back to "🛠 <tool>(…)" for anything else. Truncated at 200 chars.
- Posts directly to /workspaces/:id/activity rather than going
through a2a_tools.report_activity, which would also push a
/registry/heartbeat current_task and double-log as a TASK_UPDATED
line in the same chat feed.
- All failures swallowed silently — telemetry must not break
the conversation.
Canvas side (ChatTab.tsx):
- The existing ACTIVITY_LOGGED handler streams a2a_send /
a2a_receive / task_update events into a sliding-window
activityLog state. Two issues fixed:
1. No `msg.workspace_id === workspaceId` filter — a sibling
workspace's a2a_send was leaking into the wrong chat
panel as "→ Delegating to X...". Added an early return.
2. No agent_log render branch. Added one that renders the
summary verbatim (the workspace already prefixed its
own emoji icon, so no double-icon).
- Existing 8-line sliding window keeps the UI scoped; older
progress lines naturally roll off as new ones arrive.
Result: when DD is delegating to Visual Designer + reading
config files + running Bash to lint, the spinner area shows:
📄 Read /configs/system-prompt.md
⚡ Bash: pnpm test
→ Delegating to Visual Designer...
← Visual Designer responded (47s)
instead of bare "0s · Processing with Claude Code..." for minutes.
63 Python tests + 58 canvas chat tests pass; tsc clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The staging-tabs E2E has been failing for 6+ hours on the same
locator timeout — diagnosed earlier today as the canvas's
lib/api.ts:62-74 redirect-on-401 path firing mid-test:
e2e/staging-tabs.spec.ts:45:7 › tab: skills
TimeoutError: locator.scrollIntoViewIfNeeded: Timeout 5000ms
- navigated to "https://scenic-pumpkin-83.authkit.app/?..."
Several side-panel tabs (Peers, Skills, Channels, Memory, Audit,
and anything workspace-scoped) hit endpoints under
`/workspaces/<id>/*` that require a workspace-scoped token, NOT
the tenant admin bearer the test uses. The endpoints respond 401
in SaaS mode. canvas/src/lib/api.ts:62-74 reacts to ANY 401 by
setting `window.location.href` to AuthKit — yanking the page off
the tenant origin mid-test.
The test comment at line 18 already acknowledged the 401 class
("Peers tab: 401 without workspace-scoped token") but assumed
those would surface as "errored content" rather than a hard
navigation. The redirect logic in api.ts was added later and
breaks the assumption.
Fix: add a Playwright route handler that catches any 401 from
`/workspaces/<id>/*` paths and replaces with `200 + empty body`.
Body shape is best-effort by URL — list endpoints (paths not
ending in a UUID-shaped segment) get `[]`, single-resource
endpoints get `{}`. Both are valid JSON and well-written panels
render an empty state for either rather than crashing.
The two route patterns (`/workspaces/...` and `/cp/auth/me`)
don't overlap — the existing `/cp/auth/me` mock continues to
gate AuthGate's session check independently.
Verification:
- Type-check passes (tsc clean for the spec; pre-existing errors
in unrelated test files unchanged)
- Can't run staging E2E locally without CP admin token; CI will
exercise the real path against the freshly-provisioned tenant
- E2E Staging SaaS (full lifecycle) is currently green at 08:07Z,
confirming the underlying staging infra works — the failures
have been narrowly in this Playwright-tabs spec
Targets staging per molecule-core convention.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three required fixes from the bundle review of 391e1872:
1. workspace/a2a_client.py: substring `type_name in msg` could miss
the diagnostic prefix when an exception's message embedded a
different class name mid-string (e.g. `OSError("see ConnectionError
below")` → printed as plain msg, type lost). Switched to a
prefix-anchored check (`msg.startswith(f"{type_name}:")` etc.) so
the type label is always added when not already at the start of
the message.
2. workspace/a2a_tools.py: `activity_logs.error_detail` is unbounded
TEXT on the platform (handlers/activity.go does not validate
length). A buggy or hostile peer could stream arbitrarily large
error messages into the caller's activity log. Cap at 4096 chars
at the producer — comfortably above any real exception traceback,
well below an obvious-DoS threshold.
3. New regression test for JSON-RPC `code=0` — pins the
`code is not None` semantics so the code is preserved in the
detail rather than collapsing into the no-code path. Code=0 is
not valid per the spec, but a malformed peer can still emit it
and we want it visible for diagnosis.
Plus one optional taken: extracted the A2A-error → hint mapping into
canvas/src/components/tabs/chat/a2aErrorHint.ts. The two prior copies
(AgentCommsPanel.inferCauseHint + ActivityTab.inferA2AErrorHint) had
already drifted — Activity tab gained `not found`/`offline` cases the
chat panel never picked up, AgentCommsPanel handled empty-input
explicitly while Activity didn't. The shared module is the merged
superset, with 10 unit tests pinning each named pattern + the
"most specific first" ordering (Claude SDK wedge wins over generic
timeout).
Skipped (per analysis):
- Unicode-naive 120-char slice — Python str[:N] slices on code
points, not bytes. Safe.
- Nested [A2A_ERROR] confusion — non-issue per reviewer; outer
prefix winning still produces a structured render.
- MessagePreview + JsonBlock dual render on errors — intentional
drilldown; raw JSON is below the fold for operators who need it.
- console.warn dedup — refetches don't happen per-event so spam
risk is low.
- str(data)[:200] materialization — A2A response bodies aren't
typically MB-sized.
Verified: 1005 canvas tests pass (10 new hint tests); 10 Python
send_a2a_message tests pass (1 new for code=0); tsc clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Symptom: Activity tab and Agent Comms surfaced bare "[A2A_ERROR] "
(prefix + nothing) for failed delegations. Operator had no signal
to act on — no exception type, no target, no hint about what went
wrong, no next step. Fix is in three layers.
1. workspace/a2a_client.py — every error path now produces an
actionable detail string:
- except branch: some httpx exceptions (RemoteProtocolError,
ConnectionReset variants) stringify to "". Pre-fix the catch
was `f"{_A2A_ERROR_PREFIX}{e}"` → bare prefix. Now falls back
to `<TypeName> (no message — likely connection reset or silent
timeout)` and always appends `[target=<url>]` for traceability
in chained delegations.
- JSON-RPC error branch: previously dropped error.code on the
floor and printed "unknown" when message was missing. Now
surfaces both, including the well-defined "JSON-RPC error
with no message (code=N)" path.
- "neither result nor error" branch: pre-fix returned
str(payload) which the canvas rendered as a successful
response block. Now tagged as A2A_ERROR with a payload
snippet so downstream UI routes through the error path.
2. workspace/a2a_tools.py — tool_delegate_task now passes
error_detail (the stripped error message) through to the
activity-log POST. The platform's activity_logs.error_detail
column is the canvas's red error chip source; populating it
makes the failure visible in the row header without the user
having to expand into raw response_body JSON. The summary line
also gets a 120-char prefix of the cause so the collapsed row
reads "React Engineer failed: ConnectionResetError: ... [target=...]"
instead of "React Engineer failed".
3. canvas/src/components/tabs/ActivityTab.tsx — MessagePreview
now detects [A2A_ERROR]-prefixed bodies and renders a
structured error block (red chip, stripped detail, cause hint)
instead of the previous gray text-block that showed the literal
"[A2A_ERROR]" string. inferA2AErrorHint mirrors the patterns
from AgentCommsPanel.inferCauseHint so the same symptom reads
the same way in both surfaces (Claude SDK init wedge → restart
workspace; timeout → busy/stuck; connection-reset → transient
blip then check logs).
Tests: 9 send_a2a_message tests pass (including a new regression
test for the empty-stringifying-exception case that the user
reported); 995 canvas tests pass; tsc clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reported symptom: canvas edges show "1 call · just now" between two
agents, but the Agent Comms tab for the source workspace renders
"No agent-to-agent communications yet" — even though
GET /workspaces/<id>/activity?source=agent&limit=50 returns a2a_send
+ a2a_receive rows.
Confirmed via curl that the API does return the rows the panel
should map. The panel's load handler was the suspect, but it had:
.catch(() => setLoading(false))
which swallowed every failure path — network errors, JSON parse,
ANY throw inside the .then body — without leaving a single trace in
the console. The panel just sat on its empty state and gave the user
zero signal to act on. (And by extension, gave us nothing to debug
remotely either.)
Two changes:
1. Wrap the per-row `toCommMessage` call in a try/catch so one
malformed activity row (unexpected request_body shape, etc.)
doesn't throw out of the for-loop and skip the
setMessages(msgs) line. Previously the panel would silently
drop the entire batch when ANY row failed to parse.
2. Replace the bare `.catch(() => setLoading(false))` with a
logging variant. Now a future "panel stuck empty" report comes
with `AgentCommsPanel: load activity failed <err>` or
`AgentCommsPanel: failed to map activity row {...}` in the
console — diagnosable instead of opaque.
Behavior on the happy path is unchanged (5 existing tests still
pass; tsc clean). This is purely defensive: it makes the failure
path visible so the next stuck-empty report can be root-caused
instead of guessed at.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bundle-level review caught an implicit coupling in useCanvasViewport
between two distinct fit effects:
- settle fit: 1200ms one-shot when provisioning transitions to zero
(deploy just finished — settle on the whole org once)
- tracking fit: 500ms debounced per molecule:fit-deploying-org event
(track the org's bounds as children land during the deploy)
Both effects shared a single autoFitTimerRef, so each one's
clearTimeout call could silently cancel the other's pending fit.
Today's behavior happened to land in the right order out of luck —
the tracking handler fires per-arrival during the deploy, then the
settle effect arms after the last child completes. But nothing in
the code enforces that ordering; a future refactor that, say,
fires the settle effect from the same event sequence as the
tracking timer (mid-deploy status flicker) would silently drop the
settle fit because the tracking timer's clearTimeout ran last.
Splitting into settleFitTimerRef + trackingFitTimerRef makes the
two effects fully independent. Cleanup clears both. Tests still pass
(995/995); the refactor is mechanical.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three review-driven fixes plus regression coverage for the bugs
landed in 176b703d / deedb5ef:
1. clearTimeout the prior reload handle before scheduling a new one in
both installFromSource and handleUninstall. Two installs within the
PLUGIN_RELOAD_DELAY_MS window (15s) used to queue two
loadInstalled() calls; the unmount cleanup only cleared the latest
handle, and the second reconciliation could overwrite a still-
correct optimistic state with a stale snapshot mid-restart.
2. Drop `setInstalledLoaded(true)` from the optimistic block. That
flag's contract is "the initial GET has succeeded at least once" —
it gates the auto-expand-registry effect. A user installing a
custom-source plugin BEFORE the initial fetch returned would flip
the gate prematurely, the auto-expand would never fire, and a
followup loadInstalled racing with the optimistic write could
overwrite our entry with [] mid-restart.
3. Don't force `supported_on_runtime: true` on the optimistic record.
The "inert on this runtime" badge in the row renders on the value
`=== false`. Forcing true would hide the badge for 15s if the user
installed a plugin that doesn't actually support the workspace's
runtime; the real value lands at refetch. Leaving the field
undefined keeps the badge neutral until reconciliation arrives.
Plus a behavioral test (SkillsTab.install.test.tsx) that asserts:
- the install POST URL contains the workspaceId (not "undefined")
- the row's "Install" button is replaced by the green "Installed"
tag synchronously after POST resolves, without advancing any
timer — locks in the optimistic-update contract so a future
refactor can't silently regress it.
995 canvas tests pass (2 new); tsc clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After clicking Install, the button reverted from "Installing..." → "Install"
the moment the POST returned, then sat there for ~15s before the green
"Installed" tag appeared. The 15s gap is PLUGIN_RELOAD_DELAY_MS — we
delay the GET /workspaces/:id/plugins refetch to wait for the workspace
to restart (the listing handler returns [] while the container is
restarting because findRunningContainer comes up empty).
Uninstall already does optimistic local-state mutation (line 244 prior
to this commit) so the green tag → install button transition is
instant. Install was the inconsistent half — push the registry entry
into `installed` immediately after POST returns 200 and let the
delayed refetch reconcile.
The optimistic record uses the registry entry's metadata (name,
version, description, tags, runtimes, skills) and sets
supported_on_runtime=true. If reconciliation later disagrees (server
filter, install actually failed at the runtime layer), the refetch
overwrites the local record. Worst case is a brief 15s window where
we show "Installed" for a plugin that won't load — same window the
user previously experienced as "stuck on Install button" — but flipped
to the correct expected state.
Custom-source installs (github://, etc.) don't have a registry entry
to use, so they keep the old behavior of waiting for the refetch. Most
users install from the registry list in the UI.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SkillsTab read \`data.id\` from its props and used the value to build
two API URLs:
POST /workspaces/\${data.id}/plugins
DELETE /workspaces/\${data.id}/plugins/\${pluginName}
But \`data\` is the React Flow node.data blob (WorkspaceNodeData) —
the workspace id lives on \`node.id\`, NOT on \`node.data\`. WorkspaceNodeData
extends \`Record<string, unknown>\`, which makes \`data.id\` type-check
silently as \`unknown\` instead of erroring. So every install/uninstall
hit \`/workspaces/undefined/plugins\`, the server's not-found path
returned 503 "workspace container not running" (misleading — the real
issue was the bogus URL), and the user got a confusing toast.
Every other tab in SidePanel takes \`workspaceId={selectedNodeId}\` as
an explicit prop. SkillsTab was the lone outlier, presumably because
"data has all the fields I need" is the obvious-looking shortcut that
TypeScript can't catch through the index-signature interface.
Fix: make \`workspaceId\` an explicit prop on SkillsTab, drop the
\`data.id\` reads, thread the prop from SidePanel like the other tabs.
Test fixture updated to pass it.
Verified: 993 canvas tests pass; tsc clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Independent code review surfaced two required documentation fixes and
one growth-correctness gap. All addressed here.
Auto-fit gate (useCanvasViewport):
The previous "subtree-grew-by-count" check missed the delete-then-add
case: subtree of 6 → delete one → 5 → a different child arrives → 6
again. A length-only comparison reads no growth and the fit is
skipped, leaving the new node off-screen. Switched to an id-set
membership snapshot so any brand-new id forces the fit even when the
count is unchanged.
The gate logic is now extracted as a pure exported function
`shouldFitGrowing(currentIds, prevIds, userPannedAt, lastAutoFitAt)`
so the regression-prone decision can be unit-tested in isolation
without standing up React Flow + DOM event refs. 8 cases cover:
first-fit, empty-prior, brand-new id, status-update with user pan,
no-pan-ever, pan-before-last-fit, delete-then-add same length, and
shrink-only with user pan.
Parser parity (dotenv.go + next.config.ts):
Existing-env semantics were undocumented in both parsers. Both now
explicitly note that an explicitly-set empty string (`KEY=` from the
parent shell) counts as "set" — the file value does NOT backfill —
matching the Go (os.LookupEnv) and Node (`process.env[k] !==
undefined`) primitives.
`export ` prefix uses a literal space; `export\tFOO=bar` is
intentionally rejected. Added the same comment in both parsers
to lock in this parity invariant since the commit message claims
"if one parser changes, the other has to."
Skipped (per analysis):
- Drag-pan respect for left-click drag-pan during deploy. The
growth-check safety net means any pan gets overridden on the
next arrival anyway, which is the desired behavior for the
"watch the org deploy" use case. After deploy completes, no
more fit-deploying-org events fire so drag-pan works freely.
- Map cleanup for lastFitSubtreeIdsRef. Per-tab session, UUID
keys, tiny entries — not worth the cleanup hook.
993 canvas tests pass (8 new); Go dotenv tests pass; tsc clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Symptom: org import zoomed to fit the parent + first child, then froze
at that framing while the remaining children kept materialising
off-screen. The user had to manually pan/zoom to see the new arrivals.
Two stacked bugs in useCanvasViewport's deploy-time auto-fit:
1. The user-pan-respect gate stamps userPannedAtRef on EVERY
pointerdown that lands inside .react-flow__pane. That fires for
ordinary clicks (deselect, click-near-a-card, modal-close-bubble
from the import dialog) — not just for actual pan gestures. One
accidental pre-import click was enough to lock out every fit for
the rest of the deploy. Wheel is the canonical unambiguous
pan/zoom signal; drop pointerdown.
2. Even with a real pan during deploy, when more children land the
org's bounds grow and the user has lost context — the new
arrivals are off-screen and the deploy is the primary thing they
want to watch right now. The guard had no growth awareness, so
one pan cancelled all follow-up fits unconditionally. Now we
track the subtree size at the last fit (per root), and if the
current subtree is larger we force the fit through regardless of
the user-pan timestamp. When the subtree size hasn't changed
(status updates on already-positioned nodes), the user-pan
respect still applies — so post-deploy exploration isn't
yanked back.
The Map keyed by root id supports back-to-back imports of different
orgs without one's growth count blocking the other's first fit.
985 canvas tests pass; tsc clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Symptom: spawn animation missing on org import. Workspaces appeared in
their final positions all at once instead of materialising one-by-one.
Root cause: the WS pill said "Reconnecting" forever because the canvas
was trying to connect to ws://localhost:3000/ws — its own port, where
Next.js dev doesn't serve a WebSocket — instead of the platform's
ws://localhost:8080/ws.
Why: deriveWsBaseUrl() falls back to window.location when
NEXT_PUBLIC_WS_URL is unset. Next.js auto-loads .env from the project
root only — and the canonical NEXT_PUBLIC_WS_URL /
NEXT_PUBLIC_PLATFORM_URL live in the monorepo root .env, alongside the
Go platform's MOLECULE_ENV / DATABASE_URL. Without an extra
canvas/.env.local copy (which would still be a per-developer manual
step), the canvas dev server starts blind to those vars.
Fix: next.config.ts now walks upward from __dirname looking for the
monorepo root (same workspace-server/go.mod sentinel the platform's
dotenv loader uses) and merges the root .env into process.env BEFORE
Next.js compiles. Existing env wins over file values, so docker
runs / CI / explicit exports still dominate.
The parser is a TypeScript mirror of workspace-server/cmd/server/
dotenv.go's parseDotEnvLine — same rules (export prefix, quotes,
inline comments, BOM) so a single .env line behaves identically across
both processes. If one parser changes, the other has to.
Production unaffected: `output: "standalone"` bakes resolved env into
the build, the workspace-server sentinel isn't shipped in deploy
artifacts, and the existing-env-wins rule means container env
dominates anywhere this file is consulted at runtime.
Verified: canvas dev startup log now shows
"[next.config] loaded 49 vars from /Users/.../molecule-core/.env";
served bundle has the correct ws://localhost:8080/ws URL; WS pill
flips to "Connected" after a hard refresh and per-workspace spawn
animations fire on the next org import as expected.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Independent code review surfaced three required fixes and one cheap
optional one. All addressed here.
dotenv parser:
- `export FOO=bar` was parsed as key `"export FOO"` (with embedded
space) and silently os.Setenv'd, so a developer pasting from a
direnv `.envrc` would get junk vars. Now strips the prefix.
- Quoted values weren't unwrapped: `FOO="hello world"` produced value
`"hello world"` with literal quotes. Now strips one matched pair of
surrounding `"` or `'`. Inside a quoted value `#` is part of the
value, not a comment marker (matches godotenv convention).
- UTF-8 BOM at file start (Windows editors) would have produced a
first key like U+FEFF + "FOO". Now stripped via TrimPrefix.
dotenv loader:
- findDotEnv()'s upward walk would happily pick up `~/.env` or a
sibling-repo `.env` if the binary was run from `~/Documents/other-
project/`. Real foot-gun on shared dev boxes. Now gated on a
monorepo sentinel: the candidate directory must contain
`workspace-server/go.mod`. Falls through to "no .env found" (=
pre-fix behavior) when the sentinel is absent.
socket fallback poll:
- startFallbackPoll() previously fired only on onclose, so the very
first connect attempt — when onclose hasn't fired yet because we
never had a successful onopen — left the canvas with no HTTP poll
for the duration of the failing handshake (Chrome can hold a
SYN-SENT WebSocket open ~75s before giving up). Now also called at
the top of connect(); the timer-already-running guard makes it a
no-op when one cycle later onclose calls it again.
Test coverage added: export prefix, single+double quoted values, hash
inside quotes preserved, unterminated quote falls back to bare value,
CRLF stripping locked in, BOM stripping, and a sentinel-rejection
regression test that creates a temp .env with no workspace-server
sibling and asserts findDotEnv refuses to load it.
Verified: 985 canvas tests + 30 dotenv subtests + 4 dotenv integration
tests all pass; tsc clean; rebuilt platform from monorepo root with
stripped env still loads .env (49 vars) and /workspaces returns 200.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Deleting a parent on a wedged WS used to leave the child cards on
the canvas as orphaned roots until the user manually refreshed.
Why: Canvas.tsx and DetailsTab.tsx both called `removeNode(parentId)`
after `DELETE /workspaces/:id?confirm=true` returned 200. `removeNode`
deliberately re-parents children rather than cascading — it relies on
the per-descendant WORKSPACE_REMOVED WS events the platform emits as
part of the cascade to drop each child individually. When the WS is
unhealthy those events never arrive, so the local store keeps the
children alive (now re-parented to root since their actual parent is
gone).
Fix: new `removeSubtree(rootId)` action on the canvas store mirrors
the server-side cascade — drops the root + every descendant + every
incident edge in one atomic set(). Both delete call sites now use it.
The WS events still arrive when WS is healthy and become idempotent
no-ops because the nodes are already gone.
Why a new action instead of changing removeNode: removeNode's
re-parenting behavior is correct for non-cascading flows (drag-out,
manual node detach in the future). Adding a sibling action keeps
both call shapes available rather than forcing every caller to opt
out of cascade.
6 new unit tests cover root cascade, mid-level cascade, leaf
no-op-cascade, selection clearing across the subtree, selection
preservation outside the subtree, and edge cleanup.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Seventh E2E bug, surfaced after the AuthGate mock from the previous
commit finally let the harness reach the tab-iteration loop:
Error: tab-skills button missing — TABS list may have drifted
Locator: locator('#tab-skills')
The TABS bar in SidePanel is `overflow-x-auto` (intentional — there
are 13 tabs and they don't all fit on smaller viewports; the
right-edge fade gradient signals the overflow). Tabs after position
~3 are clipped, and Playwright's `toBeVisible()` returns false for
clipped elements (it checks getBoundingClientRect against viewport).
Fix: `scrollIntoViewIfNeeded()` before the visibility assertion,
mirroring what SidePanel's own keyboard handler does on arrow-key
navigation. The tab is then in view and `toBeVisible()` passes.
This was the test's 7th and (probably) final harness bug. The
chain mapping all the way from "staging E2E timed out at 1200s"
this morning:
1. instance_status field name (#2066)
2. staging.moleculesai.app DNS zone (#2066)
3. X-Molecule-Org-Id TenantGuard header (#2066)
4. Hydration selector waited pre-click (#2066)
5. networkidle never settles (this PR's parent commits)
6. AuthGate /cp/auth/me redirect
7. Tab buttons clipped by overflow-x-auto
If THIS run still fails, the failure surfaces in actual product
behavior (a tab's panel content), not test mechanics.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two related fixes for the case where the canvas thinks workspaces are
stuck provisioning when they're actually online:
1. ProvisioningTimeout banners now gate on wsStatus === "connected".
While the WS is in connecting/disconnected state, the local
"provisioning" status reflects the last event received before the
drop — workspaces may have transitioned to online minutes ago. The
8m timeout was firing against frozen state and showing a wall of
yellow warnings on already-online workspaces.
2. Socket layer now starts a 10s rehydrate poll when the WS goes
unhealthy (onclose) and stops it on onopen/disconnect. The
reconnect attempts continue in parallel; whichever recovers first
wins. rehydrate()'s existing dedup gate prevents the open-time
rehydrate from racing with a fallback poll. Without this the
store could stay frozen for minutes while WS exponential backoff
chewed through retries.
Plus the previously-uncommitted TemplatePalette flushSync change so
the import modal unmounts synchronously before doImport runs (otherwise
React batches the close with the import's setState prefix and the
modal backdrop hides the spawn animation).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sixth E2E bug, surfaced after the page.goto-domcontentloaded fix
finally let the navigation complete. The harness now reaches the
canvas-root selector wait but still times out because the canvas
never renders:
TimeoutError: page.waitForSelector: Timeout 45000ms exceeded.
waiting for [aria-label="Molecule AI workspace canvas"]
Root cause: canvas/src/components/AuthGate.tsx wraps the page,
fetches /cp/auth/me on mount, and redirects to the login page when
the response is 401. The bearer header we set via
context.setExtraHTTPHeaders works for platform API calls but does
NOT satisfy /cp/auth/me — that endpoint is cookie-based (WorkOS
session). So:
1. AuthGate mounts
2. Calls fetchSession() → /cp/auth/me → 401 (no session cookie)
3. AuthGate transitions to anonymous → redirectToLogin()
4. Browser navigates away from tenant URL
5. The React Flow canvas root with the aria-label never mounts
6. waitForSelector times out at 45s
Fix: context.route() intercepts /cp/auth/me and returns a fake
Session JSON so AuthGate resolves to "authenticated" and renders
its children. The session contents are cosmetic — Session.org_id
and Session.user_id appear in a few canvas surfaces but never fail
on dummy values.
This is the cleanest fix path. Alternatives considered + rejected:
- Add a ?e2e=1 backdoor to AuthGate: production code shouldn't
have a "skip auth" flag, even gated.
- Real WorkOS login flow in Playwright: too much overhead per run.
- Skip the canvas UI test, test only API: defeats the point of
the staging E2E (which is to catch UI regressions before
promotion).
After this lands the harness should reach the workspace-node click
step and exercise tabs — only then can a real product bug (rather
than a test-harness bug) surface. The 6-bug chain mapped to:
1. instance_status field name (#2066)
2. staging.moleculesai.app DNS zone (#2066)
3. X-Molecule-Org-Id TenantGuard header (#2066)
4. Hydration selector waited pre-click (#2066)
5. networkidle never settles (this commit's parent)
6. AuthGate /cp/auth/me redirect (this commit)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Single-themed bundle of fixes accumulated while polishing the canvas
chat / agent-comms / plugins / position flows. Each piece is small;
the connective tissue is "things observable from the canvas right
panel and the org-deploy flow that surprised real users".
UI / composer
- Legend: add close X + persisted-localStorage state + reopener
pill; default open for first-time users.
- SidePanel: rename "Skills" tab label → "Plugins" (single-line;
internal panelTab enum value, component name, and store keys
unchanged).
- SkillsTab: registry tri-state UI (loading / error / empty) with
actionable Retry button + 10s explicit fetch timeout. Handle
AbortSignal.timeout's DOMException by name (TimeoutError /
AbortError) — Chromium's "signal timed out" message wouldn't
match the prior naive /timeout/ regex. Reset mountedRef on every
mount: pre-existing StrictMode dev-mode bug where cleanup-only
`current = false` was never re-set, permanently wedging every
`if (mountedRef.current) setX(...)` guard and producing a
"Loading…" panel that never resolved on hard refresh.
- ChatTab: paste-image-from-clipboard via onPaste handler; unique
monotonic-counter filenames so same-second pastes don't collide
on name+size dedup. mime→ext map avoids `image/svg+xml`-style
raw extensions on synthesised filenames. Bypasses the
DataTransfer constructor so Safari < 14.1 / older Edge work.
- ChatTab: drop stuck error toast when the WS path already
delivered the agent reply but the HTTP path errored late
(sendingFromAPIRef gate now covers the .catch() handler).
- ChatTab: filter heartbeat-style internal self-messages from the
My Chat tab so historical rows with source_id=NULL don't
surface as user-typed input.
- Modal portals: OrgImportPreflightModal + MissingKeysModal
(ProviderPickerModal + AllKeysModal) now createPortal to
document.body and clamp max-h to 80vh. Escapes the ancestor
containing block (TemplatePalette's fixed+filtered sidebar
re-anchored descendants' position:fixed to itself, hiding
modals behind workspace cards). MissingKeysModal bumped to
z-[60] for stack ordering when both modals are open.
- OrgImportPreflightModal saveOne: ref-based microtask-safe
in-flight gate replaces the brittle "set startValue inside a
setState updater and read on the next line" pattern (React 18
doesn't guarantee functional updaters run synchronously; that
path strands `saving:true` and never calls createSecret). Same
useRef pattern guards SkillsTab.loadRegistry against concurrent
fires and Fast-Refresh-stranded promises; force=true parameter
on retry click bypasses the gate.
Agent comms
- AgentCommsPanel: derive UI-facing `flow` field instead of using
activity_type-derived direction. Self-logged a2a_receive rows
(source_id == workspace_id, what the agent runtime writes to log
its own outbound delegation replies) now correctly render as
OUTBOUND with → arrow + right-justified bubble. Previously they
rendered "← From Self" with Restart pointing at THIS workspace.
- AgentCommsPanel: error rows replace the unactionable
"X failed [A2A_ERROR]" body with banner + underlying-error
code-block + cause-hint (matched on Claude Code SDK init wedge,
deadline-exceeded, agent-thrown exception, empty-error) +
Restart [peer] / Open [peer] action buttons.
- AgentCommsPanel: render text bodies through ReactMarkdown +
remark-gfm so multi-part replies (tables, code) render properly.
Multi-part text extractor
- extractReplyText (live A2A response in ChatTab) and
extractResponseText (chat history loader in message-parser):
now COLLECT from every source — top-level parts, parts.root.text,
and artifacts — joined with "\n". Previous "first source wins"
silently dropped multi-part replies (Hermes summary+detail,
Claude Code long-form table). Tests cover joined-from-parts,
joined-from-artifacts, joined-from-both.
Position stability
- canvas-topology.buildNodesAndEdges: auto-rescue heuristic now
accepts currentParentSizes map; uses max(initial min, currently
grown) for the bbox check. Fixes "child jumps to weird location
after 30s" — the periodic socket health-check rehydrate
(silenceSec > 30) was rebuilding nodes from scratch, and the
rescue's reliance on grid-derived initial size false-flagged
children the user dragged into the user-grown area.
- canvas.hydrate: pass live measured dimensions from the existing
store into buildNodesAndEdges.
- socket.RehydrateDedup: pure exported helper class that gates
rehydrate calls. Two states — in-flight (in-flight Promise reused
by concurrent callers) + post-completion window (1.5s, returns
Promise.resolve()). Initialised with -Infinity so first call
always passes the gate. Wired into ReconnectingSocket.rehydrate.
A2A edges
- New A2AEdge custom React Flow edge component portals its label
out of the SVG layer via EdgeLabelRenderer so labels (a) render
above workspace cards instead of being hidden behind them and
(b) accept clicks. Click selects source + switches panel to
Activity, but only on a NEW selection (preserves current tab on
re-click of an already-selected source).
- buildA2AEdges output tagged type:"a2a"; edgeTypes wired in
Canvas.tsx.
Tests
- 14 new vitest cases across 4 files (964 → 978 passing):
OrgImportPreflightModal saveOne single-fire / double-click,
any-of rendering; AgentCommsPanel toCommMessage flow derivation
in all four shapes; canvas-topology rescue respects-grown /
rescues-genuine-drift / fallback-without-live-size; socket
RehydrateDedup gate behaviour; message-parser multi-part
response extraction.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fifth E2E bug surfaced by the previous run. After the four setup-
phase fixes (instance_status, DNS zone, X-Molecule-Org-Id, hydration
selector) plus CP#259 ending the pq cache class, the harness finally
reached the actual page navigation step — and timed out there:
TimeoutError: page.goto: Timeout 45000ms exceeded.
navigating to "https://...staging.moleculesai.app/", waiting until "networkidle"
`waitUntil: "networkidle"` waits for 500ms of network silence. The
canvas keeps a WebSocket connection open + polls /events and
/workspaces every few seconds for status updates, so the network
is never idle — page.goto sits on it until the default 45s timeout
and throws.
Fix: switch to `waitUntil: "domcontentloaded"`. Returns as soon as
the HTML is parsed. React hydration plus the existing
`waitForSelector` line below is what actually gates ready-for-
interaction; the goto's job is just to land on the page.
This is a generally-applicable lesson — networkidle is broken for
any SPA with a heartbeat. Notably, our existing canvas unit tests
that mock @xyflow/react and don't open WebSockets DON'T hit this,
which is why this only surfaces against staging.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fourth E2E bug in the staging→main chain. The previous three (#2066
setup-phase fixes) let the harness reach the actual Playwright spec.
This one is in staging-tabs.spec.ts itself.
The spec at L78 waits 45s for one of:
[role="tablist"], [data-testid="hydration-error"]
Both targets are wrong:
1. [role="tablist"] only appears AFTER the workspace node is
clicked (which happens 25 lines later at L100). Waiting for
it BEFORE the click can never resolve, so the wait always
times out at 45s regardless of whether the canvas actually
loaded.
2. [data-testid="hydration-error"] doesn't exist anywhere in
the canvas. The error banner at app/page.tsx:62 only had
role="alert" — which collides with toast notifications and
other alert-type elements, so a more-specific selector was
never wired.
Two-part fix:
- Test waits on `[aria-label="Molecule AI workspace canvas"]`
instead — that's the React Flow wrapper (Canvas.tsx:150),
always present once hydrated regardless of workspace count
or selection state. Hydration-error banner remains the
secondary OR target for the failure path.
- app/page.tsx hydration-error banner gets the missing
`data-testid="hydration-error"` attribute. role="alert"
stays for accessibility; the testid is for programmatic
detection without conflict.
After this lands, the staging-tabs spec should advance past the
initial wait, click the workspace node, and exercise each tab.
If a tab fails, we get a proper test failure rather than a 45s
timeout that obscures everything.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Third E2E bug in the staging→main chain, found while debugging the
\`Workspace create 404\` failure that surfaced after the previous two
E2E fixes (instance_status, staging.moleculesai.app DNS).
Root cause: workspace-server's \`middleware/TenantGuard\` middleware
returns 404 (not 401/403, intentionally — see comment in
\`tenant_guard.go\`: "must not be inferable by probing other orgs'
machines") when a request to the tenant origin lacks one of:
- X-Molecule-Org-Id header matching MOLECULE_ORG_ID env on the tenant
- Fly-Replay-Src state from the CP router (production browser path)
- Same-origin Canvas (Referer == Host)
The E2E was a direct GitHub-Actions curl with neither — every non-
allowlisted route 404'd with the platform's ratelimit headers but
none of the security headers, which made it look like a missing
route in the platform.
The org UUID is already on the admin-orgs row alongside instance_status,
so capture it during the readiness poll and add it to the tenantAuth
header bag. Both /workspaces (POST) and /workspaces/:id (GET) now
carry it.
Allowlist still contains /health, /metrics, /registry/register,
/registry/heartbeat — so the TLS readiness step (which hits /health)
keeps working without the header.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Second related E2E bug, surfaced after #2066's instance_status fix
let the harness reach the TLS readiness step:
Error: tenant TLS: timed out after 180s
The CP provisioner writes staging tenant DNS as
<slug>.staging.moleculesai.app (with the staging. subdomain
prefix — visible in the EC2 provisioner DNS log line). The harness
was building https://<slug>.moleculesai.app (prod-zone shape),
so DNS literally didn't resolve, fetch threw NXDOMAIN inside the
silent catch, and waitFor saw null on every 5s poll until 180s
elapsed.
Fix: parameterize as STAGING_TENANT_DOMAIN env var, default
staging.moleculesai.app. Doc-comment example updated to match.
Override hatch is there only for ops running this harness against
a non-default zone.
Verified manually: a freshly-provisioned tenant
(e2e-canvas-20260425-sav9fe) was unreachable at the prod-shaped
URL (NXDOMAIN) but reached CF at the staging-shaped URL.
teardown.ts only hits CP, not the tenant URL — no fix needed there.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Staging Canvas Playwright E2E has been timing out at 1200s on every
recent run. Found via /code-review-and-quality on the staging→main
promotion chain.
The CP /cp/admin/orgs response shape is (handlers/admin.go:118):
type adminOrgSummary struct {
...
InstanceStatus string `json:"instance_status,omitempty"`
...
}
There is NO top-level `status` field. The waitFor predicate compared
`row.status === "running"` against undefined on every poll — the
predicate could never resolve truthy. The harness invariably wedged
on the 20-min timeout regardless of whether the tenant was actually
provisioned.
This bug has been double-edged:
- It MASKED the #242 pq-cache-collision class for hours: the
tenants WERE provisioning fine, but the test couldn't tell.
- It survived #255, #257 (real CP fixes) — the test still timed
out, making us suspect more CP bugs that didn't exist.
Fix: poll `row.instance_status` instead. One-line change. Identical
fix for the failed-state branch one line below.
No new tests for the harness itself; the fix's correctness is
verified by the next E2E run on the affected branch passing
end-to-end. If it doesn't pass after this, there's a separate
bug we can hunt cleanly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extends the org-import env preflight so a template can declare an
alternative: satisfy ANY one member to pass. Motivated by the
Claude-family node case where either ANTHROPIC_API_KEY or
CLAUDE_CODE_OAUTH_TOKEN unlocks the agent — forcing both was wrong.
Server (workspace-server):
- New EnvRequirement union type with custom YAML + JSON
(un)marshaling. Accepts scalar (strict) or {any_of: [...]} in
both on-disk org.yaml and inline POST /org/import bodies.
- collectOrgEnv now returns []EnvRequirement. Dedups groups by
sorted-member signature. "Strict wins" pruning drops any-of
groups that mention a name already declared strictly (same
tier and cross-tier).
- Import preflight uses EnvRequirement.IsSatisfied — scalar =
exact match, group = any member present.
- Empty any_of: [] rejected at parse time (never-satisfiable).
- 14 handler tests (6 updated for the union shape, 8 new
covering any-of satisfaction, dedup, strict-dominates-group,
cross-tier pruning, invalid-member filtering, YAML round-trip,
and empty-any-of rejection).
Canvas:
- EnvRequirement = string | {any_of: string[]} with envReqMembers,
envReqSatisfied, envReqKey helpers.
- OrgImportPreflightModal renders strict rows and any-of groups
via a new AnyOfEnvGroup sub-component: "Configure any one"
banner, per-member input, ✓-satisfied indicator, and dimmed
siblings once any member is configured so the user can still
switch providers.
- TemplatePalette.OrgTemplate.required_env / recommended_env
retyped to EnvRequirement[]; passthrough to the modal
unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Commit 5adc8a74 (part of this PR) intentionally made
molecule:fit-deploying-org fire for root-level workspaces too — it
used to only fire for children, which meant a standalone create
didn't center the viewport until the first child arrived ~2s later.
The existing regression test still expected ONLY the
molecule:pan-to-node event for a new root, so it started failing
with "expected length 1, got 2". The product behavior is correct
(centering on the root immediately is better UX); the test was
pinning the old single-dispatch shape.
Fix: assert BOTH events fire, each with the right detail payload,
so a future regression that drops either one (or duplicates) trips
the test. Single-test update, no production code change. 953/953
canvas tests pass locally.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
## Problem
Two issues the external-workspace path was silently dropping:
1. `knownRuntimes` was a hardcoded Go map that drifted from
manifest.json — e.g. `gemini-cli` was in manifest but missing
from the Go allowlist, so any workspace provisioning with
runtime=gemini-cli got silently coerced to langgraph.
2. No end-to-end "bring your own compute" story. The canvas UI
had no way to pick runtime=external; the partial backend code
required the operator to already have a URL ready (chicken-and-
egg with the agent that doesn't exist yet), and no workspace_auth
_token was minted so the external agent couldn't authenticate its
register call.
## Change
### Runtime registry driven by manifest.json
- New `runtime_registry.go` reads `manifest.json` at service init.
Each `workspace_templates[].name` becomes a runtime identifier
(with the `-default` suffix stripped so `claude-code-default`
and `claude-code` resolve to the same runtime).
- `external` is always injected (no template repo exists for it).
- Falls back to a static map on manifest load failure so tests /
dev containers keep working.
- 5 new tests including a real-manifest sanity check.
### First-class external workspace flow
When `POST /workspaces` is called with `runtime: "external"` AND
no URL supplied:
1. Workspace row inserted with `status='awaiting_agent'`
(distinct from `provisioning` so canvas doesn't trip its
provisioning-timeout UX).
2. A workspace_auth_token is minted via `wsauth.IssueToken`.
3. Response body includes a `connection` object with:
- `workspace_id`, `platform_url`, `auth_token`
- `registry_endpoint`, `heartbeat_endpoint`
- `curl_register_template` — zero-dep one-shot register snippet
- `python_snippet` — full SDK setup w/ heartbeat loop,
paired with molecule-sdk-python PR #13's A2AServer
4. The platform URL is resolved from `EXTERNAL_PLATFORM_URL` env
(ops-configurable per tenant) or falls back to request headers.
The legacy `payload.External` + `payload.URL` path is preserved —
org-import and other callers that already have a URL still work.
### Canvas UI
- New "External agent (bring your own compute)" checkbox in
CreateWorkspaceDialog.
- When checked, template/model/hermes-provider fields are hidden
and the POST body includes `runtime: "external"`.
- New `ExternalConnectModal` component: shown once after create,
renders Python / curl / raw-fields tabs with copy-to-clipboard
buttons. Stays mounted as a sibling of the create dialog so the
token survives the create dialog unmount.
- `auth_token` is interpolated into the snippet client-side so the
copied block is truly ready to run — operator only has to fill
in their agent's public URL.
## Tests
- Go: 5 new runtime_registry tests (happy path, -default strip,
external always injected, missing file, malformed JSON, real
manifest sanity). All existing handler tests still pass.
- TypeScript: no type errors on my files; pre-existing
canvas-batch-partial-failure type drift is on main already and
tracked on the #2061 branch.
## Follow-ups (filed separately)
- Cut molecule-sdk-python v0.y to PyPI so the snippet can use
`pip install molecule-ai-sdk` instead of `git+main`.
- Add a `runtime: string` field per template in manifest.json so
one template can declare its runtime explicitly (instead of
deriving it from name conventions). Unblocks N-templates-per-
runtime (e.g. hermes-minimax, hermes-anthropic both runtime=hermes).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Builds on #2061. Three internally-cohesive sub-features; easiest to
read in order.
## 1. Org-level env preflight
Server
- `OrgTemplate` + `OrgWorkspace` gain `required_env: string[]` and
`recommended_env: string[]` YAML fields.
- `GET /org/templates` walks the tree and returns the tree-union
(deduped, sorted) of both. `collectOrgEnv` dedup prefers required
when the same key is declared at both tiers.
- `POST /org/import` preflights against `global_secrets` WHERE
`octet_length(encrypted_value) > 0` (empty-value rows used to be
counted as "configured" and the per-container preflight still
failed at start time). 412 Precondition Failed + `missing_env`
list when required keys are absent. `force=true` bypasses with
an audit log line. DB lookup failure now returns 500 (was:
silent fall-through that defeated the guard). Env-var NAMES
validated against `^[A-Z][A-Z0-9_]{0,127}$` so a malicious
template can't ship pathological names into the UI or DB.
Canvas
- New `OrgImportPreflightModal`: red "Required" section (blocking)
and yellow "Recommended" section (non-blocking, import stays
enabled, shows live missing-count next to the Import button).
- Per-key password input → `PUT /settings/secrets` → strike-through
on save. Functional `setDrafts` throughout (no stale-closure
clobbers on rapid successive saves). `useEffect` seed keyed on a
sorted-join string signature so a parent re-render with a new
array identity doesn't clobber typed inputs.
- `TemplatePalette.handleImport` branches: zero env declarations →
straight to import; any declarations → fetch configured global
secret keys, open the modal.
Tests (Go): `TestCollectOrgEnv_*` (5) cover union-across-levels,
required-wins-over-recommended (including same-struct), dedup,
empty, invalid-name rejection.
## 2. EmptyState parity with TemplatePalette
The "Deploy your first agent" grid used to call `POST /workspaces`
with no preflight while the sidebar palette ran
`checkDeploySecrets` + `MissingKeysModal` first. Same template
deployed two different ways → first-run users saw containers boot
in `failed` state without guidance. Now both surfaces share one
preflight + modal handshake.
EmptyState's previous `interface Template` dropped `runtime`,
`models`, and `required_env` — silently discarding exactly the
fields the preflight needs. `Template` now lives in
`deploy-preflight.ts` and is imported from there by both surfaces.
## 3. useTemplateDeploy hook
With the preflight + modal wiring now duplicated across
EmptyState + TemplatePalette + (going forward) any third surface,
extracted the pattern into `canvas/src/hooks/useTemplateDeploy.tsx`:
const { deploy, deploying, error, modal } = useTemplateDeploy({
canvasCoords: ..., // optional, default random
onDeployed: (id) => ...,
});
Closes three drift surfaces that the duplication had created:
- `resolveRuntime` id→runtime fallback table (moved to
`deploy-preflight.ts`). EmptyState had a narrower fallback that
would have silently disagreed with the palette on any future id
needing a non-identity mapping.
- `checkDeploySecrets` call signature. One owner.
- `MissingKeysModal` JSX wiring. One owner.
Narrow try/catch around `checkDeploySecrets` so a preflight network
failure clears `deploying` and surfaces via `setError` instead of
stranding the button forever. `modal: ReactNode` (not a
`renderModal()` function) — the previous memoization bought
nothing since consumers called it inline every render. Named
`MissingKeysInfo` interface for the state shape.
## 4. Viewport auto-fit user-pan gate fix
During org deploy the canvas was meant to pan+zoom to follow each
arriving workspace (`molecule:fit-deploying-org` event → debounced
fitView). In practice the fit stayed stuck on wherever the first
fit landed.
Root cause: React Flow v12 fires `onMoveEnd` with a truthy `event`
at the END of a programmatic `fitView` animation. The original
"respect-user-pan" gate stamped `userPannedAtRef` in `onMoveEnd`,
so our own fit completing looked like a user pan, and every
subsequent auto-fit short-circuited for the rest of the deploy.
Fix: stop trusting `onMoveEnd` for user-intent detection. Register
explicit `wheel` + `pointerdown` listeners on `document` with
capture phase and `target.closest('.react-flow__pane')` filter.
Capture-phase immunity to `stopPropagation`; pane-filter rejects
toolbar / modal / side-panel clicks (the old `window` fallback
caught those). `onMoveEnd` simplified to only drive the debounced
viewport save.
Also: fit event dispatched on root arrivals (not just children),
so the canvas centers on the just-landed root immediately instead
of waiting ~2s for the first child. Animation 600ms → 400ms so
successive per-arrival fits don't pile up visually. End-state fit
stays at 1200ms — intentional asymmetry ("settling" vs
"tracking"), documented in code.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Merge origin/staging into fix/canvas-multilevel-layout-ux. 18 files
auto-merged (mostly canvas/tabs/chat and workspace-server handlers
the earlier DIRTY marker was stale relative to current staging).
- Fix 7 test failures surfaced by the merge:
1. Canvas.pan-to-node.test.tsx — mockGetIntersectingNodes was
inferred as vi.fn(() => never[]); mockReturnValueOnce of a node
object failed type check. Explicit return-type annotation.
2. Canvas.pan-to-node.test.tsx + Canvas.a11y.test.tsx — Canvas.tsx
reads deletingIds.size (new multilevel-layout state). Both mock
stores lacked deletingIds; added new Set<string>() to each.
3. canvas-batch-partial-failure.test.ts — makeWS() built a wire-
format WorkspaceData (snake_case, with x/y/uptime_seconds). The
store's node.data is now WorkspaceNodeData (camelCase, no wire-
only fields). Rewrote makeWS to produce WorkspaceNodeData and
updated 5 call-site casts. No assertions changed.
4. ConfigTab.hermes.test.tsx — two tests pinned pre-#2061 behavior
that the PR intentionally inverts:
a. "shows hermes-specific info banner" — RUNTIMES_WITH_OWN_CONFIG
now contains only {"external"}, so the banner is no longer
shown for hermes. Inverted assertion: now pins ABSENCE of
the banner, with a comment noting the inversion.
b. "config.yaml runtime wins over DB" — priority reversed:
DB is now authoritative so the tier-on-node badge matches
the form. Inverted scenario: DB=hermes + yaml=crewai →
form shows hermes. Switched test's DB runtime off langgraph
because the dropdown collapses langgraph into an empty-
valued "default" option that would hide the win signal.
- No production code changed — this commit is staging merge + test
realignment only. 953/953 canvas tests pass. tsc --noEmit clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Session's accumulated UX work across frontend and platform. Reviewable
in four logical sections — diff is large but internally cohesive
(each section fixes a gap the next one depends on).
## Chat attachments — user ↔ agent file round trip
- New POST /workspaces/:id/chat/uploads (multipart, 50 MB total /
25 MB per file, UUID-prefixed storage under
/workspace/.molecule/chat-uploads/).
- New GET /workspaces/:id/chat/download with RFC 6266 filename
escaping and binary-safe io.CopyN streaming.
- Canvas: drag-and-drop onto chat pane, pending-file pills,
per-message attachment chips with fetch+blob download (anchor
navigation can't carry auth headers).
- A2A flow carries FileParts end-to-end; hermes template executor
now consumes attachments via platform helpers.
## Platform attachment helpers (workspace/executor_helpers.py)
Every runtime's executor routes through the same helpers so future
runtimes inherit attachment awareness for free:
- extract_attached_files — resolve workspace:/file:///bare URIs,
reject traversal, skip non-existent.
- build_user_content_with_files — manifest for non-image files,
multi-modal list (text + image_url) for images. Respects
MOLECULE_DISABLE_IMAGE_INLINING for providers whose vision
adapter hangs on base64 payloads (MiniMax M2.7).
- collect_outbound_files — scans agent reply for /workspace/...
paths, stages each into chat-uploads/ (download endpoint
whitelist), emits as FileParts in the A2A response.
- ensure_workspace_writable — called at molecule-runtime startup
so non-root agents can write /workspace without each template
having to chmod in its Dockerfile.
Hermes template executor + langgraph (a2a_executor.py) + claude-code
(claude_sdk_executor.py) all adopt the helpers.
## Model selection & related platform fixes
- PUT /workspaces/:id/model — was 404'ing, so canvas "Save"
silently lost the model choice. Stores into workspace_secrets
(MODEL_PROVIDER), auto-restarts via RestartByID.
- applyRuntimeModelEnv falls back to envVars["MODEL_PROVIDER"]
so Restart propagates the stored model to HERMES_DEFAULT_MODEL
without needing the caller to rehydrate payload.Model.
- ConfigTab Tier dropdown now reads from workspaces row, not the
(stale) config.yaml — fixes "badge shows T3, form shows T2".
## ChatTab & WebSocket UX fixes
- Send button no longer locks after a dropped TASK_COMPLETE —
`sending` no longer initializes from data.currentTask.
- A2A POST timeout 15 s → 120 s. LLM turns routinely exceed 15 s;
the previous default aborted fetches while the server was still
replying, producing "agent may be unreachable" on success.
- socket.ts: disposed flag + reconnectTimer cancellation + handler
detachment fix zombie-WebSocket in React StrictMode.
- Hermes Config tab: RUNTIMES_WITH_OWN_CONFIG drops 'hermes' —
the adaptor's purpose IS the form, banner was contradictory.
- workspace_provision.go auto-recovery: try <runtime>-default AND
bare <runtime> for template path (hermes lives at the bare name).
## Org deploy/delete animation (theme-ready CSS)
- styles/theme-tokens.css — design tokens (durations, easings,
colors). Light theme overrides by setting only the deltas.
- styles/org-deploy.css — animation classes + keyframes, every
value references a token. prefers-reduced-motion respected.
- Canvas projects node.draggable=false onto locked workspaces
(deploying children AND actively-deleting ids) — RF's
authoritative drag lock; useDragHandlers retains a belt-and-
braces check.
- Organ cancel button (red pulse pill on root during deploy)
cascades via existing DELETE /workspaces/:id?confirm=true.
- Auto fit-view after each arrival, debounced 500 ms so rapid
sibling arrivals coalesce into one fit (previous per-event
fit made the viewport lurch continuously).
- Auto-fit respects user-pan — onMoveEnd stamps a user-pan
timestamp only when event !== null (ignores programmatic
fitView) so auto-fits don't self-cancel.
- deletingIds store slice + useOrgDeployState merge gives the
delete flow the same dim + non-draggable treatment as deploy.
- Platform-level classNames.ts shared by canvas-events +
useCanvasViewport (DRY'd 3 copies of split/filter/join).
## Server payload change
- org_import.go WORKSPACE_PROVISIONING broadcast now includes
parent_id + parent-RELATIVE x/y (slotX/slotY) so the canvas
renders the child at the right parent-nested slot without doing
any absolute-position walk. createWorkspaceTree signature gains
relX, relY alongside absX, absY; both call sites updated.
## Tests
- workspace/tests/test_executor_helpers.py — 11 new cases
covering URI resolution (including traversal rejection),
attached-file extraction (both Part shapes), manifest-only
vs multi-modal content, large-image skip, outbound staging,
dedup, and ensure_workspace_writable (chmod 777 + non-root
tolerance).
- workspace-server chat_files_test.go — upload validation,
Content-Disposition escaping, filename sanitisation.
- workspace-server secrets_test.go — SetModel upsert, empty
clears, invalid UUID rejection.
- tests/e2e/test_chat_attachments_e2e.sh — round-trip against
a live hermes workspace.
- tests/e2e/test_chat_attachments_multiruntime_e2e.sh — static
plumbing check + round-trip across hermes/langgraph/claude-code.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Marketing-lead agent's rename pass updated the "renders all three plans"
test (lines 56-57) but missed lines 77, 94, 114, 132, 143, 158 which still
referenced the pre-rename "Upgrade to Starter" / "Upgrade to Pro" button
names. Canvas (Next.js) build failed with getByRole timeout because the
component now says "Upgrade to Team" / "Upgrade to Growth".
Internal PlanId tuple ("free" | "starter" | "pro") and startCheckout(planId)
call are unchanged — only the user-facing button labels shifted, so
assertions like startCheckout("pro", "acme") still match the server-side API.
Verified locally: 9/9 PricingTable tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Schema-driven ChannelsTab renders no inputs when config_schema is
absent — the test's bare {type, display_name} mock mismatched the
real API shape and every getByLabelText("Bot Token") failed.
Mock now mirrors GET /channels/adapters with the Telegram schema
(bot_token password + chat_id text) so the a11y assertions run
against the actual rendered form.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lark adapter was already implemented in Go (lark.go — outbound Custom Bot
webhook + inbound Event Subscriptions with constant-time token verify),
but the Canvas connect-form hardcoded a Telegram-shaped pair of inputs
(bot_token + chat_id). Selecting "Lark / Feishu" from the dropdown
silently sent the wrong field names — there was no way to enter a
webhook URL.
Fix: move form shape to the server.
- Add `ConfigField` struct + `ConfigSchema()` method to the
`ChannelAdapter` interface. Each adapter declares its own fields with
label/type/required/sensitive/placeholder/help.
- Implement per-adapter schemas:
- Lark: webhook_url (required+sensitive) + verify_token (optional+sensitive)
- Slack: bot_token/channel_id/webhook_url/username/icon_emoji
- Discord: webhook_url + optional public_key
- Telegram: bot_token + chat_id (unchanged UX, keeps Detect Chats)
- Change `ListAdapters()` to return `[]AdapterInfo` with config_schema
inline. Sorted deterministically by display name so UI ordering is
stable across Go's random map iteration.
- Update the 3 existing `ListAdapters` test sites to struct access.
Canvas (`ChannelsTab.tsx`):
- Replace the two hardcoded bot_token/chat_id inputs with a single
schema-driven `SchemaField` component. Renders one input per field in
the order the adapter returns them.
- Form state becomes `formValues: Record<string,string>` keyed by
`ConfigField.key`. Values reset on platform-switch so stale
Telegram credentials can't leak into a new Lark channel.
- "Detect Chats" stays but only renders for platforms in
`SUPPORTS_DETECT_CHATS` (Telegram only — the only provider with
getUpdates).
- Only schema-known keys are posted in `config`, scrubbing any stale
values from previous platform selections.
Regression tests:
- `TestLark_ConfigSchema` locks in the 2-field Lark contract with the
required/sensitive flags correctly set.
- `TestListAdapters_IncludesLark` confirms registry wiring + schema
survives round-trip through ListAdapters.
Known pre-existing `TestStripPluginMarkers_AwkScript` failure in
internal/handlers is unrelated to this change (verified via stash+test
on clean staging).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Preparation for a "hundreds of runtimes" plugin ecosystem. Keeping the
runtime-specific UX knobs in-line inside ProvisioningTimeout scales badly
— every new runtime would require editing a component, not just adding a
table entry. Other components (create-workspace dialog, workspace card
tooltips, etc.) will want the same runtime metadata.
Changes:
- New file `canvas/src/lib/runtimeProfiles.ts` owns:
* `RuntimeProfile` type — structural shape, every field optional so
new runtimes can partially-fill without breaking consumers.
* `DEFAULT_RUNTIME_PROFILE` — 2-min default floor (docker-fast).
* `RUNTIME_PROFILES` — named overrides (currently: hermes 12 min).
* `WorkspaceRuntimeOverrides` — interface for server-provided
per-workspace overrides, so operators can tune via template
manifest / workspace metadata without a canvas release.
* `getRuntimeProfile()` — resolver with
overrides → profile → default priority.
* `provisionTimeoutForRuntime()` — convenience wrapper.
- `ProvisioningTimeout.tsx` now delegates to the profile module.
`DEFAULT_PROVISION_TIMEOUT_MS` re-exported for legacy test importers.
- Tests: 16/16 (up from 9 before the first fix). Adds pinning for:
* overrides > profile > default priority chain
* "every entry in RUNTIME_PROFILES resolves to a number" contract
* backward-compat export
Adding a new slow runtime is now one table entry in
`canvas/src/lib/runtimeProfiles.ts` with a mandatory `WHY` comment.
Moving to server-driven profiles later is a ~10-line change (the
resolver already threads WorkspaceRuntimeOverrides through).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Hermes workspaces cold-boot in 8-13 min (ripgrep + ffmpeg + node22 +
hermes-agent source build + Playwright + Chromium ~300MB). The canvas's
2-min hardcoded "Provisioning Timeout" warning fired at ~2min and told
users their workspace was "stuck" while it was still mid-install. Users
hit Retry, triggering fresh cold boots and cancelling healthy workspaces.
User-facing symptom (reported 2026-04-24 18:35Z): hermes workspace showed
"has been provisioning for 3m 15s — it may have encountered an issue"
with Retry + Cancel buttons, while the EC2 was installing node_modules.
Fix:
- Keep DEFAULT_PROVISION_TIMEOUT_MS = 120_000 (2min) — correct for fast
docker runtimes (claude-code, langgraph, crewai) where cold boot is
30-90s.
- Add RUNTIME_TIMEOUT_OVERRIDES_MS = { hermes: 720_000 } (12min).
Aligns with tests/e2e/test_staging_full_saas.sh's
PROVISION_TIMEOUT_SECS=900 (15min) so UI warns shortly before the
backend itself gives up.
- New timeoutForRuntime() resolves the base; per-node lookup in the
check-timeouts interval so a mixed batch (1 hermes + 2 langgraph) uses
the right threshold for each.
- timeoutMs prop is now optional. Undefined → per-runtime lookup; a
number → forces a single threshold for every workspace (tests use this
for deterministic behavior).
Tests: 4 new cases pinning the runtime-aware resolution, including a
guard that catches future regressions that would weaken hermes's budget.
Existing tests unchanged (they import DEFAULT_PROVISION_TIMEOUT_MS which
still exports 120_000).
13/13 pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EmbeddedTeam was defined in WorkspaceNode.tsx but had no call site —
TeamMemberChip (which is called directly) covers the same rendering
responsibility. The function was stranded after a prior refactor and
was flagged by github-code-quality on PR #1989 (merged 2026-04-24T14:09Z
without this cleanup because the token died before push).
Removes 25 lines of dead code. MAX_NESTING_DEPTH is kept — it is used
by TeamMemberChip at line 498.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>