Commit Graph

392 Commits

Author SHA1 Message Date
Hongming Wang
3ba924d174 review: drop destructive Override + single-fetch configuredKeys
Self-review of #2460 found two issues:

1. Critical: Override button in ProviderPickerModal called
   /settings/secrets when no workspaceId, overwriting the GLOBAL
   secret used by every workspace. The only consumers of this
   modal today (TemplatePalette, EmptyState via useTemplateDeploy)
   never pass workspaceId, so Override was always destructive.
   Removed entirely — the picker still solves the user-reported
   bug (always-ask + reuse saved keys); per-workspace key override
   can be a separate PR that plumbs secrets through POST /workspaces.

2. Optional: /settings/secrets was being fetched twice — once
   inside checkDeploySecrets (silently) and again in the hook to
   populate configuredKeys. Surfaced configuredKeys on
   PreflightResult so the hook re-uses the existing fetch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 13:40:58 -07:00
Hongming Wang
0608e15ab3 feat(canvas): always prompt for provider+model on multi-provider template deploy
Clicking a hermes template tile silently deployed when global env
covered the API key, producing "No LLM provider configured" 500
because the workspace booted with no explicit model slug — the
adapter fell back to its compiled-in default which 401s on the
user's actual provider key.

Fix: in useTemplateDeploy, open the picker whenever the template
declares ≥2 provider options, even when preflight.ok=true. The
modal renders pre-saved keys as Saved (with an Override link) and
adds a model input pre-filled from the template's default. Single-
provider templates (claude-code, langgraph) still skip the picker
since there's nothing to choose.

POST /workspaces now includes the picker's model slug so hermes-
style routing reads the prefix at install time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 13:34:17 -07:00
Hongming Wang
e1496936e9 feat(canvas): dynamic provider dropdown in CreateWorkspaceDialog
Mirrors the data-driven pattern PR #2454 set in ConfigTab: read
runtime_config.providers from /templates and filter the modal's
provider <select> to that subset. Same source of truth, three fewer
hardcoded copies of the provider list.

Behavior:
- Template declares providers → dropdown shows only those.
- Template ships no providers field → fall back to full HERMES_PROVIDERS
  catalog (back-compat for older templates / self-hosted setups).
- Declared list has no overlap with our static metadata → fall back to
  full catalog so the form can't lock the operator out.
- hermesProvider snaps back to the first available pick when its
  current value falls out of the filtered list.

Tests: 3 new pinning the filter, no-providers-field fallback, and
the unknown-providers fallback. All 27 CreateWorkspaceDialog tests
pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 12:45:20 -07:00
Hongming Wang
517bd0efc5 feat(canvas+workspace-server): data-driven Provider dropdown (#199)
Option B PR-5. Canvas Config tab now exposes a Provider override input
that's adapter-driven from each runtime's template — no hardcoded
provider list in the canvas. PUT /workspaces/:id/provider on Save
when dirty; auto-restart suppression to avoid double-restart with
the model handler's own restart.

The dropdown's suggestion list comes from /templates →
runtime_config.providers (the field added in
molecule-ai-workspace-template-hermes PR #31). For templates that
haven't migrated to the explicit providers list yet, suggestions
derive from model[].id slug prefixes — still adapter-driven, just
inferred. This keeps existing templates working while platform team
migrates them one at a time.

workspace-server changes:
- Add Providers []string field to templateSummary JSON
- Parse runtime_config.providers in /templates handler
- 2 new tests pin the surfacing + omitempty behavior

canvas changes:
- Remove hardcoded PROVIDER_SUGGESTIONS constant
- Add provider/originalProvider state + PUT-on-save logic
- Add deriveProvidersFromModels() fallback helper
- Wire RuntimeOption.providers from /templates response
- 8 new tests pin the behavior end-to-end

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 11:19:17 -07:00
Hongming Wang
7706db5a93 fix(canvas): persist model on Save+Restart for runtime-bearing workspaces
The Model dropdown's onChange writes to config.runtime_config.model
whenever a runtime is set (hermes, claude-code, etc.), and only
falls back to top-level config.model when no runtime is selected.
But handleSave used to diff the new value against top-level
nextSource.model only — so for any runtime-bearing workspace, the
PUT /workspaces/:id/model never fired and MODEL_PROVIDER never
landed in workspace_secrets.

Symptom (2026-04-30, hongmingwang Hermes Agent
32993ee7-840e-4c02-8ca8-cb9d75d112a5):
  - User picks minimax/MiniMax-M2.7-highspeed from the dropdown
  - Hits Save & Restart
  - Save reports success; restart fires
  - The new EC2 boots with HERMES_DEFAULT_MODEL empty
  - install.sh defaults to nousresearch/hermes-4-70b
  - hermes-agent errors "No LLM provider configured" on every chat
    turn because no NOUS_API_KEY / OPENROUTER_API_KEY is set
  - Reload Config tab → model field reverts to whatever
    GET /workspaces/:id/model returns (i.e. empty / template default)

handleSave now reads the effective model from runtime_config.model
first and falls back to top-level model for legacy no-runtime
workspaces. Same change for the old-value diff so a no-op Save
still skips the PUT.

Tests pin both branches: PUTs /model when the dropdown changed
runtime_config.model on a hermes workspace; does NOT PUT when
the value is unchanged from what GET /model returned.
2026-04-30 18:31:43 -07:00
Hongming Wang
b54ceb799f fix: address 5-axis review findings on PR #2413
Critical:
- ExternalConnectModal.tsx: filledUniversalMcp substitution searched
  for WORKSPACE_AUTH_TOKEN but the snippet's placeholder is now
  MOLECULE_WORKSPACE_TOKEN (changed in the previous polish commit
  876c0bfc). Operators copy-pasting the MCP tab would have gotten a
  literal "<paste from create response>" instead of the token. Fix
  the substitution to match the new placeholder name.

Important:
- mcp_cli._platform_register: 401/403 from initial register now hard-
  exits with code 3 + an actionable stderr message pointing the
  operator at the canvas Tokens tab. Pre-fix: warning log + continue,
  which made a bad-token startup silently fail (heartbeat 401's
  forever, every tool call also 401's, no clear surfacing in the
  operator's MCP client). 500/503 still log + continue (transient
  platform blips shouldn't abort the MCP loop).
- a2a_mcp_server.cli_main docstring: removed stale claim that this is
  the wheel's console-script entry-point target. The actual target is
  mcp_cli.main since 2026-04-30. Wheel-smoke pins both names so the
  functionality was correct, but the doc was lying.

Test coverage: 3 new mcp_cli tests:
  - register 401 exits code=3 + stderr mentions canvas Tokens tab
  - register 403 (C18 hijack rejection) takes same path
  - register 500/503 does NOT exit — only auth errors hard-fail

Findings deferred to follow-up (acceptable per review rubric):
  - Code dedup across mcp_cli / heartbeat.py / molecule_agent SDK
  - Pooled httpx.Client for connection reuse
  - Heartbeat exponential backoff
  - Token-resolution ordering parity (env-first vs file-first)
    between mcp_cli.main and platform_auth.get_token

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 16:06:59 -07:00
Hongming Wang
876c0bfcd4 docs(canvas): update Universal MCP snippet — molecule-mcp now standalone
The canvas tab snippet for the Universal MCP path was written before
this PR added the built-in register + heartbeat thread. Earlier wording
described it as "outbound-only — pair with the Claude Code or Python SDK
tab for heartbeat + inbound messages" — that's stale. molecule-mcp now
handles register + heartbeat itself; the only thing it doesn't yet do is
inbound A2A delivery.

Updated:
- externalUniversalMcpTemplate header comment + body — describes
  standalone behavior, points operators at SDK/channel only when they
  need INBOUND (not heartbeat).
- Drops the now-redundant curl-register step from the snippet — the
  binary registers itself on startup.
- Canvas modal label likewise updated.

No runtime / behavior change; pure docs polish so a copy-pasting
operator's mental model matches what the binary actually does.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 15:52:15 -07:00
Hongming Wang
716589742c feat(canvas): add Universal MCP tab to external-agent connect modal
The "Connect your external agent" dialog already covered Claude Code,
Python SDK, curl, and raw fields. This adds a Universal MCP tab that
documents the new \`molecule-mcp\` console script — the runtime-
agnostic baseline shipped by PR #2413's workspace-runtime changes.

Surface area:
- New \`externalUniversalMcpTemplate\` constant in workspace-server.
  Three-step snippet: pip install runtime → one-shot register via curl
  → wire molecule-mcp into agent's MCP config (Claude Code example,
  notes that hermes/codex/etc. take the same env-var contract).
- Workspace create response now includes \`universal_mcp_snippet\`
  alongside the existing curl/python/channel snippets.
- Canvas modal renders the tab when \`universal_mcp_snippet\` is
  present; backward-compatible with older platform builds (tab hides
  when empty).

Origin/WAF coverage (the user explicitly asked for this):
- The runtime wheel handles Origin automatically (this PR's earlier
  commit on platform_auth.auth_headers).
- The curl tab now sets \`Origin: {{PLATFORM_URL}}\` preemptively
  with an explanatory comment; \`/registry/register\` is currently
  WAF-allowed without it but adding now keeps the snippet working
  if WAF rules expand. The comment also explains why
  \`/workspaces/*\` paths return empty 404 without Origin — the
  exact failure mode I hit while smoke-testing this PR live.
- The MCP snippet's footer notes that the wheel auto-handles
  Origin so operators don't think about it.

End-to-end verification (against live tenant
hongmingwang.moleculesai.app, freshly registered workspace):
- get_workspace_info → full JSON
- list_peers → "Claude Code Agent (ID: 97ac32e9..., status: online)"
- recall_memory → "No memories found."
all returned by the molecule-mcp binary speaking MCP stdio to
this Claude Code session.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 15:34:27 -07:00
Hongming Wang
fc3b5fd385 feat(canvas): add /api/buildinfo for version-display parity with tenant
Workspace-server has GET /buildinfo (PR #2398) — `curl https://<slug>.
moleculesai.app/buildinfo` returns the live git SHA. Canvas had no
parallel: debugging "is this the deployed code?" required reading
Vercel's UI or response headers (deployment ID, not git SHA).

Add canvas /api/buildinfo returning {git_sha, git_ref, vercel_env}
sourced from VERCEL_GIT_COMMIT_SHA / _REF / VERCEL_ENV — Vercel injects
these at build time from the deploying commit. Outside Vercel (local
`next dev`, harness) all three are unset and the endpoint returns
`git_sha: "dev"`, the same sentinel workspace-server uses pre-ldflags-
injection.

Now both surfaces speak the same vocabulary:

  curl https://<slug>.moleculesai.app/buildinfo
  curl https://canvas.moleculesai.app/api/buildinfo

3 tests cover dev-fallback, Vercel-injected SHA pass-through, and JSON
content type.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 12:00:35 -07:00
Hongming Wang
15b98c4916 fix(e2e-canvas): kill teardown race that poisons concurrent runs
Setup wrote .playwright-staging-state.json at the END (step 7), only
after org create + provision-wait + TLS + workspace create + workspace-
online all succeeded. If setup crashed at steps 1-6, the org existed in
CP but the state file did not, so Playwright's globalTeardown bailed
out ("nothing to tear down") and the workflow safety-net pattern-swept
every e2e-canvas-<today>-* org to compensate. That sweep deleted
concurrent runs' live tenants — including their CF DNS records —
causing victims' next fetch to die with `getaddrinfo ENOTFOUND`.

Race observed 2026-04-30 on PR #2264 staging→main: three real-test
runs killed each other mid-test, blocking 68 commits of staging→main
promotion.

Fix: write the state file as setup's first action, right after slug
generation, before any CP call. Now:

  - Crash before slug gen        → no state file, no orphan to clean
  - Crash during steps 1-6       → state file has slug; teardown deletes
                                   it (DELETE 404s if org never created)
  - Setup completes              → state file has full state; teardown
                                   deletes the slug

The workflow safety-net no longer pattern-sweeps; it reads the state
file and deletes only the recorded slug. Concurrent canvas-E2E runs no
longer poison each other.

Verified by:
  - tsc --noEmit on staging-setup.ts + staging-teardown.ts
  - YAML lint on e2e-staging-canvas.yml
  - Code review: state file write moved to line 113 (post-makeSlug,
    pre-CP) with the original line-249 write retained as a "promote
    to full state" overwrite at the end
2026-04-29 19:23:56 -07:00
Hongming Wang
240d513ab8 canvas(ExternalConnectModal): add Claude Code tab + auto-fill auth_token
When the platform's create-external-workspace response includes
`claude_code_channel_snippet` (added in this same PR's first commit),
the modal surfaces it as the **first** tab — defaulting to it for new
external workspaces because polling-based + no-tunnel is the lowest-
friction path. Falls back to Python tab when the field is absent
(older platform builds).

Type addition is optional (`claude_code_channel_snippet?: string`)
so the canvas keeps building against pre-#2304 platform responses
during the soak window.

Auth-token stamping mirrors existing python/curl behavior — the
.env's `MOLECULE_WORKSPACE_TOKENS=<paste auth_token from create
response>` placeholder gets filled in client-side so the copy-paste
block is truly ready to run.

Also adds the missing 'use client' directive — the file uses useState
+ useCallback but didn't have the Next.js client-component marker.
Pre-commit caught it; existing absence was a latent bug that would
surface as an SSR hook error if any path rendered this component
during server rendering.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 11:39:48 -07:00
Hongming Wang
fc59f939ac chore(deps): batch dep bumps — 6 safe upgrades (4 actions majors + 2 npm dev deps)
Consolidates the remaining safe-to-merge dependabot PRs from the
2026-04-28 wave into one consumable PR. Replaces three earlier
single-bump PRs (#2245, #2230, #2231) which were closed in favor of
this single batch — same pattern as #2235.

GitHub Actions majors (SHA-pinned per org convention):
  github/codeql-action       v3 → v4.35.2  (#2228)
  actions/setup-node         v4 → v6.4.0   (#2218)
  actions/upload-artifact    v4 → v7.0.1   (#2216)
  actions/setup-python       v5 → v6.2.0   (#2214)

npm dev deps (canvas/, lockfile regenerated in node:22-bookworm
container so @emnapi/* and other Linux-only optional deps are
properly resolved — Mac-native `npm install` strips them, which
caused the earlier #2235 batch to drop these two):
  @types/node                ^22 → ^25.6   (#2231)
  jsdom                      ^25 → ^29.1   (#2230)

Why each is safe

  setup-node v4 → v6 / setup-python v5 → v6:
    Every consumer call pins node-version / python-version
    explicitly. v5 / v6 changed defaults but pinned consumers
    are unaffected. Confirmed via grep across .github/workflows/
    — all setup-node call sites pin '20' or '22', all
    setup-python call sites pin '3.11'.

  codeql-action v3 → v4.35.2:
    Used as init/autobuild/analyze sub-actions in codeql.yml.
    v4 bundles a newer CodeQL CLI; ubuntu-latest auto-updates
    so functional behavior is unchanged. The deprecated
    CODEQL_ACTION_CLEANUP_TRAP_CACHES env var (per v4.35.2
    release notes) is undocumented and we don't set it.

  upload-artifact v4 → v7.0.1:
    v6 introduced Node.js 24 runtime requiring Actions Runner
    >= 2.327.1. All upload-artifact users (codeql.yml,
    e2e-staging-canvas.yml) run on `ubuntu-latest` (GitHub-
    hosted), which auto-updates the runner agent. Self-hosted
    runners are NOT used for these jobs.

  @types/node 22 → 25 / jsdom 25 → 29:
    Both are dev-only — @types/node is type definitions,
    jsdom backs vitest's DOM environment. Tests pass:
    79 files / 1154 tests in node:22-bookworm container.

Verified locally (Linux container so the lockfile reflects what
CI's `npm ci` will install):
  - cd canvas && npm install --include=optional → 169 packages
  - npm test → 1154/1154 pass
  - npm ci → clean install succeeds
  - npm run build → Next.js prerendering succeeds

Closes when this lands (the 3 individual auto-merge PRs from earlier
were closed):
  #2228 #2218 #2216 #2214 #2231 #2230

NOT included (CI failing on dependabot's own run — major framework
bumps that need code-side migration tasks, not safe auto-bumps):
  #2233 next 15 → 16
  #2232 tailwindcss 3 → 4
  #2226 typescript 5 → 6
2026-04-28 17:44:55 -07:00
Hongming Wang
93e8e5329b
Merge pull request #2173 from Molecule-AI/deps/postcss-8.5.10-ghsa-qx2v-qp2m-jg93
deps(canvas): bump postcss 8.5.9 → 8.5.12 (GHSA-qx2v-qp2m-jg93, medium)
2026-04-27 20:20:13 +00:00
Hongming Wang
4028b81e04 refactor(canvas): route panel WS subscriptions through global socket
Both AgentCommsPanel and ChatTab's activity-feed opened raw
`new WebSocket(WS_URL)` instances per mount, with no onclose handler
and no reconnect logic. When the underlying connection dropped — idle
timeout, browser background-tab throttle, network jitter — the per-
panel sockets stayed dead until the panel re-mounted (refresh or
sub-tab unmount/remount). Live agent-comms bubbles and live activity
feed lines silently went missing in the gap, manifesting as "the
delegation didn't show up until I refreshed."

The global ReconnectingSocket in store/socket.ts already owns
reconnect, exponential backoff, health-check, and HTTP fallback poll.
Routing component subscribers through it gives every consumer those
guarantees for free, with one TCP connection per tab instead of N.

Three new pieces:

  - store/socket-events.ts: tiny pub/sub bus. emitSocketEvent fan-outs
    every decoded WSMessage to the listener Set; subscribeSocketEvents
    returns an unsubscribe. A throwing listener is logged and isolated
    so it can't break siblings.

  - store/socket.ts: ws.onmessage now calls emitSocketEvent(msg) right
    after applyEvent(msg), so the store's derived state and component
    subscribers stay in lockstep on every event arrival.

  - hooks/useSocketEvent.ts: React hook that registers exactly once
    per mount, capturing the latest handler in a ref so the closure
    sees current state/props without re-subscribing on every render.

Refactored sites:

  - AgentCommsPanel: replaced its WebSocket-in-useEffect block with
    useSocketEvent. Same parsing logic; the panel no longer opens its
    own connection.

  - ChatTab activity feed: split the previous useEffect in two — one
    seeds the activity log when `sending` flips, the other subscribes
    unconditionally and gates work on `sending` inside the handler.
    Hooks can't be conditional, so the gate has to live in the body
    rather than around the effect.

The ws-close graceful-close helper is no longer needed in either
site; the global socket owns its own teardown.

Tests: 6 new tests for the bus contract (single delivery, fan-out
order, unsubscribe, throwing-listener isolation, no-subscriber emit,
duplicate-subscribe Set semantics). All 27 existing socket tests
still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 13:12:47 -07:00
Hongming Wang
f7ad5a82f7 fix(canvas): release sendInFlightRef in the activity-log WS path too
Third-pass review caught a fourth WS path I missed. The original fix +
the stale-callback follow-up patched 3 sites that release the in-flight
guards (pendingAgentMsgs effect, HTTP .then() success, HTTP .catch()
success), but the ACTIVITY_LOGGED handler at lines 410-419 also clears
`sending` + `sendingFromAPIRef` when the platform logs the workspace's
a2a_receive ok/error. It only cleared 2 of the 3 refs — same exact
bug class as the original. If THIS path wins the race (a2a_receive
activity logged before pendingAgentMsgs delivers the reply text),
sendInFlightRef stays stuck true and the next sendMessage() silently
no-ops at line 464.

Fix: route both branches (ok and error) through releaseSendGuards()
so all four sites are now uniform.

Updated the helper's docstring to explicitly list all four sites and
warn that any future "I saw the reply" path that only clears the
natural pair (sending + sendingFromAPIRef) will silently re-introduce
the freeze. The disabled-button logic can't see sendInFlightRef so
the visible state diverges from the synchronous re-entry guard
otherwise.

This is exactly the drift `releaseSendGuards()` was supposed to
prevent — the helper landed in the prior commit but the activity-log
site wasn't migrated to use it. Fixing now closes the gap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 12:27:29 -07:00
Hongming Wang
cacf499354 fix(canvas): close stale-callback race + extract releaseSendGuards helper
Self-review on PR #2185 surfaced a latent race the original fix exposed:
the WS-clears-guards path now releases sendInFlightRef immediately, which
means a user can fire msg #2 between WS-arrival and HTTP-arrival for
msg #1. Without coordination, msg #1's late .then() sees
sendingFromAPIRef=true (set by msg #2's send), enters the main body,
and runs setSending(false) + appendMessageDeduped against msg #1's
response body — clobbering msg #2's in-flight UI state.

This race is realistic for claude-code SDK: the comment at line 294-298
already calls WS the "authoritative reply arrived" signal, and the user
typically reads-then-types before the trailing HTTP completes. Without
the original Send-button freeze "protecting" the race, it surfaces.

Two changes:

1. Token-keyed callbacks. sendTokenRef bumps on every sendMessage
   entry; .then()/.catch() capture the token in closure and bail
   without touching any flags if a newer send has superseded them.
   The newer send owns the in-flight guards.

2. releaseSendGuards() helper. The three-clear-guards trio
   (setSending, sendingFromAPIRef, sendInFlightRef) now lives in one
   useCallback so the WS handler, .then() success, and .catch()
   success can't drift apart. A future contributor dropping one of
   the three would silently re-introduce either the post-WS Send
   freeze or the stale-callback clobber.

Skipped a unit test for this regression — ChatTab has no __tests__
file and a mount test would need WS + zustand + api mocks. The fix
is 4 logical lines (token capture + 2 guard checks) and the manual
test covers it. Follow-up to add a focused mount test when ChatTab
gets its first __tests__ file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 11:47:12 -07:00
Hongming Wang
5faaf58466 fix(canvas): clear sendInFlightRef on WS-push reply path
Send button + Enter both silently no-op'd after the first agent reply
on runtimes that deliver via WebSocket (claude-code SDK does this per
the comment at ChatTab.tsx:294-298). The visible disabled-state checks
(sending, uploading, agentReachable) were all clean — the freeze came
from a third synchronous reentry guard the button can't see:

  if (sendInFlightRef.current) return;     // ChatTab.tsx:438

The ref was set true at the start of sendMessage() and only cleared in
.then() / .catch() of the HTTP fall-through and the upload-failure
branch. The WS-push handler in the pendingAgentMsgs effect cleared
`sending` and `sendingFromAPIRef` but left `sendInFlightRef` stuck
true. The HTTP .then() then early-returned at the dedup check (line
513) without touching the ref — only the .catch() early-return path
did. Net result: refresh fixed it because the ref reset on remount.

Two-line fix:
  - WS handler: also clear sendInFlightRef when the push delivers the
    reply (primary fix; no race window where the ref is stuck while
    the user can already type)
  - .then() early-return: mirror .catch()'s cleanup as defense in
    depth, so neither delivery order leaks the ref

While here: A2AEdge.test.tsx fixture was typed `as never` to dodge
EdgeProps' discriminated-union complaint, which broke spreading at
the call sites with TS2698 ("Spread types may only be created from
object types"). Replaced with `as unknown as ComponentProps<typeof
A2AEdge>` — preserves the original "skip restating every optional
field" intent and keeps a spreadable type.

All 10 A2AEdge tests pass; tsc --noEmit is clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 11:11:58 -07:00
Hongming Wang
fc60b4bc5e fix(canvas): regenerate lockfile under Node 20 for npm ci compatibility
The first commit on this branch left the lockfile inconsistent for
Node 20's npm 10:

  npm error \`npm ci\` can only install packages when your package.json
  and package-lock.json are in sync. Please update your lock file...
  npm error Missing: @emnapi/runtime@1.10.0 from lock file
  npm error Missing: @emnapi/core@1.10.0 from lock file

Root cause: my local install ran on Node 24 / npm 11, which doesn't
write peer-optional transitive entries (@img/sharp-* declares
@emnapi/runtime as peerOptional). The Canvas tabs E2E job uses Node 20
/ npm 10, which DOES expect those entries and rejected the lockfile
with EUSAGE.

Regenerated the lockfile under Node 20.19.4 (matches the lowest CI
node version, lockfile is forward-compatible with 22 and 24). 6 new
@emnapi/* entries added; postcss stays at 8.5.12 (the original goal
of this branch).

Verification:
  - \`nvm use 20 && npm ci\` clean
  - 1148/1148 vitest pass under Node 20

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 06:24:48 -07:00
Hongming Wang
6365e94213 deps(canvas): bump postcss 8.5.9 → 8.5.12 (GHSA-qx2v-qp2m-jg93)
Closes the medium-severity dependabot alert on canvas/package-lock.json.
Upstream advisory GHSA-qx2v-qp2m-jg93: "PostCSS has XSS via Unescaped
</style> in its CSS Stringify Output" — fixed in 8.5.10. We pull
8.5.12 since it's already published in the ^8.5.10 line.

package.json's caret range bumps from ^8.4.0 to ^8.5.12 — wider floor
prevents a future install from re-pinning below the safe version. The
8.x major-line constraint is preserved, so no breaking-change risk.

Verification: full canvas vitest suite passes (1148/1148 across
78 files).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 05:59:02 -07:00
Hongming Wang
d46d558ca9
Merge pull request #2148 from Molecule-AI/test/canvas-lib-utils-runtime-names-1815
test(canvas): cover utils.cn + runtime-names.runtimeDisplayName (0% → 100%) (#1815)
2026-04-27 06:57:57 +00:00
Hongming Wang
a682dcb502
Merge pull request #2149 from Molecule-AI/test/canvas-actions-1815
test(canvas): cover canvas-actions restart-pending helpers (25% → 100%) (#1815)
2026-04-27 06:55:36 +00:00
Hongming Wang
ae029f8c3f
Merge pull request #2151 from Molecule-AI/test/canvas-class-names-1815
test(canvas): cover store/classNames helpers (17% → 100%) (#1815)
2026-04-27 06:54:37 +00:00
Hongming Wang
516b58dcd7
Merge pull request #2147 from Molecule-AI/feat/canvas-coverage-instrumentation-1815
feat(canvas): vitest coverage instrumentation (#1815, no CI gate yet)
2026-04-27 06:54:22 +00:00
Hongming Wang
679e30538a test(canvas): cover store/classNames helpers (17% → 100%) (#1815)
[Molecule-Platform-Evolvement-Manager]

Continues the #1815 coverage rollup. classNames.ts was at 17%
in the baseline; this PR brings it to full coverage.

16 cases across 3 helpers:

**appendClass (6):**
- undefined / empty existing → just `cls`
- single-class → "a b" join
- DEDUP: existing already contains `cls` → existing unchanged.
  This is the load-bearing reason classNames.ts exists. Pre-helper
  the call sites inlined `${existing} ${cls}` with no dedup, so a
  tick that fired the same class twice produced "a a" and React
  Flow's className-equality diff saw it as a change every render.
- whitespace normalization (multi-space, leading/trailing)

**removeClass (7):**
- undefined / empty existing → ""
- removes named class
- exact match only ("spawn" must NOT match "spawn-fast")
- removing the only class → ""
- no-op when class absent
- whitespace normalization

**scheduleNodeClassRemoval (3):**
- after delayMs: calls set() with className-removed on target node;
  OTHER nodes untouched (the per-id pruning is the contract — pin
  it so a future refactor that maps over all nodes doesn't silently
  strip classes from siblings)
- does NOT fire before the delay elapses (vi.useFakeTimers + advance)
- SSR safety: when window is undefined, function is a no-op
  (neither get nor set fires)

## Note on test environment

Added `// @vitest-environment jsdom` directive — the file's
default `node` environment leaves `window` undefined, which would
make the SSR-guard happy-path test pass for the wrong reason
(every test would short-circuit). With jsdom, the SSR test
explicitly stubs `window` to undefined to exercise the guard.

## Test plan

- [x] All 16 cases pass locally (~1.1s with jsdom env spin-up)
- [x] No SUT changes
- [ ] CI green

## #1815 progress

- [x] Step 1+2: instrumentation (#2147)
- [x] utils.ts + runtime-names.ts (#2148)
- [x] canvas-actions.ts (#2149)
- [x] store/classNames.ts (this PR)
- [ ] store/canvas.ts (73% — biggest absolute gap; bigger surface,
      separate cycle)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 23:50:00 -07:00
Hongming Wang
e5e4eb4d2a test(canvas): cover canvas-actions restart-pending helpers (25% → 100%) (#1815)
[Molecule-Platform-Evolvement-Manager]

Continues the #1815 coverage rollup. canvas-actions.ts was at 25%
in the baseline run from #2147; this PR brings the file's two
helpers to full coverage.

5 cases:

**markAllWorkspacesNeedRestart (3):**
- calls updateNodeData on every node with `{needsRestart: true}`
- no-op when the canvas has zero workspaces
- preserves call ordering — matters because the toolbar's
  Restart Pending pill observes per-node data changes
  incrementally; a refactor that shuffled iteration order would
  silently change which workspaces flash first

**markWorkspaceNeedsRestart (2):**
- targeted call: updateNodeData fires exactly once on the named id
- defensive: regardless of how many other workspaces exist in the
  store, only the target workspace gets updated. Pre-this-test, a
  refactor that accidentally wired this function through the
  per-node iteration path of markAll would silently mark every
  workspace — pinning the cardinality here catches that.

## Mock strategy

Standard pattern for canvas store: mock useCanvasStore as both the
selector function AND a getState()-bearing object. updateNodeData
is a vi.fn() spy so the test asserts on calls + args directly.

## Test plan

- [x] All 5 cases pass locally (~132ms)
- [x] No SUT changes — pure additive coverage
- [ ] CI green

## #1815 progress

- [x] Step 1+2: instrumentation + script (#2147)
- [x] utils.ts + runtime-names.ts (#2148)
- [x] canvas-actions.ts (this PR)
- [ ] Remaining low-coverage targets: store/classNames.ts (17%),
      store/canvas.ts (73% — largest absolute gap by lines)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 23:47:49 -07:00
Hongming Wang
4fc37a76d9
Merge pull request #2143 from Molecule-AI/test/canvas-a2a-edge-2071
test(canvas): unit tests for A2AEdge — selection + Activity-tab routing (#2071)
2026-04-27 06:45:58 +00:00
Hongming Wang
bfbbe57610 test(canvas): cover utils.cn + runtime-names.runtimeDisplayName (0% → 100%) (#1815)
[Molecule-Platform-Evolvement-Manager]

Closes two of the 0%-coverage files surfaced by the baseline run in
PR #2147 (vitest coverage instrumentation). Both files are tiny
utility helpers with high-touch read paths.

## utils.cn (8 cases)

Wraps `twMerge(clsx(inputs))` — every conditionally-styled component
flows through here. The load-bearing case is the **last-wins
Tailwind dedup**: `cn("p-2", "p-4")` → "p-4". A regression that lost
twMerge would silently double-apply utilities (cosmetically broken,
breaks `:where()` rules + theme overrides).

Cases:
  - single class unchanged
  - multiple positional classes joined
  - array input flattening (clsx)
  - object syntax with truthy/falsy keys
  - last-wins dedup on conflicting Tailwind utilities (the
    regression-locked guarantee)
  - non-conflicting utilities both survive (p-2 + m-4)
  - mixed input shapes (string + array + object + string)
  - nullish / empty inputs don't throw

## runtime-names.runtimeDisplayName (4 it.each cases + 3 it())

Friendly-name lookup that surfaces the workspace runtime in the chat
indicator, details tab, and a few component labels.

Cases:
  - known runtimes map to display strings
    (claude-code → Claude Code, langgraph → LangGraph, etc.)
  - unknown runtime falls back to input string verbatim
    (a NEW runtime not yet in the lookup still renders something
    operator-debuggable rather than a generic placeholder)
  - empty string falls back to "agent" (final default)
  - case-sensitivity pinned: "Claude-Code" / "LANGGRAPH" miss the
    lookup. The upstream slug is already normalized lowercase, so a
    future refactor that lowercases input "for safety" would
    silently change behavior — pinning the contract here.

## Test plan

- [x] All 17 cases pass locally (~129ms)
- [x] No SUT changes — pure additive coverage
- [ ] CI green

## #1815 progress

- [x] Step 1+2: coverage instrumentation + script (#2147)
- [x] 0%-file gaps utils.ts + runtime-names.ts (this PR)
- [ ] More 0%/low-coverage files: lib/canvas-actions.ts (25%),
      store/classNames.ts (17%) — separate PRs
- [ ] Step 3b: thresholds + CI gate once baseline catches up

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 23:45:51 -07:00
Hongming Wang
d64ee7b4e4
Merge pull request #2145 from Molecule-AI/test/canvas-org-cancel-button-2071
test(canvas): unit tests for OrgCancelButton — cascade-delete + optimistic store (#2071)
2026-04-27 06:45:47 +00:00
Hongming Wang
57457899a1 feat(canvas): vitest coverage instrumentation (#1815, no CI gate yet)
[Molecule-Platform-Evolvement-Manager]

Closes step 1+2 of #1815. Step 3 (CI gate + threshold) is split into
a follow-up because today's baseline is ~46% lines / ~45% statements,
not the 70% the issue's draft thresholds assumed.

## What this lands

- `canvas/vitest.config.ts` — `coverage` block with v8 provider,
  reporters: text (terminal) / html (./coverage/index.html) /
  json-summary (machine-readable for tooling). NO threshold —
  pure observability.
- `canvas/package.json` — adds `test:coverage` script
  (`vitest run --coverage`); existing `test` script is unchanged so
  the default workflow is identical.
- `canvas/package-lock.json` — adds @vitest/coverage-v8@^4.1.5 (the
  v8 provider Vitest uses for native coverage).

## Why no threshold yet

Issue draft threshold was 70%/70%/65%/70% (lines/funcs/branches/stmts).
Local baseline today:

```
Statements   : 45.19% (3248/7186)
Branches     : 39.87% (2034/5101)
Functions    : 40.99% (724/1766)
Lines        : 46.36% (2905/6265)
```

Turning on a 70% gate today would either fail CI immediately or get
papered over with an ad-hoc exclude list. Better path: land
observability now, run coverage in PR review for any new code
(via the new script), gate later when the baseline catches up.

## Heatmap (from local run, top gaps)

- `src/lib/runtime-names.ts` — 0% (untouched by tests)
- `src/lib/utils.ts` — 0%
- `src/lib/canvas-actions.ts` — 25%
- `src/store/classNames.ts` — 17%
- `src/store/canvas.ts` — 73% (already-tested but the largest absolute
  gap by lines)

Each is a concrete follow-up issue / PR target.

## Test plan

- [x] `npx vitest run --coverage` runs cleanly locally (~10s) and
      produces `./coverage/index.html` + a `coverage-summary.json`
- [x] Existing `npm run test` workflow unchanged — instrumentation
      only activates with `--coverage` flag
- [x] No production-code changes — pure tooling addition

## Follow-ups (each tracked separately; this PR keeps minimal scope)

- Step 3a — write tests for the 0% files above (~tiny each)
- Step 3b — once baseline ≥ thresholds, add `thresholds` block to
  vitest.config.ts + a `npm run test:coverage` step in
  `.github/workflows/ci.yml`'s Canvas job

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 23:44:07 -07:00
Hongming Wang
e3d3b48e8c test(canvas): unit tests for dragUtils — nest hysteresis + clamp geometry (#2071)
[Molecule-Platform-Evolvement-Manager]

Closes the fourth and final item from #2071 — but at a slightly
different layer than the issue listed: tests `dragUtils.ts` (the
74-LOC pure-ish geometry helpers) instead of the full 296-LOC
`useDragHandlers` hook. Rationale below.

15 cases across 2 buckets:

**shouldDetach (8):**
- child fully inside parent → false
- child drifted slightly past edge but under DETACH_FRACTION → false
- child past 20% threshold on X → true (un-nest)
- child past 20% threshold on Y → true (un-nest)
- missing child node → true (conservative fallback per source comment)
- missing parent node → true (same)
- measured size absent → falls back to React Flow's 220x120 defaults
  (mirrors initial-mount race where measurement hasn't run yet)
- DETACH_FRACTION constant pinned at 0.2 (Miro/tldraw convention)

**clampChildIntoParent (7):**
- child already inside bounds → no-op (no setState — proven by
  reference equality on mockState.nodes)
- drifted past top-left → clamps to (0, 0)
- drifted past bottom-right → clamps to (parentW - childW, parentH - childH)
- per-axis independence: X past edge + Y inside → only X clamps
- child not in store → early return, no setState
- child internalNode missing → early return, no setState
- multi-node store: clamping one node MUST NOT touch siblings

## Why dragUtils, not the full useDragHandlers hook

The hook (296 LOC) orchestrates React Flow drag events + Zustand
mutations. Testing it would need heavyweight `useReactFlow` +
internal-node + `setDragOverNode` / `nestNode` / `batchNest` /
`isDescendant` mocks just to drive event handlers — and the
*decisions* the hook makes all delegate to these two helpers:
- `shouldDetach` decides "is this a real un-nest?"
- `clampChildIntoParent` snaps the child back when the user drifted
  slightly past the edge without holding Alt/Cmd

Pinning these locks the hot path the user feels. The hook's
remaining surface (modifier-key snapshotting, drop-target
broadcasting, commit-on-release grow pass) is plumbing — worth
testing as a follow-up if it ever regresses, but lower
correctness leverage per LOC of test setup.

## #2071 status after this PR

- [x] useTemplateDeploy (#2121)
- [x] A2AEdge (#2143)
- [x] OrgCancelButton (#2145)
- [x] dragUtils geometry helpers (this PR)
- [ ] Full useDragHandlers hook orchestration — explicit deferral
      with rationale above

## Test plan

- [x] All 15 cases pass locally (`vitest run dragUtils.test.ts` — 131ms)
- [x] No changes to the SUT — pure additive coverage
- [ ] CI green

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 23:41:37 -07:00
Hongming Wang
39eb3eb2e4 test(canvas): unit tests for OrgCancelButton — cascade-delete + optimistic store (#2071)
[Molecule-Platform-Evolvement-Manager]

Closes the third item from #2071 (Canvas test gaps follow-up). Builds
on the A2AEdge tests in PR #2143.

10 cases across 4 buckets:

**Render (2):**
- Default pill with `Cancel (N)` text + correct ARIA label
- Confirm dialog NOT visible until pill click

**Pill click (3):**
- Click flips to confirming view + stops propagation (so React Flow
  doesn't interpret the click as a node selection)
- Confirm copy pluralizes correctly: count=1 → "Delete 1 workspace?",
  count>1 → "Delete N workspaces?". Negative assertion guards against
  the wrong-form regressing in either direction.

**No / cancel-confirm (1):**
- Click No → returns to pill, no API call, no store mutation

**Yes / cascade-delete (4):**
- Happy path: beginDelete locks the WHOLE subtree (root + children,
  NOT unrelated workspace) → api.del("/workspaces/<id>?confirm=true")
  → optimistic store filter strips subtree, keeps unrelated → success
  toast → endDelete in finally
- WS-event race: WS_REMOVED handler clears the root mid-flight. The
  bail-out branch (`!postDeleteState.nodes.some(n => n.id === rootId)`)
  must NOT then run a second optimistic filter. Pre-fix the post-await
  subtree walk would miss any orphaned descendants whose parentId got
  reparented upward by handleCanvasEvent — pinned now.
- Error path: api.del rejects → endDelete UNDOes the lock + error
  toast surfaces the message → subtree STAYS in the store so the user
  can retry / interact with the still-deploying nodes
- Non-Error rejection (e.g. string thrown directly): toast surfaces
  the canned "Cancel failed" fallback instead of attempting `.message`

## Mocking

- `@/lib/api`, `@/components/Toaster`: simple spy mocks
- `@/store/canvas`: object that satisfies BOTH the selector pattern
  (`useCanvasStore(s => s.x)`) AND `getState()` / `setState()` since
  the cascade-delete handler walks the subtree via `getState()` and
  mutates via `setState()` for the optimistic removal. `vi.hoisted`
  preserves referential identity so the mock fns wired into the
  state object are observed by every consumer.

## Test plan

- [x] All 10 cases pass locally (`vitest run OrgCancelButton.test.tsx` — ~990ms)
- [x] No changes to the SUT — pure additive coverage
- [ ] CI green

## #2071 progress after this PR

- [x] useTemplateDeploy (PR #2121)
- [x] A2AEdge (PR #2143)
- [x] OrgCancelButton (this PR)
- [ ] useDragHandlers — separate PR

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 23:38:59 -07:00
Hongming Wang
c7185ece80 test(canvas): unit tests for A2AEdge — selection + Activity-tab routing (#2071)
[Molecule-Platform-Evolvement-Manager]

Closes the second item from #2071 (Canvas test gaps follow-up):
adds behavioural coverage for the custom React Flow edge that renders
delegation counts between workspaces and routes a click into the
source workspace's Activity feed.

10 cases across 2 buckets:

**Render (6):**
- Empty label → BaseEdge only, NO portaled HTML pill (the most
  common state for cold edges; pill must not render-through-empty)
- Non-empty label → pill renders with the exact label text
- isHot=true → violet accent classes; blue accent NOT present
- isHot=false → blue accent classes
- ARIA pluralization: count=1 → "1 delegation from …" (singular)
- ARIA pluralization: count=7 → "7 delegations from …" (plural)

**Click behaviour (4):**
- Click → selectNode(source)
- FRESH selection (selectedNodeId != source) → also setPanelTab("activity")
- RE-click of already-selected source → setPanelTab MUST NOT fire
  (this is the regression-locked guarantee — preserves the user's
  current tab when they intentionally moved to Chat / Memory while
  inspecting the same peer)
- stopPropagation: parent onClick must NOT see the event (otherwise
  the canvas Pane's clear-selection handler would fire and undo the
  edge's own selectNode call)

## Mocking strategy

- `@xyflow/react`: BaseEdge → <g data-testid>, EdgeLabelRenderer →
  inline pass-through (no portal), getBezierPath → fixed [path, x, y].
  Lets the test render the component without a ReactFlow provider.
- `@/store/canvas`: vi.hoisted-shared mock state with selectNode +
  setPanelTab spies and a mutable selectedNodeId. The store's
  getState() returns the same object so the click handler's
  `useCanvasStore.getState().selectedNodeId` lookup works.

Pattern matches the existing `A2ATopologyOverlay.test.tsx` setup
in the same module.

## Test plan

- [x] All 10 cases pass locally (`vitest run A2AEdge.test.tsx` — ~1.3s)
- [x] No changes to the SUT — pure additive coverage
- [ ] CI green

## Remaining #2071 items

- OrgCancelButton tests
- useDragHandlers tests

Each is a separate PR.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 23:33:28 -07:00
Hongming Wang
0032f9c906 fix(chat): drop unused extractResponseText import after helper extraction
Reviewer bot flagged: ChatTab.tsx imported extractResponseText but
no longer used it after the loop body moved to historyHydration.ts
(the helper imports it directly). Drop from the named import to
unblock merge. extractFilesFromTask remains used at line 515 for the
WS A2A_RESPONSE handler's reply-files extraction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 22:52:53 -07:00
Hongming Wang
6430b3b699 fix(chat): hydrate user-side file attachments on chat reload
Reviewer follow-up to PR #2134 (Optional finding). The history loader
walked text on the user branch but never extracted file parts — so a
chat reload after a session where the user dragged in a file rendered
the text bubble but lost the download chip. Symmetric to the agent
branch which already handles this via extractFilesFromTask.

Wire shape from ChatTab's outbound POST:
  request_body = {params: {message: {parts: [
    {kind: "text", text: "..."},
    {kind: "file", file: {uri, name, mimeType?, size?}}
  ]}}}

extractFilesFromTask walks `task.parts`, so we feed it `params.message`
(the inner object that has the parts array). Three new tests:
  - hydrates file attachments from request_body
  - emits an attachments-only bubble when text is empty (drag-drop
    without caption — pre-fix the empty userText short-circuited and
    the row was dropped entirely)
  - internal-self predicate suppresses the row even with attachments
    (defence-in-depth for future internal triggers)

Stacked on #2134; this branch's parent commit is its tip.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 21:41:28 -07:00
Hongming Wang
fe204f04da test(chat): extract historyHydration helper + 12 unit tests
User pushed back: the timestamp bug should have been caught by E2E.
Right — my earlier coverage tested the server contract (notify endpoint,
WS broadcast filter) but never the chat-history HYDRATION path. Without
a unit test that froze the wall clock and asserted timestamps came from
created_at, a future refactor could re-introduce the same bug.

This commit:

1. Extracts the per-row → ChatMessage[] mapping out of the closure
   inside loadMessagesFromDB into chat/historyHydration.ts. Pure
   function, no React dependency, easy to test.

2. Adds 12 vitest cases in __tests__/historyHydration.test.ts covering:
   - Timestamp regression (3 tests, with system time frozen to 2030 so
     a regression starts producing "2030-…" timestamps and the assertion
     fails unmistakably). The third test mirrors the user's screenshot:
     two rows with distinct created_at must produce distinct timestamps.
   - User-message extraction (text, internal-self filter, null body)
   - Agent-message extraction (text, error→system role, file attachments,
     null body, body with neither text nor files)
   - End-to-end: a single row with both request and response emits
     two messages with the same timestamp (the canonical canvas-source
     row pattern)

3. The new file-attachment test caught a SECOND latent bug — the helper
   was passing `response_body.result ?? response_body` to extractFiles
   FromTask, which passes the STRING "<text>" for the notify-with-
   attachments shape `{result: "<text>", parts: [...]}` and silently
   returns []. So a chat reload after an agent attached a file would
   lose the chips. Fixed by only unwrapping `result` when it's an
   object (the task-shape) and falling through to response_body
   otherwise (the notify shape).

ChatTab now imports the helper and the loop body becomes one line:
`messages.push(...activityRowToMessages(a, isInternalSelfMessage))`.

Verification:
  - 12/12 historyHydration tests pass
  - 1072/1072 full canvas vitest pass (was 1060 before, +12)
  - tsc --noEmit clean

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 21:18:22 -07:00
Hongming Wang
8415870520 fix(chat): pin historical user-message timestamps to activity created_at
User flagged that all historical user bubbles render with the same
"now" clock after a chat reload — both messages in the screenshot
showed 9:01:58 PM despite being sent hours apart.

ChatTab.tsx:142 minted user messages with createMessage(...) which
calls new Date().toISOString() — fine for a freshly-typed message,
wrong for hydrated history. Every reload re-stamped all user bubbles
to the render moment, collapsing the visible chronology. The agent
path on line 157 already overrides with a.created_at; mirror that.

One-line fix (spread + override timestamp) plus a comment explaining
why the override is load-bearing so the next refactor doesn't drop it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 21:06:19 -07:00
f547c4e259
Merge pull request #2132 from Molecule-AI/test/comprehensive-comms-e2e
test(comms): E2E + canvas coverage for agent → user attachments
2026-04-27 03:49:49 +00:00
Hongming Wang
fb080227a3
Merge pull request #2131 from Molecule-AI/feat/agent-comms-grouped-by-peer
feat(canvas): Agent Comms grouped by peer with sub-tabs
2026-04-27 03:43:45 +00:00
Hongming Wang
62cfc21033 test(comms): comprehensive E2E coverage for agent → user attachments
User asked to "keep optimizing and comprehensive e2e testings to prove all
works as expected" for the communication path. Adds three layers of coverage
for PR #2130 (agent → user file attachments via send_message_to_user) since
that path has the most user-visible blast radius:

1. Shell E2E (tests/e2e/test_notify_attachments_e2e.sh) — pure platform test,
   no workspace container needed. 14 assertions covering: notify text-only
   round-trip, notify-with-attachments persists parts[].kind=file in the
   shape extractFilesFromTask reads, per-element validation rejects empty
   uri/name (regression for the missing gin `dive` bug), and a real
   /chat/uploads → /notify URI round-trip when a container is up.

2. Canvas AGENT_MESSAGE handler tests (canvas-events.test.ts +5) — pin the
   WebSocket-side filtering that drops malformed attachments, allows
   attachments-only bubbles, ignores non-array payloads, and no-ops on
   pure-empty events.

3. Persisted response_body shape test (message-parser.test.ts +1) — pins
   the {result, parts} contract the chat history loader hydrates on
   reload, so refreshing after an agent attachment restores both caption
   and download chips.

Also wires the new shell E2E into e2e-api.yml so the contract regresses
in CI rather than only in manual runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 20:41:56 -07:00
Hongming Wang
26fb4b309e fix(canvas): delegation rows show real text + bidirectional bubbles
User flagged two paper cuts in Agent Comms after the grouping PR:
"Delegating to f6f3a023-ab3c-4a69-b101-976028a4a7ec" reads as gibberish
because it's a UUID, and the chat is "one way" with only outbound bubbles
even though peers are clearly responding.

Both fixes are in toCommMessage's delegation branch:

1. Pull text from the actual payload, not the platform's audit-log summary.
   - delegate row → request_body.task (the task text the agent sent).
     Fallback when missing: "Delegating to <resolved-peer-name>" — never
     the raw UUID.
   - delegate_result row → response_body.response_preview / .text (the
     peer's actual reply). Fallback paths render human-readable status
     for queued / failed cases ("Queued — Peer Agent is busy on a prior
     task...") instead of platform jargon.

2. delegate_result rows render flow="in" — even though source_id=us
   (the platform writes the row on our side), the conversational
   direction is peer → us. The chat now shows alternating bubbles
   (out: "Build me 10 landing pages" → in: "Done — ZIP at /tmp/...")
   instead of one-sided "→ To X" wall.

The WS push handler in this same file now populates request_body /
response_body from the DELEGATION_SENT / DELEGATION_COMPLETE event
payloads (task_preview, response_preview), so live-pushed bubbles use
the same text-extraction path as the GET-on-mount.

Tests:
  - 4 new in toCommMessage's delegation branch:
    - delegate row prefers request_body.task over summary
    - delegate row falls back to name-resolved label when task missing
    - delegate_result row is INBOUND (flow="in")
    - delegate_result queued shows human-readable wait message including
      the resolved peer name
  - Replaces the previous "delegate row maps text from summary" tests
    which encoded the (now-undesirable) platform-summary-as-text behavior.
  - All 15 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 20:24:58 -07:00
Hongming Wang
5f08455340 feat(canvas): Agent Comms grouped by peer with sub-tabs
The chronological-only view was a noodle once Director + N peers
exchange more than a few rounds. New layout: a sub-tab bar at the
top of the panel, with "All" pinned leftmost and one tab per peer
(name + count). Selecting a peer filters the thread to that one
DD↔X conversation; "All" preserves the previous chronological view
as the default.

Tab ordering follows Slack/Linear DM-list convention: most-recent
activity descending, so active conversations rise to the top
without the user scrolling. Counts in parens match Slack's unread
hint pattern (no separate read/unread state — the count is total
in this conversation, computed from the same in-memory message
list the panel already maintains).

Pure-helper extraction: peer-summary derivation lives in
`buildPeerSummary(messages)` so the sort + count logic is unit-
testable without rendering the panel. 5 new tests cover: count
aggregation, most-recent-first ordering, lastTs as max-not-last,
empty input, name-stability when the same peerId carries different
names across messages.

Keyboard: ArrowLeft/Right cycle peer tabs (matches the existing
My Chat / Agent Comms tab pattern in ChatTab). Auto-prune: if the
selected peer has zero messages after a setMessages update (rare,
e.g. dedupe drops the last bubble), fall back to "All" so the
viewer doesn't see an empty thread.

Frontend-only — no platform / runtime / DB changes. The existing
`peerId` / `peerName` fields on CommMessage already carry every
piece of data the new UI needs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 20:16:11 -07:00
Hongming Wang
6eaacf175b fix(notify): review-flagged Critical + Required findings on PR #2130
Two Critical bugs caught in code review of the agent→user attachments PR:

1. **Empty-URI attachments slipped past validation.** Gin's
   go-playground/validator does NOT iterate slice elements without
   `dive` — verified zero `dive` usage anywhere in workspace-server —
   so the inner `binding:"required"` tags on NotifyAttachment.URI/Name
   were never enforced. `attachments: [{"uri":"","name":""}]` would
   pass validation, broadcast empty-URI chips that render blank in
   canvas, AND persist them in activity_logs for every page reload to
   re-render. Added explicit per-element validation in Notify (returns
   400 with `attachment[i]: uri and name are required`) plus
   defence-in-depth in the canvas filter (rejects empty strings, not
   just non-strings).
   3-case regression test pins the rejection.

2. **Hardcoded application/octet-stream stripped real mime types.**
   `_upload_chat_files` always passed octet-stream as the multipart
   Content-Type. chat_files.go:Upload reads `fh.Header.Get("Content-Type")`
   FIRST and only falls back to extension-sniffing when the header is
   empty, so every agent-attached file lost its real type forever —
   broke the canvas's MIME-based icon/preview logic. Now sniff via
   `mimetypes.guess_type(path)` and only fall back to octet-stream
   when sniffing returns None.

Plus three Required nits:

- `sqlmockArgMatcher` was misleading — the closure always returned
  true after capture, identical to `sqlmock.AnyArg()` semantics, but
  named like a custom matcher. Renamed to `sqlmockCaptureArg(*string)`
  so the intent (capture for post-call inspection, not validate via
  driver-callback) is unambiguous.
- Test asserted notify call by `await_args_list[1]` index — fragile
  to any future _upload_chat_files refactor that adds a pre-flight
  POST. Now filter call list by URL suffix `/notify` and assert
  exactly one match.
- Added `TestNotify_RejectsAttachmentWithEmptyURIOrName` (3 cases)
  covering empty-uri, empty-name, both-empty so the Critical fix
  stays defended.

Deferred to follow-up:

- ORDER BY tiebreaker for same-millisecond notifies — pre-existing
  risk, not regression.
- Streaming multipart upload — bounded by the platform's 50MB total
  cap so RAM ceiling is fixed; switch to streaming if cap rises.
- Symlink rejection — agent UID can already read whatever its
  filesystem perms allow via the shell tool; rejecting symlinks
  doesn't materially shrink the attack surface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 19:47:31 -07:00
Hongming Wang
d028fe19ff feat(notify): agent → user file attachments via send_message_to_user
Closes the gap where the Director would say "ZIP is ready at /tmp/foo.zip"
in plain text instead of attaching a download chip — the runtime literally
had no API for outbound file attachments. The canvas + platform's
chat-uploads infrastructure already supported the inbound (user → agent)
direction (commit 94d9331c); this PR wires the outbound side.

End-to-end shape:

  agent: send_message_to_user("Done!", attachments=["/tmp/build.zip"])
   ↓ runtime
  POST /workspaces/<self>/chat/uploads (multipart)
   ↓ platform
  /workspace/.molecule/chat-uploads/<uuid>-build.zip
   → returns {uri: workspace:/...build.zip, name, mimeType, size}
   ↓ runtime
  POST /workspaces/<self>/notify
   {message: "Done!", attachments: [{uri, name, mimeType, size}]}
   ↓ platform
  Broadcasts AGENT_MESSAGE with attachments + persists to activity_logs
  with response_body = {result: "Done!", parts: [{kind:file, file:{...}}]}
   ↓ canvas
  WS push: canvas-events.ts adds attachments to agentMessages queue
  Reload: ChatTab.loadMessagesFromDB → extractFilesFromTask sees parts[]
  Either path → ChatTab renders download chip via existing path

Files changed:

  workspace-server/internal/handlers/activity.go
    - NotifyAttachment struct {URI, Name, MimeType, Size}
    - Notify body accepts attachments[], broadcasts in payload,
      persists as response_body.parts[].kind="file"

  canvas/src/store/canvas-events.ts
    - AGENT_MESSAGE handler reads payload.attachments, type-validates
      each entry, attaches to agentMessages queue
    - Skips empty events (was: skipped only when content empty)

  workspace/a2a_tools.py
    - tool_send_message_to_user(message, attachments=[paths])
    - New _upload_chat_files helper: opens each path, multipart POSTs
      to /chat/uploads, returns the platform's metadata
    - Fail-fast on missing file / upload error — never sends a notify
      with a half-rendered attachment chip

  workspace/a2a_mcp_server.py
    - inputSchema declares attachments param so claude-code SDK
      surfaces it to the model
    - Defensive filter on the dispatch path (drops non-string entries
      if the model sends a malformed payload)

  Tests:
    - 4 new Python: success path, missing file, upload 5xx, no-attach
      backwards compat
    - 1 new Go: Notify-with-attachments persists parts[] in
      response_body so chat reload reconstructs the chip

Why /tmp paths work even though they're outside the canvas's allowed
roots: the runtime tool reads the bytes locally and re-uploads through
/chat/uploads, which lands the file under /workspace (an allowed root).
The agent can specify any readable path.

Does NOT include: agent → agent file transfer. Different design problem
(cross-workspace download auth: peer would need a credential to call
sender's /chat/download). Tracked as a follow-up under task #114.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 19:35:58 -07:00
Hongming Wang
808cc5437f fix(canvas): ExternalConnectModal redundant null check on Dialog.Root open prop
[Molecule-Platform-Evolvement-Manager]

Addresses github-code-quality finding on PR #2064:

> Comparison between inconvertible types
> Variable 'info' cannot be of type null, but it is compared to
> an expression of type null.

By line 75, `info` has been narrowed to non-null via the
`if (!info) return null;` guard at line 56 — so `open={info !== null}`
always evaluates to `true`. Switch to JSX shorthand `open` for
clarity and to silence the static check.

Behaviorally identical; the modal still opens whenever the parent
renders this component (which only happens with non-null info).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 16:36:03 -07:00
a5e099d644
Merge branch 'staging' into feat/external-runtime-first-class 2026-04-26 16:34:17 -07:00
fdf8b65c59
Merge pull request #2126 from Molecule-AI/fix/director-bypass-and-agent-comms
fix(delegation): runtime handles 202+queued; canvas surfaces delegation rows
2026-04-26 23:08:53 +00:00
Hongming Wang
5071454074 fix(delegation): lazy-refresh QUEUED state from platform; live DELEGATION_* events
Critical follow-up to PR #2126's review. Two real bugs:

1. **Runtime QUEUED never resolved.** Platform's drain stitch updates
   the platform's delegate_result row when a queued delegation finally
   completes, but never pushes back to the runtime. The LLM polling
   check_delegation_status saw status="queued" forever — combined with
   the new docstring guidance ("queued → wait, peer will reply"), the
   model would wait indefinitely on a state that never resolves.
   Strictly worse than pre-PR behavior where it would have at least
   bypassed.

2. **Live updates dead code.** delegation.go writes activity rows by
   direct INSERT INTO activity_logs, bypassing the LogActivity helper
   that fires ACTIVITY_LOGGED. Adding "delegation" to the canvas's
   ACTIVITY_LOGGED filter (PR #2126 first cut) was inert — initial
   GET worked, live updates did not.

Fix:

(1) Runtime side, workspace/builtin_tools/delegation.py:
  - New `_refresh_queued_from_platform(task_id)` async helper that
    pulls /workspaces/<self>/delegations and finds the platform-side
    delegate_result row for our task_id.
  - check_delegation_status calls _refresh when local status is
    QUEUED, so the LLM's poll itself drives state convergence.
  - Best-effort: GET failure leaves local state untouched, next
    poll retries.
  - Docstring updated to reflect the actual behavior ("polls
    transparently — keep polling and you'll see the flip").
  - 4 new tests cover: QUEUED → completed via refresh; QUEUED →
    failed via refresh; refresh keeps QUEUED when platform hasn't
    resolved; refresh swallows network errors safely.

(2) Canvas side, AgentCommsPanel.tsx WS push handler:
  - Listens for DELEGATION_SENT / DELEGATION_STATUS / DELEGATION_COMPLETE
    / DELEGATION_FAILED in addition to ACTIVITY_LOGGED.
  - Each event's payload synthesized into an ActivityEntry shape
    so toCommMessage's existing delegation branch maps it. Status
    derived: STATUS uses payload.status, COMPLETE → "completed",
    FAILED → "failed", SENT → "pending".
  - The ACTIVITY_LOGGED branch keeps the "delegation" type accepted
    as a no-op-today / future-proof path: if delegation handlers
    are ever refactored to call LogActivity, this lights up
    automatically without another canvas change.

Doesn't change: the docstring guidance ("queued → wait, don't bypass")
is now actually load-bearing because the refresh path will deliver
the eventual outcome. Without the refresh, the guidance was a trap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 16:05:04 -07:00
Hongming Wang
ccb961a17b
Merge pull request #2096 from Molecule-AI/refactor/remove-canvas-hermes-runtime-profile-2054
refactor(canvas): remove RUNTIME_PROFILES.hermes — value flows server-side (#2054 phase 3)
2026-04-26 22:05:42 +00:00
Hongming Wang
057876cb0c fix(delegation): runtime handles 202+queued; canvas surfaces delegation rows
Two bugs that compounded into the "Director does the work itself" UX:

1. workspace/builtin_tools/delegation.py: _execute_delegation only
   handled HTTP 200 in the response branch. When the peer's a2a-proxy
   returned HTTP 202 + {queued: true} (single-SDK-session bottleneck
   on the peer), the loop fell through. Two iterations later the
   `if "error" in result` check tried to access an unbound `result`,
   the goroutine ended quietly, and the delegation stayed at FAILED
   with error="None". The LLM checking status saw "failed" + the
   platform's "Delegation queued — target at capacity" log line in
   chat context, concluded the peer was permanently unavailable, and
   bypassed delegation to do the work itself.

   Fix: explicit 202+queued branch. Adds DelegationStatus.QUEUED,
   marks the local delegation as QUEUED, mirrors to the platform,
   and returns cleanly without retrying. The retry loop is for
   transient transport errors — queueing is a real ack, not a failure
   to retry against (retrying would just re-queue the same task).

   check_delegation_status docstring extended with explicit per-status
   guidance: pending/in_progress → wait, queued → wait (peer busy on
   prior task, reply WILL arrive), completed → use result, failed →
   real error in error field; only fall back on failed, never queued.

2. canvas/src/components/tabs/chat/AgentCommsPanel.tsx: filter dropped
   every delegation row because it whitelisted only a2a_send /
   a2a_receive. activity_type='delegation' rows (written by the
   platform's /delegate handler with method='delegate' or
   'delegate_result') never reached toCommMessage. User saw "No
   agent-to-agent communications yet" while 6+ delegations existed
   in the DB.

   Fix: include "delegation" in the both the initial filter and the
   WS push filter, plus a delegation branch in toCommMessage that
   maps the row as outbound (always — platform proxies on our behalf)
   and uses summary as the primary text source.

Tests:
  - 3 new Python tests cover the 202+queued path: status becomes
    QUEUED not FAILED; no retry on queued (counted by URL match
    against the A2A target since the mock is shared across all
    AsyncClient calls); bare 202 without {queued:true} still
    falls through to the existing retry-then-FAILED path.
  - 3 new TS tests cover the delegation mapper: 'delegate' row
    maps as outbound to target with summary text; queued
    'delegate_result' preserves status='queued' (load-bearing for
    the LLM's wait-vs-bypass decision); missing target_id returns
    null instead of rendering a ghost.

Does NOT solve: the underlying single-SDK-session bottleneck that
causes peers to queue in the first place. Tracked as task #102
(parallel SDK sessions per workspace) — real architectural work.
This PR makes the runtime handle the queueing correctly so the LLM
doesn't bail out, and makes the delegations visible in Agent Comms
so operators can see what's happening.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 15:01:50 -07:00
Hongming Wang
3248941ed5
Merge branch 'staging' into feat/canvas-test-coverage-2071 2026-04-26 14:22:26 -07:00