fix(canvas): external/MCP workspace progress UX — surface poll-mode queued state (task #227) #1618
Reference in New Issue
Block a user
Delete Branch "task227/external-mcp-progress-ux"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
User-reported (2 days old, sorry): canvas had NO progress indicator when the user sent a message to an external/MCP workspace (poll-mode delivery — operator's laptop running
molecule-mcp-claude-channel, hermes/codex MCP bridge, Cursor MCP client). Native push-path runtimes (claude-code/codex/hermes/openclaw) showed thinking dots + activity stream the whole time; external workspaces had nothing.Empirical divergence point
ws-server's
proxyA2ARequestpoll-mode short-circuit (workspace-server/internal/handlers/a2a_proxy.go:402-432) returns the synthetic{status:"queued", delivery_mode:"poll", method:"message/send"}HTTP 200 immediately. The path emits anACTIVITY_LOGGEDa2a_receiveevent vialogA2AReceiveQueued— but withstatus="ok"and NOduration_ms(no reply yet).Canvas-side filters silently dropped this:
useChatSocket.tsguardif (status === "ok" && durationMs)skipped the row → no activity line.useChatSend.ts.then(resp)ranextractReplyText({status:"queued"})→ "" → no agent bubble →releaseSendGuards()collapsed the spinner.Result: user typed → user bubble → spinner for ~50ms → dead silence until eventual
AGENT_MESSAGEreply (could be 1s or 60s away).What's NOT this (scope correction)
This is not RFC #497 (
molecule-ai/internal#497). That RFC unifies inbound envelope handling — identity attribution for the receiving agent (poll-path peers losingpeer_name/peer_role/agent_card_url). Task #227 is purely the canvas-side outbound progress UX. RFC#497 is still useful but orthogonal.Fix (frontend-only — backend already emits the right signals)
delivery_modethroughWorkspaceData → WorkspaceNodeData.deliveryMode(already in the GET /workspaces JSON; canvas just wasn't reading it).useChatSocket: detecta2a_receive status=okwith NOduration_ms→ emit⧗ <peer> queued — agent will pick up on next pollactivity line. Do not callonSendComplete(spinner must persist).useChatSend: on{status:"queued"}response, return early beforereleaseSendGuards(). The eventualAGENT_MESSAGEreleases the guards via the existinguseChatSocketpath. No synthetic empty agent bubble.AgentCommsPanel.WaitingBubbles: also light on delegationstatus="dispatched"(poll-mode marker) with distinct copy.Net diff: 363 +/- across 8 files. No backend changes.
Test plan
useChatSocket.test.tsx— 2 new tests: queued-line emission +onSendCompleteNOT called on queued path. 6/6 pass.useChatSend.pollMode.test.tsx— new file, 4 tests: no empty bubble on queued-200,sendingstays true post-queued, microtask-flush late-release guard, push-mode non-regression. 4/4 pass.AgentCommsPanel.render.test.tsx+AgentCommsPanel.test.ts— 55 existing tests pass (waiting-bubble extension is additive).canvas-topology.test.ts— 3 existing tests pass (deliveryMode plumbing is additive).tsc --noEmitclean on all touched files (pre-existing TS errors in unrelated ContextMenu/EmptyState/OrgCancelButton/SidePanel/WorkspaceNode tests are NOT in this PR's scope).Closes task #227 (product/UX tracker, user-reported 2026-05-18).
🤖 Generated with Claude Code
Canvas had no progress indicator when the user sent a message to a poll-mode (external/MCP) workspace — e.g. an operator's laptop running `molecule-mcp-claude-channel`, a hermes/codex MCP bridge, or a Cursor MCP client. The user saw their bubble, a ~50ms spinner, then dead silence until the eventual reply landed via the AGENT_MESSAGE WS event (which could be seconds or minutes later, depending on poll cadence). The native push-path runtimes (claude-code/codex/hermes/openclaw) had a "thinking…" indicator + per-tool activity stream the whole time — external workspaces had nothing. Empirical divergence point: ws-server's `proxyA2ARequest` poll-mode short-circuit (a2a_proxy.go:402-432) returns `{status:"queued", delivery_mode:"poll", method:"message/send"}` synchronously when the target has no URL. The `logA2AReceiveQueued` path DOES fire an `ACTIVITY_LOGGED` `a2a_receive` event — but with `status="ok"` and NO `duration_ms`. Canvas's `useChatSocket.ts` guard `if (status === "ok" && durationMs)` silently dropped the row. Meanwhile `useChatSend.ts` `.then(resp)` ran `extractReplyText({status:"queued"})` -> "" -> no agent bubble created -> `releaseSendGuards()` collapsed the spinner. This is NOT the RFC#497 envelope-handling problem (that's inbound-side identity-attribution for the receiving agent). #227 is purely the canvas-side outbound progress UX. RFC#497 work remains useful but separate. Fix (frontend-only — the backend already emits the right signals): 1. Plumb `delivery_mode` from the GET /workspaces response into `WorkspaceNodeData.deliveryMode` so the UI can introspect. 2. `useChatSocket`: detect `a2a_receive status=ok` with NO `duration_ms` as the "queued for poll" signal — emit a "⧗ <peer> queued — agent will pick up on next poll" activity line so the in-flight spinner shows useful sub-text instead of just "Processing…". 3. `useChatSend`: on a `{status:"queued"}` response, return early BEFORE `releaseSendGuards()` — the spinner persists until the real `AGENT_MESSAGE` lands (existing useChatSocket onAgentMessage/onSendComplete path handles the eventual release). No synthetic empty agent bubble. 4. `AgentCommsPanel.WaitingBubbles`: extend the per-peer "typing" indicator to also light on delegation `status="dispatched"` — that's the platform's marker for a poll-mode delegation written to the peer's inbox but not yet picked up. Distinct copy ("Queued — <peer> will pick up on next poll") so the user can tell push-pending from poll-queued. Tests: - useChatSocket: 2 new tests pin the queued-line emission AND that `onSendComplete` is NOT called on the queued path (spinner persists). - useChatSend: new file (4 tests) covering: (a) no empty agent bubble on queued-200, (b) `sending` stays true post-queued, (c) defense against accidental late release via microtask flush, (d) push-mode non-regression — real reply parts still flip sending=false + create the agent bubble. Verified: all 67 tests pass in the touched test files (useChatSocket.test.tsx 6, useChatSend.pollMode.test.tsx 4, AgentCommsPanel.render.test.tsx + AgentCommsPanel.test.ts 55 from a prior count, canvas-topology.test.ts 3). Typecheck clean on all touched files (`tsc --noEmit`). Closes task #227 (UX tracker, user-reported 2 days ago). Related: internal#497 (RFC, envelope handling — separate concern, inbound-side identity attribution for the receiving agent).core-qa five-axis — PR#1618 @
cb6588628(SHA-pinned)1. Correctness (queued-event handler — no race/flicker)
Verified
useChatSocket.ts:65-83queued branch fires ONLY whenstatus==="ok" && !durationMs— disjoint from the (status==="ok" && durationMs) success branch (no double-emit). The handler emits the activity line viaonActivityLogand deliberately omitsonSendComplete, so the chat spinner state onuseChatSend.sendingsurvives until either a follow-up AGENT_MESSAGE (via global storeagentMessagesconsumer at lines 24-36 firingonSendComplete) or a status="error" branch flips it. The two states are mutually exclusive and there is no observable order in which the queued row can clear the spinner. No race.2. Test coverage (4 pollMode + 2 useChatSocket)
onAgentMessagenot called: lines 91 ✓Plus two useChatSocket tests (queued activity-line emission + no premature onSendComplete) at lines 128-188. All 6 assertions match the bug surface.
3. Edge cases — no AGENT_MESSAGE for 5min
FINDING — DEFER: there is no explicit poll-mode timeout. If the external workspace never polls (operator laptop offline) the spinner stays up indefinitely. The existing
timeoutMs: 120_000onapi.post(useChatSend.ts:212) is a 120s safety net but on the queued-200 path the 200 already returned, so the timeout never fires. This is functionally identical to push-mode behaviour when the remote agent hangs, and the explicit 120s mark plus the error branch surfacing actionable detail (internal#212) means the user is not silently stranded — they'll see ⚠ status from the activity row when the upstream eventually errors. Acceptable for v1 of #227. File follow-up if customer-reported.4. Backwards compat (push-mode byte-identical)
Push-path branch at useChatSocket.ts:60-64 (
status==="ok" && durationMs) is structurally untouched — the newelse ifis appended after. useChatSend.ts:241-243 queued early-return runs BEFOREextractReplyTextso push payloads (which have nostatusfield at top level) flow through unchanged. Pollmode.test.tsx case (d) asserts this.5. Test isolation
apiPostMock.mockReset()inbeforeEach(line 54) — clean per-test.vi.mock("@/lib/api")keeps the production module out of the loop. Renderhook + act + microtask flush — standard pattern.Verdict: APPROVED — fix matches RC, tests pin the new contract, push-mode untouched. Edge case 3 is logged as deferred follow-up, not blocking.
core-devops five-axis — PR#1618 @
cb6588628(SHA-pinned)1. Tiny-PR rule — surgical or refactor?
8 files +363/-2. Pure additive surgical: useChatSend.ts (+46), useChatSocket.ts (+19), AgentCommsPanel.tsx (+14/-2), three store files just widen interface types with JSDoc to surface a new optional field (canvas-topology +3, canvas.ts +22, socket.ts +10). Two new test files (+178+71). Zero refactors, zero deletions outside the WaitingBubbles filter widening which is itself a 2-line tuple addition. Within tiny-PR norm. The 363/2 line count is dominated by JSDoc + new tests, not logic. No finding.
2. delivery_mode-coupling follow-up — Required or defer?
FINDING — DEFER (not Required). The impl detects poll-mode from TWO independent signals:
resp.status==="queued"on the HTTP body (useChatSend.ts:241) and!duration_mson the broadcast ACTIVITY_LOGGED row (useChatSocket.ts:65). The body field is the canonical, ws-server-authored signal — that's NOT shape-coupling, it's an explicit contract documented in the JSDoc with the exact a2a_proxy.go:416-431 line range. The broadcast!duration_msIS shape-coupling and is the fragile half; a future ws-server that emits duration_ms=0 for queued rows would silently regress to the empty-bubble bug. RFC#497 (unify push/poll inbound envelope, task #236) is the proper home for adding an explicitdelivery_modefield on ACTIVITY_LOGGED. File as #227-followup, not blocking — the JSDoc already documents the coupling and the test pins the current behaviour.3. Observability — new log spam?
useChatSocket.ts new branch calls
onActivityLogonce per queued-200, identical cardinality to the existing pre-#227 success/error branches. Per-message, not per-tick. Activity stream is canvas-local, no backend logs. No spam. No finding.4. RFC#497 alignment — complementary or conflicting?
Complementary. RFC#497 (task #236) wants a shared envelope assembler for push/poll on the BACKEND side (the 4+ divergent re-impls noted in
reference_push_poll_envelope_no_shared_assembler_rfc497). This PR is FRONTEND-ONLY and consumes the existing two-signal contract. When #497 lands and addsdelivery_modeto ACTIVITY_LOGGED, the useChatSocket.ts!durationMsheuristic becomes redundant but remains correct as a defensive fallback. No future-rework lock-in. No finding.5. E2E gap (no playwright for chat-poll-mode) — Required or defer?
FINDING — DEFER. 6 unit/hook tests pin the contract at the boundary (apiPostMock + capturedHandler). The user-visible UX (WaitingBubbles
status==="dispatched"arm in AgentCommsPanel.tsx:660) is not exercised end-to-end. Perfeedback_obs_first_debugging_all_agentsthe right verify path post-merge is canvas DOM check on Hongming's tenant against a live external/MCP target (operator laptop running molecule-mcp-claude-channel). File as #227-followup E2E task — not blocking given user is the original reporter and will validate. The 4 pollMode tests catch the dominant regression class.Verdict: APPROVED. Tiny, additive, two-signal (body + broadcast) is acceptable for v1; delivery_mode and E2E coupling concerns are tracked but explicitly NOT blocking — the JSDoc anchors the contract and RFC#497 is the proper home for backend-side hardening.
/sop-tier-recheck
(Auto-relay — security-review approved status failure-state despite 2-distinct-persona APPROVED reviews 5137 + 5138.)
/sop-tier-recheck
/qa-recheck
/security-recheck