fix(canvas-chat): clear stale 'Failed to send' banner when the agent replies (core#2697) #2736

Merged
devops-engineer merged 1 commits from fix/chat-clear-stale-error-on-reply into main 2026-06-13 09:07:42 +00:00
Member

Bug (CTO screenshot, JRS)

The red "Failed to send — agent may be unreachable" banner stayed up while the agent was visibly working (thinking indicator at ●●● 37s, live tool feed streaming). Contradictory UI.

Root cause

The agent's turn hit a real context-overflow 400 (token limit 262144, requested 262158) which set the banner; the runtime auto-healed (reset session + retried). The retry's reply then arrived — but onAgentMessage / onSendComplete released the send guards and never cleared the error state, so the stale "unreachable" banner persisted over a reachable, replying agent.

Fix

A reply landing (push AGENT_MESSAGE) or a poll-mode send-complete proves reachability → both callbacks now clear both error sources (setError(null) + clearSendError()).

Tests

New ChatTab.errorClearOnReply.test.tsx captures ChatTab's socket callbacks and asserts onAgentMessage + onSendComplete clear the error. Full ChatTab suite green (358 pass).

Note (separate, not a regression)

The underlying trigger — the SEO agent's session exceeding the model's 256K context — is handled by the runtime's existing auto-heal (session reset + retry), and the user-facing remedy is the New session button (#2700, shipped). This PR only fixes the misleading banner.

🤖 Generated with Claude Code

## Bug (CTO screenshot, JRS) The red **"Failed to send — agent may be unreachable"** banner stayed up **while the agent was visibly working** (thinking indicator at `●●● 37s`, live tool feed streaming). Contradictory UI. ### Root cause The agent's turn hit a real **context-overflow 400** (`token limit 262144, requested 262158`) which set the banner; the runtime **auto-healed** (reset session + retried). The retry's reply then arrived — but `onAgentMessage` / `onSendComplete` released the send guards and **never cleared the error state**, so the stale "unreachable" banner persisted over a reachable, replying agent. ### Fix A reply landing (push `AGENT_MESSAGE`) or a poll-mode send-complete proves reachability → both callbacks now clear both error sources (`setError(null)` + `clearSendError()`). ### Tests New `ChatTab.errorClearOnReply.test.tsx` captures ChatTab's socket callbacks and asserts `onAgentMessage` + `onSendComplete` clear the error. Full ChatTab suite green (358 pass). ### Note (separate, not a regression) The underlying trigger — the SEO agent's session exceeding the model's 256K context — is handled by the runtime's existing auto-heal (session reset + retry), and the user-facing remedy is the **New session** button (#2700, shipped). This PR only fixes the misleading banner. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
core-devops added 1 commit 2026-06-13 09:01:34 +00:00
fix(canvas-chat): clear stale "Failed to send" banner when the agent replies (core#2697)
CI / Python Lint & Test (pull_request) Successful in 6s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 5s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 12s
E2E Peer Visibility (literal MCP list_peers) / detect-changes (pull_request) Successful in 12s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 12s
Harness Replays / detect-changes (pull_request) Successful in 12s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 15s
sop-checklist / review-refire (pull_request_target) Has been skipped
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 11s
reserved-path-review / reserved-path-review (pull_request_target) Successful in 7s
CI / Detect changes (pull_request) Successful in 16s
E2E API Smoke Test / detect-changes (pull_request) Successful in 17s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (local) (pull_request) Has been skipped
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 16s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s
Harness Replays / Harness Replays (pull_request) Successful in 1s
CI / Platform (Go) (pull_request) Successful in 2s
gate-check-v3 / gate-check (pull_request_target) Failing after 17s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (pull_request) Successful in 5s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 22s
E2E Chat / detect-changes (pull_request) Successful in 22s
sop-checklist / all-items-acked (pull_request_target) Successful in 9s
E2E Chat / E2E Chat (pull_request) Successful in 3s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 4s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (pull_request) Successful in 39s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (pull_request) Successful in 26s
CI / Canvas (Next.js) (pull_request) Successful in 3m44s
CI / Canvas Deploy Status (pull_request) Successful in 1s
CI / all-required (pull_request) Successful in 3s
qa-review / approved (pull_request_target) Approved via pull_request_review trigger
qa-review / approved (pull_request_review) Successful in 9s
reserved-path-review / reserved-path-review (pull_request_review) Successful in 10s
security-review / approved (pull_request_target) Approved via pull_request_review trigger
security-review / approved (pull_request_review) Successful in 10s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
sop-checklist / na-declarations (pull_request) N/A: (none)
audit-force-merge / audit (pull_request_target) Successful in 9s
59792d9b93
Reported on JRS: a turn that hit a context-overflow 400 set the
"Failed to send — agent may be unreachable" banner; the runtime auto-healed
(reset session + retried); the retry's reply landed and tools streamed at
"●●● Ns" — yet the red banner stayed up, contradicting the visibly-working
agent. onAgentMessage / onSendComplete released the send guards but never
cleared the error state.

A reply landing (push AGENT_MESSAGE) or a poll-mode send-complete PROVES the
agent is reachable, so both callbacks now clear both error sources
(setError(null) + clearSendError()). Adds an integration test capturing
ChatTab's socket callbacks and asserting the clear.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
agent-reviewer-cr2 approved these changes 2026-06-13 09:07:13 +00:00
agent-reviewer-cr2 left a comment
Member

APPROVED: reviewed #2736 at head 59792d9b.

The fix is correctly tied to proof of agent reachability: both onAgentMessage and poll-mode onSendComplete clear the local error plus the useChatSend error, so a stale 'Failed to send / agent may be unreachable' banner is removed once the agent actually replies or completes. This does not change send failure detection itself, and it does not mask a still-failing send path before a reachability callback arrives.

The new ChatTab.errorClearOnReply test captures the real socket callbacks and covers both AGENT_MESSAGE and onSendComplete. Required CI is green, including Canvas (Next.js) and CI / all-required. /sop-ack

APPROVED: reviewed #2736 at head 59792d9b. The fix is correctly tied to proof of agent reachability: both onAgentMessage and poll-mode onSendComplete clear the local error plus the useChatSend error, so a stale 'Failed to send / agent may be unreachable' banner is removed once the agent actually replies or completes. This does not change send failure detection itself, and it does not mask a still-failing send path before a reachability callback arrives. The new ChatTab.errorClearOnReply test captures the real socket callbacks and covers both AGENT_MESSAGE and onSendComplete. Required CI is green, including Canvas (Next.js) and CI / all-required. /sop-ack
Member

/sop-ack

/sop-ack
devops-engineer merged commit 552c72d19d into main 2026-06-13 09:07:42 +00:00
Member

APPROVED (post-merge verification; PR was already merged when I fetched it). Head 59792d9b93a419dbca8698a0f16563c1dcf6827b.

5-axis review: the fix clears the stale banner only from the real reachability callbacks, onAgentMessage and onSendComplete. Those callbacks are reached when a reply lands over socket or poll-mode completes, so a pre-reply send failure remains visible until the agent actually proves reachability. Failure detection is not weakened: the send error still comes from useChatSend, the UI still records setError(...) on actual send failures, and this patch does not clear errors on input changes, retry start, mount, or arbitrary state transitions.

Coverage is targeted: ChatTab.errorClearOnReply.test.tsx mocks useChatSend, captures the real useChatSocket callbacks, and asserts clearError is called for both onAgentMessage and onSendComplete. Required CI history for this head shows Canvas Next.js and CI / all-required green; the PR is already merged.

/sop-ack

APPROVED (post-merge verification; PR was already merged when I fetched it). Head `59792d9b93a419dbca8698a0f16563c1dcf6827b`. 5-axis review: the fix clears the stale banner only from the real reachability callbacks, `onAgentMessage` and `onSendComplete`. Those callbacks are reached when a reply lands over socket or poll-mode completes, so a pre-reply send failure remains visible until the agent actually proves reachability. Failure detection is not weakened: the send error still comes from `useChatSend`, the UI still records `setError(...)` on actual send failures, and this patch does not clear errors on input changes, retry start, mount, or arbitrary state transitions. Coverage is targeted: `ChatTab.errorClearOnReply.test.tsx` mocks `useChatSend`, captures the real `useChatSocket` callbacks, and asserts `clearError` is called for both `onAgentMessage` and `onSendComplete`. Required CI history for this head shows Canvas Next.js and `CI / all-required` green; the PR is already merged. /sop-ack
Sign in to join this conversation.
No Reviewers
3 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#2736