fix(canvas-chat): treat Cloudflare 524/522/504 as 'still processing', not unreachable (core#2697) #2750
Reference in New Issue
Block a user
Delete Branch "fix/chat-524-not-unreachable"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Real root cause (DevTools console, JRS)
The canvas→agent
/a2aPOST is held open for the whole turn; a turn longer than Cloudflare's ~100s edge limit returns 524 from CF — NOT a dead agent.useChatSend's catch only swallowed the clientTimeoutError, so a 524 hit the generic branch → the false "agent may be unreachable" banner. (Raising server-side timeouts — #2727/#2749 — can't fix this; CF caps at 100s before the server timeout.)Fix
api.tsattaches.statusto the thrown error.useChatSendtreats524/522/504(CF gateway timeouts) like the client timeout: keep the thinking state, no banner, reply arrives via theAGENT_MESSAGEWS event. Test added.Important caveat
Live reply delivery depends on the WebSocket, which is also failing on JRS (
wss://…/wserrors in the same console) — that's a separate issue I'm investigating now. This PR stops the false banner; the WS fix restores live reply delivery. The durable fix for both is async canvas dispatch (return <100s, deliver via WS) — filed under #2723.🤖 Generated with Claude Code
/sop-ack
5-axis review on head
7b8ad89998.Requesting changes: the fix correctly attaches HTTP status in api.ts and the 524 test pins the false-banner case, but useChatSend currently swallows 522 as if the message is still processing. Cloudflare 522 means the edge timed out connecting to the origin, not that the origin accepted the long-running request. In that case the A2A POST may not have reached the server, so returning without releasing guards or surfacing an error can leave the chat in a permanent thinking state with no WebSocket reply coming.
Please narrow the still-processing treatment to statuses with accepted/processing semantics, or add a delivery-proofed rationale/test for 522 before swallowing it. The observed production failure was 524; that path looks appropriate. 504 is also ambiguous and should be justified or tested against the actual platform/proxy behavior if kept.
CR2 5-axis review: 524 ("A Timeout Occurred") means the origin ACCEPTED the request and is still processing (held long turn) — safe to treat as "still working". But 522 ("Connection Timed Out") means CF couldn't even CONNECT to the origin = genuinely unreachable, and 504 is likewise not "accepted+slow". Swallowing those would hide a real failure. Narrow the suppression to 524 only; add a test asserting 522 surfaces the unreachable banner. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>/sop-ack
APPROVED on head
6679cfb25f.5-axis re-review: the RC is resolved.
api.tsstill attaches the HTTP status without parsing strings;useChatSendnow treats only Cloudflare 524 as the accepted-but-still-processing long-turn case and leaves 522/504 on the normal unreachable/error path. The new tests cover both sides: 524 keeps the spinner/no banner, and 522 surfaces the unreachable banner. No security or performance concern in this frontend-only error-classification change.Note: the PR title/body and one generic
api.tscomment still mention 522/504 from the earlier version; behavior is correct, but cleaning that wording before merge would reduce future confusion. Not blocking./sop-ack comprehensive-testing Canvas/chat tests include 524 still-processing and 522 unreachable regression coverage; current code checks are green/passing except ceremony contexts being refreshed.
/sop-ack local-postgres-e2e N/A: frontend canvas error-classification change only; no Postgres or backend DB surface.
/sop-ack staging-smoke N/A/pre-merge: no deploy-side backend behavior; validates client handling of Cloudflare response status.
/sop-ack root-cause Root cause is Cloudflare 524 on a long held /a2a turn being treated as agent unreachable instead of accepted-but-still-processing.
/sop-ack five-axis-review CR2 completed 5-axis re-review and APPROVED #11433 on head
6679cfb25f./sop-ack no-backwards-compat No API/contract compatibility shim; api.post now exposes status on thrown errors and chat behavior narrows only 524.
/sop-ack memory-consulted Applied prior #2750 RC distinction: 524 accepted+slow; 522 connection-to-origin timeout stays unreachable.