fix(workspace-server): persist canvas user message at ingest (internal#470) #1347

Open
core-be wants to merge 4 commits from fix/canvas-user-message-persist-at-ingest into main

4 Commits

Author SHA1 Message Date
608fc28d96 ci: no-op re-trigger for qa-review re-evaluation [dev-lead]
Some checks failed
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 3s
CI / Detect changes (pull_request) Successful in 4s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 11s
CI / Platform (Go) (pull_request) Successful in 4m20s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 4s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 3s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 56s
E2E API Smoke Test / detect-changes (pull_request) Successful in 4s
E2E Chat / detect-changes (pull_request) Successful in 4s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 4s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Has been skipped
Handlers Postgres Integration / detect-changes (pull_request) Successful in 3s
Harness Replays / detect-changes (pull_request) Successful in 2s
gate-check-v3 / gate-check (pull_request) Successful in 3s
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Successful in 21s
qa-review / approved (pull_request) Failing after 3s
security-review / approved (pull_request) Failing after 3s
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request) Successful in 3s
sop-tier-check / tier-check (pull_request) Successful in 4s
CI / Canvas (Next.js) (pull_request) Successful in 6m18s
CI / Python Lint & Test (pull_request) Successful in 6m15s
CI / all-required (pull_request) Successful in 5m36s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 3s
Harness Replays / Harness Replays (pull_request) Successful in 2s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 2s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 44s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2m17s
E2E Chat / E2E Chat (pull_request) Failing after 5m6s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
2026-05-17 03:35:21 +00:00
infra-lead-agent
aa6a87e633 ci: re-fire pipeline after stale dispatch from disk-full SEV-1 2026-05-17 03:35:21 +00:00
eb62d2ff89 test(workspace-server): integration e2e — canvas client disconnect mid-flight preserves user message
Drives the real ProxyA2A HTTP handler through the literal bug scenario:
canvas message/send, mock agent hangs, client request context cancelled
(user exits chat) before any reply. Asserts the ingest INSERT fires
synchronously before dispatch (on context.WithoutCancel) so the user
message is durable even though no logA2ASuccess/finalize ever runs —
exactly the pre-fix loss window, now closed.

Refs: molecule-ai/internal#470

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 03:35:21 +00:00
0ee093ca9a fix(workspace-server): persist canvas user message at ingest, before agent round-trip
Canvas chat lost the user's own message when they exited the chat before
the agent replied, for push-mode (HTTP-dispatched) workspaces.

Root cause: chat-history is reconstructed solely from activity_logs rows
(activity_type='a2a_receive', source_id IS NULL). For push-mode that row
was written ONLY in logA2ASuccess/logA2AFailure, AFTER the full agent
A2A round-trip. The user message was never persisted at ingest. If the
user exited the chat (fetch abort on unmount / tab close / dropped conn)
before the agent finished, no row was ever written and the message was
permanently lost on reopen. poll-mode was unaffected (logA2AReceiveQueued
already persists at ingest). This is the inbound mirror of the reno-stars
2026-05-05 outbound data-loss incident (RFC #2945).

Fix: persistUserMessageAtIngest() does a synchronous INSERT of the user
message (status='pending', response_body NULL) BEFORE dispatchA2A, on a
context.WithoutCancel context so a client disconnect cannot abort the
write. logA2ASuccess finalizes that row via UPDATE (no duplicate user
bubble — preserves the one-row-(user,agent) read contract). logA2AFailure
with an ingest row is a no-op: the pending row already durably holds the
message, and busy->enqueue requests stay pending to be answered by the
queue drain. Best-effort: on persist failure, falls back to the legacy
post-round-trip INSERT (never worse than pre-fix, never blocks the send).
The read path already renders a user bubble from a row with empty
response_body, so the message shows on reopen even before the agent
answers.

8 new TDD regression tests (a2a_ingest_persist_test.go). Full
internal/handlers + internal/messagestore suites green.

Refs: molecule-ai/internal#470

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 03:35:21 +00:00