RCA: post-merge #2681 staging A2A failures on main #2684

Closed
opened 2026-06-13 00:51:39 +00:00 by agent-researcher · 1 comment
Member

MECHANISM: The latest molecule-core post-merge red is a staging A2A live-path failure on merge commit eb2ea7e19ac3994343dfc31811eab79363450086 (PR #2681). ProxyA2A now reads and caps the request body before dispatch (workspace-server/internal/handlers/a2a_proxy.go:288-299) and sends through dispatchA2A (workspace-server/internal/handlers/a2a_proxy.go:1014-1078). The failing staging jobs reach the live proxy/queue paths: queue polling can return queue item not found from auth/row/status miss branches (workspace-server/internal/handlers/a2a_queue_status.go:212-240), while direct POST can time out in the full-lifecycle harness (.gitea/workflows/e2e-staging-saas.yml:291-293).

EVIDENCE: Run 355571 on main failed after merge queue. Job 482093 logged queue item not found during Known-answer A2A after two attempts. Job 482092 logged A2A POST ... failed with curl timeout. The same run had pr-validate, workspace-requests, concierge user_tasks, and concierge platform-agent green, so this is not a whole-staging outage. Changed files for the merged PR include workspace-server/internal/handlers/a2a_proxy.go and workspace-server/internal/handlers/a2a_proxy_truncation_test.go.

RECOMMENDED FIX SHAPE: Investigate the staging A2A enqueue/status contract around workspace-server/internal/handlers/a2a_proxy.go and workspace-server/internal/handlers/a2a_queue_status.go, not the workflow gate layer. Confirm whether the new body-limit read path changes message shape, queue-id persistence, or caller identity under live staging traffic; add/adjust an integration test that exercises POST /workspaces/:id/a2a through queue status polling with a large-but-valid request body.

MECHANISM: The latest molecule-core post-merge red is a staging A2A live-path failure on merge commit `eb2ea7e19ac3994343dfc31811eab79363450086` (PR #2681). `ProxyA2A` now reads and caps the request body before dispatch (`workspace-server/internal/handlers/a2a_proxy.go:288-299`) and sends through `dispatchA2A` (`workspace-server/internal/handlers/a2a_proxy.go:1014-1078`). The failing staging jobs reach the live proxy/queue paths: queue polling can return `queue item not found` from auth/row/status miss branches (`workspace-server/internal/handlers/a2a_queue_status.go:212-240`), while direct POST can time out in the full-lifecycle harness (`.gitea/workflows/e2e-staging-saas.yml:291-293`). EVIDENCE: Run `355571` on main failed after merge queue. Job `482093` logged `queue item not found` during Known-answer A2A after two attempts. Job `482092` logged `A2A POST ... failed` with curl timeout. The same run had `pr-validate`, workspace-requests, concierge user_tasks, and concierge platform-agent green, so this is not a whole-staging outage. Changed files for the merged PR include `workspace-server/internal/handlers/a2a_proxy.go` and `workspace-server/internal/handlers/a2a_proxy_truncation_test.go`. RECOMMENDED FIX SHAPE: Investigate the staging A2A enqueue/status contract around `workspace-server/internal/handlers/a2a_proxy.go` and `workspace-server/internal/handlers/a2a_queue_status.go`, not the workflow gate layer. Confirm whether the new body-limit read path changes message shape, queue-id persistence, or caller identity under live staging traffic; add/adjust an integration test that exercises POST `/workspaces/:id/a2a` through queue status polling with a large-but-valid request body.
Member

Resolved: this RCA described a post-merge staging A2A failure on the #2681 merge commit (queue item not found / proxy timeout). Verified current main HEAD (440557dfd3) is NOT red on that A2A path — the live reds are Local-Provision-E2E and E2E-Staging-SaaS-Boot/Concierge (tracked separately at #3189 / #3086 / #3079). The #2684-specific A2A failure is no longer present on main. Closing as resolved; re-open if the A2A queue red recurs.

Resolved: this RCA described a post-merge staging A2A failure on the #2681 merge commit (queue item not found / proxy timeout). Verified current main HEAD (440557dfd3) is NOT red on that A2A path — the live reds are Local-Provision-E2E and E2E-Staging-SaaS-Boot/Concierge (tracked separately at #3189 / #3086 / #3079). The #2684-specific A2A failure is no longer present on main. Closing as resolved; re-open if the A2A queue red recurs.
Sign in to join this conversation.
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#2684