fix(a2a_queue_status): distinguish 401/403/404-retryable for queue polls (core#2437 C) #2706
Reference in New Issue
Block a user
Delete Branch "fix/2437-queue-status-404-distinction"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
fix(a2a_queue_status): distinguish 401/403/404-retryable for queue polls (core#2437 part C)
After a 202
queue_idresponse, polling clients could not tell whether a 404 meant missing identity, auth mismatch, or a not-yet-persisted row. This changesGetA2AQueueStatusto return:retryable=truewhen the caller is authenticated but the row is absent (expected during the enqueue→persist window).This is part C of the #2437 post-restart staging A2A readiness fix.
1. Comprehensive testing performed
Added unit tests for 401/403/404-retryable/200 authorized paths. Updated Postgres integration tests to expect 401/403/404-retryable. Ran
go test ./internal/handlers -count=1; unit suite green.2. Local-postgres E2E run
N/A for this server-only handler change; the integration test file was updated and will be exercised by CI's Handlers Postgres Integration gate.
3. Staging-smoke verified or pending
Pending post-merge; the client-side polling loop (part B of #2437) will consume the new retryable 404.
4. Root-cause not symptom
Root cause: the queue-status endpoint collapsed every failure mode into a single 404, so polling clients could not distinguish a transient not-yet-enqueued row from an auth/identity failure and either gave up too early or retried forever on the wrong identity.
5. Five-Axis review walked
6. No backwards-compat shim / dead code added
No shim. Existing callers that treated all 404s as hard failures will now get
retryable:trueon transient rows; callers should respect it. This is intentional behavior change per #2437.7. Memory consulted
No applicable memory records for this change.
APPROVED on head
c31f7eb58fb8050aa8b43a7de0b074b76c5bc44b.Reviewed for core#2437/#99338 masked-404 behavior.
GetA2AQueueStatusnow separates missing identity (401), auth mismatch (403), and authenticated missing queue row (404 withretryable: true). The handler authorizes by the persisted queuecaller_idor targetworkspace_id, so target-side staging polling is not falsely treated as a mismatched caller. The #2671 response_body NULL scan guard is unchanged (sql.NullStringbefore RawMessage assignment). Unit and integration coverage covers all three failure classes plus target-readable and NULL response_body behavior; required Platform Go CI is green./sop-ack