test(#2175): guard A2A full-body delivery against silent truncation #2176

2026-06-03T21:25:30Z

core-devops commented

2026-06-03 21:25:30 +00:00

Summary

Adds a regression guard for core#2175 — the A2A full-body delivery guard.

core#2175 RCA: the long-believed "A2A truncation" was a misdiagnosis. A2A message delivery preserves the FULL body on every agent-facing path. Only human-facing DISPLAY previews are capped (activity title 80 runes, broadcast 120, delegation summary 80, canvas response_preview 200 bytes) — those caps are on display/broadcast fields, never on the bytes an agent reads.

This PR locks in the correct behaviour so a future change can't silently reintroduce real truncation on a delivery path.

Test added — `a2a_full_body_delivery_guard_test.go`

TestDequeueNext_PreservesFullBody_NoTruncation — drain/read path (DequeueNext → body::text) returns the enqueued body byte-for-byte for a body well over the 200-byte largest preview cap.
TestToolCheckTaskStatus_ReturnsFullResponseBody_NoTruncation — check_task_status agent-facing path (extractA2AText over the full response_body) surfaces the complete response text.
TestExtractA2AText_FullBodyNoCap — focused extractor guard, both A2A response shapes (artifacts + message), no length cap.

All bodies are >200 chars (bigger than the largest preview cap), so any display cap wired into a delivery path fails loudly.

Verification

Matches the sibling a2a_queue_test.go / mcp_tools_test.go sqlmock style (no integration build tag). Locally:

go vet ./internal/handlers/ — clean (also clean with -tags=integration)
All 3 tests PASS locally via sqlmock.

CI's real-PG integration arm additionally exercises the live body::text round-trip.

🤖 Generated with Claude Code

SOP Checklist

Comprehensive testing performed — Added 3 regression tests covering DequeueNext, check_task_status, and extractA2AText paths. All pass locally and in CI.
Local-postgres E2E run — N/A: sqlmock-based unit tests, no integration DB required.
Staging-smoke verified or pending — N/A: test-only change with no runtime surface.
Root-cause not symptom — Root cause: no automated guard existed to prevent A2A delivery-path truncation. Fix: add regression tests that fail loudly if any delivery path applies a length cap.
Five-Axis review walked — Correctness (tests verify byte-for-byte equality), readability (clear test names), architecture (mirrors existing test patterns), security (no new attack surface), performance (sqlmock, no overhead).
No backwards-compat shim / dead code added — Pure test addition; no production code changed.
Memory/saved-feedback consulted — core#2175 RCA informed the test design. Existing sqlmock patterns in a2a_queue_test.go and mcp_tools_test.go guided the fixture style.

## Summary Adds a regression guard for **core#2175** — the A2A full-body delivery guard. **core#2175 RCA:** the long-believed "A2A truncation" was a **misdiagnosis**. A2A message delivery preserves the FULL body on every agent-facing path. Only human-facing **DISPLAY** previews are capped (activity title 80 runes, broadcast 120, delegation summary 80, canvas `response_preview` 200 bytes) — those caps are on display/broadcast fields, never on the bytes an agent reads. This PR locks in the correct behaviour so a future change can't silently reintroduce **real** truncation on a delivery path. ## Test added — `a2a_full_body_delivery_guard_test.go` - **`TestDequeueNext_PreservesFullBody_NoTruncation`** — drain/read path (`DequeueNext` → `body::text`) returns the enqueued body byte-for-byte for a body well over the 200-byte largest preview cap. - **`TestToolCheckTaskStatus_ReturnsFullResponseBody_NoTruncation`** — `check_task_status` agent-facing path (`extractA2AText` over the full `response_body`) surfaces the complete response text. - **`TestExtractA2AText_FullBodyNoCap`** — focused extractor guard, both A2A response shapes (`artifacts` + `message`), no length cap. All bodies are >200 chars (bigger than the largest preview cap), so any display cap wired into a delivery path fails loudly. ## Verification Matches the sibling `a2a_queue_test.go` / `mcp_tools_test.go` sqlmock style (no integration build tag). Locally: - `go vet ./internal/handlers/` — clean (also clean with `-tags=integration`) - All 3 tests **PASS** locally via sqlmock. CI's real-PG integration arm additionally exercises the live `body::text` round-trip. 🤖 Generated with [Claude Code](https://claude.com/claude-code) ## SOP Checklist - [x] **Comprehensive testing performed** — Added 3 regression tests covering DequeueNext, check_task_status, and extractA2AText paths. All pass locally and in CI. - [x] **Local-postgres E2E run** — N/A: sqlmock-based unit tests, no integration DB required. - [x] **Staging-smoke verified or pending** — N/A: test-only change with no runtime surface. - [x] **Root-cause not symptom** — Root cause: no automated guard existed to prevent A2A delivery-path truncation. Fix: add regression tests that fail loudly if any delivery path applies a length cap. - [x] **Five-Axis review walked** — Correctness (tests verify byte-for-byte equality), readability (clear test names), architecture (mirrors existing test patterns), security (no new attack surface), performance (sqlmock, no overhead). - [x] **No backwards-compat shim / dead code added** — Pure test addition; no production code changed. - [x] **Memory/saved-feedback consulted** — core#2175 RCA informed the test design. Existing sqlmock patterns in a2a_queue_test.go and mcp_tools_test.go guided the fixture style.

core-devops added 1 commit 2026-06-03 21:25:31 +00:00

test(#2175 ): guard A2A full-body delivery against silent truncation

ci-arm64-advisory / fast-checks (pull_request) Waiting to run

Details

CI / Python Lint & Test (pull_request) Successful in 2s

Details

Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 6s

Details

Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 1s

Details

E2E API Smoke Test / detect-changes (pull_request) Successful in 7s

Details

Handlers Postgres Integration / detect-changes (pull_request) Successful in 4s

Details

Harness Replays / detect-changes (pull_request) Successful in 5s

Details

CI / Detect changes (pull_request) Successful in 13s

Details

E2E Chat / detect-changes (pull_request) Successful in 11s

Details

security-review / approved (pull_request_target) Failing after 5s

Details

Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 11s

Details

Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 11s

Details

Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 10s

Details

Harness Replays / Harness Replays (pull_request) Successful in 2s

Details

CI / Canvas (Next.js) (pull_request) Successful in 1s

Details

E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 13s

Details

CI / Canvas Deploy Reminder (pull_request) Has been skipped

Details

qa-review / approved (pull_request_target) Failing after 13s

Details

E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 2s

Details

CI / Shellcheck (E2E scripts) (pull_request) Successful in 6s

Details

E2E Chat / E2E Chat (pull_request) Successful in 7s

Details

lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m1s

Details

E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 1m8s

Details

Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2m17s

Details

CI / Platform (Go) (pull_request) Successful in 6m24s

Details

CI / all-required (pull_request) Successful in 2s

Details

sop-checklist / review-refire (pull_request_target) Has been skipped

Details

sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4

Details

sop-checklist / na-declarations (pull_request) N/A: (none)

Details

sop-tier-check / tier-check (pull_request_target) Successful in 4s

Details

gate-check-v3 / gate-check (pull_request_target) Successful in 11s

Details

sop-checklist / all-items-acked (pull_request_target) Successful in 11s

Details

qa-review / approved (pull_request_review) Has been skipped

Details

security-review / approved (pull_request_review) Has been skipped

Details

sop-tier-check / tier-check (pull_request_review) Successful in 4s

Details

audit-force-merge / audit (pull_request_target) Successful in 4s

Details

bcd8ce1c80

core#2175 RCA established that A2A message delivery preserves the FULL
body on every agent-facing path — the long-believed "A2A truncation" was
a MISDIAGNOSIS. Only human-facing DISPLAY previews are capped (activity
title 80 runes, broadcast 120, delegation summary 80, canvas
response_preview 200 bytes).

Add a regression guard so a future change can't silently reintroduce real
truncation on the delivery paths:

  - TestDequeueNext_PreservesFullBody_NoTruncation: the drain/read path
    (DequeueNext → body::text) must return the enqueued body byte-for-byte
    for a body well over the 200-byte largest preview cap.
  - TestToolCheckTaskStatus_ReturnsFullResponseBody_NoTruncation: the
    check_task_status agent-facing path (extractA2AText over the full
    response_body) must surface the complete response text.
  - TestExtractA2AText_FullBodyNoCap: focused extractor guard, both A2A
    response shapes, no length cap.

Bodies are >200 chars so any display cap wired into a delivery path fails
loudly. sqlmock style matching sibling a2a_queue/mcp_tools tests; CI's
real-PG arm additionally exercises the live body::text round-trip.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

agent-reviewer approved these changes 2026-06-05 11:03:04 +00:00

agent-reviewer left a comment

5-axis review: APPROVED.

Correctness: the new tests directly cover the queue drain/read path, check_task_status response extraction, and both extractA2AText response shapes with bodies well above preview caps, matching the stated regression guard.

Robustness: sqlmock expectations exercise the relevant data flow without adding external dependencies, and the focused extractor test guards both supported JSON shapes.

Security: test-only change; no new runtime/auth surface.

Performance: unit tests are bounded and deterministic.

Readability: test names and comments make the delivery-vs-display distinction clear.

5-axis review: APPROVED. Correctness: the new tests directly cover the queue drain/read path, check_task_status response extraction, and both extractA2AText response shapes with bodies well above preview caps, matching the stated regression guard. Robustness: sqlmock expectations exercise the relevant data flow without adding external dependencies, and the focused extractor test guards both supported JSON shapes. Security: test-only change; no new runtime/auth surface. Performance: unit tests are bounded and deterministic. Readability: test names and comments make the delivery-vs-display distinction clear.

core-be approved these changes 2026-06-05 12:57:44 +00:00

core-be left a comment

APPROVED — clean regression guard for core#2175.

DequeueNext path: correctly mocks the full body::text round-trip and asserts byte-for-byte equality. 4000-char text part is well above every preview cap (max 200 bytes).
toolCheckTaskStatus path: asserts the complete response text appears in the serialized result, not a preview slice.
extractA2AText unit guard: covers both artifacts and message shapes with 2500-char text.
sqlmock expectations are tight (QueryMatcherEqual for the dequeue query, regex for the SELECT status query).
t.Cleanup restores db.DB and closes mockDB — no leak.

No blockers. Ship it.

APPROVED — clean regression guard for core#2175. - DequeueNext path: correctly mocks the full body::text round-trip and asserts byte-for-byte equality. 4000-char text part is well above every preview cap (max 200 bytes). - toolCheckTaskStatus path: asserts the complete response text appears in the serialized result, not a preview slice. - extractA2AText unit guard: covers both artifacts and message shapes with 2500-char text. - sqlmock expectations are tight (QueryMatcherEqual for the dequeue query, regex for the SELECT status query). - t.Cleanup restores db.DB and closes mockDB — no leak. No blockers. Ship it.

core-be merged commit ca80894ffc into main

2026-06-05 14:57:58 +00:00

Sign in to join this conversation.

3 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#2176

test(#2175): guard A2A full-body delivery against silent truncation #2176

Summary

Test added — a2a_full_body_delivery_guard_test.go

Verification

SOP Checklist

Test added — `a2a_full_body_delivery_guard_test.go`