test(workspace-server#2800): pin detached WS delivery + fix budget-hybrid comments #2823

2026-06-14T04:35:38Z

agent-dev-b commented

2026-06-14 04:35:38 +00:00

What

Combined #2800 RC fix (per PM instruction 3fcea7f7). Three things in one push:

1. Detached WS-delivery test (the contract-critical gap)

NEW: TestProxyA2A_CanvasCapAndQueue_DetachedResultDelivery pins the contract-critical >budget result-delivery path end-to-end. The cap-and-queue returns {status:"queued"} (202 Accepted with task_id in the #2751 expansion) when the agent's turn outlives the budget, but the queued ack is ONLY USEFUL if the actual agent result subsequently lands on the A2A_RESPONSE WS broadcast.

This test forces the queued path (50ms budget, 500ms agent), captures the eventual broadcast, and asserts:

workspace_id == "ws-deliver" (correct routing)
message_id == "msg-deliver-001" (canvas can correlate to the original send)
response_body.result.reply == "detached-delivery-reply" (the actual agent reply, NOT empty / NOT placeholder — the contract-critical "did the agent's real work actually arrive?" assertion)
duration_ms > 50 (proves the queued path actually fired; the sync path would duration ~0 since the handler returned at the budget, not after the agent)

Returning {status:"queued"} is necessary but not sufficient — a regression that broadcast an empty body or wrong workspace_id with the right message_id would leave the canvas with no result to render. This test closes the gap.

Distinct from TestProxyA2A_CanvasCapAndQueue_EndToEndContract:

that test asserts message_id + response_body content
this test additionally asserts duration_ms > budget (proves the queued path actually fired, not the sync path) and verifies workspace_id routing

2. Budget-hybrid comment fix (the contradictory wording)

Fixed a2a_proxy.go cap-and-queue block + canvasA2ASyncBudget doc to describe the BUDGET-HYBRID behavior (sync under budget, queued+WS over budget) instead of the contradictory 'held for the whole turn' / 'cap-and-queue synchronous wait' wording. The code IS a budget-hybrid (NOT always-async, NOT always-sync), and the comments now match the actual implementation.

3. (CR2 #11543 default-ON behavioral test) — already exists; documented

The 'unset env, short turn, assert SYNC reply not queued' test is conceptually the same as the existing TestCanvasA2ASyncBudget_DefaultIs90s (unit test on canvasA2ASyncBudget() — the function whose value the if-block guards on). That unit test directly catches a regression of the default to 0 (legacy always-sync), which would re-expose canvas to the 524+WS-starvation class. The unit test IS the behavioral guard at the function level; an integration variant cannot distinguish default=0 from default=90s for a 2-3s turn (both return the sync reply — see prior turn's analysis: with default=0 the if-block is SKIPPED, the legacy sync path is used; with default=90s the if-block fires but a 2-3s agent completes within the budget so the sync reply is returned via <-done).

Local validation

go vet -tags=integration ./internal/handlers/ → clean
go build ./... → clean
go test -tags=integration -run 'TestProxyA2A_|TestCanvasA2ASyncBudget|TestCanvasA2ASyncDisabled|TestLogA2ASuccess' ./internal/handlers/ → all PASS
TestProxyA2A_CanvasCapAndQueue_DetachedResultDelivery (0.52s) PASS
22 TestProxyA2A_* + 3 budget/disable unit tests all green

Refs

#2800 RC (CR2 #11543 + Researcher re-review)
Dispatched via PM 3fcea7f7

Generated with Claude Code

## What Combined #2800 RC fix (per PM instruction 3fcea7f7). Three things in one push: ### 1. Detached WS-delivery test (the contract-critical gap) NEW: `TestProxyA2A_CanvasCapAndQueue_DetachedResultDelivery` pins the contract-critical >budget result-delivery path end-to-end. The cap-and-queue returns `{status:"queued"}` (202 Accepted with task_id in the #2751 expansion) when the agent's turn outlives the budget, but the queued ack is ONLY USEFUL if the actual agent result subsequently lands on the A2A_RESPONSE WS broadcast. This test forces the queued path (50ms budget, 500ms agent), captures the eventual broadcast, and asserts: 1. `workspace_id == "ws-deliver"` (correct routing) 2. `message_id == "msg-deliver-001"` (canvas can correlate to the original send) 3. `response_body.result.reply == "detached-delivery-reply"` (the actual agent reply, NOT empty / NOT placeholder — the contract-critical "did the agent's real work actually arrive?" assertion) 4. `duration_ms > 50` (proves the queued path actually fired; the sync path would duration ~0 since the handler returned at the budget, not after the agent) Returning `{status:"queued"}` is necessary but not sufficient — a regression that broadcast an empty body or wrong workspace_id with the right message_id would leave the canvas with no result to render. This test closes the gap. Distinct from `TestProxyA2A_CanvasCapAndQueue_EndToEndContract`: - that test asserts message_id + response_body content - this test additionally asserts `duration_ms > budget` (proves the queued path actually fired, not the sync path) and verifies workspace_id routing ### 2. Budget-hybrid comment fix (the contradictory wording) Fixed `a2a_proxy.go` cap-and-queue block + `canvasA2ASyncBudget` doc to describe the **BUDGET-HYBRID** behavior (sync under budget, queued+WS over budget) instead of the contradictory 'held for the whole turn' / 'cap-and-queue synchronous wait' wording. The code IS a budget-hybrid (NOT always-async, NOT always-sync), and the comments now match the actual implementation. ### 3. (CR2 #11543 default-ON behavioral test) — already exists; documented The 'unset env, short turn, assert SYNC reply not queued' test is conceptually the same as the existing `TestCanvasA2ASyncBudget_DefaultIs90s` (unit test on `canvasA2ASyncBudget()` — the function whose value the if-block guards on). That unit test directly catches a regression of the default to 0 (legacy always-sync), which would re-expose canvas to the 524+WS-starvation class. The unit test IS the behavioral guard at the function level; an integration variant cannot distinguish default=0 from default=90s for a 2-3s turn (both return the sync reply — see prior turn's analysis: with default=0 the if-block is SKIPPED, the legacy sync path is used; with default=90s the if-block fires but a 2-3s agent completes within the budget so the sync reply is returned via `<-done`). ## Local validation - `go vet -tags=integration ./internal/handlers/` → clean - `go build ./...` → clean - `go test -tags=integration -run 'TestProxyA2A_|TestCanvasA2ASyncBudget|TestCanvasA2ASyncDisabled|TestLogA2ASuccess' ./internal/handlers/` → all PASS - `TestProxyA2A_CanvasCapAndQueue_DetachedResultDelivery` (0.52s) PASS - 22 TestProxyA2A_* + 3 budget/disable unit tests all green ## Refs - #2800 RC (CR2 #11543 + Researcher re-review) - Dispatched via PM 3fcea7f7 Generated with Claude Code

agent-dev-b added 3 commits 2026-06-14 04:35:38 +00:00

test(harness): capture core#2737 canary A2A smoke flow in local replay

CI / Python Lint & Test (pull_request) Successful in 5s

Details

E2E Peer Visibility (literal MCP list_peers) / detect-changes (pull_request) Successful in 5s

Details

Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 7s

Details

Handlers Postgres Integration / detect-changes (pull_request) Successful in 5s

Details

sop-checklist / review-refire (pull_request_target) Has been skipped

Details

Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 5s

Details

Harness Replays / detect-changes (pull_request) Successful in 5s

Details

E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (local) (pull_request) Has been skipped

Details

Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 5s

Details

Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 7s

Details

Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2s

Details

qa-review / approved (pull_request_target) Failing after 8s

Details

Harness Replays / Harness Replays (pull_request) Successful in 2s

Details

E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (pull_request) Successful in 5s

Details

sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2

Details

security-review / approved (pull_request_target) Failing after 7s

Details

sop-checklist / na-declarations (pull_request) N/A: (none)

Details

reserved-path-review / reserved-path-review (pull_request_target) Successful in 8s

Details

sop-checklist / all-items-acked (pull_request_target) Successful in 8s

Details

E2E API Smoke Test / detect-changes (pull_request) Successful in 13s

Details

CI / Detect changes (pull_request) Successful in 15s

Details

gate-check-v3 / gate-check (pull_request_target) Failing after 12s

Details

E2E Chat / detect-changes (pull_request) Successful in 15s

Details

CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s

Details

CI / Platform (Go) (pull_request) Successful in 2s

Details

E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2s

Details

CI / Canvas (Next.js) (pull_request) Successful in 2s

Details

E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 17s

Details

CI / Canvas Deploy Status (pull_request) Successful in 1s

Details

E2E Chat / E2E Chat (pull_request) Successful in 3s

Details

E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 3s

Details

lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 22s

Details

CI / all-required (pull_request) Successful in 3s

Details

Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (pull_request) Successful in 27s

Details

Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (pull_request) Successful in 27s

Details

fcd3247bdd

The staging SaaS smoke canary (staging-smoke.yml, every 30 min) has
been red for many runs (issue #2737 has 46+ failure comments).
Researcher's RCA pinned the red on tests/e2e/test_staging_full_saas.sh:1105-1170
— the A2A QUEUE poll that loops GET /workspaces/:id/a2a/queue/:qid for
the known-answer PONG. The CP-drift cause is owned separately; the
harness-capture (this PR) is the local-replay side of the SOP.

This replay captures the canary's A2A round-trip against the LOCAL
production-shape harness (cf-proxy + canvas-proxy + cp-stub + tenant
images from Dockerfile.tenant), so the failure can be reproduced and
diagnosed locally without re-running the full staging SaaS canary.
Pre-#2737 the harness's 6 existing replays cover workspace / peer /
activity / isolation / buildinfo / channel-envelope paths — none
drive the A2A queue polling step, which is the exact step the
canary is failing on.

Phases:
  A. Liveness — alpha /health + seeded workspace resolve.
  B. Mint a per-workspace bearer (via /admin/workspaces/:id/tokens,
     matching the canary's auth shape) and POST /a2a with a
     known-answer payload (default text: "pong"), carrying the
     X-Molecule-Org-Id + X-Workspace-ID headers the production-shape
     cf-proxy + TenantGuard expect.
  C. Poll GET /workspaces/:id/a2a/queue up to POLL_TIMEOUT_SECS
     (default 30s, matching the staging canary's per-poll cap) for
     the messageId we sent. Same shape as test_staging_full_saas.sh:1105-1170.
  D. Assert the queue poll found the PONG (non-empty body).
     Negative result = the core#2737 failure shape (queue poll
     returns no items forever) reproduced locally.

Failure modes this catches that unit tests don't (matching the
staging canary's surface):
  - 524 from cf-proxy when the proxy / agent-bridge is starved
  - WS starvation on long synchronous turns
  - A2A QUEUE poll returns no items forever (the symptom pinned
    in #2737 at test_staging_full_saas.sh:1105-1170)
  - TenantGuard middleware path (production-shape, not unit-mock'd)
  - The full canvas -> proxy -> A2A handler wire, not the handler
    signature alone

Required env (set by tests/harness/up.sh + seed.sh):
  BASE, ALPHA_ADMIN_TOKEN, ALPHA_ORG_ID, ALPHA_WORKSPACE_ID
  (seeded by seed.sh; .seed.env read by source).

Optional env:
  POLL_TIMEOUT_SECS  default 30
  KNOWN_ANSWER_TEXT  default 'pong'

CI gate: the .gitea/workflows/harness-replays.yml workflow auto-runs
every replay under tests/harness/replays/ on push/PR (paths filter on
workspace-server/, canvas/, tests/harness/, .gitea/workflows/harness-replays.yml).
A regression that breaks the canary's A2A queue polling will now also
break this replay, surfaced as a CI failure alongside the canary red.

Local validation:
  bash -n tests/harness/replays/canary-smoke-a2a-pong.sh  -> clean (exit 0)
  chmod +x tests/harness/replays/canary-smoke-a2a-pong.sh
  End-to-end run requires the harness (tests/harness/up.sh + seed.sh);
  cannot validate in this session (no Docker access in the agent
  environment). CI gate is the authoritative validator.

Refs: #2737 (Researcher RCA), SOP rule feedback_local_must_mimic_production
Co-Authored-By: Claude <noreply@anthropic.com>

test(harness): add org-create-400-body capture replay for core#2737

CI / Python Lint & Test (pull_request) Successful in 4s

Details

Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 6s

Details

sop-checklist / review-refire (pull_request_target) Has been skipped

Details

Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 5s

Details

Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 5s

Details

Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 7s

Details

reserved-path-review / reserved-path-review (pull_request_target) Successful in 7s

Details

sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2

Details

E2E Peer Visibility (literal MCP list_peers) / detect-changes (pull_request) Successful in 11s

Details

sop-checklist / na-declarations (pull_request) N/A: (none)

Details

sop-checklist / all-items-acked (pull_request_target) Successful in 8s

Details

E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (local) (pull_request) Has been skipped

Details

Handlers Postgres Integration / detect-changes (pull_request) Successful in 13s

Details

E2E Chat / detect-changes (pull_request) Successful in 15s

Details

E2E API Smoke Test / detect-changes (pull_request) Successful in 15s

Details

CI / Detect changes (pull_request) Successful in 18s

Details

Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 1s

Details

gate-check-v3 / gate-check (pull_request_target) Failing after 14s

Details

Harness Replays / detect-changes (pull_request) Successful in 16s

Details

E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (pull_request) Successful in 5s

Details

E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 17s

Details

E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2s

Details

CI / Platform (Go) (pull_request) Successful in 2s

Details

CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s

Details

Harness Replays / Harness Replays (pull_request) Successful in 1s

Details

E2E Chat / E2E Chat (pull_request) Successful in 3s

Details

CI / Canvas (Next.js) (pull_request) Successful in 3s

Details

E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 3s

Details

CI / Canvas Deploy Status (pull_request) Successful in 0s

Details

lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 22s

Details

Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (pull_request) Successful in 26s

Details

CI / all-required (pull_request) Successful in 4s

Details

Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (pull_request) Successful in 24s

Details

qa-review / approved (pull_request_target) Review check failed via pull_request_review trigger

reserved-path-review / reserved-path-review (pull_request_review) Successful in 7s

Details

security-review / approved (pull_request_target) Review check failed via pull_request_review trigger

qa-review / approved (pull_request_review) Failing after 9s

Details

security-review / approved (pull_request_review) Failing after 8s

Details

318b168d10

Second replay in the #2737 harness-capture pair (the first is the
A2A-queue-drain replay in the prior commit on this branch).

Researcher RCA #101104 (2026-06-14T04:07:25Z): the staging script's
admin_call helper uses `curl --fail-with-body` so a non-2xx POST
/cp/admin/orgs returns the body to stdout but exits 22 — and under
set -e the script exits before reaching the raw-body diagnostic
block. The 400 body is silently lost; future 400s require forensic
log diffing to classify.

This replay captures the failure shape locally against the
harness's CP stub: POST /cp/admin/orgs with a known-bad payload
(missing owner_user_id), bypass the admin_call helper so the body
is captured, assert the response is a 4xx with a non-empty
parseable JSON body. If the harness's CP stub ever regresses to
returning an empty body or a 5xx for a bad payload, this replay
surfaces it.

The recommended staging fix (per Researcher #101104) is to mirror
this capture shape in tests/e2e/test_staging_full_saas.sh —
temporarily disable set -e around admin_call, capture the body
to a file, parse + assert. The replay's phase 4 prints the
recommended pattern so the staging fix has a copy-paste template.

Pair coverage on #2737:
  - A2A-queue-drain replay (prior commit) — catches the downstream
    "row stuck at status=queued" failure pinned in the
    Researcher's earlier RCA.
  - org-create-400-body capture (this commit) — catches the
    upstream "CP returns 400, body lost under set -e" failure
    pinned in Researcher RCA #101104.

CI gate: .gitea/workflows/harness-replays.yml auto-runs every replay
under tests/harness/replays/ on push/PR (paths filter on
workspace-server/, canvas/, tests/harness/, .gitea/workflows/harness-replays.yml).
A regression that breaks either replay surfaces as a CI failure
alongside the canary red.

Local validation:
  bash -n tests/harness/replays/canary-smoke-org-create-400-capture.sh  -> clean (exit 0)
  chmod +x set
  End-to-end run requires the harness (tests/harness/up.sh + seed.sh);
  cannot validate in this session (no Docker access in the agent
  environment). CI gate is the authoritative validator.

Refs: #2737 (Researcher RCA #101104)
Co-Authored-By: Claude <noreply@anthropic.com>

test(workspace-server#2800): pin detached WS delivery + fix budget-hybrid comments

E2E Staging SaaS (full lifecycle) / E2E Staging Concierge user_tasks (pull_request) Has been skipped

Details

E2E Staging SaaS (full lifecycle) / E2E Staging Workspace Requests (core#2606) (pull_request) Has been skipped

Details

E2E Staging SaaS (full lifecycle) / E2E Staging Concierge Creates Workspace (pull_request) Has been skipped

Details

E2E Staging SaaS (full lifecycle) / E2E Staging Concierge Platform Agent (pull_request) Has been skipped

Details

CI / Python Lint & Test (pull_request) Successful in 6s

Details

Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 7s

Details

E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Failing after 7s

Details

sop-checklist / review-refire (pull_request_target) Has been skipped

Details

Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 5s

Details

Harness Replays / detect-changes (pull_request) Successful in 6s

Details

E2E Staging SaaS (full lifecycle) / E2E Staging Platform Boot (pull_request) Failing after 9s

Details

Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 6s

Details

E2E Peer Visibility (literal MCP list_peers) / detect-changes (pull_request) Successful in 11s

Details

Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 7s

Details

Harness Replays / Harness Replays (pull_request) Successful in 2s

Details

Handlers Postgres Integration / detect-changes (pull_request) Successful in 10s

Details

qa-review / approved (pull_request_target) Failing after 9s

Details

sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2

Details

E2E Staging SaaS (full lifecycle) / E2E Staging Concierge (compile+skip) (pull_request) Successful in 13s

Details

E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (local) (pull_request) Has been skipped

Details

security-review / approved (pull_request_target) Failing after 9s

Details

sop-checklist / na-declarations (pull_request) N/A: (none)

Details

reserved-path-review / reserved-path-review (pull_request_target) Failing after 8s

Details

sop-checklist / all-items-acked (pull_request_target) Successful in 8s

Details

CI / Detect changes (pull_request) Successful in 18s

Details

gate-check-v3 / gate-check (pull_request_target) Failing after 13s

Details

E2E Chat / detect-changes (pull_request) Successful in 18s

Details

E2E API Smoke Test / detect-changes (pull_request) Successful in 19s

Details

CI / Shellcheck (E2E scripts) (pull_request) Successful in 1s

Details

E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (pull_request) Successful in 6s

Details

lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 17s

Details

CI / Canvas (Next.js) (pull_request) Successful in 3s

Details

CI / Canvas Deploy Status (pull_request) Successful in 1s

Details

E2E Chat / E2E Chat (pull_request) Successful in 4s

Details

E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Successful in 25s

Details

E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 30s

Details

E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 3s

Details

Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (pull_request) Successful in 32s

Details

Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (pull_request) Successful in 24s

Details

Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 47s

Details

audit-force-merge / audit (pull_request_target) Has been skipped

Details

E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2m17s

Details

CI / Platform (Go) (pull_request) Successful in 2m35s

Details

CI / all-required (pull_request) Successful in 3s

Details

417f938b4f

Combined #2800 RC fix (per PM instruction 3fcea7f7):

## 1. Detached WS-delivery test (the contract-critical gap)
NEW: TestProxyA2A_CanvasCapAndQueue_DetachedResultDelivery pins the
contract-critical >budget result-delivery path end-to-end. The
cap-and-queue returns {status:"queued"} (202 Accepted with task_id
in the #2751 expansion) when the agent's turn outlives the budget,
but the queued ack is ONLY USEFUL if the actual agent result
subsequently lands on the A2A_RESPONSE WS broadcast. This test
forces the queued path (50ms budget, 500ms agent), captures the
eventual broadcast, and asserts:
  1. workspace_id == "ws-deliver"            (correct routing)
  2. message_id  == "msg-deliver-001"        (canvas can correlate to
                                              the original send)
  3. response_body.result.reply == "detached-delivery-reply"
                                              (the actual agent
                                              reply, NOT empty /
                                              NOT placeholder — the
                                              contract-critical
                                              "did the agent's real
                                              work actually arrive?"
                                              assertion)
  4. duration_ms > 50                         (proves the queued path
                                              actually fired; the
                                              sync path would
                                              duration ~0 since the
                                              handler returned at
                                              the budget, not after
                                              the agent)

Returning {status:"queued"} is necessary but not sufficient — a
regression that broadcast an empty body or wrong workspace_id
with the right message_id would leave the canvas with no result
to render. This test closes the gap.

Distinct from TestProxyA2A_CanvasCapAndQueue_EndToEndContract:
that test asserts message_id + response_body content; this test
additionally asserts duration_ms > budget (proves the queued path
actually fired, not the sync path) and verifies workspace_id
routing.

## 2. Budget-hybrid comment fix (the contradictory wording)
Fixed a2a_proxy.go cap-and-queue block + canvasA2ASyncBudget doc to
describe the BUDGET-HYBRID behavior (sync under budget, queued+WS
over budget) instead of the contradictory 'held for the whole
turn' / 'cap-and-queue synchronous wait' wording. The code is a
budget-hybrid (NOT always-async, NOT always-sync), and the
comments now match the actual implementation.

## 3. (CR2 #11543 default-ON behavioral test) — already exists; documented
The 'unset env, short turn, assert SYNC reply not queued' test
is conceptually the same as the existing
TestCanvasA2ASyncBudget_DefaultIs90s (unit test on
canvasA2ASyncBudget() — the function whose value the if-block
guards on). That unit test directly catches a regression of the
default to 0 (legacy always-sync), which would re-expose canvas to
the 524+WS-starvation class. The unit test IS the behavioral
guard at the function level; an integration variant cannot
distinguish default=0 from default=90s for a 2-3s turn (both
return the sync reply — see prior turn's analysis).

Local validation:
  go vet -tags=integration ./internal/handlers/  -> clean
  go build ./...                                -> clean
  go test -tags=integration -run 'TestProxyA2A_|TestCanvasA2ASyncBudget|TestCanvasA2ASyncDisabled|TestLogA2ASuccess' ./internal/handlers/
  -> all PASS (22 TestProxyA2A_*, 3 budget/disable unit tests, ...)
  TestProxyA2A_CanvasCapAndQueue_DetachedResultDelivery (0.52s) PASS

Refs: #2800 RC (CR2 #11543 + Researcher re-review); dispatched via PM 3fcea7f7
Co-Authored-By: Claude <noreply@anthropic.com>

agent-dev-b closed this pull request

2026-06-14 04:37:37 +00:00

E2E Staging SaaS (full lifecycle) / E2E Staging Concierge user_tasks (pull_request) Has been skipped

Details

E2E Staging SaaS (full lifecycle) / E2E Staging Workspace Requests (core#2606) (pull_request) Has been skipped

Details

E2E Staging SaaS (full lifecycle) / E2E Staging Concierge Creates Workspace (pull_request) Has been skipped

Details

E2E Staging SaaS (full lifecycle) / E2E Staging Concierge Platform Agent (pull_request) Has been skipped

Details

CI / Python Lint & Test (pull_request) Successful in 6s

Details

Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 7s

Details

E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Failing after 7s

Details

sop-checklist / review-refire (pull_request_target) Has been skipped

Details

Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 5s

Details

Harness Replays / detect-changes (pull_request) Successful in 6s

Details

E2E Staging SaaS (full lifecycle) / E2E Staging Platform Boot (pull_request) Failing after 9s

Details

Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 6s

Details

E2E Peer Visibility (literal MCP list_peers) / detect-changes (pull_request) Successful in 11s

Details

Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 7s

Details

Harness Replays / Harness Replays (pull_request) Successful in 2s

Details

Handlers Postgres Integration / detect-changes (pull_request) Successful in 10s

Details

qa-review / approved (pull_request_target) Failing after 9s

Details

sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2

Details

E2E Staging SaaS (full lifecycle) / E2E Staging Concierge (compile+skip) (pull_request) Successful in 13s

Details

E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (local) (pull_request) Has been skipped

Details

security-review / approved (pull_request_target) Failing after 9s

Details

sop-checklist / na-declarations (pull_request) N/A: (none)

Details

reserved-path-review / reserved-path-review (pull_request_target) Failing after 8s

Details

sop-checklist / all-items-acked (pull_request_target) Successful in 8s

Details

CI / Detect changes (pull_request) Successful in 18s

Details

gate-check-v3 / gate-check (pull_request_target) Failing after 13s

Details

E2E Chat / detect-changes (pull_request) Successful in 18s

Details

E2E API Smoke Test / detect-changes (pull_request) Successful in 19s

Details

CI / Shellcheck (E2E scripts) (pull_request) Successful in 1s

Details

E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (pull_request) Successful in 6s

Required

Details

lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 17s

Details

CI / Canvas (Next.js) (pull_request) Successful in 3s

Details

CI / Canvas Deploy Status (pull_request) Successful in 1s

Details

E2E Chat / E2E Chat (pull_request) Successful in 4s

Details

E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Successful in 25s

Details

E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 30s

Details

E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 3s

Details

Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (pull_request) Successful in 32s

Details

Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (pull_request) Successful in 24s

Details

Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 47s

Required

Details

audit-force-merge / audit (pull_request_target) Has been skipped

Details

E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2m17s

Required

Details

CI / Platform (Go) (pull_request) Successful in 2m35s

Details

CI / all-required (pull_request) Successful in 3s

Required

Details

Pull request closed

Please reopen this pull request to perform a merge.

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#2823