test(e2e): name the A2A empty-completion failure class in staging SaaS canary #2203

2026-06-04T04:17:37Z

core-devops commented

2026-06-04 04:17:37 +00:00

What

Step 8 of tests/e2e/test_staging_full_saas.sh (the full-lifecycle SaaS canary) sends an A2A round-trip to the parent and asserts a PONG. When the configured completion backend returns a 2xx with no text part (empty content, or tool_calls/reasoning-only), the agent runtime returns the literal reply Error: message contained no text content. Until now that fell through the generic error|exception catch-all and was reported as a vague "A2A returned an error-shaped response", which misdirects triage to workspace-server.

This adds a specific error-class check (mirroring the existing hermes-401 / quota-exhausted patterns) that names it as a model/provider backend regression with the operator action, immediately before the generic catch-all. No behaviour change for healthy runs; a genuine empty-completion still hard-fails — it is just diagnosed correctly.

Why now

Observed 2026-06-03/04: 100% of staging canaries on MODEL_SLUG=MiniMax-M2 (the canary default since #2710) hit this on the parent first cold turn. It is identical on main scheduled synthetic E2E and on open PRs (incl. #2197) — i.e. an environmental backend regression, NOT PR-introduced and NOT a workspace-server/boot fault. This change is purely diagnostic precision on the unmodified main-line step-8 block; it does not mask the regression.

Verification

bash -n clean (on main base)
shellcheck -S error clean
tests/e2e/test_model_slug.sh: 16/16 pass (untouched)
New grep verified to match the exact runtime string Error: message contained no text content.

🤖 Generated with Claude Code

## What Step 8 of `tests/e2e/test_staging_full_saas.sh` (the full-lifecycle SaaS canary) sends an A2A round-trip to the parent and asserts a PONG. When the configured completion backend returns a 2xx with **no text part** (empty content, or tool_calls/reasoning-only), the agent runtime returns the literal reply `Error: message contained no text content.` Until now that fell through the generic `error|exception` catch-all and was reported as a vague *"A2A returned an error-shaped response"*, which misdirects triage to workspace-server. This adds a specific error-class check (mirroring the existing hermes-401 / quota-exhausted patterns) that names it as a model/provider **backend** regression with the operator action, immediately before the generic catch-all. No behaviour change for healthy runs; a genuine empty-completion still hard-fails — it is just diagnosed correctly. ## Why now Observed 2026-06-03/04: **100% of staging canaries on `MODEL_SLUG=MiniMax-M2`** (the canary default since #2710) hit this on the parent first cold turn. It is **identical on main scheduled synthetic E2E and on open PRs** (incl. #2197) — i.e. an environmental backend regression, NOT PR-introduced and NOT a workspace-server/boot fault. This change is purely diagnostic precision on the unmodified main-line step-8 block; it does not mask the regression. ## Verification - `bash -n` clean (on main base) - `shellcheck -S error` clean - `tests/e2e/test_model_slug.sh`: 16/16 pass (untouched) - New grep verified to match the exact runtime string `Error: message contained no text content.` 🤖 Generated with [Claude Code](https://claude.com/claude-code)

core-devops added 1 commit 2026-06-04 04:17:39 +00:00

test(e2e): name the A2A empty-completion failure class in staging SaaS canary

ci-arm64-advisory / fast-checks (pull_request) Waiting to run

Details

Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 1s

Details

CI / Python Lint & Test (pull_request) Successful in 14s

Details

Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 15s

Details

E2E Chat / detect-changes (pull_request) Successful in 22s

Details

E2E API Smoke Test / detect-changes (pull_request) Successful in 22s

Details

CI / Detect changes (pull_request) Successful in 24s

Details

Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 3s

Details

E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 22s

Details

Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 8s

Details

Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 19s

Details

Handlers Postgres Integration / detect-changes (pull_request) Successful in 22s

Details

sop-checklist / review-refire (pull_request_target) Has been skipped

Details

qa-review / approved (pull_request_target) Failing after 8s

Details

security-review / approved (pull_request_target) Failing after 5s

Details

gate-check-v3 / gate-check (pull_request_target) Successful in 14s

Details

CI / Platform (Go) (pull_request) Successful in 2s

Details

sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2

Details

sop-checklist / na-declarations (pull_request) N/A: (none)

Details

sop-checklist / all-items-acked (pull_request_target) Successful in 12s

Details

CI / Canvas (Next.js) (pull_request) Successful in 2s

Details

sop-tier-check / tier-check (pull_request_target) Successful in 11s

Details

E2E Chat / E2E Chat (pull_request) Successful in 10s

Details

E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 10s

Details

CI / Shellcheck (E2E scripts) (pull_request) Successful in 23s

Details

lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 58s

Details

CI / all-required (pull_request) Successful in 7s

Details

Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2s

Details

CI / Canvas Deploy Reminder (pull_request) Has been skipped

Details

E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 1m25s

Details

E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Waiting to run

Details

E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Waiting to run

Details

E2E Staging SaaS (full lifecycle) / E2E Staging Platform Boot (pull_request) Waiting to run

Details

audit-force-merge / audit (pull_request_target) Successful in 36s

Details

4fe9e809e9

Step 8 of the full-lifecycle SaaS canary sends an A2A round-trip to the
parent and asserts a PONG. When the configured completion backend returns
a 2xx with no text part (empty content / tool_calls-or-reasoning-only),
the agent runtime surfaces the literal reply "Error: message contained no
text content." Today that fell through the generic "error|exception"
catch-all and was reported as a vague "A2A returned an error-shaped
response", which misdirects triage to workspace-server.

Add a specific error-class check (mirroring the existing hermes-401 /
quota-exhausted patterns) that names this as a model/provider BACKEND
regression with the operator action, before the generic catch-all. No
behaviour change for healthy runs; the failure still hard-fails — it is
just diagnosed correctly.

Observed 2026-06-03/04: 100% of staging canaries on MODEL_SLUG=MiniMax-M2
(canary default since #2710) hit this on the parent's first cold turn,
identical on main's scheduled synthetic E2E and on open PRs — i.e. an
environmental backend regression, not PR-introduced. This is purely a
diagnostic-precision improvement to the unmodified main-line step-8 block.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

core-devops referenced this pull request

2026-06-04 04:22:12 +00:00

Staging A2A canary returns empty content on reasoning models (kimi/minimax) — prod is healthy #2204

claude-ceo-assistant merged commit 0ad52852fd into main

2026-06-04 04:22:39 +00:00

core-devops commented

2026-06-04 04:22:48 +00:00

Owner force-merged (honest bypass). Diagnostic precision for the staging A2A empty-completion class (names it a backend/reasoning-model issue with operator action, before the generic catch-all) — does NOT mask the red (tolerating empty completions would hide a real signal). Required CI green; bash-lint clean; model-slug tests 16/16. Token revoked.

molecule-code-reviewer referenced this pull request

2026-06-04 04:39:31 +00:00

Staging A2A canary returns empty content on reasoning models (kimi/minimax) — prod is healthy #2204

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#2203