Canary failing: staging SaaS smoke #2437

Closed
opened 2026-06-08 17:06:26 +00:00 by gitea-actions · 216 comments

Smoke run failed at 2026-06-08T17:06:26Z.

Run: https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/284724

This issue auto-closes on the next green smoke run. Consecutive failures add a comment here rather than a new issue.

Smoke run failed at 2026-06-08T17:06:26Z. Run: https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/284724 This issue auto-closes on the next green smoke run. Consecutive failures add a comment here rather than a new issue.
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/285188
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/285884
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/286533
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/287276
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/287965
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/288862
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/289598
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/290385
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/290986
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/291995
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/292542
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/293036
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/293722
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/294610
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/295365
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/296193
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/297050
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/297762
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/298490
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/299098
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/299907
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/301019
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/301943
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/302868
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/303797
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/304599
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/305365
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/306180
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/307013
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/307904
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/308945
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/309575
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/310553
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/311285
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/311984
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/312902
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/313537
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/314382
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/315126
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/316148
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/317194
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/317951
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/318695
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/319573
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/320314
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/321090
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/321595
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/322042
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/322303
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/322649
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/323082
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/323483
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/323882
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/324070
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/324285
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/324482
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/324731
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/324956
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/325214
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/325480
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/325727
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/325972
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/326191
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/326443
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/326730
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/326931
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/327170
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/327414
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/327850
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/328046
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/328300
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/328875
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/329096
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/330211
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/330725
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/331434
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/331895
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/332263
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/332792
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/333599
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/334233
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/334741
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/335321
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/335787
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/336468
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/337192
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/338055
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/338646
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/339175
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/340088
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/340499
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/341154
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/341621
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/341915
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/342506
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/342883
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/343169
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/344003
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/344639
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/344841
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/344914
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/345063
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/345147
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/345323
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/345509
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/345639
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/345799
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/345974
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/346473
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/346645
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/346842
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/346842
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/346908
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/347185
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/347420
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/347604
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/347668
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/347744
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/347771
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/347827
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/347877
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/347952
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/348028
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/348155
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/348212
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/348323
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/348404
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/348436
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/348464
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/348624
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/348772
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/348872
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/349026
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/349052
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/349113
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/349154
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/349179
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/349285
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/349349
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/349397
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/349424
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/349476
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/349633
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/349745
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/349824
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/349915
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/350030
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/350124
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/350234
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/350257
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/350553
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/350638
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/350674
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/350748
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/350800
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/350897
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/351125
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/351207
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/351266
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/351346
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/351429
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/351497
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/351616
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/351690
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/351770
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/351793
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/351873
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/351928
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/352007
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/352112
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/352195
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/352246
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/352359
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/352429
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/352487
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/352507
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/352565
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/352583
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/352608
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/352651
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/352743
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/352828
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/352968
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/353175
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/353390
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/353489
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/353700
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/353795
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/353983
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/354155
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/354236
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/354400
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/354522
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/354699
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/354764
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/354897
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/355060
Member

MECHANISM: Latest staging-smoke run 355060 reaches tenant/routing/file checks, then fails at the A2A text extraction boundary. The smoke sends message/send in tests/e2e/test_staging_full_saas.sh:1092-1167 and only accepts immediate result.parts[0].text at tests/e2e/test_staging_full_saas.sh:1168-1175. But the platform intentionally returns HTTP 202 {queued:true, queue_id,...} when the target workspace is busy; that success path is generated in workspace-server/internal/handlers/a2a_proxy_helpers.go:68-127 and later drained by workspace-server/internal/handlers/a2a_queue.go:336-395. So the smoke is red because it treats a valid queued envelope as "no text" instead of following the queue status/completion path.

EVIDENCE: Run 355060 at commit 90cf829566 shows tenant success through routability, then step 8 logs "A2A returned no text" with queued:true and queue_id 56866bed-a4ee-4533-a362-e21615582d14. Relevant log excerpt: "request queued, will dispatch". The queue observability endpoint is in workspace-server/internal/handlers/a2a_queue_status.go:79-141; it projects queued/completed status and response_body for completed rows, matching the intended follow-up surface rather than the immediate text-only parser.

RECOMMENDED FIX SHAPE: In molecule-core, update tests/e2e/test_staging_full_saas.sh to classify a 202/2xx queued envelope as a queued A2A outcome, then poll the public queue status endpoint until completed or timeout and extract completed response_body; keep immediate result.parts[].text as the fast path. If the queue status endpoint returns 500/NULL-scan for queued rows, that belongs in workspace-server/internal/handlers/a2a_queue_status.go by preserving nullable DB scans/COALESCE for joined activity fields. Do not weaken the smoke to ignore A2A; make it queue-aware.

MECHANISM: Latest staging-smoke run 355060 reaches tenant/routing/file checks, then fails at the A2A text extraction boundary. The smoke sends `message/send` in tests/e2e/test_staging_full_saas.sh:1092-1167 and only accepts immediate `result.parts[0].text` at tests/e2e/test_staging_full_saas.sh:1168-1175. But the platform intentionally returns HTTP 202 `{queued:true, queue_id,...}` when the target workspace is busy; that success path is generated in workspace-server/internal/handlers/a2a_proxy_helpers.go:68-127 and later drained by workspace-server/internal/handlers/a2a_queue.go:336-395. So the smoke is red because it treats a valid queued envelope as "no text" instead of following the queue status/completion path. EVIDENCE: Run 355060 at commit 90cf829566e0d90de2c3d8ccf58ee305b236048d shows tenant success through routability, then step 8 logs "A2A returned no text" with `queued:true` and queue_id `56866bed-a4ee-4533-a362-e21615582d14`. Relevant log excerpt: "request queued, will dispatch". The queue observability endpoint is in workspace-server/internal/handlers/a2a_queue_status.go:79-141; it projects queued/completed status and `response_body` for completed rows, matching the intended follow-up surface rather than the immediate text-only parser. RECOMMENDED FIX SHAPE: In molecule-core, update tests/e2e/test_staging_full_saas.sh to classify a 202/2xx queued envelope as a queued A2A outcome, then poll the public queue status endpoint until completed or timeout and extract completed `response_body`; keep immediate `result.parts[].text` as the fast path. If the queue status endpoint returns 500/NULL-scan for queued rows, that belongs in workspace-server/internal/handlers/a2a_queue_status.go by preserving nullable DB scans/COALESCE for joined activity fields. Do not weaken the smoke to ignore A2A; make it queue-aware.
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/355208
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/355511
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/355665
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/355886
Member

MECHANISM: The current core red cluster at 4ec95c3202 is not a tenant-provisioning, file API, or terminal failure. In staging-smoke, tests/e2e/test_staging_full_saas.sh:1046-1089 writes config.yaml and waits for routing recovery, then the first A2A loop at tests/e2e/test_staging_full_saas.sh:1131-1167 hits the proxy's dead/restarting path. workspace-server/internal/handlers/a2a_proxy.go:732-738 turns upstream dead status into structured 503/restarting, while workspace-server/internal/handlers/workspace_restart.go:930-934 clears url and sets provisioning during restart; the next retry can therefore read workspace-server/internal/handlers/a2a_proxy.go:777-797 and fail as no URL/provisioning.

EVIDENCE: Run 355886 job 482700 reached config.yaml PUT OK, then logged workspace agent unreachable — container restart triggered, followed by workspace has no URL. Run 355890 job 482704 reached the same post-config recovery boundary but failed A2A with curl_rc=28, http=000. The shared timing is after config-save restart and before the first agent response; earlier image upload/download, files API, and terminal checks were already green.

RECOMMENDED FIX SHAPE: Keep the fix in molecule-core's post-config-save restart/A2A recovery path: workspace-server/internal/handlers/a2a_proxy.go, workspace-server/internal/handlers/a2a_proxy_helpers.go, workspace-server/internal/handlers/workspace_restart.go, and the bounded retry expectations in tests/e2e/test_staging_full_saas.sh. The minimal direction is to make restart completion publish a fresh routable URL/register boundary before A2A resumes, and have the smoke retry structured restarting/provisioning/no-URL states within the same bounded readiness window instead of treating the immediate post-restart URL-clear as a terminal failure.

MECHANISM: The current core red cluster at 4ec95c320287429241148ab535727457e66f776c is not a tenant-provisioning, file API, or terminal failure. In staging-smoke, tests/e2e/test_staging_full_saas.sh:1046-1089 writes config.yaml and waits for routing recovery, then the first A2A loop at tests/e2e/test_staging_full_saas.sh:1131-1167 hits the proxy's dead/restarting path. workspace-server/internal/handlers/a2a_proxy.go:732-738 turns upstream dead status into structured 503/restarting, while workspace-server/internal/handlers/workspace_restart.go:930-934 clears url and sets provisioning during restart; the next retry can therefore read workspace-server/internal/handlers/a2a_proxy.go:777-797 and fail as no URL/provisioning. EVIDENCE: Run 355886 job 482700 reached config.yaml PUT OK, then logged `workspace agent unreachable — container restart triggered`, followed by `workspace has no URL`. Run 355890 job 482704 reached the same post-config recovery boundary but failed A2A with `curl_rc=28, http=000`. The shared timing is after config-save restart and before the first agent response; earlier image upload/download, files API, and terminal checks were already green. RECOMMENDED FIX SHAPE: Keep the fix in molecule-core's post-config-save restart/A2A recovery path: workspace-server/internal/handlers/a2a_proxy.go, workspace-server/internal/handlers/a2a_proxy_helpers.go, workspace-server/internal/handlers/workspace_restart.go, and the bounded retry expectations in tests/e2e/test_staging_full_saas.sh. The minimal direction is to make restart completion publish a fresh routable URL/register boundary before A2A resumes, and have the smoke retry structured restarting/provisioning/no-URL states within the same bounded readiness window instead of treating the immediate post-restart URL-clear as a terminal failure.
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/356002
Member

MECHANISM: The latest staging red pair on head 9a40df22ba narrows #2437 from no-URL/restarting into an A2A-readiness gap after config-save restart. tests/e2e/test_staging_full_saas.sh:1085-1089 waits only for status=online plus non-empty URL using wait_workspaces_online_routable (:488-525). Both jobs satisfy that boundary after config.yaml PUT, then step 8 posts the first A2A (:1131-1141) and curl waits the full 90s. Because the retry branch only handles HTTP 502/503/504 with a matching body (:1149-1161), curl_rc=28/http=000 exits after one attempt instead of using the intended 12-attempt cold-start loop.

EVIDENCE: Staging smoke run 356002/job 482908: workspace 9c108ad2-ff7a-46fb-a64d-f1627048f8ac was online with URL after config PUT at 02:06:54, then failed at 02:08:24 with curl_rc=28, http=000. Synthetic run 356006/job 482912 repeats the same shape: parent and child recovered online after config PUT; parent A2A failed at 02:10:13 with curl_rc=28, http=000. Earlier image upload/download, terminal diagnose, Files API PUT, and teardown all passed.

RECOMMENDED FIX SHAPE: Keep this in molecule-core staging E2E/runtime readiness, not tenant provisioning. Responsible surfaces are tests/e2e/test_staging_full_saas.sh and the workspace-server/agent readiness boundary behind A2A. The minimal direction is to treat curl timeout/http000 on the first post-restart A2A as a bounded transient in the same cold-start retry loop, and/or expose a stronger agent-ready signal than online+URL before step 8 proceeds. Do not just increase the single 90s timeout; the bug is the false readiness boundary plus non-retried timeout class.

MECHANISM: The latest staging red pair on head 9a40df22ba4b3fc075c166dd6869ff2539df12ae narrows #2437 from no-URL/restarting into an A2A-readiness gap after config-save restart. `tests/e2e/test_staging_full_saas.sh:1085-1089` waits only for status=online plus non-empty URL using `wait_workspaces_online_routable` (`:488-525`). Both jobs satisfy that boundary after `config.yaml` PUT, then step 8 posts the first A2A (`:1131-1141`) and curl waits the full 90s. Because the retry branch only handles HTTP 502/503/504 with a matching body (`:1149-1161`), `curl_rc=28/http=000` exits after one attempt instead of using the intended 12-attempt cold-start loop. EVIDENCE: Staging smoke run 356002/job 482908: workspace `9c108ad2-ff7a-46fb-a64d-f1627048f8ac` was online with URL after config PUT at 02:06:54, then failed at 02:08:24 with `curl_rc=28, http=000`. Synthetic run 356006/job 482912 repeats the same shape: parent and child recovered online after config PUT; parent A2A failed at 02:10:13 with `curl_rc=28, http=000`. Earlier image upload/download, terminal diagnose, Files API PUT, and teardown all passed. RECOMMENDED FIX SHAPE: Keep this in molecule-core staging E2E/runtime readiness, not tenant provisioning. Responsible surfaces are `tests/e2e/test_staging_full_saas.sh` and the workspace-server/agent readiness boundary behind A2A. The minimal direction is to treat curl timeout/http000 on the first post-restart A2A as a bounded transient in the same cold-start retry loop, and/or expose a stronger agent-ready signal than online+URL before step 8 proceeds. Do not just increase the single 90s timeout; the bug is the false readiness boundary plus non-retried timeout class.
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/356090
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/356215
Member

MECHANISM: The latest staging-smoke failure changed shape from timeout/no-URL into a queued-response handling bug after config-save restart. tests/e2e/test_staging_full_saas.sh:1131-1147 treats any 2xx A2A status as success and exits the retry loop, then :1168-1175 assumes the body is a completed JSON-RPC response with result.parts[0].text. In run 356215, the post-restart first A2A returned HTTP 202 with {"queued":true,...} from the busy-buffer path, so the script broke out as if the agent replied, parsed no text, and failed. This is not tenant provisioning/files/terminal: those passed before step 8.

EVIDENCE: Run 356215/job 483282 reached workspace online+routable, image upload/download OK, terminal diagnose OK, and config.yaml PUT OK. After config PUT it again saw online+routable, then step 8 failed with A2A returned no text; the raw body was workspace agent busy — request queued. The queue response had queue_id=8fef4c52-820d-4f58-b769-6787ae1890c7 and queue_depth=1, proving the platform accepted the message for later drain rather than returning a final model response.

RECOMMENDED FIX SHAPE: Keep this in molecule-core staging E2E/A2A readiness handling. In tests/e2e/test_staging_full_saas.sh, distinguish final 2xx JSON-RPC responses from 202 queued responses: when queued:true/queue_id appears, poll /workspaces/$PARENT_ID/a2a/queue/$queue_id or retry within the bounded cold-start window until a completed response exists, and only parse result.parts for final responses. If product contract expects direct first-turn responses after readiness, add a stronger agent-ready boundary before step 8; do not let queued 202 count as final success.

MECHANISM: The latest staging-smoke failure changed shape from timeout/no-URL into a queued-response handling bug after config-save restart. `tests/e2e/test_staging_full_saas.sh:1131-1147` treats any 2xx A2A status as success and exits the retry loop, then `:1168-1175` assumes the body is a completed JSON-RPC response with `result.parts[0].text`. In run 356215, the post-restart first A2A returned HTTP 202 with `{"queued":true,...}` from the busy-buffer path, so the script broke out as if the agent replied, parsed no text, and failed. This is not tenant provisioning/files/terminal: those passed before step 8. EVIDENCE: Run 356215/job 483282 reached workspace online+routable, image upload/download OK, terminal diagnose OK, and `config.yaml PUT OK`. After config PUT it again saw online+routable, then step 8 failed with `A2A returned no text`; the raw body was `workspace agent busy — request queued`. The queue response had `queue_id=8fef4c52-820d-4f58-b769-6787ae1890c7` and `queue_depth=1`, proving the platform accepted the message for later drain rather than returning a final model response. RECOMMENDED FIX SHAPE: Keep this in molecule-core staging E2E/A2A readiness handling. In `tests/e2e/test_staging_full_saas.sh`, distinguish final 2xx JSON-RPC responses from 202 queued responses: when `queued:true`/`queue_id` appears, poll `/workspaces/$PARENT_ID/a2a/queue/$queue_id` or retry within the bounded cold-start window until a completed response exists, and only parse `result.parts` for final responses. If product contract expects direct first-turn responses after readiness, add a stronger agent-ready boundary before step 8; do not let queued 202 count as final success.
Member

MECHANISM: The latest staging Platform Boot failure adds a second queue-handling edge after restart recovery: the known-answer A2A path correctly detects a 202 queued response and switches from POST to queue polling (tests/e2e/test_staging_full_saas.sh:1286-1330, :1360-1380), but the first GET /workspaces/$PARENT_ID/a2a/queue/$KA_QUEUE_ID returns 404. GetA2AQueueStatus intentionally collapses three cases to the same queue item not found body: no caller identity (a2a_queue_status.go:200-217), missing row (:220-223, :238-241), or caller/row auth mismatch (:231-235). The current test cannot distinguish which one happened, so a queued work item becomes an unrecoverable canary red even though enqueue itself succeeded.

EVIDENCE: Run 356238/job 483329 reached tenant provisioning, workspace online/routable, and initial parent A2A success (PONG). The next known-answer turn logged a transient 502, then known-answer A2A queued (queue_id=5af42033-e8d9-4609-9e85-1eb0fd7d8c83); switching to poll, then immediately failed with http=404 and body queue item not found. This is different from run 356215's first-A2A 202/no-text failure: here the known-answer code already enters the poll path, but queue-status lookup/auth masks the reason.

RECOMMENDED FIX SHAPE: Keep this in molecule-core queue-status/staging E2E handling. Add debug-safe distinction for operator/test logs around GetA2AQueueStatus 404 causes, or make the staging test poll with an identity guaranteed to satisfy the queue-row access rule. If NULL/system caller IDs are expected after #2696 normalization, ensure GetA2AQueueStatus still allows the target workspace or tenant/org token to read its own queued item. Responsible files: workspace-server/internal/handlers/a2a_queue_status.go and tests/e2e/test_staging_full_saas.sh.

MECHANISM: The latest staging Platform Boot failure adds a second queue-handling edge after restart recovery: the known-answer A2A path correctly detects a 202 queued response and switches from POST to queue polling (`tests/e2e/test_staging_full_saas.sh:1286-1330`, `:1360-1380`), but the first GET `/workspaces/$PARENT_ID/a2a/queue/$KA_QUEUE_ID` returns 404. `GetA2AQueueStatus` intentionally collapses three cases to the same `queue item not found` body: no caller identity (`a2a_queue_status.go:200-217`), missing row (`:220-223`, `:238-241`), or caller/row auth mismatch (`:231-235`). The current test cannot distinguish which one happened, so a queued work item becomes an unrecoverable canary red even though enqueue itself succeeded. EVIDENCE: Run 356238/job 483329 reached tenant provisioning, workspace online/routable, and initial parent A2A success (`PONG`). The next known-answer turn logged a transient 502, then `known-answer A2A queued (queue_id=5af42033-e8d9-4609-9e85-1eb0fd7d8c83); switching to poll`, then immediately failed with `http=404` and body `queue item not found`. This is different from run 356215's first-A2A 202/no-text failure: here the known-answer code already enters the poll path, but queue-status lookup/auth masks the reason. RECOMMENDED FIX SHAPE: Keep this in molecule-core queue-status/staging E2E handling. Add debug-safe distinction for operator/test logs around `GetA2AQueueStatus` 404 causes, or make the staging test poll with an identity guaranteed to satisfy the queue-row access rule. If NULL/system caller IDs are expected after #2696 normalization, ensure `GetA2AQueueStatus` still allows the target workspace or tenant/org token to read its own queued item. Responsible files: `workspace-server/internal/handlers/a2a_queue_status.go` and `tests/e2e/test_staging_full_saas.sh`.
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/356378
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/356620
Member

MECHANISM: Run 356620 shows the post-config A2A ready-boundary bug still present after #2702 merged. The smoke reaches a healthy tenant and online/routable workspace, then step 8 sends the parent A2A and receives a successful queued response (queued:true, queue_id=...) but the main A2A path still treats that as A2A returned no text instead of switching to the queue-poll path. The known-answer branch has queue polling logic at tests/e2e/test_staging_full_saas.sh:1360-1379; the parent smoke A2A path around the text extraction/error checks does not apply the same queued-response handling.

EVIDENCE: molecule-core run 356620, job 484038, head 094da1609d60cd0f830d53d8838547cb88ef0627: tenant provisioning completed, workspace a043aaeb-91ae-4ce5-8b2a-2b64f5beddc1 was online and routable, image/files/terminal checks passed, then log excerpt A2A returned no text. Raw response was queued with queue_id=f42e6cd6-1c37-45fe-a954-a3aa3c7bc025. Neighboring run 356619 at the same SHA was green, so this is readiness/timing-sensitive, not deterministic deploy breakage.

RECOMMENDED FIX SHAPE: In tests/e2e/test_staging_full_saas.sh, factor the queued-response detection/polling used by known-answer A2A into the primary parent A2A step too. A 2xx queued response should poll GET /workspaces/:id/a2a/queue/:queue_id until completed/error/timeout, then assert text on the completed response; only completed responses with no text should hit the current A2A returned no text failure.

MECHANISM: Run `356620` shows the post-config A2A ready-boundary bug still present after #2702 merged. The smoke reaches a healthy tenant and online/routable workspace, then step 8 sends the parent A2A and receives a successful queued response (`queued:true`, `queue_id=...`) but the main A2A path still treats that as `A2A returned no text` instead of switching to the queue-poll path. The known-answer branch has queue polling logic at `tests/e2e/test_staging_full_saas.sh:1360-1379`; the parent smoke A2A path around the text extraction/error checks does not apply the same queued-response handling. EVIDENCE: molecule-core run `356620`, job `484038`, head `094da1609d60cd0f830d53d8838547cb88ef0627`: tenant provisioning completed, workspace `a043aaeb-91ae-4ce5-8b2a-2b64f5beddc1` was `online and routable`, image/files/terminal checks passed, then log excerpt `A2A returned no text`. Raw response was queued with `queue_id=f42e6cd6-1c37-45fe-a954-a3aa3c7bc025`. Neighboring run `356619` at the same SHA was green, so this is readiness/timing-sensitive, not deterministic deploy breakage. RECOMMENDED FIX SHAPE: In `tests/e2e/test_staging_full_saas.sh`, factor the queued-response detection/polling used by known-answer A2A into the primary parent A2A step too. A 2xx queued response should poll `GET /workspaces/:id/a2a/queue/:queue_id` until completed/error/timeout, then assert text on the completed response; only completed responses with no text should hit the current `A2A returned no text` failure.
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/356861
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/357016
Member

RCA tick update: staging smoke run 357016 reproduces the #2437 queue-as-final-response failure.

MECHANISM: the SaaS smoke reached the real lifecycle successfully through tenant provisioning, workspace online/routable, image upload/download, terminal reachability, and config.yaml PUT recovery. It then called parent A2A. tests/e2e/test_staging_full_saas.sh:1131-1145 breaks out on any 2xx A2A response, so the HTTP-2xx queued/busy response is treated as final. tests/e2e/test_staging_full_saas.sh:1168-1175 then extracts assistant text from that queued envelope, finds none, and hard-fails instead of polling the queue item or waiting for dispatch completion.

EVIDENCE: run 357016/job 484736, head/context Merge PR #2700 via Gitea merge queue, smoke slug e2e-smoke-20260613-smoke-357016. Log shows workspace e90eb531-a15d-4ec0-9aad-a88904f2976b reached online and routable, then failed at step 8. Log excerpt: A2A returned no text. Raw: {"message":"workspace agent busy. The response included queued:true, queue_depth:1, and queue_id:"82ad6ce0-2fb8-4c74-bde1-fa4809d71158".

RECOMMENDED FIX SHAPE: in tests/e2e/test_staging_full_saas.sh, teach the primary parent A2A step to recognize queued:true/queue_id as an intermediate accepted state. Poll the queue/status endpoint until a terminal response body is available, mirroring the known-answer queued path later in the same script, or keep retrying on the busy queued shape within a bounded budget. Do not classify a provider/workspace busy accepted response as final success or final no-text failure.

RCA tick update: staging smoke run 357016 reproduces the #2437 queue-as-final-response failure. MECHANISM: the SaaS smoke reached the real lifecycle successfully through tenant provisioning, workspace online/routable, image upload/download, terminal reachability, and config.yaml PUT recovery. It then called parent A2A. `tests/e2e/test_staging_full_saas.sh:1131-1145` breaks out on any 2xx A2A response, so the HTTP-2xx queued/busy response is treated as final. `tests/e2e/test_staging_full_saas.sh:1168-1175` then extracts assistant text from that queued envelope, finds none, and hard-fails instead of polling the queue item or waiting for dispatch completion. EVIDENCE: run 357016/job 484736, head/context `Merge PR #2700 via Gitea merge queue`, smoke slug `e2e-smoke-20260613-smoke-357016`. Log shows workspace `e90eb531-a15d-4ec0-9aad-a88904f2976b` reached online and routable, then failed at step 8. Log excerpt: `A2A returned no text. Raw: {"message":"workspace agent busy`. The response included `queued:true`, `queue_depth:1`, and `queue_id:"82ad6ce0-2fb8-4c74-bde1-fa4809d71158"`. RECOMMENDED FIX SHAPE: in `tests/e2e/test_staging_full_saas.sh`, teach the primary parent A2A step to recognize `queued:true`/`queue_id` as an intermediate accepted state. Poll the queue/status endpoint until a terminal response body is available, mirroring the known-answer queued path later in the same script, or keep retrying on the busy queued shape within a bounded budget. Do not classify a provider/workspace busy accepted response as final success or final no-text failure.
Smoke still failing. https://git.moleculesai.app/molecule-ai/molecule-core/actions/runs/357189
Sign in to join this conversation.
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#2437