fix(canvas/a2a-hint): route poll-mode budget timeout away from restart anti-pattern (P1 #348) #1607

Merged
core-devops merged 1 commits from fix/a2a-error-hint-timeout-class into staging 2026-05-20 13:32:38 +00:00
Owner

Summary

P1 #348 sub-fix. canvas/src/components/tabs/chat/a2aErrorHint.ts
previously routed the empty-detail and generic-timeout cases to a bare
"workspace restart is the safe first move" prompt — the anti-pattern
per feedback_surface_actionable_failure_reason_to_user when the
underlying error is a still-in-flight long-running task (PM
coordinating Researcher on codex is the canonical case).

Three changes:

  1. New polling timeout after/check_task_status pattern, ordered
    ABOVE the generic timeout bucket, routes the poll-mode budget-
    exhaustion shape (emitted by a2a_tools_delegation.py:199) to a
    hint that explicitly tells the user NOT to restart and to call
    check_task_status with the delegation_id to retrieve the
    in-flight result.

  2. New optional context.peerKind parameter. When the caller knows
    the callee is a codex runtime, the empty-detail and generic-
    timeout hints specialise to call out that codex tasks can
    legitimately exceed the 600s sync-proxy budget.

  3. The empty-detail branch (the most common "useless hint" path) no
    longer claims "restart is the safe first move" by default — it
    now tells the user to check the callee's Activity tab before
    restarting.

Risk

  • Pure refactor of advice text + new optional parameter.
  • Existing single-arg callers (ActivityTab.tsx:358,
    AgentCommsPanel.tsx:758) continue to compile unchanged.
  • One existing test was updated where it pinned the old "safe first
    move" wording; new behavior pinned with 8 added test cases.

Test plan

  • vitest run a2aErrorHint.test.ts — 17/17 pass (9 prior + 8 new)
  • tsc --noEmit clean for the file (pre-existing unrelated errors
    in other test files persist)
  • post-merge: trigger a poll-mode 600s timeout end-to-end and verify
    canvas shows the new "Do NOT restart" hint

Refs P1 #348, feedback_surface_actionable_failure_reason_to_user.

## Summary P1 #348 sub-fix. `canvas/src/components/tabs/chat/a2aErrorHint.ts` previously routed the empty-detail and generic-timeout cases to a bare "workspace restart is the safe first move" prompt — the anti-pattern per `feedback_surface_actionable_failure_reason_to_user` when the underlying error is a still-in-flight long-running task (PM coordinating Researcher on codex is the canonical case). Three changes: 1. New `polling timeout after`/`check_task_status` pattern, ordered ABOVE the generic timeout bucket, routes the poll-mode budget- exhaustion shape (emitted by `a2a_tools_delegation.py:199`) to a hint that explicitly tells the user NOT to restart and to call check_task_status with the delegation_id to retrieve the in-flight result. 2. New optional `context.peerKind` parameter. When the caller knows the callee is a codex runtime, the empty-detail and generic- timeout hints specialise to call out that codex tasks can legitimately exceed the 600s sync-proxy budget. 3. The empty-detail branch (the most common "useless hint" path) no longer claims "restart is the safe first move" by default — it now tells the user to check the callee's Activity tab before restarting. ## Risk - Pure refactor of advice text + new optional parameter. - Existing single-arg callers (`ActivityTab.tsx:358`, `AgentCommsPanel.tsx:758`) continue to compile unchanged. - One existing test was updated where it pinned the old "safe first move" wording; new behavior pinned with 8 added test cases. ## Test plan - [x] `vitest run a2aErrorHint.test.ts` — 17/17 pass (9 prior + 8 new) - [x] `tsc --noEmit` clean for the file (pre-existing unrelated errors in other test files persist) - [ ] post-merge: trigger a poll-mode 600s timeout end-to-end and verify canvas shows the new "Do NOT restart" hint Refs P1 #348, feedback_surface_actionable_failure_reason_to_user.
hongming added 1 commit 2026-05-20 10:25:06 +00:00
fix(canvas/a2a-hint): route poll-mode budget timeout away from restart anti-pattern (P1 #348)
E2E Chat / E2E Chat (pull_request) Blocked by required conditions
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 4s
CI / Detect changes (pull_request) Successful in 6s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 14s
E2E API Smoke Test / detect-changes (pull_request) Successful in 6s
E2E Chat / detect-changes (pull_request) Successful in 5s
Harness Replays / detect-changes (pull_request) Successful in 6s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 6s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 9s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 5s
CI / Platform (Go) (pull_request) Successful in 5m1s
gate-check-v3 / gate-check (pull_request) Successful in 9s
qa-review / approved (pull_request) Successful in 5s
security-review / approved (pull_request) Successful in 3s
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request) Successful in 3s
sop-tier-check / tier-check (pull_request) Successful in 4s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m2s
CI / Canvas (Next.js) (pull_request) Successful in 6m12s
CI / Python Lint & Test (pull_request) Successful in 7m0s
CI / all-required (pull_request) Successful in 6m10s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 12s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 6s
Harness Replays / Harness Replays (pull_request) Successful in 6s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 7s
audit-force-merge / audit (pull_request) Successful in 5s
dd3d11c51d
Per `feedback_surface_actionable_failure_reason_to_user`, an opaque
"workspace restart is the safe first move" prompt is the anti-pattern
when the underlying error is a still-in-flight long-running task —
which is exactly the canonical PM→Researcher codex coordination case
that surfaces "FAILED TO DELIVER" today.

Three improvements:

1. New `polling timeout after`/`check_task_status` pattern, ordered
   ABOVE the generic timeout bucket, routes to a hint that explicitly
   tells the user NOT to restart and to call check_task_status with
   the delegation_id to retrieve the in-flight result.

2. New optional `context.peerKind` parameter. When the caller knows
   the callee is a codex runtime, the empty-detail and generic-timeout
   hints specialise to call out that codex tasks can legitimately
   exceed the 600s sync-proxy budget.

3. The empty-detail branch (the most common "useless hint" path) no
   longer claims "restart is the safe first move" by default — it now
   tells the user to check the callee's Activity tab before restarting.

Existing single-arg callers continue to compile unchanged (context is
optional). Regression test count goes 9 → 17.

Refs P1 #348, feedback_surface_actionable_failure_reason_to_user.
core-be approved these changes 2026-05-20 13:32:35 +00:00
core-be left a comment
Member

APPROVED from core-be lens. PR routes poll-mode budget-timeout away from the misleading 'agent error' surface to a more specific 'budget exhausted' hint in canvas. Per feedback_surface_actionable_failure_reason_to_user; closes a UX-papercut where opaque timeouts looked like crashes. CI/all-required=success.

APPROVED from core-be lens. PR routes poll-mode budget-timeout away from the misleading 'agent error' surface to a more specific 'budget exhausted' hint in canvas. Per feedback_surface_actionable_failure_reason_to_user; closes a UX-papercut where opaque timeouts looked like crashes. CI/all-required=success.
core-qa approved these changes 2026-05-20 13:32:36 +00:00
core-qa left a comment
Member

APPROVED from core-qa lens. Hint-routing logic unit-tested; matches canvas TypeScript a2aErrorHint signature. CI green.

APPROVED from core-qa lens. Hint-routing logic unit-tested; matches canvas TypeScript a2aErrorHint signature. CI green.
core-devops merged commit c3ba26ead2 into staging 2026-05-20 13:32:38 +00:00
Sign in to join this conversation.
No Reviewers
3 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#1607