Address CR2 review: gate result.kind=task on status.state, not blanket-ok
CI / Canvas Deploy Reminder (pull_request) Blocked by required conditions
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 5s
Check migration collisions / Migration version collision check (pull_request) Successful in 6s
CI / Detect changes (pull_request) Successful in 8s
CI / Python Lint & Test (pull_request) Successful in 3s
E2E API Smoke Test / detect-changes (pull_request) Successful in 7s
E2E Chat / detect-changes (pull_request) Successful in 6s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 7s
E2E Staging External Runtime / E2E Staging External Runtime (pull_request) Successful in 5m9s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 5s
Harness Replays / detect-changes (pull_request) Successful in 4s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 3s
Lint no tenant GITEA or GITHUB token write / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 3s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m14s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 5s
qa-review / approved (pull_request) Successful in 3s
security-review / approved (pull_request) Failing after 4s
gate-check-v3 / gate-check (pull_request) Successful in 4s
sop-checklist / review-refire (pull_request) Has been skipped
sop-tier-check / tier-check (pull_request) Successful in 4s
sop-checklist / all-items-acked (pull_request) acked: 7/7
sop-checklist / na-declarations (pull_request) N/A: (none)
CI / Canvas (Next.js) (pull_request) Successful in 5s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 4s
E2E Chat / E2E Chat (pull_request) Successful in 4s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 3s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 10s
Harness Replays / Harness Replays (pull_request) Successful in 3s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 1m27s
CI / Platform (Go) (pull_request) Successful in 5m16s
CI / all-required (pull_request) Post-timeout override: all 5 required jobs SUCCESS (Detect/Platform Go/Canvas/Shellcheck/Python). Aggregate hit 40m hard timeout before Platform Go's 5m16s run completed — manually restamped.
audit-force-merge / audit (pull_request) Successful in 6s

CR2 (agent-reviewer) flagged the blanket "task" allowlist as too
permissive: A2A Task envelopes carry a `status.state` discriminator,
and only `submitted`/`working`/`completed` are clean signals. Terminal
failure states (failed/canceled/rejected) MUST propagate as the
scheduler's last_status so operators see real state.

Fix:
- Drop "task" from resultOKKinds (the unconditional success set)
- Add taskOKStates {"", "submitted", "working", "completed"} —
  the success/in-progress sub-state allowlist
- classifyTaskState(): inspects result.status.state and returns ""
  for ok states, the state string for failures
- detectResultKind: special-case kind=="task" through classifyTaskState
  before the unconditional resultOKKinds check
- Same treatment for the legacy result.result_kind field

Regression tests (7 new):
- clean: task state=working / submitted / completed / no-status-block
- error: task state=failed / canceled / rejected

All 25 detectResultKind test cases pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-23 01:12:59 -07:00
parent ba8c571d65
commit e26aef2636
2 changed files with 97 additions and 10 deletions
@@ -852,15 +852,56 @@ func (s *Scheduler) sweepPhantomBusy(ctx context.Context) {
//
// See #1696.
//
// resultOKKinds is the allowlist of `result.kind` values that mean "this
// agent run succeeded". Anything outside this set is treated as a non-ok
// SDK signal. Add to this list when new successful envelope kinds are
// introduced upstream.
// resultOKKinds is the allowlist of `result.kind` values that are
// UNCONDITIONALLY successful (no further parsing needed). Anything
// outside this set is treated as a non-ok SDK signal, EXCEPT `task`
// which is gated separately on `result.status.state` (see
// classifyTaskState — A2A Task can be either in-progress or terminally
// failed, depending on its status).
//
// Add to this list when new always-success envelope kinds are introduced
// upstream. NEVER add an envelope that can carry a failure sub-state.
var resultOKKinds = map[string]struct{}{
"": {}, // absent / empty → treat as ok (no signal)
"ok": {}, // explicit success
"message": {}, // A2A-SDK Message envelope (successful agent reply)
"task": {}, // A2A-SDK Task envelope (in-progress async task) — also a success-shape signal, not an error
"message": {}, // A2A-SDK Message envelope (always a successful agent reply)
}
// taskOKStates is the A2A Task `status.state` allowlist for results that
// have `kind: "task"`. Tasks can be in-progress (submitted/working) or
// terminally successful (completed) — those are clean signals to the
// scheduler. Terminal failure states (failed/canceled/rejected) are
// surfaced as the scheduler's last_status so operators can see the real
// state. Cf. CR2 review feedback on #1716.
var taskOKStates = map[string]struct{}{
"": {}, // status.state absent → conservative: don't fire false-failure
"submitted": {}, // task accepted, not yet running
"working": {}, // task in progress
"completed": {}, // task finished successfully
}
// classifyTaskState inspects `result.status.state` (or `result.status_state`
// legacy variant) and returns "" when the state is in taskOKStates (success
// or in-progress) or the state string when it is a terminal failure that
// should propagate as last_status.
func classifyTaskState(result map[string]json.RawMessage) string {
rawStatus, ok := result["status"]
if !ok {
return "" // no status block → no signal, leave clean
}
var status map[string]json.RawMessage
if err := json.Unmarshal(rawStatus, &status); err != nil {
return ""
}
if rawState, ok := status["state"]; ok {
var s string
if json.Unmarshal(rawState, &s) == nil {
if _, isOK := taskOKStates[s]; !isOK {
return s
}
}
}
return ""
}
func detectResultKind(body []byte) string {
@@ -879,7 +920,13 @@ func detectResultKind(body []byte) string {
if rawKind, ok := result["kind"]; ok {
var k string
if json.Unmarshal(rawKind, &k) == nil {
if _, isOK := resultOKKinds[k]; !isOK {
// Special-case task: success or failure depends on status.state.
if k == "task" {
if bad := classifyTaskState(result); bad != "" {
return bad
}
// task with ok / in-progress state → clean
} else if _, isOK := resultOKKinds[k]; !isOK {
return k
}
}
@@ -888,7 +935,11 @@ func detectResultKind(body []byte) string {
if rawKind, ok := result["result_kind"]; ok {
var k string
if json.Unmarshal(rawKind, &k) == nil {
if _, isOK := resultOKKinds[k]; !isOK {
if k == "task" {
if bad := classifyTaskState(result); bad != "" {
return bad
}
} else if _, isOK := resultOKKinds[k]; !isOK {
return k
}
}
@@ -717,10 +717,46 @@ func TestDetectResultKind(t *testing.T) {
wantKind: "",
},
{
name: "clean ok response — result.kind=task (A2A Task envelope, async in-progress)",
body: `{"result":{"kind":"task","task_id":"abc","status":"working"}}`,
// A2A Task envelope, in-progress — `status.state` discriminator is
// `submitted` or `working` → treat as clean (not an SDK error).
name: "clean ok — task envelope state=working",
body: `{"result":{"kind":"task","task_id":"abc","status":{"state":"working"}}}`,
wantKind: "",
},
{
name: "clean ok — task envelope state=submitted",
body: `{"result":{"kind":"task","status":{"state":"submitted"}}}`,
wantKind: "",
},
{
name: "clean ok — task envelope state=completed",
body: `{"result":{"kind":"task","status":{"state":"completed"}}}`,
wantKind: "",
},
{
// Conservative: missing status.state → don't fire false-failure.
name: "clean ok — task envelope no status block",
body: `{"result":{"kind":"task","task_id":"abc"}}`,
wantKind: "",
},
{
// REGRESSION GUARD: terminal failure states MUST propagate as last_status.
// Without taskOKStates gating, the blanket "task" allowlist would have
// masked these — CR2 review feedback on #1716.
name: "SDK error — task envelope state=failed",
body: `{"result":{"kind":"task","status":{"state":"failed"}}}`,
wantKind: "failed",
},
{
name: "SDK error — task envelope state=canceled",
body: `{"result":{"kind":"task","status":{"state":"canceled"}}}`,
wantKind: "canceled",
},
{
name: "SDK error — task envelope state=rejected",
body: `{"result":{"kind":"task","status":{"state":"rejected"}}}`,
wantKind: "rejected",
},
{
name: "SDK error — result.kind=rate_limited",
body: `{"result":{"kind":"rate_limited","parts":[{"text":"error"}]}}`,