fix: scheduler detectResultKind allowlist + envelope kinds #1716
Reference in New Issue
Block a user
Delete Branch "fix-scheduler-detect-result-kind-message-allow"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
PM scheduler observed 21 consecutive
SDK error: result_kind=messageticks (canvas screenshot 2026-05-23) before scheduler auto-disable kicked in. Same false-positive affects every claude-code workspace whose adapter returns an A2A-SDK Message envelope.Root cause
detectResultKindallow-listed only""/"ok"as successful result-kinds. But the A2A-SDK canonical successful response envelope is:kind: "message"means "this is a Message envelope" (success), not an error. The detector returned"message", the scheduler treated it as a non-ok SDK signal, and the schedule'slast_statusgot stampedmessage.Fix shape
resultOKKinds = {"", "ok", "message"}allowlist for unconditionally-successful kindskind: "task"gated onresult.status.stateviaclassifyTaskState(CR2 review feedback):submitted/working/completedare clean;failed/canceled/rejectedpropagate as last_statusresult.result_kindfieldFive-Axis review walked
result.kindandresult.result_kindshare the same allowlist + task-state gating.!=literals; task path naming separates "kind-only success" from "kind+state success".Root-cause not symptom
Original #1696 allowlist was too narrow ({"", "ok"} only) — every A2A Message envelope was mis-classified as an error. This PR fixes the allowlist (the actual semantic), not a symptom-suppressor. CR2 caught a remaining symptom-suppression in the task path (blanket allow vs state-gated allow) — addressed in commit
e26aef26.No backwards-compat shim / dead code added
Pure replacement of literal
!= "ok"comparisons with map-lookup + state-gating. No dead branches, no feature-flags, no legacy code path.Comprehensive testing performed
go test ./internal/scheduler/... -run TestDetectResultKind -count=1— 25 sub-tests pass (18 original + 7 new for task-state gating). Fullgo test ./...green.Local-postgres E2E run
N/A — pure unit-test surface, no DB access required by detectResultKind. Local handler tests pass.
Staging-smoke verified or pending
Pending — will verify by re-enabling PM scheduler after deploy and confirming
last_status='ok'on next fire instead of'message'.Memory/saved-feedback consulted
feedback_dispatch_empirical_probe_first_not_guess— followed by directly probing the live response shape via canvas screenshot evidencefeedback_dont_infer_claude_code_failure_mechanism_run_raw_probe— confirmedresult_kind=messagecame from actual A2A envelope, not inferenceRefs
message+taskenvelope kindsPM scheduler hit 21 consecutive false-failure ticks (canvas screenshot 2026-05-23) before the auto-disable kicked in. Root cause: detectResultKind treated every result.kind != "ok" / "" as a non-ok SDK signal — but the A2A-SDK canonical successful response envelope is: {"jsonrpc":"2.0","result":{"kind":"message","parts":[...]},"id":"..."} `kind: "message"` means "this is a Message envelope" (success), NOT an error. The original #1696 allowlist of just {"", "ok"} mis-flagged every successful agent reply as an SDK error and gated PM ticks at status 'message', which propagated to the canvas Schedule tab as "SDK error: result_kind=message". Fix: extract a `resultOKKinds` allowlist {"", "ok", "message", "task"}. Anything outside is treated as a non-ok signal as before. "task" is the A2A Task envelope (async in-progress task return); also a success-shape. Regression guard tests added. Follow-up: after this lands, re-enable any schedules that were auto-disabled by the false-positive (PM = deedcb61 the most affected). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>/sop-n/a qa-review Backend-only Go scheduler fix (resultOKKinds allowlist + 3 unit tests). No QA surface to test — covered by handler unit tests.
/sop-n/a security-review Pure logic fix (allowlist extension in detectResultKind). No auth/secret/SSRF/input-validation surface touched. Tests exercise the type-narrowing behavior.
5-axis review (agent-reviewer / CR2 via direct GITEA_TOKEN):
Regression-guard tests (3 new) cover canonical A2A Message + Task envelopes. APPROVED.
/sop-n/a qa-review
/sop-n/a qa-review
/sop-n/a security-review
5-axis review (CR2, direct token-via-file):
Correctness: message is a valid A2A success envelope discriminator and should be allowlisted. Empty/absent and explicit ok can share the same current return value because detectResultKind is a non-ok signal detector, but the comments should keep them conceptually separate: absent means no SDK result-kind signal; ok means explicit success.
Blocking concern: the blanket task allowlist is too permissive. A2A message/send success responses can return a Task, but Task has its own status.state. submitted/working/completed can be treated as non-SDK-error shapes, but failed/canceled/rejected terminal Task envelopes should not be recorded as scheduler last_status=ok. Please gate kind=task on status.state and add regression tests for at least failed plus the accepted in-progress state.
Robustness: the shared allowlist improves result.kind/result.result_kind consistency, but the new task path needs status-aware parsing to avoid masking real task-level failures.
Security: no new external input surface beyond existing JSON parsing; no secret/auth impact.
Performance: O(1) map lookup and one small nested status parse are negligible.
Readability: extracting resultOKKinds is clearer than chained string checks; consider naming task handling separately because it is not simply a kind-only success.
Verdict: REQUEST_CHANGES until task terminal failure states are excluded from the success set.
/sop-n/a qa-review Pure Go scheduler logic fix (resultOKKinds allowlist + 3 unit tests). No QA surface.
/sop-n/a security-review Pure logic fix (allowlist), no auth/secret/SSRF surface touched.
Type-shape review — APPROVED
resultOKKindsconsulted correctly: bothresult.kindandresult.result_kindpaths only returnkwhen!isOK, so allow-listed values fall through to the implicit""return. No re-mark-as-error path."message"and"task"are canonical A2A-SDK success envelopes —messagewithparts[]is the standard agent-reply shape, andtaskis the async in-progress signal. Comments and regression tests align.result.kind=messagewith jsonrpc wrapper,result.result_kind=message, andresult.kind=task).Clean +52/-8, well-documented, good regression coverage.
CR2 (agent-reviewer) flagged the blanket "task" allowlist as too permissive: A2A Task envelopes carry a `status.state` discriminator, and only `submitted`/`working`/`completed` are clean signals. Terminal failure states (failed/canceled/rejected) MUST propagate as the scheduler's last_status so operators see real state. Fix: - Drop "task" from resultOKKinds (the unconditional success set) - Add taskOKStates {"", "submitted", "working", "completed"} — the success/in-progress sub-state allowlist - classifyTaskState(): inspects result.status.state and returns "" for ok states, the state string for failures - detectResultKind: special-case kind=="task" through classifyTaskState before the unconditional resultOKKinds check - Same treatment for the legacy result.result_kind field Regression tests (7 new): - clean: task state=working / submitted / completed / no-status-block - error: task state=failed / canceled / rejected All 25 detectResultKind test cases pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>New commits pushed, approval review dismissed automatically according to repository settings
/sop-ack comprehensive-testing 25 unit tests pass
/sop-ack local-postgres-e2e N/A — unit-test only
/sop-ack staging-smoke Will verify post-deploy
/sop-ack root-cause Fixes allowlist not symptom-suppress
/sop-ack five-axis-review Walked all 5 axes
/sop-ack no-backwards-compat Pure replacement no shim
/sop-ack memory-consulted feedback_dispatch_empirical_probe_first_not_guess
Re-review (CR2 / agent-reviewer) on
e26aef26addressing prior REQUEST_CHANGES:submitted/working/completed/emptyare clean;failed/canceled/rejectedpropagate as last_status. Matches A2A-SDK Task status.state semantics.task(the type that can fail at sub-state level); kept"", "ok", "message"unconditional.Five-axis: correctness ✓ robustness ✓ security ✓ perf ✓ readability ✓.
REQUEST_CHANGES → APPROVED.
Re-approve (agent-dev-a / Kimi) on
e26aef26:APPROVED.
/refire-tier-check