fix(canvas): polite tasks/cancel before /workspaces/:id/restart for Stop All (task #377 companion) #1619

Merged
core-devops merged 1 commits from fix/377-canvas-polite-cancel-before-restart into main 2026-05-20 22:58:59 +00:00
Member

Summary

Companion to molecule-ai-workspace-template-claude-code PR#40 (fix/377-stop-all-propagation, core-be). Task #377.

PR#40 adds a fast-cancel path on the runtime side (executor.cancel() -> killpg the CLI subprocess group), gated on MOLECULE_STOP_PROPAGATE=true. Until canvas issues the A2A tasks/cancel JSON-RPC at the workspace before the heavy /restart, that runtime path is INERT in production - nothing ever reaches executor.cancel(). Flipping the env var would give zero canary signal. This PR closes that gap.

Fix - two-phase Stop All in Toolbar.tsx

  • Phase 1: POST /workspaces/:id/a2a with {method:"tasks/cancel", params:{}} for every active workspace in parallel. workspace-server/internal/handlers/a2a_proxy.go forwards the envelope verbatim to the runtime; the a2a-sdk framework dispatches tasks/cancel to AgentExecutor.cancel() (template-claude-code claude_sdk_executor.py:853).
  • Phase 2: poll the canvas Zustand store (driven by TASK_UPDATED WS pushes via canvas-events.ts:400) every 250ms for up to 8000ms. Drained workspaces (activeTasks=0) are excluded from phase 3.
  • Phase 3: for any workspace that did NOT drain inside the timeout (old runtime image, or cancel propagation stuck) fall through to the original heavy /workspaces/:id/restart. Behavior is a strict superset of pre-fix Stop All - stuck workspaces still get the hammer, well-behaved ones are spared.

Upstream citation

Per feedback_upstream_docs_first_before_hypothesizing:

  • A2A protocol spec §9.4.5 CancelTask - JSON-RPC binding for the abstract Cancel Task operation (https://a2a-protocol.org/latest/specification/).
  • a2a-sdk 1.0.3 a2a/compat/v0_3/types.py:1125 pins the wire method literal: method: Literal['tasks/cancel'] = 'tasks/cancel'. Matches the slash-notation our codebase already uses for "message/send" (workspace-server/internal/handlers/delegation.go:155, canvas/src/components/tabs/ScheduleTab.tsx:168).

Test plan

Four new specs in Toolbar.test.tsx:

  • Phase 1 dispatches tasks/cancel via /a2a for every active workspace before any /restart (order + envelope shape assertion).
  • When activeTasks drains to 0 during the poll window, /restart is NOT called.
  • When activeTasks does NOT drain inside the 8s timeout, /restart is called for each stuck workspace (with phase-1-before-phase-3 order assertion).
  • Selective drain - one workspace drains, the other doesn't; /restart is called only for the stuck one.

Full canvas vitest suite locally: 3360 passed, 1 skipped, 0 failed. Toolbar file: 25 tests (21 prior + 4 new).

No new dependencies. No new env vars. No DB migrations.

Refs: task #377, template-claude-code PR#40.

## Summary Companion to [`molecule-ai-workspace-template-claude-code` PR#40](https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-claude-code/pulls/40) (`fix/377-stop-all-propagation`, core-be). Task #377. PR#40 adds a fast-cancel path on the runtime side (`executor.cancel()` -> `killpg` the CLI subprocess group), gated on `MOLECULE_STOP_PROPAGATE=true`. **Until canvas issues the A2A `tasks/cancel` JSON-RPC at the workspace before the heavy `/restart`, that runtime path is INERT in production** - nothing ever reaches `executor.cancel()`. Flipping the env var would give zero canary signal. This PR closes that gap. ## Fix - two-phase Stop All in `Toolbar.tsx` - **Phase 1**: POST `/workspaces/:id/a2a` with `{method:"tasks/cancel", params:{}}` for every active workspace in parallel. `workspace-server/internal/handlers/a2a_proxy.go` forwards the envelope verbatim to the runtime; the a2a-sdk framework dispatches `tasks/cancel` to `AgentExecutor.cancel()` (template-claude-code `claude_sdk_executor.py:853`). - **Phase 2**: poll the canvas Zustand store (driven by `TASK_UPDATED` WS pushes via `canvas-events.ts:400`) every 250ms for up to 8000ms. Drained workspaces (`activeTasks=0`) are excluded from phase 3. - **Phase 3**: for any workspace that did NOT drain inside the timeout (old runtime image, or cancel propagation stuck) fall through to the original heavy `/workspaces/:id/restart`. Behavior is a strict superset of pre-fix Stop All - stuck workspaces still get the hammer, well-behaved ones are spared. ## Upstream citation Per `feedback_upstream_docs_first_before_hypothesizing`: - A2A protocol spec **§9.4.5 CancelTask** - JSON-RPC binding for the abstract Cancel Task operation (https://a2a-protocol.org/latest/specification/). - a2a-sdk 1.0.3 `a2a/compat/v0_3/types.py:1125` pins the wire method literal: `method: Literal['tasks/cancel'] = 'tasks/cancel'`. Matches the slash-notation our codebase already uses for `"message/send"` (`workspace-server/internal/handlers/delegation.go:155`, `canvas/src/components/tabs/ScheduleTab.tsx:168`). ## Test plan Four new specs in `Toolbar.test.tsx`: - [x] Phase 1 dispatches `tasks/cancel` via `/a2a` for every active workspace **before** any `/restart` (order + envelope shape assertion). - [x] When `activeTasks` drains to 0 during the poll window, `/restart` is NOT called. - [x] When `activeTasks` does NOT drain inside the 8s timeout, `/restart` is called for each stuck workspace (with phase-1-before-phase-3 order assertion). - [x] Selective drain - one workspace drains, the other doesn't; `/restart` is called only for the stuck one. Full canvas vitest suite locally: **3360 passed, 1 skipped, 0 failed**. Toolbar file: **25 tests (21 prior + 4 new)**. No new dependencies. No new env vars. No DB migrations. Refs: task #377, template-claude-code PR#40.
core-fe added 1 commit 2026-05-20 21:25:08 +00:00
fix(canvas): polite tasks/cancel before /workspaces/:id/restart for Stop All (task #377 companion)
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 3s
CI / Detect changes (pull_request) Successful in 6s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 22s
E2E API Smoke Test / detect-changes (pull_request) Successful in 11s
E2E Chat / detect-changes (pull_request) Successful in 11s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 7s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 5s
Harness Replays / detect-changes (pull_request) Successful in 7s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 5s
Lint no tenant GITEA or GITHUB token write / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 3s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m8s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 8s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 5s
gate-check-v3 / gate-check (pull_request) Successful in 3s
sop-checklist / review-refire (pull_request) Has been skipped
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request) Successful in 4s
sop-tier-check / tier-check (pull_request) Successful in 5s
CI / Platform (Go) (pull_request) Successful in 4m46s
CI / Canvas (Next.js) (pull_request) Successful in 6m14s
CI / Python Lint & Test (pull_request) Successful in 6m38s
CI / all-required (pull_request) Successful in 6m28s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 7s
Harness Replays / Harness Replays (pull_request) Successful in 3s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 7s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
E2E Chat / E2E Chat (pull_request) Failing after 5m39s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 7m19s
qa-review / approved (pull_request) Refired via /qa-recheck by unknown
security-review / approved (pull_request) Refired via /security-recheck; security-review failed
audit-force-merge / audit (pull_request) Successful in 3s
52304d99a2
Companion to template-claude-code PR#40 (fix/377-stop-all-propagation,
core-be). PR#40 adds a fast-cancel path on the runtime side
(executor.cancel -> killpg the CLI subprocess group), gated on
MOLECULE_STOP_PROPAGATE=true. But until canvas issues the A2A
tasks/cancel JSON-RPC at the workspace before the heavy /restart,
that runtime path is INERT in production - nothing ever reaches
executor.cancel(). Flipping the env var would produce zero canary
signal.

Fix - two-phase Stop All in Toolbar.tsx:

  Phase 1: POST /workspaces/:id/a2a with method "tasks/cancel" and
           empty params for every workspace that has activeTasks>0,
           in parallel. The workspace-server a2a_proxy.go forwards
           the envelope verbatim to the runtime; a2a-sdk dispatches
           tasks/cancel to AgentExecutor.cancel() on the runtime
           side (claude_sdk_executor.py cancel() at line 853).

  Phase 2: poll the canvas Zustand store (TASK_UPDATED pushes drive
           the active_tasks field via canvas-events.ts:400) every
           250ms for up to 8000ms. Drained workspaces (activeTasks=0)
           are removed from the to-restart set.

  Phase 3: for any workspace that did NOT drain inside the timeout
           - runtime on an old image without the cancel hook, or
           cancel propagation stuck - fall through to the original
           heavy /workspaces/:id/restart. Behavior is a strict
           superset of pre-fix Stop All: stuck workspaces still get
           the hammer, well-behaved workspaces are spared.

Upstream wire-shape citations (per
feedback_upstream_docs_first_before_hypothesizing):

- A2A protocol spec section 9.4.5 "CancelTask" - JSON-RPC binding
  for the abstract Cancel Task operation
  (https://a2a-protocol.org/latest/specification/)
- a2a-sdk 1.0.3 a2a/compat/v0_3/types.py line 1125 pins the wire
  method literal: `method: Literal['tasks/cancel'] = 'tasks/cancel'`.
  Matches the slash-notation our codebase already uses for
  "message/send" (workspace-server/internal/handlers/delegation.go:155,
  canvas/src/components/tabs/ScheduleTab.tsx:168).

Tests - 4 new specs in Toolbar.test.tsx covering each phase:

  1. phase 1 dispatches tasks/cancel via /a2a for every active
     workspace BEFORE any /restart (order assertion + envelope shape).
  2. when activeTasks drains to 0 during the poll window, /restart
     is NOT called.
  3. when activeTasks does NOT drain inside the 8s timeout, /restart
     is called for each stuck workspace (with phase-1-before-phase-3
     order assertion).
  4. selective drain - one workspace drains, the other doesn't;
     /restart is called only for the stuck one.

Full canvas vitest suite: 3360 passed, 1 skipped, 0 failed. Toolbar
file alone: 25 tests (21 prior + 4 new).

Refs: task #377, template-claude-code PR#40
core-qa approved these changes 2026-05-20 21:37:35 +00:00
Dismissed
core-qa left a comment
Member

core-qa five-axis lens review (PR#1619 @ 52304d9)

Reviewing Toolbar.tsx stopAll three-phase polite-cancel + 4 new vitest specs against task #377 acceptance criteria.

1. Correctness (drain-poll behaviour + UI feedback). Verified: setStopping(true) flips at L114 BEFORE phase 1, button text/aria switch to "Stopping..." at L297/291 immediately, setStopping(false) clears only after phase 3 completes (L168). User does NOT see a frozen UI during the 8s drain — they see the button labelled "Stopping..." and disabled. Poll cadence (250ms) is reasonable: 32 polls max per drain, store-read only (no network), no perceivable jitter. No finding.

2. Timeout sizing. 8000ms drain timeout, justified in-line against template-claude-code _SIGTERM_GRACE_S=5s plus WS round-trip buffer for TASK_UPDATED push. Sound bound: short enough that a wedged workspace gets /restart within ~8s (user-acceptable for a "Stop All" click), long enough to cover the 5s grace + 1-2s WS broadcast. No finding.

3. Selective fallthrough. Phase-3 selective test (L469-486) directly exercises ws-0 drains + ws-1 stuck → only /workspaces/ws-1/restart fires, expects 1 restart call, asserts target URL. Implementation maintains undrained Set granularly per-id (L142-154). No finding.

4. Backwards compat (older runtimes lacking tasks/cancel). Phase-1 errors are .catch'd to no-op (L129), so a 4xx/5xx/timeout from an old runtime simply skips the cancel and falls into the drain-poll. activeTasks will not decrement (no cancel happened), drain times out at 8s, phase 3 hits /restart — strictly status-quo behaviour. Test phase-3-fallthrough (L438-465) pins this ordering. No finding.

5. Test-suite signal. 4 new specs cover the spec-mandated cases (phase ordering, drain-skip, timeout-fallthrough, partial selective). Test setup at L94-160 wires the live useCanvasStore so drain-poll reads fresh state. vitest 3360 passing claim per PR body. No finding.

Verdict: APPROVED. Logic is clean, tests cover all stated invariants, fallthrough is conservative.

### core-qa five-axis lens review (PR#1619 @ 52304d9) Reviewing Toolbar.tsx stopAll three-phase polite-cancel + 4 new vitest specs against task #377 acceptance criteria. **1. Correctness (drain-poll behaviour + UI feedback).** Verified: `setStopping(true)` flips at L114 BEFORE phase 1, button text/aria switch to "Stopping..." at L297/291 immediately, `setStopping(false)` clears only after phase 3 completes (L168). User does NOT see a frozen UI during the 8s drain — they see the button labelled "Stopping..." and disabled. Poll cadence (250ms) is reasonable: 32 polls max per drain, store-read only (no network), no perceivable jitter. No finding. **2. Timeout sizing.** 8000ms drain timeout, justified in-line against template-claude-code `_SIGTERM_GRACE_S=5s` plus WS round-trip buffer for TASK_UPDATED push. Sound bound: short enough that a wedged workspace gets /restart within ~8s (user-acceptable for a "Stop All" click), long enough to cover the 5s grace + 1-2s WS broadcast. No finding. **3. Selective fallthrough.** Phase-3 selective test (L469-486) directly exercises ws-0 drains + ws-1 stuck → only `/workspaces/ws-1/restart` fires, expects 1 restart call, asserts target URL. Implementation maintains `undrained` Set granularly per-id (L142-154). No finding. **4. Backwards compat (older runtimes lacking `tasks/cancel`).** Phase-1 errors are .catch'd to no-op (L129), so a 4xx/5xx/timeout from an old runtime simply skips the cancel and falls into the drain-poll. activeTasks will not decrement (no cancel happened), drain times out at 8s, phase 3 hits `/restart` — strictly status-quo behaviour. Test phase-3-fallthrough (L438-465) pins this ordering. No finding. **5. Test-suite signal.** 4 new specs cover the spec-mandated cases (phase ordering, drain-skip, timeout-fallthrough, partial selective). Test setup at L94-160 wires the live useCanvasStore so drain-poll reads fresh state. vitest 3360 passing claim per PR body. No finding. Verdict: APPROVED. Logic is clean, tests cover all stated invariants, fallthrough is conservative.
core-qa approved these changes 2026-05-20 21:38:51 +00:00
Dismissed
core-qa left a comment
Member

core-qa five-axis lens review (PR#1619 @ 52304d9)

Reviewing Toolbar.tsx stopAll three-phase polite-cancel + 4 new vitest specs against task #377 acceptance criteria.

1. Correctness (drain-poll behaviour + UI feedback). Verified: setStopping(true) flips at L114 BEFORE phase 1, button text/aria switch to "Stopping..." at L297/291 immediately, setStopping(false) clears only after phase 3 completes (L168). User does NOT see a frozen UI during the 8s drain — they see the button labelled "Stopping..." and disabled. Poll cadence (250ms) is reasonable: 32 polls max per drain, store-read only (no network), no perceivable jitter. No finding.

2. Timeout sizing. 8000ms drain timeout, justified in-line against template-claude-code _SIGTERM_GRACE_S=5s plus WS round-trip buffer for TASK_UPDATED push. Sound bound: short enough that a wedged workspace gets /restart within ~8s (user-acceptable for a "Stop All" click), long enough to cover the 5s grace + 1-2s WS broadcast. No finding.

3. Selective fallthrough. Phase-3 selective test (L469-486) directly exercises ws-0 drains + ws-1 stuck → only /workspaces/ws-1/restart fires, expects 1 restart call, asserts target URL. Implementation maintains undrained Set granularly per-id (L142-154). No finding.

4. Backwards compat (older runtimes lacking tasks/cancel). Phase-1 errors are .catch'd to no-op (L129), so a 4xx/5xx/timeout from an old runtime simply skips the cancel and falls into the drain-poll. activeTasks will not decrement (no cancel happened), drain times out at 8s, phase 3 hits /restart — strictly status-quo behaviour. Test phase-3-fallthrough (L438-465) pins this ordering. No finding.

5. Test-suite signal. 4 new specs cover the spec-mandated cases (phase ordering, drain-skip, timeout-fallthrough, partial selective). Test setup at L94-160 wires the live useCanvasStore so drain-poll reads fresh state. vitest 3360 passing claim per PR body. No finding.

Verdict: APPROVED. Logic is clean, tests cover all stated invariants, fallthrough is conservative.

### core-qa five-axis lens review (PR#1619 @ 52304d9) Reviewing Toolbar.tsx stopAll three-phase polite-cancel + 4 new vitest specs against task #377 acceptance criteria. **1. Correctness (drain-poll behaviour + UI feedback).** Verified: `setStopping(true)` flips at L114 BEFORE phase 1, button text/aria switch to "Stopping..." at L297/291 immediately, `setStopping(false)` clears only after phase 3 completes (L168). User does NOT see a frozen UI during the 8s drain — they see the button labelled "Stopping..." and disabled. Poll cadence (250ms) is reasonable: 32 polls max per drain, store-read only (no network), no perceivable jitter. No finding. **2. Timeout sizing.** 8000ms drain timeout, justified in-line against template-claude-code `_SIGTERM_GRACE_S=5s` plus WS round-trip buffer for TASK_UPDATED push. Sound bound: short enough that a wedged workspace gets /restart within ~8s (user-acceptable for a "Stop All" click), long enough to cover the 5s grace + 1-2s WS broadcast. No finding. **3. Selective fallthrough.** Phase-3 selective test (L469-486) directly exercises ws-0 drains + ws-1 stuck → only `/workspaces/ws-1/restart` fires, expects 1 restart call, asserts target URL. Implementation maintains `undrained` Set granularly per-id (L142-154). No finding. **4. Backwards compat (older runtimes lacking `tasks/cancel`).** Phase-1 errors are `.catch`'d to no-op (L129), so a 4xx/5xx/timeout from an old runtime simply skips the cancel and falls into the drain-poll. activeTasks will not decrement (no cancel happened), drain times out at 8s, phase 3 hits `/restart` — strictly status-quo behaviour. Test phase-3-fallthrough (L438-465) pins this ordering. No finding. **5. Test-suite signal.** 4 new specs cover the spec-mandated cases (phase ordering, drain-skip, timeout-fallthrough, partial selective). Test setup at L94-160 wires the live useCanvasStore so drain-poll reads fresh state. vitest 3360 passing claim per PR body. No finding. Verdict: APPROVED. Logic is clean, tests cover all stated invariants, fallthrough is conservative.
core-qa approved these changes 2026-05-20 21:38:57 +00:00
core-qa left a comment
Member

core-qa five-axis lens review (PR#1619 @ 52304d9)

Reviewing Toolbar.tsx stopAll three-phase polite-cancel + 4 new vitest specs against task #377 acceptance criteria.

1. Correctness (drain-poll behaviour + UI feedback). Verified: setStopping(true) flips at L114 BEFORE phase 1, button text/aria switch to "Stopping..." at L297/291 immediately, setStopping(false) clears only after phase 3 completes (L168). User does NOT see a frozen UI during the 8s drain — they see the button labelled "Stopping..." and disabled. Poll cadence (250ms) is reasonable: 32 polls max per drain, store-read only (no network), no perceivable jitter. No finding.

2. Timeout sizing. 8000ms drain timeout, justified in-line against template-claude-code _SIGTERM_GRACE_S=5s plus WS round-trip buffer for TASK_UPDATED push. Sound bound: short enough that a wedged workspace gets /restart within ~8s (user-acceptable for a "Stop All" click), long enough to cover the 5s grace + 1-2s WS broadcast. No finding.

3. Selective fallthrough. Phase-3 selective test (L469-486) directly exercises ws-0 drains + ws-1 stuck → only /workspaces/ws-1/restart fires, expects 1 restart call, asserts target URL. Implementation maintains undrained Set granularly per-id (L142-154). No finding.

4. Backwards compat (older runtimes lacking tasks/cancel). Phase-1 errors are .catch'd to no-op (L129), so a 4xx/5xx/timeout from an old runtime simply skips the cancel and falls into the drain-poll. activeTasks will not decrement (no cancel happened), drain times out at 8s, phase 3 hits /restart — strictly status-quo behaviour. Test phase-3-fallthrough (L438-465) pins this ordering. No finding.

5. Test-suite signal. 4 new specs cover the spec-mandated cases (phase ordering, drain-skip, timeout-fallthrough, partial selective). Test setup at L94-160 wires the live useCanvasStore so drain-poll reads fresh state. vitest 3360 passing claim per PR body. No finding.

Verdict: APPROVED. Logic is clean, tests cover all stated invariants, fallthrough is conservative.

### core-qa five-axis lens review (PR#1619 @ 52304d9) Reviewing Toolbar.tsx stopAll three-phase polite-cancel + 4 new vitest specs against task #377 acceptance criteria. **1. Correctness (drain-poll behaviour + UI feedback).** Verified: `setStopping(true)` flips at L114 BEFORE phase 1, button text/aria switch to "Stopping..." at L297/291 immediately, `setStopping(false)` clears only after phase 3 completes (L168). User does NOT see a frozen UI during the 8s drain — they see the button labelled "Stopping..." and disabled. Poll cadence (250ms) is reasonable: 32 polls max per drain, store-read only (no network), no perceivable jitter. No finding. **2. Timeout sizing.** 8000ms drain timeout, justified in-line against template-claude-code `_SIGTERM_GRACE_S=5s` plus WS round-trip buffer for TASK_UPDATED push. Sound bound: short enough that a wedged workspace gets /restart within ~8s (user-acceptable for a "Stop All" click), long enough to cover the 5s grace + 1-2s WS broadcast. No finding. **3. Selective fallthrough.** Phase-3 selective test (L469-486) directly exercises ws-0 drains + ws-1 stuck → only `/workspaces/ws-1/restart` fires, expects 1 restart call, asserts target URL. Implementation maintains `undrained` Set granularly per-id (L142-154). No finding. **4. Backwards compat (older runtimes lacking `tasks/cancel`).** Phase-1 errors are `.catch`'d to no-op (L129), so a 4xx/5xx/timeout from an old runtime simply skips the cancel and falls into the drain-poll. activeTasks will not decrement (no cancel happened), drain times out at 8s, phase 3 hits `/restart` — strictly status-quo behaviour. Test phase-3-fallthrough (L438-465) pins this ordering. No finding. **5. Test-suite signal.** 4 new specs cover the spec-mandated cases (phase ordering, drain-skip, timeout-fallthrough, partial selective). Test setup at L94-160 wires the live useCanvasStore so drain-poll reads fresh state. vitest 3360 passing claim per PR body. No finding. Verdict: APPROVED. Logic is clean, tests cover all stated invariants, fallthrough is conservative.
core-devops approved these changes 2026-05-20 21:39:14 +00:00
core-devops left a comment
Member

core-devops five-axis lens review (PR#1619 @ 52304d9)

1. A2A endpoint accepts tasks/cancel (empirical). Read workspace-server/internal/handlers/a2a_proxy.go @ main — normalizeA2APayload wraps in JSON-RPC envelope and forwards the method as-is. NO method allowlist, NO switch on method. The v0.2/v0.3 params.message shape-check (L742-779) only fires when params.message is present; for tasks/cancel with params:{} it is bypassed. a2aMethod surfaces in logA2ASuccess/logA2AReceiveQueued (L509/L418/L553), so attempts will appear in activity_logs with method="tasks/cancel". No finding.

2. Tiny-PR rule. +91/-2 in Toolbar.tsx is surgical: replaces the single stopAll callback body with the three-phase variant + explanatory block-comment. No drive-by refactor. +174/-4 test file is the 4 new specs + shared store-wiring helper. No finding.

3. Observability. tasks/cancel flows through the same A2A log path as every method — Loki method="tasks/cancel" will be populated. Canvas side adds no dedicated log for "drain succeeded" vs "fell through to /restart"; activity_logs is the only signal. Non-blocking nit: a console.info at phase-3 entry counting undrained.size would help ops correlate Stop-All clicks with fall-through frequency. Non-blocking finding.

4. Companion-PR ordering (central devops Q). Two safe orderings. (a) Canvas-first: tasks/cancel hits an old runtime, a2a-sdk 1.0.3 default AgentExecutor.cancel() raises ServerError(UnsupportedOperationError) → non-2xx → canvas .catch swallows → drain times out → phase 3 /restart fires. Strictly status-quo. (b) Runtime-first (PR#40): canvas keeps old behaviour, calls /restart directly — also status-quo. Either order is safe. RECOMMENDED: ship PR#40 first, then this PR — that exercises the polite-cancel path against an executor that honors it the moment canvas starts calling, eliminating a transient window where every Stop All is a no-op cancel + restart. Not blocking; just optimal. No blocking finding.

5. Risk surface. Fallthrough is conservative (drained skip /restart, undrained get heavy stop). No new route, no new state. Worst case: cancel succeeds but TASK_UPDATED push slow → drain times out → /restart fires — same as pre-fix behaviour. No finding.

Verdict: APPROVED. Generic proxy verified to accept tasks/cancel, fallthrough is status-quo. Ship PR#40 first if convenient, not blocking.

### core-devops five-axis lens review (PR#1619 @ 52304d9) **1. A2A endpoint accepts `tasks/cancel` (empirical).** Read workspace-server/internal/handlers/a2a_proxy.go @ main — `normalizeA2APayload` wraps in JSON-RPC envelope and forwards the `method` as-is. NO method allowlist, NO switch on method. The v0.2/v0.3 `params.message` shape-check (L742-779) only fires when `params.message` is present; for `tasks/cancel` with `params:{}` it is bypassed. `a2aMethod` surfaces in `logA2ASuccess`/`logA2AReceiveQueued` (L509/L418/L553), so attempts will appear in activity_logs with `method="tasks/cancel"`. No finding. **2. Tiny-PR rule.** +91/-2 in Toolbar.tsx is surgical: replaces the single `stopAll` callback body with the three-phase variant + explanatory block-comment. No drive-by refactor. +174/-4 test file is the 4 new specs + shared store-wiring helper. No finding. **3. Observability.** `tasks/cancel` flows through the same A2A log path as every method — Loki `method="tasks/cancel"` will be populated. Canvas side adds no dedicated log for "drain succeeded" vs "fell through to /restart"; activity_logs is the only signal. Non-blocking nit: a `console.info` at phase-3 entry counting `undrained.size` would help ops correlate Stop-All clicks with fall-through frequency. Non-blocking finding. **4. Companion-PR ordering (central devops Q).** Two safe orderings. (a) Canvas-first: `tasks/cancel` hits an old runtime, a2a-sdk 1.0.3 default `AgentExecutor.cancel()` raises `ServerError(UnsupportedOperationError)` → non-2xx → canvas `.catch` swallows → drain times out → phase 3 `/restart` fires. **Strictly status-quo.** (b) Runtime-first (PR#40): canvas keeps old behaviour, calls `/restart` directly — also status-quo. **Either order is safe.** RECOMMENDED: ship PR#40 first, then this PR — that exercises the polite-cancel path against an executor that honors it the moment canvas starts calling, eliminating a transient window where every Stop All is a no-op cancel + restart. Not blocking; just optimal. No blocking finding. **5. Risk surface.** Fallthrough is conservative (drained skip /restart, undrained get heavy stop). No new route, no new state. Worst case: cancel succeeds but TASK_UPDATED push slow → drain times out → /restart fires — same as pre-fix behaviour. No finding. Verdict: APPROVED. Generic proxy verified to accept `tasks/cancel`, fallthrough is status-quo. Ship PR#40 first if convenient, not blocking.
Member

/sop-tier-recheck

/sop-tier-recheck
Member

/qa-recheck

/qa-recheck
Member

/security-recheck

/security-recheck
core-devops merged commit 55fa44571e into main 2026-05-20 22:58:59 +00:00
Sign in to join this conversation.
No Reviewers
3 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#1619