Canary started flaking 2026-05-01 22:11 with model-refusal replies:
- "I'm unable to do that."
- "I'm unable to fulfill that request. Can I assist you with anything else?"
- "I'm unable to reply with responses that don't allow me to fulfill tasks…"
3 fails / 10 recent runs ≈ 30% flake.
Trigger: 2026-04-30's Platform Capabilities preamble (#2332) added the
directive "Use them proactively" to the top of every system prompt.
Combined with the heavy A2A + HMA tool docs further down, the model
reads the contrived bare-echo prompt ("Reply with exactly: PONG") as
out-of-role and intermittently refuses.
Real user prompts don't hit this — only the synthetic smoke prompt does,
so the right fix is in the canary's prompt phrasing, not the platform's
system prompt (which is correctly priming agents toward tool use). New
phrasing explicitly tells the model "this is a smoke test" and "no
tools or memory are needed" so it has permission to comply.
Also updates the child workspace's CHILD_PONG prompt with the same
framing — same failure mode would have hit it once full-mode runs again.
No code change to system prompt, no test infra change. Just two prompt
strings + a load-bearing comment so future readers don't trim back to
the brittle phrasing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>