The staging canary's A2A step has a ladder of specific regression
classifiers (hermes-agent down, model_not_found, Invalid API key,
etc.) followed by a generic "error|exception" catch-all. Provider-
side OpenAI 429 quota errors fell through to the catch-all, so the
canary issue body and CI log just said "A2A returned an error-shaped
response" — which is technically true but obscures the actual
operator action.
This adds a 7th classifier above the catch-all for "exceeded your
current quota" / "insufficient_quota" — both terms appear in
OpenAI's quota-exhaustion 429 response. When matched, the failure
message names the operator action directly (top up MOLECULE_STAGING_OPENAI_KEY
or rotate the secret) and links to #2578.
Why this is correct, not "lowering the bar":
- Steps 0–7 of the canary cover full platform health (CP up, tenant
provisioned, DNS+TLS reachable, workspace booted, A2A delivered).
- Reaching step 8 with a provider-side 429 means the platform IS
healthy — the failure is downstream of all platform invariants.
- The canary still exits 1 (CI stays red, threshold-3 alarm still
fires); only the failure message changes.
- All 6 existing specific classifiers run BEFORE this one, so any
real platform regression is still caught with its specific message.
Verification:
- Regex tested against the actual 429 string from canary run 25291517608:
"API call failed after 3 retries: HTTP 429: You exceeded your current quota..."
→ matches ✅
- Negative tests: "PONG", "hermes-agent unreachable" → no match ✅
- bash -n syntax check passes
- shellcheck -S error clean
Tracking: #2593 (canary), #2578 (root cause)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>