forked from molecule-ai/molecule-core
test(e2e): canary classifies provider-quota 429 as operator-action, not platform regression
The staging canary's A2A step has a ladder of specific regression classifiers (hermes-agent down, model_not_found, Invalid API key, etc.) followed by a generic "error|exception" catch-all. Provider- side OpenAI 429 quota errors fell through to the catch-all, so the canary issue body and CI log just said "A2A returned an error-shaped response" — which is technically true but obscures the actual operator action. This adds a 7th classifier above the catch-all for "exceeded your current quota" / "insufficient_quota" — both terms appear in OpenAI's quota-exhaustion 429 response. When matched, the failure message names the operator action directly (top up MOLECULE_STAGING_OPENAI_KEY or rotate the secret) and links to #2578. Why this is correct, not "lowering the bar": - Steps 0–7 of the canary cover full platform health (CP up, tenant provisioned, DNS+TLS reachable, workspace booted, A2A delivered). - Reaching step 8 with a provider-side 429 means the platform IS healthy — the failure is downstream of all platform invariants. - The canary still exits 1 (CI stays red, threshold-3 alarm still fires); only the failure message changes. - All 6 existing specific classifiers run BEFORE this one, so any real platform regression is still caught with its specific message. Verification: - Regex tested against the actual 429 string from canary run 25291517608: "API call failed after 3 retries: HTTP 429: You exceeded your current quota..." → matches ✅ - Negative tests: "PONG", "hermes-agent unreachable" → no match ✅ - bash -n syntax check passes - shellcheck -S error clean Tracking: #2593 (canary), #2578 (root cause) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
2e9686036d
commit
4c49ff75f6
@ -517,6 +517,7 @@ fi
|
||||
# "Encrypted content is not supported" → hermes codex_responses API misroute (#14)
|
||||
# "Unknown provider" → bridge misconfigured PROVIDER= (regression of #13 fix)
|
||||
# "hermes-agent unreachable" → gateway process died
|
||||
# "exceeded your current quota" → MOLECULE_STAGING_OPENAI_KEY billing (NOT a platform regression — #2578)
|
||||
#
|
||||
# Fail LOUD with the specific pattern so CI log + alert channel makes the
|
||||
# regression unambiguous.
|
||||
@ -542,6 +543,16 @@ fi
|
||||
if echo "$AGENT_TEXT" | grep -qF "Invalid API key"; then
|
||||
fail "A2A — REGRESSION: tenant auth chain returned 'Invalid API key'. Likely CP boot-event 401 race (CP #238) or stale OPENAI_API_KEY in the runtime env. Raw: $AGENT_TEXT"
|
||||
fi
|
||||
# Provider quota exhausted — distinguish from a platform regression so
|
||||
# the canary alert names the operator action directly instead of falling
|
||||
# through to the generic "error-shaped response" message. Steps 0-7 having
|
||||
# passed means the platform itself is healthy (CP up, tenant provisioned,
|
||||
# workspace online, A2A delivery end-to-end). When the agent comes back
|
||||
# with a provider-side 429, that is a billing event on the configured
|
||||
# OpenAI key, not a platform regression. Tracked in #2578.
|
||||
if echo "$AGENT_TEXT" | grep -qiE "exceeded your current quota|insufficient_quota"; then
|
||||
fail "A2A — PROVIDER QUOTA EXHAUSTED (NOT a platform regression). Operator action: top up MOLECULE_STAGING_OPENAI_KEY billing or rotate to a higher-quota org at Settings → Secrets and Variables → Actions. Tracked in #2578. Raw: $AGENT_TEXT"
|
||||
fi
|
||||
# Generic catch-all — falls through if none of the known regressions hit.
|
||||
if echo "$AGENT_TEXT" | grep -qiE "error|exception"; then
|
||||
fail "A2A returned an error-shaped response: $AGENT_TEXT"
|
||||
|
||||
Loading…
Reference in New Issue
Block a user