forked from molecule-ai/molecule-core
Ships scoped Phase 3 of the Hermes multi-provider work. Every workspace
can now declare an ordered list of (provider, model) rungs; when the
pinned model hits rate-limit / 5xx / context-length / overload, the
executor advances to the next rung before raising.
## Why
3× Claude Max saturation is a routine occurrence now — the "first 429 on
a batch delegation" is the common path, not the exception. A workspace
pinned to Haiku that hits a context-length limit has no recovery today;
same for Sonnet hitting rate-limit mid-synthesis. Escalation promotes
to the next tier for that single call, preserves coordination, avoids
restart cascades.
## New module: adapters/hermes/escalation.py
- ``LadderRung(provider, model)`` — one config entry.
- ``parse_ladder(raw)`` — tolerant config parser; skips malformed rungs
with a warning rather than raising so boot stays resilient.
- ``should_escalate(exc) -> bool`` — truth table over 15+ error shapes:
- Typed classes (RateLimitError, OverloadedError, APITimeoutError,
APIConnectionError, InternalServerError)
- Context-length markers (each provider uses different phrasing)
- Gateway markers (502/503/504, overloaded, temporarily unavailable)
- Status-code substrings (429, 529, 5xx)
- Hard-rejects auth failures (401/403/invalid_api_key) even if the
outer exception class is RateLimitError — wrapping case matters.
## Executor wiring
``HermesA2AExecutor`` now accepts ``escalation_ladder`` in its
constructor + ``create_executor()`` factory. ``_do_inference()`` walks
the ladder:
1. First attempt = pinned provider:model (matches pre-ladder behaviour)
2. On escalatable error, try each rung in order
3. On non-escalatable error, raise immediately (auth, malformed payload)
4. On exhaustion, raise the last error
Rung switches temporarily rebind ``self.provider_cfg`` / ``self.model``
/ ``self.api_key`` / ``self.base_url`` in a try/finally, so any raised
error leaves the executor in its original state for the next call. Key
resolution for non-pinned rungs goes through ``resolve_provider`` which
reads the rung-provider's env vars fresh.
## Config shape
``config.yaml`` (rendered from ``org.yaml`` → workspace secrets):
runtime_config:
escalation_ladder:
- provider: gemini
model: gemini-2.5-flash
- provider: anthropic
model: claude-sonnet-4-5-20250929
- provider: anthropic
model: claude-opus-4-1-20250805
Empty / absent = single-shot behaviour, full backwards-compat with
every existing workspace.
## Tests
34 passing, all isolated (no network):
- ``test_hermes_escalation.py`` (28): parser + truth-table across
rate-limit, overload, context-length, gateway, auth-reject, unrelated
exceptions, and case-insensitivity.
- ``test_hermes_ladder_integration.py`` (6): no-ladder single call,
ladder-not-triggered on success, escalate-on-rate-limit-then-succeed,
stop-on-non-escalatable, raise-last-error-when-exhausted, skip-
unknown-provider-in-rung.
## Not in this PR
- Uncertainty-driven escalation (judge pass after successful reply).
- Per-workspace budget tracking (#305 covers this separately).
- Live streaming reuse across rungs (ladder retries the whole call).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|---|---|---|
| .. | ||
| __init__.py | ||
| conftest.py | ||
| test_a2a_cli.py | ||
| test_a2a_client.py | ||
| test_a2a_executor.py | ||
| test_a2a_mcp_server.py | ||
| test_a2a_tools_impl.py | ||
| test_a2a_tools_module.py | ||
| test_adapters.py | ||
| test_agent_base_urls.py | ||
| test_agent.py | ||
| test_approval.py | ||
| test_audit.py | ||
| test_awareness_client_full.py | ||
| test_claude_sdk_executor.py | ||
| test_cli_executor.py | ||
| test_common_setup.py | ||
| test_compliance.py | ||
| test_config.py | ||
| test_consolidation.py | ||
| test_coordinator_parent.py | ||
| test_coordinator_routing.py | ||
| test_delegation.py | ||
| test_events.py | ||
| test_executor_helpers.py | ||
| test_first_party_plugins.py | ||
| test_governance.py | ||
| test_heartbeat.py | ||
| test_hermes_adapter.py | ||
| test_hermes_escalation.py | ||
| test_hermes_ladder_integration.py | ||
| test_hermes_phase2_dispatch.py | ||
| test_hermes_providers.py | ||
| test_hermes_smoke.py | ||
| test_hitl.py | ||
| test_main_initial_prompt.py | ||
| test_mcp_memory.py | ||
| test_medo.py | ||
| test_memory.py | ||
| test_molecule_ai_status.py | ||
| test_namespaces.py | ||
| test_openclaw_adapter.py | ||
| test_platform_auth.py | ||
| test_plugins_builtins_drift.py | ||
| test_plugins_builtins.py | ||
| test_plugins_registry.py | ||
| test_plugins.py | ||
| test_preflight.py | ||
| test_prompt.py | ||
| test_qianfan_provider.py | ||
| test_routing_policy.py | ||
| test_sandbox.py | ||
| test_security_scan.py | ||
| test_shared_runtime.py | ||
| test_skills_loader.py | ||
| test_skills_watcher.py | ||
| test_telemetry.py | ||
| test_temporal_workflow.py | ||
| test_transcript_auth.py | ||
| test_transcript_lines.py | ||
| test_watcher.py | ||