molecule-core

History

rabbitblood 7d732cec3c feat(hermes): escalation ladder — promote to stronger models on transient failure Ships scoped Phase 3 of the Hermes multi-provider work. Every workspace can now declare an ordered list of (provider, model) rungs; when the pinned model hits rate-limit / 5xx / context-length / overload, the executor advances to the next rung before raising. ## Why 3× Claude Max saturation is a routine occurrence now — the "first 429 on a batch delegation" is the common path, not the exception. A workspace pinned to Haiku that hits a context-length limit has no recovery today; same for Sonnet hitting rate-limit mid-synthesis. Escalation promotes to the next tier for that single call, preserves coordination, avoids restart cascades. ## New module: adapters/hermes/escalation.py - ``LadderRung(provider, model)`` — one config entry. - ``parse_ladder(raw)`` — tolerant config parser; skips malformed rungs with a warning rather than raising so boot stays resilient. - ``should_escalate(exc) -> bool`` — truth table over 15+ error shapes: - Typed classes (RateLimitError, OverloadedError, APITimeoutError, APIConnectionError, InternalServerError) - Context-length markers (each provider uses different phrasing) - Gateway markers (502/503/504, overloaded, temporarily unavailable) - Status-code substrings (429, 529, 5xx) - Hard-rejects auth failures (401/403/invalid_api_key) even if the outer exception class is RateLimitError — wrapping case matters. ## Executor wiring ``HermesA2AExecutor`` now accepts ``escalation_ladder`` in its constructor + ``create_executor()`` factory. ``_do_inference()`` walks the ladder: 1. First attempt = pinned provider:model (matches pre-ladder behaviour) 2. On escalatable error, try each rung in order 3. On non-escalatable error, raise immediately (auth, malformed payload) 4. On exhaustion, raise the last error Rung switches temporarily rebind ``self.provider_cfg`` / ``self.model`` / ``self.api_key`` / ``self.base_url`` in a try/finally, so any raised error leaves the executor in its original state for the next call. Key resolution for non-pinned rungs goes through ``resolve_provider`` which reads the rung-provider's env vars fresh. ## Config shape ``config.yaml`` (rendered from ``org.yaml`` → workspace secrets): runtime_config: escalation_ladder: - provider: gemini model: gemini-2.5-flash - provider: anthropic model: claude-sonnet-4-5-20250929 - provider: anthropic model: claude-opus-4-1-20250805 Empty / absent = single-shot behaviour, full backwards-compat with every existing workspace. ## Tests 34 passing, all isolated (no network): - ``test_hermes_escalation.py`` (28): parser + truth-table across rate-limit, overload, context-length, gateway, auth-reject, unrelated exceptions, and case-insensitivity. - ``test_hermes_ladder_integration.py`` (6): no-ladder single call, ladder-not-triggered on success, escalate-on-rate-limit-then-succeed, stop-on-non-escalatable, raise-last-error-when-exhausted, skip- unknown-provider-in-rung. ## Not in this PR - Uncertainty-driven escalation (judge pass after successful reply). - Per-workspace budget tracking (#305 covers this separately). - Live streaming reuse across rungs (ladder retries the whole call). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>		2026-04-16 02:27:27 -07:00
..
autogen	initial commit — Molecule AI platform	2026-04-13 11:55:37 -07:00
claude_code	feat: GET /workspaces/:id/transcript — live agent session log	2026-04-15 14:29:43 -07:00
crewai	initial commit — Molecule AI platform	2026-04-13 11:55:37 -07:00
deepagents	initial commit — Molecule AI platform	2026-04-13 11:55:37 -07:00
gemini_cli	feat(adapters): add gemini-cli runtime adapter (closes #332 ) (#379 )	2026-04-15 23:30:00 -07:00
hermes	feat(hermes): escalation ladder — promote to stronger models on transient failure	2026-04-16 02:27:27 -07:00
langgraph	initial commit — Molecule AI platform	2026-04-13 11:55:37 -07:00
openclaw	initial commit — Molecule AI platform	2026-04-13 11:55:37 -07:00
__init__.py	initial commit — Molecule AI platform	2026-04-13 11:55:37 -07:00
base.py	feat: GET /workspaces/:id/transcript — live agent session log	2026-04-15 14:29:43 -07:00
shared_runtime.py	initial commit — Molecule AI platform	2026-04-13 11:55:37 -07:00