The previous wording ('If one clearly matches') set too high a threshold,
and 'If none match, proceed normally' was an easy escape hatch for lazy
models. Now:
- Lowered threshold: 'matches or is even partially relevant'
- Added MUST directive and 'err on the side of loading' guidance
- Replaced permissive closer with 'only proceed without if genuinely none
are relevant'
This should reduce cases where the agent skips loading relevant skills
unless explicitly forced.
- Add openai/openai-codex -> openai mapping to PROVIDER_TO_MODELS_DEV
so context-length lookups use models.dev data instead of 128k fallback.
Fixes#8161.
- Set api_mode from custom_providers entry when switching via hermes model,
and clear stale api_mode when the entry has none. Also extract api_mode
in _named_custom_provider_map(). Fixes#8181.
- Convert OpenAI image_url content blocks to Anthropic image blocks when
the endpoint is Anthropic-compatible (MiniMax, MiniMax-CN, or any URL
containing /anthropic). Fixes#8147.
Users whose credentials exist only in external files — OpenAI Codex
OAuth tokens in ~/.codex/auth.json or Anthropic Claude Code credentials
in ~/.claude/.credentials.json — would not see those providers in the
/model picker, even though hermes auth and hermes model detected them.
Root cause: list_authenticated_providers() only checked the raw Hermes
auth store and env vars. External credential file fallbacks (Codex CLI
import, Claude Code file discovery) were never triggered.
Fix (three parts):
1. _seed_from_singletons() in credential_pool.py: openai-codex now
imports from ~/.codex/auth.json when the Hermes auth store is empty,
mirroring resolve_codex_runtime_credentials().
2. list_authenticated_providers() in model_switch.py: auth store + pool
checks now run for ALL providers (not just OAuth auth_type), catching
providers like anthropic that support both API key and OAuth.
3. list_authenticated_providers(): direct check for anthropic external
credential files (Claude Code, Hermes PKCE). The credential pool
intentionally gates anthropic behind is_provider_explicitly_configured()
to prevent auxiliary tasks from silently consuming tokens. The /model
picker bypasses this gate since it is discovery-oriented.
After compression, models (especially Kimi 2.5) would sometimes respond
to questions from the summary instead of the latest user message. This
happened ~30% of the time on Telegram.
Root cause: the summary's 'Next Steps' section read as active instructions,
and the SUMMARY_PREFIX didn't explicitly tell the model to ignore questions
in the summary. When the summary merged into the first tail message, there
was no clear separator between historical context and the actual user message.
Changes inspired by competitor analysis (Claude Code, OpenCode, Codex):
1. SUMMARY_PREFIX rewritten with explicit 'Do NOT answer questions from
this summary — respond ONLY to the latest user message AFTER it'
2. Summarizer preamble (shared by both prompts) adds:
- 'Do NOT respond to any questions' (from OpenCode's approach)
- 'Different assistant' framing (from Codex) to create psychological
distance between summary content and active conversation
3. New summary sections:
- '## Resolved Questions' — tracks already-answered questions with
their answers, preventing re-answering (from Claude Code's
'Pending user asks' pattern)
- '## Pending User Asks' — explicitly marks unanswered questions
- '## Remaining Work' replaces '## Next Steps' — passive framing
avoids reading as active instructions
4. merge-summary-into-tail path now inserts a clear separator:
'--- END OF CONTEXT SUMMARY — respond to the message below ---'
5. Iterative update prompt now instructs: 'Move answered questions to
Resolved Questions' to maintain the resolved/pending distinction
across multiple compactions.
Adds an optional focus topic to /compress: `/compress database schema`
guides the summariser to preserve information related to the focus topic
(60-70% of summary budget) while compressing everything else more aggressively.
Inspired by Claude Code's /compact <focus>.
Changes:
- context_compressor.py: focus_topic parameter on _generate_summary() and
compress(); appends FOCUS TOPIC guidance block to the LLM prompt
- run_agent.py: focus_topic parameter on _compress_context(), passed through
to the compressor
- cli.py: _manual_compress() extracts focus topic from command string,
preserves existing manual_compression_feedback integration (no regression)
- gateway/run.py: _handle_compress_command() extracts focus from event args
and passes through — full gateway parity
- commands.py: args_hint="[focus topic]" on /compress CommandDef
Salvaged from PR #7459 (CLI /compress focus only — /context command deferred).
15 new tests across CLI, compressor, and gateway.
Switch estimate_tokens_rough(), estimate_messages_tokens_rough(), and
estimate_request_tokens_rough() from floor division (len // 4) to
ceiling division ((len + 3) // 4). Short texts (1-3 chars) previously
estimated as 0 tokens, causing the compressor and pre-flight checks to
systematically undercount when many short tool results are present.
Also replaced the inline duplicate formula in run_conversation()
(total_chars // 4) with a call to the shared
estimate_messages_tokens_rough() function.
Updated 4 tests that hardcoded floor-division expected values.
Related: issue #6217, PR #6629
Three root causes of the 'agent stops mid-task' gateway bug:
1. Compression threshold floor (64K tokens minimum)
- The 50% threshold on a 100K-context model fired at 50K tokens,
causing premature compression that made models lose track of
multi-step plans. Now threshold_tokens = max(50% * context, 64K).
- Models with <64K context are rejected at startup with a clear error.
2. Budget warning removal — grace call instead
- Removed the 70%/90% iteration budget warnings entirely. These
injected '[BUDGET WARNING: Provide your final response NOW]' into
tool results, causing models to abandon complex tasks prematurely.
- Now: no warnings during normal execution. When the budget is
actually exhausted (90/90), inject a user message asking the model
to summarise, allow one grace API call, and only then fall back
to _handle_max_iterations.
3. Activity touches during long terminal execution
- _wait_for_process polls every 0.2s but never reported activity.
The gateway's inactivity timeout (default 1800s) would fire during
long-running commands that appeared 'idle.'
- Now: thread-local activity callback fires every 10s during the
poll loop, keeping the gateway's activity tracker alive.
- Agent wires _touch_activity into the callback before each tool call.
Also: docs update noting 64K minimum context requirement.
Closes#7915 (root cause was agent-loop termination, not Weixin delivery limits).
* fix(tools): neutralize shell injection in _write_to_sandbox via path quoting
_write_to_sandbox interpolated storage_dir and remote_path directly into
a shell command passed to env.execute(). Paths containing shell
metacharacters (spaces, semicolons, $(), backticks) could trigger
arbitrary command execution inside the sandbox.
Fix: wrap both paths with shlex.quote(). Clean paths (alphanumeric +
slashes/hyphens/dots) are left unmodified by shlex.quote, so existing
behavior is unchanged. Paths with unsafe characters get single-quoted.
Tests added for spaces, $(command) substitution, and semicolon injection.
* fix: is_local_endpoint misses Docker/Podman DNS names
host.docker.internal, host.containers.internal, gateway.docker.internal,
and host.lima.internal are well-known DNS names that container runtimes
use to resolve the host machine. Users running Ollama on the host with
the agent in Docker/Podman hit the default 120s stream timeout instead
of the bumped 1800s because these hostnames weren't recognized as local.
Add _CONTAINER_LOCAL_SUFFIXES tuple and suffix check in
is_local_endpoint(). Tests cover all three runtime families plus a
negative case for domains that merely contain the suffix as a substring.
The auxiliary client previously checked env vars (AUXILIARY_{TASK}_PROVIDER,
AUXILIARY_{TASK}_MODEL, etc.) before config.yaml's auxiliary.{task}.* section.
This violated the project's '.env is for secrets only' policy — these are
behavioral settings, not API keys.
Flipped the resolution order in _resolve_task_provider_model():
1. Explicit args (always win)
2. config.yaml auxiliary.{task}.* (PRIMARY)
3. Env var overrides (backward-compat fallback only)
4. 'auto' (full auto-detection chain)
Env var reading code is kept for backward compatibility but config.yaml
now takes precedence. Updated module docstring and function docstring.
Also removed AUXILIARY_VISION_MODEL from _EXTRA_ENV_KEYS in config.py.
Cherry-picked from PR #7702 by kshitijk4poor.
Adds Xiaomi MiMo as a direct provider (XIAOMI_API_KEY) with models:
- mimo-v2-pro (1M context), mimo-v2-omni (256K, multimodal), mimo-v2-flash (256K, cheapest)
Standard OpenAI-compatible provider checklist: auth.py, config.py, models.py,
main.py, providers.py, doctor.py, model_normalize.py, model_metadata.py,
models_dev.py, auxiliary_client.py, .env.example, cli-config.yaml.example.
Follow-up: vision tasks use mimo-v2-omni (multimodal) instead of the user's
main model. Non-vision aux uses the user's selected model. Added
_PROVIDER_VISION_MODELS dict for provider-specific vision model overrides.
On failure, falls back to aggregators (gemini flash) via existing fallback chain.
Corrects pre-existing context lengths: mimo-v2-pro 1048576→1000000,
mimo-v2-omni 1048576→256000, adds mimo-v2-flash 256000.
36 tests covering registry, aliases, auto-detect, credentials, models.dev,
normalization, URL mapping, providers module, doctor, aux client, vision
model override, and agent init.
Cherry-picked from PR #7749 by kshitijk4poor with modifications:
- Raise hard image limit from 5 MB to 20 MB (matches most restrictive provider)
- Send images at full resolution first; only auto-resize to 5 MB on API failure
- Add _is_image_size_error() helper to detect size-related API rejections
- Auto-resize uses Pillow (soft dep) with progressive downscale + JPEG quality reduction
- Fix get_model_capabilities() to check modalities.input for vision support
- Increase default vision timeout from 30s to 120s (matches hardcoded fallback intent)
- Applied retry-with-resize to both vision_analyze_tool and browser_vision
Closes#7740
Based on PR #7285 by @kshitijk4poor.
Two bugs affecting Qwen OAuth users:
1. Wrong context window — qwen3-coder-plus showed 128K instead of 1M.
Added specific entries before the generic qwen catch-all:
- qwen3-coder-plus: 1,000,000 (corrected from PR's 1,048,576 per
official Alibaba Cloud docs and OpenRouter)
- qwen3-coder: 262,144
2. Random stopping — max_tokens was suppressed for Qwen Portal, so the
server applied its own low default. Reasoning models exhaust that on
thinking tokens. Now: honor explicit max_tokens, default to 65536
when unset.
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
process_registry.py: _reader_loop() has process.wait() after the try-except
block (line 380). If the reader thread crashes with an unexpected exception
(e.g. MemoryError, KeyboardInterrupt), control exits the except handler but
skips wait() — leaving the child as a zombie process. Move wait() and the
cleanup into a finally block so the child is always reaped.
cron/scheduler.py: _run_job_script() only redacts secrets in stdout on the
SUCCESS path (line 417-421). When a cron script fails (non-zero exit), both
stdout and stderr are returned WITHOUT redaction (lines 407-413). A script
that accidentally prints an API key to stderr during a failure would leak it
into the LLM context. Move redaction before the success/failure branch so
both paths benefit.
skill_commands.py: _build_skill_message() enumerates supporting files using
rglob("*") but only checks is_file() (line 171) without filtering symlinks.
PR #6693 added symlink protection to scan_skill_commands() but missed this
function. A malicious skill can create symlinks in references/ pointing to
arbitrary files, exposing their paths (and potentially content via skill_view)
to the LLM. Add is_symlink() check to match the guard in scan_skill_commands.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
async_call_llm (and call_llm) can return non-OpenAI objects from
custom providers or adapter shims, crashing downstream consumers
with misleading AttributeError ('str' has no attribute 'choices').
Add _validate_llm_response() that checks the response has the
expected .choices[0].message shape before returning. Wraps all
return paths in call_llm, async_call_llm, and fallback paths.
Fails fast with a clear RuntimeError identifying the task, response
type, and a preview of the malformed payload.
Closes#7264
`resolve_provider_client()` already drops OpenRouter-format model slugs
(containing "/") when the resolved provider is not OpenRouter (line 1097).
However, `_get_cached_client()` returns `model or cached_default` directly
on cache hits, bypassing this check entirely.
When the main provider is openai-codex, the auto-detection chain (Step 1
of `_resolve_auto`) caches a CodexAuxiliaryClient. Subsequent auxiliary
calls for different tasks (e.g. compression with `summary_model:
google/gemini-3-flash-preview`) hit the cache and pass the OpenRouter-
format model slug straight to the Codex Responses API, which does not
understand it and returns an empty `response.output`.
This causes two user-visible failures:
- "Invalid API response shape" (empty output after 3 retries)
- "Context length exceeded, cannot compress further" (compression itself
fails through the same path)
Add `_compat_model()` helper that mirrors the "/" check from
`resolve_provider_client()` and call it on the cache-hit return path.
Four fixes to auxiliary_client.py:
1. Respect explicit provider as hard constraint (#7559)
When auxiliary.{task}.provider is explicitly set (not 'auto'),
connection/payment errors no longer silently fallback to cloud
providers. Local-only users (Ollama, vLLM) will no longer get
unexpected OpenRouter billing from auxiliary tasks.
2. Eliminate model='default' sentinel (#7512)
_resolve_api_key_provider() no longer sends literal 'default' as
model name to APIs. Providers without a known aux model in
_API_KEY_PROVIDER_AUX_MODELS are skipped instead of producing
model_not_supported errors.
3. Add payment/connection fallback to async_call_llm (#7512)
async_call_llm now mirrors sync call_llm's fallback logic for
payment (402) and connection errors. Previously, async consumers
(session_search, web_tools, vision) got hard failures with no
recovery. Also fixes hardcoded 'openrouter' fallback to use the
full auto-detection chain.
4. Use accurate error reason in fallback logs (#7512)
_try_payment_fallback() now accepts a reason parameter and uses
it in log messages. Connection timeouts are no longer misleadingly
logged as 'payment error'.
Closes#7559Closes#7512
The auxiliary client always calls client.chat.completions.create(),
ignoring the api_mode config flag. This breaks codex-family models
(e.g. gpt-5.3-codex) on direct OpenAI API keys, which need the
/v1/responses endpoint.
Changes:
- Expand _resolve_task_provider_model to return api_mode (5-tuple)
- Read api_mode from auxiliary.{task}.api_mode config and env vars
(AUXILIARY_{TASK}_API_MODE)
- Pass api_mode through _get_cached_client to resolve_provider_client
- Add _needs_codex_wrap/_wrap_if_needed helpers that wrap plain OpenAI
clients in CodexAuxiliaryClient when api_mode=codex_responses or
when auto-detection finds api.openai.com + codex model pattern
- Apply wrapping at all custom endpoint, named custom provider, and
API-key provider return paths
- Update test mocks for the new 5-tuple return format
Users can now set:
auxiliary:
compression:
model: gpt-5.3-codex
base_url: https://api.openai.com/v1
api_mode: codex_responses
Closes#6800
Refactor hardcoded color constants throughout the CLI to resolve from
the active skin engine, so custom themes fully control the visual
appearance.
cli.py:
- Replace _GOLD constant with _ACCENT (_SkinAwareAnsi class) that
lazily resolves response_border from the active skin
- Rename _GOLD_DEFAULT to _ACCENT_ANSI_DEFAULT
- Make _build_compact_banner() read banner_title/accent/dim from skin
- Make session resume notifications use _accent_hex()
- Make status line use skin colors (accent_color, separator_color,
label_color instead of cryptic _dim_c/_dim_c2/_accent_c/_label_c)
- Reset _ACCENT cache on /skin switch
agent/display.py:
- Replace hardcoded diff ANSI escapes with skin-aware functions:
_diff_dim(), _diff_file(), _diff_hunk(), _diff_minus(), _diff_plus()
(renamed from SCREAMING_CASE _ANSI_* to snake_case)
- Add reset_diff_colors() for cache invalidation on skin switch
Aligns MiniMax provider with official API documentation. Fixes 6 bugs:
transport mismatch (openai_chat -> anthropic_messages), credential leak
in switch_model(), prompt caching sent to non-Anthropic endpoints,
dot-to-hyphen model name corruption, trajectory compressor URL routing,
and stale doctor health check.
Also corrects context window (204,800), thinking support (manual mode),
max output (131,072), and model catalog (M2 family only on /anthropic).
Source: https://platform.minimax.io/docs/api-reference/text-anthropic-api
Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
_is_oauth_token() returned True for any key not starting with 'sk-ant-api',
which means MiniMax and Alibaba API keys were falsely treated as Anthropic
OAuth tokens. This triggered the Claude Code compatibility path:
- All tool names prefixed with mcp_ (e.g. mcp_terminal, mcp_web_search)
- System prompt injected with 'You are Claude Code' identity
- 'Hermes Agent' replaced with 'Claude Code' throughout
Fix: Make _is_oauth_token() positively identify Anthropic OAuth tokens by
their key format instead of using a broad catch-all:
- sk-ant-* (but not sk-ant-api-*) -> setup tokens, managed keys
- eyJ* -> JWTs from Anthropic OAuth flow
- Everything else -> False (MiniMax, Alibaba, etc.)
Reported by stefan171.
GPT-5+ models (except gpt-5-mini) are only accessible via the Responses
API on Copilot. When these models were configured as the compression
summary_model (or any auxiliary task), the plain OpenAI client sent them
to /chat/completions which returned a 400 error:
model "gpt-5.4-mini" is not accessible via the /chat/completions endpoint
resolve_provider_client() now checks _should_use_copilot_responses_api()
for the copilot provider and wraps the client in CodexAuxiliaryClient
when needed, routing calls through responses.stream() transparently.
Adds tests for both the wrapping (gpt-5.4-mini) and non-wrapping
(gpt-4.1-mini) paths.
Follow-up fixes for the context engine plugin slot (PR #5700):
- Enhance ContextEngine ABC: add threshold_percent, protect_first_n,
protect_last_n as class attributes; complete update_model() default
with threshold recalculation; clarify on_session_end() lifecycle docs
- Add ContextCompressor.update_model() override for model/provider/
base_url/api_key updates
- Replace all direct compressor internal access in run_agent.py with
ABC interface: switch_model(), fallback restore, context probing
all use update_model() now; _context_probed guarded with getattr/
hasattr for plugin engine compatibility
- Create plugins/context_engine/ directory with discovery module
(mirrors plugins/memory/ pattern) — discover_context_engines(),
load_context_engine()
- Add context.engine config key to DEFAULT_CONFIG (default: compressor)
- Config-driven engine selection in run_agent.__init__: checks config,
then plugins/context_engine/<name>/, then general plugin system,
falls back to built-in ContextCompressor
- Wire on_session_end() in shutdown_memory_provider() at real session
boundaries (CLI exit, /reset, gateway expiry)
- PluginContext.register_context_engine() lets plugins replace the
built-in ContextCompressor with a custom ContextEngine implementation
- PluginManager stores the registered engine; only one allowed
- run_agent.py checks for a plugin engine at init before falling back
to the default ContextCompressor
- reset_session_state() now calls engine.on_session_reset() instead of
poking internal attributes directly
- ContextCompressor.on_session_reset() handles its own internals
(_context_probed, _previous_summary, etc.)
- 19 new tests covering ABC contract, defaults, plugin slot registration,
rejection of duplicates/non-engines, and compressor reset behavior
- All 34 existing compressor tests pass unchanged
Introduces agent/context_engine.py — an abstract base class that defines
the pluggable context engine interface. ContextCompressor now inherits
from ContextEngine as the default implementation.
No behavior change. All 34 existing compressor tests pass.
This is the foundation for a context engine plugin slot, enabling
third-party engines like LCM (Lossless Context Management) to replace
the built-in compressor via the plugin system.
When two gateway messages arrived concurrently, _set_session_env wrote
HERMES_SESSION_PLATFORM/CHAT_ID/CHAT_NAME/THREAD_ID into the process-global
os.environ. Because asyncio tasks share the same process, Message B would
overwrite Message A's values mid-flight, causing background-task notifications
and tool calls to route to the wrong thread/chat.
Replace os.environ with Python's contextvars.ContextVar. Each asyncio task
(and any run_in_executor thread it spawns) gets its own copy, so concurrent
messages never interfere.
Changes:
- New gateway/session_context.py with ContextVar definitions, set/clear/get
helpers, and os.environ fallback for CLI/cron/test backward compatibility
- gateway/run.py: _set_session_env returns reset tokens, _clear_session_env
accepts them for proper cleanup in finally blocks
- All tool consumers updated: cronjob_tools, send_message_tool, skills_tool,
terminal_tool (both notify_on_complete AND check_interval blocks), tts_tool,
agent/skill_utils, agent/prompt_builder
- Tests updated for new contextvar-based API
Fixes#7358
Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
Adds xAI as a first-class provider: ProviderConfig in auth.py,
HermesOverlay in providers.py, 11 curated Grok models, URL mapping
in model_metadata.py, aliases (x-ai, x.ai), and env var tests.
Uses standard OpenAI-compatible chat completions.
Closes#7050
- Remove sys.path.insert hack (leftover from standalone dev)
- Add token lock (acquire_scoped_lock/release_scoped_lock) in
connect()/disconnect() to prevent duplicate pollers across profiles
- Fix get_connected_platforms: WEIXIN check must precede generic
token/api_key check (requires both token AND account_id)
- Add WEIXIN_HOME_CHANNEL_NAME to _EXTRA_ENV_KEYS
- Add gateway setup wizard with QR login flow
- Add platform status check for partially configured state
- Add weixin.md docs page with full adapter documentation
- Update environment-variables.md reference with all 11 env vars
- Update sidebars.ts to include weixin docs page
- Wire all gateway integration points onto current main
Salvaged from PR #6747 by Zihan Huang.
Port from anomalyco/opencode#21355: Alibaba's DashScope API returns a
unique throttling message ('Request rate increased too quickly...') that
doesn't match standard rate-limit patterns ('rate limit', 'too many
requests'). This caused Alibaba errors to fall through to the 'unknown'
category rather than being properly classified as rate_limit with
appropriate backoff/rotation.
Add 'rate increased too quickly' to _RATE_LIMIT_PATTERNS and test with
the exact error message observed from the Alibaba provider.
_resolve_api_key_provider() now checks is_provider_explicitly_configured
before calling _try_anthropic(). Previously, any auxiliary fallback
(e.g. when kimi-coding key was invalid) would silently discover and use
Claude Code OAuth tokens — consuming the user's Claude Max subscription
without their knowledge.
This is the auxiliary-client counterpart of the setup-wizard gate in
PR #4210.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previously, removing a claude_code credential from the anthropic pool
only printed a note — the next load_pool() re-seeded it from
~/.claude/.credentials.json. Now writes a 'suppressed_sources' flag
to auth.json that _seed_from_singletons checks before seeding.
Follows the pattern of env: source removal (clears .env var) and
device_code removal (clears auth store state).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
_seed_from_singletons('anthropic') now checks
is_provider_explicitly_configured('anthropic') before reading
~/.claude/.credentials.json. Without this, the auxiliary client
fallback chain silently discovers and uses Claude Code tokens when
the user's primary provider key is invalid — consuming their Claude
Max subscription quota without consent.
Follows the same gating pattern as PR #4210 (setup wizard gate)
but applied to the credential pool seeding path.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Automated dead code audit using vulture + coverage.py + ast-grep intersection,
confirmed by Opus deep verification pass. Every symbol verified to have zero
production callers (test imports excluded from reachability analysis).
Removes ~1,534 lines of dead production code across 46 files and ~1,382 lines
of stale test code. 3 entire files deleted (agent/builtin_memory_provider.py,
hermes_cli/checklist.py, tests/hermes_cli/test_setup_model_selection.py).
Co-authored-by: alt-glitch <balyan.sid@gmail.com>
prompt_builder.py: The `hidden_div` detection pattern uses `.*` which does not
match newlines in Python regex (re.DOTALL is not passed). An attacker can bypass
detection by splitting the style attribute across lines:
`<div style="color:red;\ndisplay: none">injected content</div>`
Replace `.*` with `[\s\S]*?` to match across line boundaries.
credential_files.py: `_load_config_files()` catches all exceptions at DEBUG level
(line 171), making YAML parse failures invisible in production logs. Users whose
credential files silently fail to mount into sandboxes have no diagnostic clue.
Promote to WARNING to match the severity pattern used by the path validation
warnings at lines 150 and 158 in the same function.
webhook.py: `_reload_dynamic_routes()` logs JSON parse failures at WARNING (line
265) but the impact — stale/corrupted dynamic routes persisting silently — warrants
ERROR level to ensure operator visibility in alerting pipelines.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
error_classifier.py: Message-only auth errors ("invalid api key", "unauthorized",
etc.) were classified as retryable=True (line 707), inconsistent with the HTTP 401
path (line 432) which correctly uses retryable=False + should_fallback=True. The
mismatch causes 3 wasted retries with the same broken credential before fallback,
while 401 errors immediately attempt fallback. Align the message-based path to
match: retryable=False, should_fallback=True.
web_tools.py: The _PREFIX_RE secret-detection check in web_extract_tool() runs
against the raw URL string (line 1196). URL-encoded secrets like %73k-1234... (
sk-1234...) bypass the filter because the regex expects literal ASCII. Add
urllib.parse.unquote() before the check so percent-encoded variants are also caught.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
xAI /v1/models does not return context_length metadata, so Hermes
probes down to the 128k default whenever a user configures a custom
provider pointing at https://api.x.ai/v1. This forces every xAI user
to manually override model.context_length in config.yaml (2M for
Grok 4.20 / 4.1-fast / 4-fast) or lose most of the usable context
window.
Add DEFAULT_CONTEXT_LENGTHS entries for the Grok family so the
fallback lookup returns the correct value via substring matching.
Values sourced from models.dev (2026-04) and cross-checked against
the xAI /v1/models listing:
- grok-4.20-* 2,000,000 (reasoning, non-reasoning, multi-agent)
- grok-4-1-fast-* 2,000,000
- grok-4-fast-* 2,000,000
- grok-4 / grok-4-0709 256,000
- grok-code-fast-1 256,000
- grok-3* 131,072
- grok-2 / latest 131,072
- grok-2-vision* 8,192
- grok (catch-all) 131,072
Keys are ordered longest-first so that specific variants match before
the catch-all, consistent with the existing Claude/Gemma/MiniMax entries.
Add TestDefaultContextLengths.test_grok_models_context_lengths and
test_grok_substring_matching to pin the values and verify the full
lookup path. All 77 tests in test_model_metadata.py pass.
Auth errors matched by message pattern were incorrectly marked retryable=True, causing futile retry loops. Aligns with _classify_by_status() which already sets retryable=False for 401/403. Fixes#7026. Contributed by @kuishou68.
The hardcoded User-Agent 'KimiCLI/1.3' is outdated — Kimi CLI is now at
v1.30.0. The stale version string causes intermittent 403 errors from
Kimi's coding endpoint ('only available for Coding Agents').
Update all 8 occurrences across run_agent.py, auxiliary_client.py, and
doctor.py to 'KimiCLI/1.30.0' to match the current official Kimi CLI.
Extends the /fast command to support Anthropic's Fast Mode beta in addition
to OpenAI Priority Processing. When enabled on Claude Opus 4.6, adds
speed:"fast" and the fast-mode-2026-02-01 beta header to API requests for
~2.5x faster output token throughput.
Changes:
- hermes_cli/models.py: Add _ANTHROPIC_FAST_MODE_MODELS registry,
model_supports_fast_mode() now recognizes Claude Opus 4.6,
resolve_fast_mode_overrides() returns {speed: fast} for Anthropic
vs {service_tier: priority} for OpenAI
- agent/anthropic_adapter.py: Add _FAST_MODE_BETA constant,
build_anthropic_kwargs() accepts fast_mode=True which injects
speed:fast + beta header via extra_headers (skipped for third-party
Anthropic-compatible endpoints like MiniMax)
- run_agent.py: Pass fast_mode to build_anthropic_kwargs in the
anthropic_messages path of _build_api_kwargs()
- cli.py: Update _handle_fast_command with provider-aware messaging
(shows 'Anthropic Fast Mode' vs 'Priority Processing')
- hermes_cli/commands.py: Update /fast description to mention both
providers
- tests: 13 new tests covering Anthropic model detection, override
resolution, CLI availability, routing, adapter kwargs, and
third-party endpoint safety
When the model mentions <think> as literal text in its response (e.g.
"(/think not producing <think> tags)"), the streaming display treated it
as a reasoning block opener and suppressed everything after it. The
response box would close with truncated content and no error — the API
response was complete but the display ate it.
Root cause: _stream_delta() matched <think> anywhere in the text stream
regardless of position. Real reasoning blocks always start at the
beginning of a line; mentions in prose appear mid-sentence.
Fix: track line position across streaming deltas with a
_stream_last_was_newline flag. Only enter reasoning suppression when
the tag appears at a block boundary (start of stream, after a newline,
or after only whitespace on the current line). Add a _flush_stream()
safety net that recovers buffered content if no closing tag is found
by end-of-stream.
Also fixes three related issues discovered during investigation:
- anthropic_adapter: _get_anthropic_max_output() now normalizes dots to
hyphens so 'claude-opus-4.6' matches the 'claude-opus-4-6' table key
(was returning 32K instead of 128K)
- run_agent: send explicit max_tokens for Claude models on Nous Portal,
same as OpenRouter — both proxy to Anthropic's API which requires it.
Without it the backend defaults to a low limit that truncates responses.
- run_agent: reset truncated_tool_call_retries after successful tool
execution so a single truncation doesn't poison the entire conversation.
The Codex retry block and valid-token short-circuit in _refresh_entry()
both return early, bypassing the auth.json sync at the end of the method.
This adds _sync_device_code_entry_to_auth_store() calls on both paths
so refreshed/synced tokens are written back to auth.json regardless of
which code path succeeds.
MiniMax's Anthropic-compatible endpoints reject requests that include
the fine-grained-tool-streaming beta header — every tool-use message
triggers a connection error (~18s timeout). Regular chat works fine.
Add _common_betas_for_base_url() that filters out the tool-streaming
beta for Bearer-auth (MiniMax) endpoints while keeping all other betas.
All four client-construction branches now use the filtered list.
Based on #6528 by @HiddenPuppy.
Original cherry-picked from PR #6688 by kshitijk4poor.
Fixes#6510, fixes#6555.
_classify_by_message had no handling for _USAGE_LIMIT_PATTERNS, so
messages like 'usage limit exceeded, try again in 5 minutes' arriving
without an HTTP status code fell through to FailoverReason.unknown
instead of rate_limit.
Apply the same billing/rate-limit disambiguation that _classify_402
already uses: USAGE_LIMIT_PATTERNS + transient signal → rate_limit,
USAGE_LIMIT_PATTERNS alone → billing.
Add 4 tests covering the no-status-code usage-limit path.
When _generate_summary() failed (no provider, timeout, model error),
the compressor silently dropped all middle turns with just a debug
log. The agent would then see head + tail with no explanation of the
gap, causing total context amnesia (generic greetings instead of
continuing the conversation).
Now generates a static fallback marker that tells the model context
was lost and to continue from the recent tail messages. The fallback
flows through the same role-alternation logic as a real summary so
message structure stays valid.
Step 1 of _resolve_auto() explicitly excluded 'custom' providers,
forcing custom endpoint users through the fragile fallback chain
instead of using their known-working main model credentials.
This caused silent compression failures for users on local OpenAI-
compatible endpoints — the summary generation would fail, middle
turns would be silently dropped, and the agent would lose all
conversation context.
Remove 'custom' from the exclusion list so custom endpoint users
get the same main-model-first treatment as DeepSeek, Anthropic,
Gemini, and other direct providers.
When the API returns "max_tokens too large given prompt" (input tokens
are within the context window, but input + requested output > window),
the old code incorrectly routed through the same handler as "prompt too
long" errors, calling get_next_probe_tier() and permanently halving
context_length. This made things worse: the window was fine, only the
requested output size needed trimming for that one call.
Two distinct error classes now handled separately:
Prompt too long — input itself exceeds context window.
Fix: compress history + halve context_length (existing behaviour,
unchanged).
Output cap too large — input OK, but input + max_tokens > window.
Fix: parse available_tokens from the error message, set a one-shot
_ephemeral_max_output_tokens override for the retry, and leave
context_length completely untouched.
Changes:
- agent/model_metadata.py: add parse_available_output_tokens_from_error()
that detects Anthropic's "available_tokens: N" error format and returns
the available output budget, or None for all other error types.
- run_agent.py: call the new parser first in the is_context_length_error
block; if it fires, set _ephemeral_max_output_tokens (with a 64-token
safety margin) and break to retry without touching context_length.
_build_api_kwargs consumes the ephemeral value exactly once then clears
it so subsequent calls use self.max_tokens normally.
- agent/anthropic_adapter.py: expand build_anthropic_kwargs docstring to
clearly document the max_tokens (output cap) vs context_length (total
window) distinction, which is a persistent source of confusion due to
the OpenAI-inherited "max_tokens" name.
- cli-config.yaml.example: add inline comments explaining both keys side
by side where users are most likely to look.
- website/docs/integrations/providers.md: add a callout box at the top
of "Context Length Detection" and clarify the troubleshooting entry.
- tests/test_ctx_halving_fix.py: 24 tests across four classes covering
the parser, build_anthropic_kwargs clamping, ephemeral one-shot
consumption, and the invariant that context_length is never mutated
on output-cap errors.
The error classifier's generic-400 heuristic only extracted err_body_msg from
the nested body structure (body['error']['message']), missing the flat body
format used by OpenAI's Responses API (body['message']). This caused
descriptive 400 errors like 'Invalid input[index].name: string does not match
pattern' to appear generic when the session was large, misclassifying them as
context overflow and triggering an infinite compression loop.
Added flat-body fallback in _classify_400() consistent with the parent
classify_api_error() function's existing handling at line 297-298.
When is explicitly set to ,
the custom-endpoint path in creates a plain
client without provider-specific headers. This means sync vision calls (e.g.
) use the generic User-Agent and get rejected by
Kimi's coding endpoint with a 403:
'Kimi For Coding is currently only available for Coding Agents such as Kimi CLI...'
The async converter already injects , and the
auto-detected API-key provider path also injects it, but the explicit custom
endpoint shortcut was missing it entirely.
This patch adds the same injection to the custom endpoint
branch, and updates all existing Kimi header sites to for
consistency.
Fixes <issue number to be filled in>
The credential pool seeder (_seed_from_env) hardcoded the base URL
for API-key providers without running provider-specific auto-detection.
For kimi-coding, this caused sk-kimi- prefixed keys to be seeded with
the legacy api.moonshot.ai/v1 endpoint instead of api.kimi.com/coding/v1,
resulting in HTTP 401 on the first request.
Import and call _resolve_kimi_base_url for kimi-coding so the pool
uses the correct endpoint based on the key prefix, matching the
runtime credential resolver behavior.
Also fix a comment: sk-kimi- keys are issued by kimi.com/code,
not platform.kimi.ai.
Fixes#5561
Two bugs in the model fallback system:
1. Nous login leaves stale model in config (provider=nous, model=opus
from previous OpenRouter setup). Fixed by deferring the config.yaml
provider write until AFTER model selection completes, and passing the
selected model atomically via _update_config_for_provider's
default_model parameter. Previously, _update_config_for_provider was
called before model selection — if selection failed (free tier, no
models, exception), config stayed as nous+opus permanently.
2. Codex/stale providers in auxiliary fallback can't connect but block
the auto-detection chain. Added _is_connection_error() detection
(APIConnectionError, APITimeoutError, DNS failures, connection
refused) alongside the existing _is_payment_error() check in
call_llm(). When a provider endpoint is unreachable, the system now
falls back to the next available provider instead of crashing.
Parse x-ratelimit-* headers from inference API responses (Nous Portal,
OpenRouter, OpenAI-compatible) and display them in the /usage command.
- New agent/rate_limit_tracker.py: parse 12 rate limit headers (RPM/RPH/
TPM/TPH limits, remaining, reset timers), format as progress bars (CLI)
or compact one-liner (gateway)
- Hook into streaming path in run_agent.py: stream.response.headers is
available on the OpenAI SDK Stream object before chunks are consumed
- CLI /usage: appends rate limit section with progress bars + warnings
when any bucket exceeds 80%
- Gateway /usage: appends compact rate limit summary
- 24 unit tests covering parsing, formatting, edge cases
Headers captured per response:
x-ratelimit-{limit,remaining,reset}-{requests,tokens}{,-1h}
Example CLI display:
Nous Rate Limits (captured just now):
Requests/min [░░░░░░░░░░░░░░░░░░░░] 0.1% 1/800 used (799 left, resets in 59s)
Tokens/hr [░░░░░░░░░░░░░░░░░░░░] 0.0% 49/336.0M (336.0M left, resets in 52m)
Wrap is_dir() in _is_valid_subdir() and is_file() in
_load_hints_for_directory() with OSError handlers so that
inaccessible directories (e.g. /root from a non-root Daytona
host user) are silently skipped instead of crashing the agent.
The existing PermissionError PRs for prompt_builder.py (#6247,
#6321, #6355) do not cover subdirectory_hints.py, which was
identified as a separate crash path in the #6214 comments.
Ref: #6214
The 24-hour default cooldown for 402-exhausted credentials was far too
aggressive — if a user tops up credits or the 402 was caused by an
oversized max_tokens request rather than true billing exhaustion, they
shouldn't have to wait a full day. Reduce to 1 hour (matching the
existing 429 TTL).
Inspired by PR #6493 (michalkomar).
Two issues resolved:
1. Add opencode.ai to _URL_TO_PROVIDER mapping so base_url routes through
models.dev lookup (which has mimo-v2-pro at 1M context) instead of
falling back to probing /models (404) and defaulting to 128K.
2. Fix _format_context_length to round cleanly: 1048576 → '1M' instead
of '1.048576M'. Applies same rounding logic to K values.
Tail protection was effectively message-count based despite having a
token budget, because protect_last_n=20 acted as a hard floor. A single
50K-token tool output would cause all 20 recent messages to be
preserved regardless of budget, leaving little room for summarization.
Changes:
- _find_tail_cut_by_tokens: min_tail reduced from protect_last_n (20)
to 3; token budget is now the primary criterion
- Soft ceiling at 1.5x budget to avoid cutting mid-oversized-message
- _prune_old_tool_results: accepts optional protect_tail_tokens so
pruning also respects the token budget instead of a fixed count
- compress() minimum message check relaxed from protect_first_n +
protect_last_n + 1 to protect_first_n + 3 + 1
- Tool group alignment (no splitting tool_call/result) preserved
Three targeted improvements to the compression system:
1. Replace hardcoded truncation limits with named class constants
(_CONTENT_MAX=6000, _CONTENT_HEAD=4000, _CONTENT_TAIL=1500,
_TOOL_ARGS_MAX=1500, _TOOL_ARGS_HEAD=1200). Previous limits
(3000/500) heavily truncated the summarizer's input — a 200-line
edit got cut to 3000 chars before the summarizer ever saw it.
2. Add '## Tools & Patterns' section to both compression prompt
templates (first-pass and iterative). Preserves working tool
invocations, preferred flags, and tool-specific discoveries
across compaction boundaries.
3. Warn users on 2nd+ compression: 'Session compressed N times —
accuracy may degrade. Consider /new to start fresh.'
Ref #499
Two linked fixes for MiniMax Anthropic-compatible fallback:
1. Normalize httpx.URL to str before calling .rstrip() in auth/provider
detection helpers. Some client objects expose base_url as httpx.URL,
not str — crashed with AttributeError in _requires_bearer_auth() and
_is_third_party_anthropic_endpoint(). Also fixes _try_activate_fallback()
to use the already-stringified fb_base_url instead of raw httpx.URL.
2. Strip Anthropic-proprietary thinking block signatures when targeting
third-party Anthropic-compatible endpoints (MiniMax, Azure AI Foundry,
self-hosted proxies). These endpoints cannot validate Anthropic's
signatures and reject them with HTTP 400 'Invalid signature in
thinking block'. Now threads base_url through convert_messages_to_anthropic()
→ build_anthropic_kwargs() so signature management is endpoint-aware.
Based on PR #4945 by kshitijk4poor (rstrip fix).
Fixes#4944.
Fixes 9 test failures on current main, incorporating ideas from PR stack
#6219-#6222 by xinbenlv with corrections:
- model_metadata: sync HF context length key casing
(minimaxai/minimax-m2.5 → MiniMaxAI/MiniMax-M2.5)
- cli.py: route quick command error output through self.console
instead of creating a new ChatConsole() instance
- docker.py: explicit docker_forward_env entries now bypass the
Hermes secret blocklist (intentional opt-in wins over generic filter)
- auxiliary_client: revert _read_main_provider() to simple
provider.strip().lower() — the _normalize_aux_provider() call
introduced in 5c03f2e7 stripped the custom: prefix, breaking
named custom provider resolution
- auxiliary_client: flip vision auto-detection order to
active provider → OpenRouter → Nous → stop (was OR → Nous → active)
- test: update vision priority test to match new order
Based on PR #6219-#6222 by xinbenlv.
- Add HERMES_QWEN_BASE_URL to OPTIONAL_ENV_VARS in config.py (was missing
despite being referenced in code)
- Remove redundant qwen-oauth entry from _API_KEY_PROVIDER_AUX_MODELS
(non-aggregator providers use their main model for aux tasks automatically)
Based on #6079 by @tunamitom with critical fixes and comprehensive tests.
Changes from #6079:
- Fix: sanitization overwrite bug — Qwen message prep now runs AFTER codex
field sanitization, not before (was silently discarding Qwen transforms)
- Fix: missing try/except AuthError in runtime_provider.py — stale Qwen
credentials now fall through to next provider on auto-detect
- Fix: 'qwen' alias conflict — bare 'qwen' stays mapped to 'alibaba'
(DashScope); use 'qwen-portal' or 'qwen-cli' for the OAuth provider
- Fix: hardcoded ['coder-model'] replaced with live API fetch + curated
fallback list (qwen3-coder-plus, qwen3-coder)
- Fix: extract _is_qwen_portal() helper + _qwen_portal_headers() to replace
5 inline 'portal.qwen.ai' string checks and share headers between init
and credential swap
- Fix: add Qwen branch to _apply_client_headers_for_base_url for mid-session
credential swaps
- Fix: remove suspicious TypeError catch blocks around _prompt_provider_choice
- Fix: handle bare string items in content lists (were silently dropped)
- Fix: remove redundant dict() copies after deepcopy in message prep
- Revert: unrelated ai-gateway test mock removal and model_switch.py comment deletion
New tests (30 test functions):
- _qwen_cli_auth_path, _read_qwen_cli_tokens (success + 3 error paths)
- _save_qwen_cli_tokens (roundtrip, parent creation, permissions)
- _qwen_access_token_is_expiring (5 edge cases: fresh, expired, within skew,
None, non-numeric)
- _refresh_qwen_cli_tokens (success, preserve old refresh, 4 error paths,
default expires_in, disk persistence)
- resolve_qwen_runtime_credentials (fresh, auto-refresh, force-refresh,
missing token, env override)
- get_qwen_auth_status (logged in, not logged in)
- Runtime provider resolution (direct, pool entry, alias)
- _build_api_kwargs (metadata, vl_high_resolution_images, message formatting,
max_tokens suppression)
Hermes Agent identified and patched its own prompting blind spots through
automated self-evaluation — running 64+ tool-use benchmarks across GPT-5.4
and Codex-5.3, diagnosing 5 failure modes, writing targeted prompt patches,
and verifying the fix in a closed loop.
Failure modes discovered and fixed:
- Mental arithmetic (wrong answers: 39,152,053 vs correct 39,151,253)
- User profile hallucination ('Windows 11' when running on Linux)
- Time guessing without verification
- Clarification-seeking instead of acting ('open where?' for port checks)
- Hash computation from memory (SHA-256, encodings)
- Confusing system RAM with agent's own persistent memory store
Two new XML sections added to OPENAI_MODEL_EXECUTION_GUIDANCE:
- <mandatory_tool_use>: explicit categories that must always use tools
- <act_dont_ask>: default to action on obvious interpretations
Results:
gpt-5.4: 68.8% → 100% tool compliance (+31.2pp)
gpt-5.3-codex: 62.5% → 100% tool compliance (+37.5pp)
Regression: 0/8 conversational prompts over-tooled
Anthropic signs thinking blocks against the full turn content. Any
upstream mutation (context compression, session truncation, orphan
stripping, message merging) invalidates the signature, causing HTTP 400
'Invalid signature in thinking block' — especially in long-lived
gateway sessions.
Strategy (following clawdbot/OpenClaw pattern):
1. Strip thinking/redacted_thinking from all assistant messages EXCEPT
the last one — preserves reasoning continuity on the current
tool-use chain while avoiding stale signature errors on older turns.
2. Downgrade unsigned thinking blocks to plain text — Anthropic can't
validate them, but the reasoning content is preserved.
3. Strip cache_control from thinking/redacted_thinking blocks to
prevent cache markers from interfering with signature validation.
4. Drop thinking blocks from the second message when merging
consecutive assistant messages (role alternation enforcement).
5. Error recovery: on HTTP 400 mentioning 'signature' and 'thinking',
strip all reasoning_details from the conversation and retry once.
This is the safety net for edge cases the proactive stripping
misses.
Addresses the issue reported in PR #6086 by @mingginwan while
preserving reasoning continuity (their PR stripped ALL thinking
blocks unconditionally).
Files changed:
- agent/anthropic_adapter.py: thinking block management in
convert_messages_to_anthropic (strip old turns, downgrade unsigned,
strip cache_control, merge-time strip)
- run_agent.py: one-shot signature error recovery in retry loop
- tests/test_anthropic_adapter.py: 10 new tests covering all cases
Simplify the vision auto-detection chain from 5 backends (openrouter,
nous, codex, anthropic, custom) down to 3:
1. OpenRouter (known vision-capable default model)
2. Nous Portal (known vision-capable default model)
3. Active provider + model (whatever the user is running)
4. Stop
This is simpler and more predictable. The active provider step uses
resolve_provider_client() which handles all provider types including
named custom providers (from #5978).
Removed the complex preferred-provider promotion logic and API-level
fallback — the chain is short enough that it doesn't need them.
Based on PR #5376 by Mibay. Closes#5366.
Salvaged fixes from community PRs:
- fix(model_switch): _read_auth_store → _load_auth_store + fix auth store
key lookup (was checking top-level dict instead of store['providers']).
OAuth providers now correctly detected in /model picker.
Cherry-picked from PR #5911 by Xule Lin (linxule).
- fix(ollama): pass num_ctx to override 2048 default context window.
Ollama defaults to 2048 context regardless of model capabilities. Now
auto-detects from /api/show metadata and injects num_ctx into every
request. Config override via model.ollama_num_ctx. Fixes#2708.
Cherry-picked from PR #5929 by kshitij (kshitijk4poor).
- fix(aux): normalize provider aliases for vision/auxiliary routing.
Adds _normalize_aux_provider() with 17 aliases (google→gemini,
claude→anthropic, glm→zai, etc). Fixes vision routing failure when
provider is set to 'google' instead of 'gemini'.
Cherry-picked from PR #5793 by e11i (Elizabeth1979).
- fix(aux): rewrite MiniMax /anthropic base URLs to /v1 for OpenAI SDK.
MiniMax's inference_base_url ends in /anthropic (Anthropic Messages API),
but auxiliary client uses OpenAI SDK which appends /chat/completions →
404 at /anthropic/chat/completions. Generic _to_openai_base_url() helper
rewrites terminal /anthropic to /v1 for OpenAI-compatible endpoint.
Inspired by PR #5786 by Lempkey.
Added debug logging to silent exception blocks across all fixes.
Co-authored-by: Hermes Agent <hermes@nousresearch.com>
Free-tier Nous Portal users were getting mimo-v2-omni (a multimodal
model) for all auxiliary tasks including compression, session search,
and web extraction. Now routes non-vision tasks to mimo-v2-pro (a
text model) which is better suited for those workloads.
- Added _NOUS_FREE_TIER_AUX_MODEL constant for text auxiliary tasks
- _try_nous() accepts vision=False param to select the right model
- Vision path (_resolve_strict_vision_backend) passes vision=True
- All other callers default to vision=False → mimo-v2-pro
* fix(telegram): replace substring caption check with exact line-by-line match
Captions in photo bursts and media group albums were silently dropped when
a shorter caption happened to be a substring of an existing one (e.g.
"Meeting" lost inside "Meeting agenda"). Extract a shared _merge_caption
static helper that splits on "\n\n" and uses exact match with whitespace
normalisation, then use it in both _enqueue_photo_event and
_queue_media_group_event.
Adds 13 unit tests covering the fixed bug scenarios.
Cherry-picked from PR #2671 by Dilee.
* fix: extend caption substring fix to all platforms
Move _merge_caption helper from TelegramAdapter to BasePlatformAdapter
so all adapters inherit it. Fix the same substring-containment bug in:
- gateway/platforms/base.py (photo burst merging)
- gateway/run.py (priority photo follow-up merging)
- gateway/platforms/feishu.py (media batch merging)
The original fix only covered telegram.py. The same bug existed in base.py
and run.py (pure substring check) and feishu.py (list membership without
whitespace normalization).
* fix(auxiliary): resolve named custom providers and 'main' alias in auxiliary routing
Two bugs caused auxiliary tasks (vision, compression, etc.) to fail when
using named custom providers defined in config.yaml:
1. 'provider: main' was hardcoded to 'custom', which only checks legacy
OPENAI_BASE_URL env vars. Now reads _read_main_provider() to resolve
to the actual provider (e.g., 'custom:beans', 'openrouter', 'deepseek').
2. Named custom provider names (e.g., 'beans') fell through to
PROVIDER_REGISTRY which doesn't know about config.yaml entries.
Now checks _get_named_custom_provider() before the registry fallback.
Fixes both resolve_provider_client() and _normalize_vision_provider()
so the fix covers all auxiliary tasks (vision, compression, web_extract,
session_search, etc.).
Adds 13 unit tests. Reported by Laura via Discord.
---------
Co-authored-by: Dilee <uzmpsk.dilekakbas@gmail.com>
16 callsites across 14 files were re-deriving the hermes home path
via os.environ.get('HERMES_HOME', ...) instead of using the canonical
get_hermes_home() from hermes_constants. This breaks profiles — each
profile has its own HERMES_HOME, and the inline fallback defaults to
~/.hermes regardless.
Fixed by importing and calling get_hermes_home() at each site. For
files already inside the hermes process (agent/, hermes_cli/, tools/,
gateway/, plugins/), this is always safe. Files that run outside the
process context (mcp_serve.py, mcp_oauth.py) already had correct
try/except ImportError fallbacks and were left alone.
Skipped: hermes_constants.py (IS the implementation), env_loader.py
(bootstrap), profiles.py (intentionally manipulates the env var),
standalone scripts (optional-skills/, skills/), and tests.
Comprehensive cleanup across 80 files based on automated (ruff, pyflakes, vulture)
and manual analysis of the entire codebase.
Changes by category:
Unused imports removed (~95 across 55 files):
- Removed genuinely unused imports from all major subsystems
- agent/, hermes_cli/, tools/, gateway/, plugins/, cron/
- Includes imports in try/except blocks that were truly unused
(vs availability checks which were left alone)
Unused variables removed (~25):
- Removed dead variables: connected, inner, channels, last_exc,
source, new_server_names, verify, pconfig, default_terminal,
result, pending_handled, temperature, loop
- Dropped unused argparse subparser assignments in hermes_cli/main.py
(12 instances of add_parser() where result was never used)
Dead code removed:
- run_agent.py: Removed dead ternary (None if False else None) and
surrounding unreachable branch in identity fallback
- run_agent.py: Removed write-only attribute _last_reported_tool
- hermes_cli/providers.py: Removed dead @property decorator on
module-level function (decorator has no effect outside a class)
- gateway/run.py: Removed unused MCP config load before reconnect
- gateway/platforms/slack.py: Removed dead SessionSource construction
Undefined name bugs fixed (would cause NameError at runtime):
- batch_runner.py: Added missing logger = logging.getLogger(__name__)
- tools/environments/daytona.py: Added missing Dict and Path imports
Unnecessary global statements removed (14):
- tools/terminal_tool.py: 5 functions declared global for dicts
they only mutated via .pop()/[key]=value (no rebinding)
- tools/browser_tool.py: cleanup thread loop only reads flag
- tools/rl_training_tool.py: 4 functions only do dict mutations
- tools/mcp_oauth.py: only reads the global
- hermes_time.py: only reads cached values
Inefficient patterns fixed:
- startswith/endswith tuple form: 15 instances of
x.startswith('a') or x.startswith('b') consolidated to
x.startswith(('a', 'b'))
- len(x)==0 / len(x)>0: 13 instances replaced with pythonic
truthiness checks (not x / bool(x))
- in dict.keys(): 5 instances simplified to in dict
- Redefined unused name: removed duplicate _strip_mdv2 import in
send_message_tool.py
Other fixes:
- hermes_cli/doctor.py: Replaced undefined logger.debug() with pass
- hermes_cli/config.py: Consolidated chained .endswith() calls
Test results: 3934 passed, 17 failed (all pre-existing on main),
19 skipped. Zero regressions.
- Show pricing during initial Nous Portal login (was missing from
_login_nous, only shown in the already-logged-in hermes model path)
- Filter free models for paid subscribers: non-allowlisted free models
are hidden; allowlisted models (xiaomi/mimo-v2-pro, xiaomi/mimo-v2-omni)
only appear when actually priced as free
- Detect free-tier accounts via portal api/oauth/account endpoint
(monthly_charge == 0); free-tier users see only free models as
selectable, with paid models shown dimmed and unselectable
- Use xiaomi/mimo-v2-omni as the auxiliary vision model for free-tier
Nous users so vision_analyze and browser_vision work without paid
model access (replaces the default google/gemini-3-flash-preview)
- Unavailable models rendered via print() before TerminalMenu to avoid
simple_term_menu line-width padding artifacts; upgrade URL resolved
from auth state portal_base_url (supports staging/custom portals)
- Add 21 tests covering filter_nous_free_models, is_nous_free_tier,
and partition_nous_models_by_tier
* feat: switch managed browser provider from Browserbase to Browser Use
The Nous subscription tool gateway now routes browser automation through
Browser Use instead of Browserbase. This commit:
- Adds managed Nous gateway support to BrowserUseProvider (idempotency
keys, X-BB-API-Key auth header, external_call_id persistence)
- Removes managed gateway support from BrowserbaseProvider (now
direct-only via BROWSERBASE_API_KEY/BROWSERBASE_PROJECT_ID)
- Updates browser_tool.py fallback: prefers Browser Use over Browserbase
- Updates nous_subscription.py: gateway vendor 'browser-use', auto-config
sets cloud_provider='browser-use' for new subscribers
- Updates tools_config.py: Nous Subscription entry now uses Browser Use
- Updates setup.py, cli.py, status.py, prompt_builder.py display strings
- Updates all affected tests to match new behavior
Browserbase remains fully functional for users with direct API credentials.
The change only affects the managed/subscription path.
* chore: remove redundant Browser Use hint from system prompt
* fix: upgrade Browser Use provider to v3 API
- Base URL: api/v2 -> api/v3 (v2 is legacy)
- Unified all endpoints to use native Browser Use paths:
- POST /browsers (create session, returns cdpUrl)
- PATCH /browsers/{id} with {action: stop} (close session)
- Removed managed-mode branching that used Browserbase-style
/v1/sessions paths — v3 gateway now supports /browsers directly
- Removed unused managed_mode variable in close_session
* fix(browser-use): use X-Browser-Use-API-Key header for managed mode
The managed gateway expects X-Browser-Use-API-Key, not X-BB-API-Key
(which is a Browserbase-specific header). Using the wrong header caused
a 401 AUTH_ERROR on every managed-mode browser session create.
Simplified _headers() to always use X-Browser-Use-API-Key regardless
of direct vs managed mode.
* fix(nous_subscription): browserbase explicit provider is direct-only
Since managed Nous gateway now routes through Browser Use, the
browserbase explicit provider path should not check managed_browser_available
(which resolves against the browser-use gateway). Simplified to direct-only
with managed=False.
* fix(browser-use): port missing improvements from PR #5605
- CDP URL normalization: resolve HTTP discovery URLs to websocket after
cloud provider create_session() (prevents agent-browser failures)
- Managed session payload: send timeout=5 and proxyCountryCode=us for
gateway-backed sessions (prevents billing overruns)
- Update prompt builder, browser_close schema, and module docstring to
replace remaining Browserbase references with Browser Use
- Dynamic /browser status detection via _get_cloud_provider() instead
of hardcoded env var checks (future-proof for new providers)
- Rename post_setup key from 'browserbase' to 'agent_browser'
- Update setup hint to mention Browser Use alongside Browserbase
- Add tests: CDP normalization, browserbase direct-only guard,
managed browser-use gateway, direct browserbase fallback
---------
Co-authored-by: rob-maron <132852777+rob-maron@users.noreply.github.com>
* refactor: remove browser_close tool — auto-cleanup handles it
The browser_close tool was called in only 9% of browser sessions (13/144
navigations across 66 sessions), always redundantly — cleanup_browser()
already runs via _cleanup_task_resources() at conversation end, and the
background inactivity reaper catches anything else.
Removing it saves one tool schema slot in every browser-enabled API call.
Also fixes a latent bug: cleanup_browser() now handles Camofox sessions
too (previously only Browserbase). Camofox sessions were never auto-cleaned
per-task because they live in a separate dict from _active_sessions.
Files changed (13):
- tools/browser_tool.py: remove function, schema, registry entry; add
camofox cleanup to cleanup_browser()
- toolsets.py, model_tools.py, prompt_builder.py, display.py,
acp_adapter/tools.py: remove browser_close from all tool lists
- tests/: remove browser_close test, update toolset assertion
- docs/skills: remove all browser_close references
* fix: repeat browser_scroll 5x per call for meaningful page movement
Most backends scroll ~100px per call — barely visible on a typical
viewport. Repeating 5x gives ~500px (~half a viewport), making each
scroll tool call actually useful.
Backend-agnostic approach: works across all 7+ browser backends without
needing to configure each one's scroll amount individually. Breaks
early on error for the agent-browser path.
* feat: auto-return compact snapshot from browser_navigate
Every browser session starts with navigate → snapshot. Now navigate
returns the compact accessibility tree snapshot inline, saving one
tool call per browser task.
The snapshot captures the full page DOM (not viewport-limited), so
scroll position doesn't affect it. browser_snapshot remains available
for refreshing after interactions or getting full=true content.
Both Browserbase and Camofox paths auto-snapshot. If the snapshot
fails for any reason, navigation still succeeds — the snapshot is
a bonus, not a requirement.
Schema descriptions updated to guide models: navigate mentions it
returns a snapshot, snapshot mentions it's for refresh/full content.
* refactor: slim cronjob tool schema — consolidate model/provider, drop unused params
Session data (151 calls across 67 sessions) showed several schema
properties were never used by models. Consolidated and cleaned up:
Removed from schema (still work via backend/CLI):
- skill (singular): use skills array instead
- reason: pause-only, unnecessary
- include_disabled: now defaults to true
- base_url: extreme edge case, zero usage
- provider (standalone): merged into model object
Consolidated:
- model + provider → single 'model' object with {model, provider} fields.
If provider is omitted, the current main provider is pinned at creation
time so the job stays stable even if the user changes their default.
Kept:
- script: useful data collection feature
- skills array: standard interface for skill loading
Schema shrinks from 14 to 10 properties. All backend functionality
preserved — the Python function signature and handler lambda still
accept every parameter.
* fix: remove mixture_of_agents from core toolsets — opt-in only via hermes tools
MoA was in _HERMES_CORE_TOOLS and composite toolsets (hermes-cli,
hermes-messaging, safe), which meant it appeared in every session
for anyone with OPENROUTER_API_KEY set. The _DEFAULT_OFF_TOOLSETS
gate only works after running 'hermes tools' explicitly.
Now MoA only appears when a user explicitly enables it via
'hermes tools'. The moa toolset definition and check_fn remain
unchanged — it just needs to be opted into.
The credential pool seeder and runtime credential resolver hardcoded
api.z.ai/api/paas/v4 for all Z.AI keys. Keys on the Coding Plan (or CN
endpoint) would hit the wrong endpoint, causing 401/429 errors on the
first request even though a working endpoint exists.
Add _resolve_zai_base_url() that:
- Respects GLM_BASE_URL env var (no probe when explicitly set)
- Probes all candidate endpoints (global, cn, coding-global, coding-cn)
via detect_zai_endpoint() to find one that returns HTTP 200
- Caches the detected endpoint in provider state (auth.json) keyed on
a SHA-256 hash of the API key so subsequent starts skip the probe
- Falls back to the default URL if all probes fail
Wire into both _seed_from_env() in the credential pool and
resolve_api_key_provider_credentials() in the runtime resolver,
matching the pattern from the kimi-coding fix (PR #5566).
Fixes the same class of bug as #5561 but for the zai provider.
Cherry-picked from PR #5580 by MestreY0d4-Uninter.
- Share parent's credential pool with child agents for key rotation
- Leasing layer spreads parallel children across keys (least-loaded)
- Thread-safe acquire_lease/release_lease in CredentialPool
- Reverted sneaked-in tool-name restoration change (kept original
getattr + isinstance guard pattern)
Two remaining gaps from the codex empty-output spec:
1. Normalize dict-shaped streamed items: output_item.done events may
yield dicts (raw/fallback paths) instead of SDK objects. The
extraction loop now uses _item_get() that handles both getattr
and dict .get() access.
2. Avoid plain-text synthesis when function_call events were streamed:
tracks has_function_calls during streaming and skips text-delta
synthesis when tool calls are present — prevents collapsing a
tool-call response into a fake text message.
The _CodexCompletionsAdapter (used for compression, vision, web_extract,
session_search, and memory flush when on the codex provider) streamed
responses but discarded all events with 'for _event in stream: pass'.
When get_final_response() returned empty output (the same chatgpt.com
backend-api shape change), auxiliary calls silently returned None content.
Now collects response.output_item.done and text deltas during streaming
and backfills empty output — same pattern as _run_codex_stream().
Tested live against chatgpt.com/backend-api/codex with OAuth.
OpenAI OAuth refresh tokens are single-use and rotate on every refresh.
When the Codex CLI (or another Hermes profile) refreshes its token, the
pool entry's refresh_token becomes stale. Subsequent refresh attempts
fail with invalid_grant, and the entry enters a 24-hour exhaustion
cooldown with no recovery path.
This mirrors the existing _sync_anthropic_entry_from_credentials_file()
pattern: when an openai-codex entry is exhausted, compare its
refresh_token against ~/.codex/auth.json and sync the fresh pair if
they differ.
Fixes the common scenario where users run 'codex login' to refresh
their token externally and Hermes never picks it up.
Co-authored-by: David Andrews (LexGenius.ai) <david@lexgenius.ai>
Two fixes:
1. Replace all stale 'hermes login' references with 'hermes auth' across
auth.py, auxiliary_client.py, delegate_tool.py, config.py, run_agent.py,
and documentation. The 'hermes login' command was deprecated; 'hermes auth'
now handles OAuth credential management.
2. Fix credential removal not persisting for singleton-sourced credentials
(device_code for openai-codex/nous, hermes_pkce for anthropic).
auth_remove_command already cleared env vars for env-sourced credentials,
but singleton credentials stored in the auth store were re-seeded by
_seed_from_singletons() on the next load_pool() call. Now clears the
underlying auth store entry when removing singleton-sourced credentials.
Skills can now declare config.yaml settings via metadata.hermes.config
in their SKILL.md frontmatter. Values are stored under skills.config.*
namespace, prompted during hermes config migrate, shown in hermes config
show, and injected into the skill context at load time.
Also adds the llm-wiki skill (Karpathy's LLM Wiki pattern) as the first
skill to use the new config interface, declaring wiki.path.
Skill config interface (new):
- agent/skill_utils.py: extract_skill_config_vars(), discover_all_skill_config_vars(),
resolve_skill_config_values(), SKILL_CONFIG_PREFIX
- agent/skill_commands.py: _inject_skill_config() injects resolved values
into skill messages as [Skill config: ...] block
- hermes_cli/config.py: get_missing_skill_config_vars(), skill config
prompting in migrate_config(), Skill Settings in show_config()
LLM Wiki skill (skills/research/llm-wiki/SKILL.md):
- Three-layer architecture (raw sources, wiki pages, schema)
- Three operations (ingest, query, lint)
- Session orientation, page thresholds, tag taxonomy, update policy,
scaling guidance, log rotation, archiving workflow
Docs: creating-skills.md, configuration.md, skills.md, skills-catalog.md
Closes#5100
When a user runs out of OpenRouter credits and switches to Codex (or any
other provider), auxiliary tasks (compression, vision, web_extract) would
still try OpenRouter first and fail with 402. Two fixes:
1. Payment fallback in call_llm(): When a resolved provider returns HTTP 402
or a credit-related error, automatically retry with the next available
provider in the auto-detection chain. Skips the depleted provider and
tries Nous → Custom → Codex → API-key providers.
2. Remove hardcoded OpenRouter fallback: The old code fell back specifically
to OpenRouter when auto/custom resolution returned no client. Now falls
back to the full auto-detection chain, which handles any available
provider — not just OpenRouter.
Also extracts _get_provider_chain() as a shared function (replaces inline
tuple in _resolve_auto and the new fallback), built at call time so test
patches on _try_* functions remain visible.
Adds 16 tests covering _is_payment_error(), _get_provider_chain(),
_try_payment_fallback(), and call_llm() integration with 402 retry.
Telegram Bot API requires command names to contain only lowercase a-z,
digits 0-9, and underscores. Skill/plugin names containing characters
like +, /, @, or . caused set_my_commands to fail with
Bot_command_invalid.
Two-layer fix:
- scan_skill_commands(): strip non-alphanumeric/non-hyphen chars from
cmd_key at source, collapse consecutive hyphens, trim edges, skip
names that sanitize to empty string
- _sanitize_telegram_name(): centralized helper used by all 3 Telegram
name generation sites (core commands, plugin commands, skill commands)
with empty-name guard at each call site
Closes#5534
Grok models (x-ai/grok-4.20-beta, grok-code-fast-1) now receive tool-use
enforcement guidance, steering them to actually call tools instead of
describing intended actions. Matches both OpenRouter (x-ai/grok-*) and
direct xAI API usage.
Enable Hermes tool execution through the copilot-acp adapter by:
- Passing tool schemas and tool_choice into the ACP prompt text
- Instructing ACP backend to emit <tool_call>{...}</tool_call> blocks
- Parsing XML tool-call blocks and bare JSON fallback back into
Hermes-compatible SimpleNamespace tool call objects
- Setting finish_reason='tool_calls' when tool calls are extracted
- Cleaning tool-call markup from response text
Fix duplicate tool call extraction when both XML block and bare JSON
regexes matched the same content (XML blocks now take precedence).
Cherry-picked from PR #4536 by MestreY0d4-Uninter. Stripped heuristic
fallback system (auto-synthesized tool calls from prose) and
Portuguese-language patterns — tool execution should be model-decided,
not heuristic-guessed.
Consolidated salvage from PRs #5301 (qaqcvc), #5339 (lance0),
#5058 and #5098 (maymuneth).
Mem0 API v2 compatibility (#5301):
- All reads use filters={user_id: ...} instead of bare user_id= kwarg
- All writes use filters with user_id + agent_id for attribution
- Response unwrapping for v2 dict format {results: [...]}
- Split _read_filters() vs _write_filters() — reads are user-scoped
only for cross-session recall, writes include agent_id
- Preserved 'hermes-user' default (no breaking change for existing users)
- Omitted run_id scoping from #5301 — cross-session memory is Mem0's
core value, session-scoping reads would defeat that purpose
Memory prefetch context fencing (#5339):
- Wraps prefetched memory in <memory-context> fenced blocks with system
note marking content as recalled context, NOT user input
- Sanitizes provider output to strip fence-escape sequences, preventing
injection where memory content breaks out of the fence
- API-call-time only — never persisted to session history
Secret redaction (#5058, #5098):
- Added prefix patterns for Groq (gsk_), Matrix (syt_), RetainDB
(retaindb_), Hindsight (hsk-), Mem0 (mem0_), ByteRover (brv_)
Adds OPENAI_MODEL_EXECUTION_GUIDANCE — XML-tagged behavioral guidance
injected for GPT and Codex models alongside the existing tool-use
enforcement. Targets four specific failure modes:
- <tool_persistence>: retry on empty/partial results instead of giving up
- <prerequisite_checks>: do discovery/lookup before jumping to final action
- <verification>: check correctness/grounding/formatting before finalizing
- <missing_context>: use lookup tools instead of hallucinating
Follows the same injection pattern as GOOGLE_MODEL_OPERATIONAL_GUIDANCE
for Gemini/Gemma models. Inspired by OpenClaw PR #38953 and OpenAI's
GPT-5.4 prompting guide patterns.
As the agent navigates into subdirectories via tool calls (read_file,
terminal, search_files, etc.), automatically discover and load project
context files (AGENTS.md, CLAUDE.md, .cursorrules) from those directories.
Previously, context files were only loaded from the CWD at session start.
If the agent moved into backend/, frontend/, or any subdirectory with its
own AGENTS.md, those instructions were never seen.
Now, SubdirectoryHintTracker watches tool call arguments for file paths
and shell commands, resolves directories, and loads hint files on first
access. Discovered hints are appended to the tool result so the model
gets relevant context at the moment it starts working in a new area —
without modifying the system prompt (preserving prompt caching).
Features:
- Extracts paths from tool args (path, workdir) and shell commands
- Loads AGENTS.md, CLAUDE.md, .cursorrules (first match per directory)
- Deduplicates — each directory loaded at most once per session
- Ignores paths outside the working directory
- Truncates large hint files at 8K chars
- Works on both sequential and concurrent tool execution paths
Inspired by Block/goose SubdirectoryHintTracker.
Telegram's Bot API disallows hyphens in command names, so
_build_telegram_menu registers /claude-code as /claude_code. When the
user taps it from autocomplete, the gateway dispatch did a direct
lookup against skill_cmds (keyed on the hyphenated form) and missed,
silently falling through to the LLM as plain text. The model would
then typically call delegate_task, spawning a Hermes subagent instead
of invoking the intended skill.
Normalize underscores to hyphens in skill and plugin command lookup,
matching the existing pattern in _check_unavailable_skill.
Resolve exact label matches before treating digit-only input as a positional index so destructive auth removal does not mis-target credentials named with numeric labels.
Constraint: The CLI remove path must keep supporting existing index-based usage while adding safer label targeting
Rejected: Ban numeric labels | labels are free-form and existing users may already rely on them
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: When a destructive command accepts multiple identifier forms, prefer exact identity matches before fallback parsing heuristics
Tested: Focused pytest slice for auth commands, credential pool recovery, and routing (273 passed); py_compile on changed Python files
Not-tested: Full repository pytest suite
Persist structured exhaustion metadata from provider errors, use explicit reset timestamps when available, and expose label-based credential targeting in the auth CLI. This keeps long-lived Codex cooldowns from being misreported as one-hour waits and avoids forcing operators to manage entries by list position alone.
Constraint: Existing credential pool JSON needs to remain backward compatible with stored entries that only record status code and timestamp
Constraint: Runtime recovery must keep the existing retry-then-rotate semantics for 429s while enriching pool state with provider metadata
Rejected: Add a separate credential scheduler subsystem | too large for the Hermes pool architecture and unnecessary for this fix
Rejected: Only change CLI formatting | would leave runtime rotation blind to resets_at and preserve the serial-failure behavior
Confidence: high
Scope-risk: moderate
Reversibility: clean
Directive: Preserve structured rate-limit metadata when new providers expose reset hints; do not collapse back to status-code-only exhaustion tracking
Tested: Focused pytest slice for auth commands, credential pool recovery, and routing (272 passed); py_compile on changed Python files; hermes -w auth list/remove smoke test with temporary HERMES_HOME
Not-tested: Full repository pytest suite, broader gateway/integration flows outside the touched auth and pool paths
Users on direct API-key providers (Alibaba, DeepSeek, ZAI, etc.) without
an OpenRouter or Nous key would get broken auxiliary tasks (compression,
vision, etc.) because _resolve_auto() only tried aggregator providers
first, then fell back to iterating PROVIDER_REGISTRY with wrong default
model names.
Now _resolve_auto() checks the user's main provider first. If it's not
an aggregator (OpenRouter/Nous), it uses their main model directly for
all auxiliary tasks. Aggregator users still get the cheap gemini-flash
model as before.
Adds _read_main_provider() to read model.provider from config.yaml,
mirroring the existing _read_main_model().
Reported by SkyLinx — Alibaba Coding Plan user getting 400 errors from
google/gemini-3-flash-preview being sent to DashScope.
Bug fixes:
- agent/redact.py: catastrophic regex backtracking in _ENV_ASSIGN_RE — removed
re.IGNORECASE and changed [A-Z_]* to [A-Z0-9_]* to restrict matching to actual
env var name chars. Without this, the pattern backtracks exponentially on large
strings (e.g. 100K tool output), causing test_file_read_guards to time out.
- tools/file_operations.py: over-escaped newline in find -printf format string
produced literal backslash-n instead of a real newline, breaking file search
result parsing (total_count always 1, paths concatenated).
Test fixes:
- Remove stale pytestmark.skip from 4 test modules that were blanket-skipped as
'Hangs in non-interactive environments' but actually run fine:
- test_413_compression.py (12 tests, 25s)
- test_file_tools_live.py (71 tests, 24s)
- test_code_execution.py (61 tests, 99s)
- test_agent_loop_tool_calling.py (has proper OPENROUTER_API_KEY skip already)
- test_413_compression.py: fix threshold values in 2 preflight compression tests
where context_length was too small for the compressed output to fit in one pass.
- test_mcp_probe.py: add missing _MCP_AVAILABLE mock so tests work without MCP SDK.
- test_mcp_tool_issue_948.py: inject MCP symbols (StdioServerParameters etc.) when
SDK is not installed so patch() targets exist.
- test_approve_deny_commands.py: replace time.sleep(0.3) with deterministic polling
of _gateway_queues — fixes race condition where resolve fires before threads
register their approval entries, causing the test to hang indefinitely.
Net effect: +256 tests recovered from skip, 8 real failures fixed.
Address review feedback: replace bare `except: pass` with a debug
log when the post-retry write-back to ~/.claude/.credentials.json
fails. The write-back is best-effort (token is already resolved),
but logging helps troubleshooting.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
OAuth refresh tokens are single-use. When multiple consumers share the
same Anthropic OAuth session (credential pool entries, Claude Code CLI,
multiple Hermes profiles), whichever refreshes first invalidates the
refresh token for all others. This causes a cascade:
1. Pool entry tries to refresh with a consumed refresh token → 400
2. Pool marks the credential as "exhausted" with a 24-hour cooldown
3. All subsequent heartbeats skip the credential entirely
4. The fallback to resolve_anthropic_token() only works while the
access token in ~/.claude/.credentials.json hasn't expired
5. Once it expires, nothing can auto-recover without manual re-login
Fix:
- Add _sync_anthropic_entry_from_credentials_file() to detect when
~/.claude/.credentials.json has a newer refresh token and sync it
into the pool entry, clearing exhaustion status
- After a successful pool refresh, write the new tokens back to
~/.claude/.credentials.json so other consumers stay in sync
- On refresh failure, check if the credentials file has a different
(newer) refresh token and retry once before marking exhausted
- In _available_entries(), sync exhausted claude_code entries from
the credentials file before applying the 24-hour cooldown, so a
manual re-login or external refresh immediately unblocks agents
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two pre-existing issues causing test_file_read_guards timeouts on CI:
1. agent/redact.py: _ENV_ASSIGN_RE used unbounded [A-Z_]* with
IGNORECASE, matching any letter/underscore to end-of-string at
each position → O(n²) backtracking on 100K+ char inputs.
Bounded to {0,50} since env var names are never that long.
2. tools/file_tools.py: redact_sensitive_text() ran BEFORE the
character-count guard, so oversized content (that would be rejected
anyway) went through the expensive regex first. Reordered to check
size limit before redaction.
The Anthropic SDK appends /v1/messages to the base_url, so OpenCode's
base URL https://opencode.ai/zen/go/v1 produced a double /v1 path
(https://opencode.ai/zen/go/v1/v1/messages), causing 404s for MiniMax
models. Strip trailing /v1 when api_mode is anthropic_messages.
Also adds MiMo-V2-Pro, MiMo-V2-Omni, and MiniMax-M2.5 to the OpenCode
Go model lists per their updated docs.
Fixes#4890
Three interconnected bugs caused `hermes skills config` per-platform
settings to be silently ignored:
1. telegram_menu_commands() never filtered disabled skills — all skills
consumed menu slots regardless of platform config, hitting Telegram's
100 command cap. Now loads disabled skills for 'telegram' and excludes
them from the menu.
2. Gateway skill dispatch executed disabled skills because
get_skill_commands() (process-global cache) only filters by the global
disabled list at scan time. Added per-platform check before execution,
returning an actionable 'skill is disabled' message.
3. get_disabled_skill_names() only checked HERMES_PLATFORM env var, but
the gateway sets HERMES_SESSION_PLATFORM instead. Added
HERMES_SESSION_PLATFORM as fallback, plus an explicit platform=
parameter for callers that know their platform (menu builder, gateway
dispatch). Also added platform to prompt_builder's skills cache key
so multi-platform gateways get correct per-platform skill prompts.
Reported by SteveSkedasticity (CLAW community).
* feat(memory): add pluggable memory provider interface with profile isolation
Introduces a pluggable MemoryProvider ABC so external memory backends can
integrate with Hermes without modifying core files. Each backend becomes a
plugin implementing a standard interface, orchestrated by MemoryManager.
Key architecture:
- agent/memory_provider.py — ABC with core + optional lifecycle hooks
- agent/memory_manager.py — single integration point in the agent loop
- agent/builtin_memory_provider.py — wraps existing MEMORY.md/USER.md
Profile isolation fixes applied to all 6 shipped plugins:
- Cognitive Memory: use get_hermes_home() instead of raw env var
- Hindsight Memory: check $HERMES_HOME/hindsight/config.json first,
fall back to legacy ~/.hindsight/ for backward compat
- Hermes Memory Store: replace hardcoded ~/.hermes paths with
get_hermes_home() for config loading and DB path defaults
- Mem0 Memory: use get_hermes_home() instead of raw env var
- RetainDB Memory: auto-derive profile-scoped project name from
hermes_home path (hermes-<profile>), explicit env var overrides
- OpenViking Memory: read-only, no local state, isolation via .env
MemoryManager.initialize_all() now injects hermes_home into kwargs so
every provider can resolve profile-scoped storage without importing
get_hermes_home() themselves.
Plugin system: adds register_memory_provider() to PluginContext and
get_plugin_memory_providers() accessor.
Based on PR #3825. 46 tests (37 unit + 5 E2E + 4 plugin registration).
* refactor(memory): drop cognitive plugin, rewrite OpenViking as full provider
Remove cognitive-memory plugin (#727) — core mechanics are broken:
decay runs 24x too fast (hourly not daily), prefetch uses row ID as
timestamp, search limited by importance not similarity.
Rewrite openviking-memory plugin from a read-only search wrapper into
a full bidirectional memory provider using the complete OpenViking
session lifecycle API:
- sync_turn: records user/assistant messages to OpenViking session
(threaded, non-blocking)
- on_session_end: commits session to trigger automatic memory extraction
into 6 categories (profile, preferences, entities, events, cases,
patterns)
- prefetch: background semantic search via find() endpoint
- on_memory_write: mirrors built-in memory writes to the session
- is_available: checks env var only, no network calls (ABC compliance)
Tools expanded from 3 to 5:
- viking_search: semantic search with mode/scope/limit
- viking_read: tiered content (abstract ~100tok / overview ~2k / full)
- viking_browse: filesystem-style navigation (list/tree/stat)
- viking_remember: explicit memory storage via session
- viking_add_resource: ingest URLs/docs into knowledge base
Uses direct HTTP via httpx (no openviking SDK dependency needed).
Response truncation on viking_read to prevent context flooding.
* fix(memory): harden Mem0 plugin — thread safety, non-blocking sync, circuit breaker
- Remove redundant mem0_context tool (identical to mem0_search with
rerank=true, top_k=5 — wastes a tool slot and confuses the model)
- Thread sync_turn so it's non-blocking — Mem0's server-side LLM
extraction can take 5-10s, was stalling the agent after every turn
- Add threading.Lock around _get_client() for thread-safe lazy init
(prefetch and sync threads could race on first client creation)
- Add circuit breaker: after 5 consecutive API failures, pause calls
for 120s instead of hammering a down server every turn. Auto-resets
after cooldown. Logs a warning when tripped.
- Track success/failure in prefetch, sync_turn, and all tool calls
- Wait for previous sync to finish before starting a new one (prevents
unbounded thread accumulation on rapid turns)
- Clean up shutdown to join both prefetch and sync threads
* fix(memory): enforce single external memory provider limit
MemoryManager now rejects a second non-builtin provider with a warning.
Built-in memory (MEMORY.md/USER.md) is always accepted. Only ONE
external plugin provider is allowed at a time. This prevents tool
schema bloat (some providers add 3-5 tools each) and conflicting
memory backends.
The warning message directs users to configure memory.provider in
config.yaml to select which provider to activate.
Updated all 47 tests to use builtin + one external pattern instead
of multiple externals. Added test_second_external_rejected to verify
the enforcement.
* feat(memory): add ByteRover memory provider plugin
Implements the ByteRover integration (from PR #3499 by hieuntg81) as a
MemoryProvider plugin instead of direct run_agent.py modifications.
ByteRover provides persistent memory via the brv CLI — a hierarchical
knowledge tree with tiered retrieval (fuzzy text then LLM-driven search).
Local-first with optional cloud sync.
Plugin capabilities:
- prefetch: background brv query for relevant context
- sync_turn: curate conversation turns (threaded, non-blocking)
- on_memory_write: mirror built-in memory writes to brv
- on_pre_compress: extract insights before context compression
Tools (3):
- brv_query: search the knowledge tree
- brv_curate: store facts/decisions/patterns
- brv_status: check CLI version and context tree state
Profile isolation: working directory at $HERMES_HOME/byterover/ (scoped
per profile). Binary resolution cached with thread-safe double-checked
locking. All write operations threaded to avoid blocking the agent
(curate can take 120s with LLM processing).
* fix(memory): thread remaining sync_turns, fix holographic, add config key
Plugin fixes:
- Hindsight: thread sync_turn (was blocking up to 30s via _run_in_thread)
- RetainDB: thread sync_turn (was blocking on HTTP POST)
- Both: shutdown now joins sync threads alongside prefetch threads
Holographic retrieval fixes:
- reason(): removed dead intersection_key computation (bundled but never
used in scoring). Now reuses pre-computed entity_residuals directly,
moved role_content encoding outside the inner loop.
- contradict(): added _MAX_CONTRADICT_FACTS=500 scaling guard. Above
500 facts, only checks the most recently updated ones to avoid O(n^2)
explosion (~125K comparisons at 500 is acceptable).
Config:
- Added memory.provider key to DEFAULT_CONFIG ("" = builtin only).
No version bump needed (deep_merge handles new keys automatically).
* feat(memory): extract Honcho as a MemoryProvider plugin
Creates plugins/honcho-memory/ as a thin adapter over the existing
honcho_integration/ package. All 4 Honcho tools (profile, search,
context, conclude) move from the normal tool registry to the
MemoryProvider interface.
The plugin delegates all work to HonchoSessionManager — no Honcho
logic is reimplemented. It uses the existing config chain:
$HERMES_HOME/honcho.json -> ~/.honcho/config.json -> env vars.
Lifecycle hooks:
- initialize: creates HonchoSessionManager via existing client factory
- prefetch: background dialectic query
- sync_turn: records messages + flushes to API (threaded)
- on_memory_write: mirrors user profile writes as conclusions
- on_session_end: flushes all pending messages
This is a prerequisite for the MemoryManager wiring in run_agent.py.
Once wired, Honcho goes through the same provider interface as all
other memory plugins, and the scattered Honcho code in run_agent.py
can be consolidated into the single MemoryManager integration point.
* feat(memory): wire MemoryManager into run_agent.py
Adds 8 integration points for the external memory provider plugin,
all purely additive (zero existing code modified):
1. Init (~L1130): Create MemoryManager, find matching plugin provider
from memory.provider config, initialize with session context
2. Tool injection (~L1160): Append provider tool schemas to self.tools
and self.valid_tool_names after memory_manager init
3. System prompt (~L2705): Add external provider's system_prompt_block
alongside existing MEMORY.md/USER.md blocks
4. Tool routing (~L5362): Route provider tool calls through
memory_manager.handle_tool_call() before the catchall handler
5. Memory write bridge (~L5353): Notify external provider via
on_memory_write() when the built-in memory tool writes
6. Pre-compress (~L5233): Call on_pre_compress() before context
compression discards messages
7. Prefetch (~L6421): Inject provider prefetch results into the
current-turn user message (same pattern as Honcho turn context)
8. Turn sync + session end (~L8161, ~L8172): sync_all() after each
completed turn, queue_prefetch_all() for next turn, on_session_end()
+ shutdown_all() at conversation end
All hooks are wrapped in try/except — a failing provider never breaks
the agent. The existing memory system, Honcho integration, and all
other code paths are completely untouched.
Full suite: 7222 passed, 4 pre-existing failures.
* refactor(memory): remove legacy Honcho integration from core
Extracts all Honcho-specific code from run_agent.py, model_tools.py,
toolsets.py, and gateway/run.py. Honcho is now exclusively available
as a memory provider plugin (plugins/honcho-memory/).
Removed from run_agent.py (-457 lines):
- Honcho init block (session manager creation, activation, config)
- 8 Honcho methods: _honcho_should_activate, _strip_honcho_tools,
_activate_honcho, _register_honcho_exit_hook, _queue_honcho_prefetch,
_honcho_prefetch, _honcho_save_user_observation, _honcho_sync
- _inject_honcho_turn_context module-level function
- Honcho system prompt block (tool descriptions, CLI commands)
- Honcho context injection in api_messages building
- Honcho params from __init__ (honcho_session_key, honcho_manager,
honcho_config)
- HONCHO_TOOL_NAMES constant
- All honcho-specific tool dispatch forwarding
Removed from other files:
- model_tools.py: honcho_tools import, honcho params from handle_function_call
- toolsets.py: honcho toolset definition, honcho tools from core tools list
- gateway/run.py: honcho params from AIAgent constructor calls
Removed tests (-339 lines):
- 9 Honcho-specific test methods from test_run_agent.py
- TestHonchoAtexitFlush class from test_exit_cleanup_interrupt.py
Restored two regex constants (_SURROGATE_RE, _BUDGET_WARNING_RE) that
were accidentally removed during the honcho function extraction.
The honcho_integration/ package is kept intact — the plugin delegates
to it. tools/honcho_tools.py registry entries are now dead code (import
commented out in model_tools.py) but the file is preserved for reference.
Full suite: 7207 passed, 4 pre-existing failures. Zero regressions.
* refactor(memory): restructure plugins, add CLI, clean gateway, migration notice
Plugin restructure:
- Move all memory plugins from plugins/<name>-memory/ to plugins/memory/<name>/
(byterover, hindsight, holographic, honcho, mem0, openviking, retaindb)
- New plugins/memory/__init__.py discovery module that scans the directory
directly, loading providers by name without the general plugin system
- run_agent.py uses load_memory_provider() instead of get_plugin_memory_providers()
CLI wiring:
- hermes memory setup — interactive curses picker + config wizard
- hermes memory status — show active provider, config, availability
- hermes memory off — disable external provider (built-in only)
- hermes honcho — now shows migration notice pointing to hermes memory setup
Gateway cleanup:
- Remove _get_or_create_gateway_honcho (already removed in prev commit)
- Remove _shutdown_gateway_honcho and _shutdown_all_gateway_honcho methods
- Remove all calls to shutdown methods (4 call sites)
- Remove _honcho_managers/_honcho_configs dict references
Dead code removal:
- Delete tools/honcho_tools.py (279 lines, import was already commented out)
- Delete tests/gateway/test_honcho_lifecycle.py (131 lines, tested removed methods)
- Remove if False placeholder from run_agent.py
Migration:
- Honcho migration notice on startup: detects existing honcho.json or
~/.honcho/config.json, prints guidance to run hermes memory setup.
Only fires when memory.provider is not set and not in quiet mode.
Full suite: 7203 passed, 4 pre-existing failures. Zero regressions.
* feat(memory): standardize plugin config + add per-plugin documentation
Config architecture:
- Add save_config(values, hermes_home) to MemoryProvider ABC
- Honcho: writes to $HERMES_HOME/honcho.json (SDK native)
- Mem0: writes to $HERMES_HOME/mem0.json
- Hindsight: writes to $HERMES_HOME/hindsight/config.json
- Holographic: writes to config.yaml under plugins.hermes-memory-store
- OpenViking/RetainDB/ByteRover: env-var only (default no-op)
Setup wizard (hermes memory setup):
- Now calls provider.save_config() for non-secret config
- Secrets still go to .env via env vars
- Only memory.provider activation key goes to config.yaml
Documentation:
- README.md for each of the 7 providers in plugins/memory/<name>/
- Requirements, setup (wizard + manual), config reference, tools table
- Consistent format across all providers
The contract for new memory plugins:
- get_config_schema() declares all fields (REQUIRED)
- save_config() writes native config (REQUIRED if not env-var-only)
- Secrets use env_var field in schema, written to .env by wizard
- README.md in the plugin directory
* docs: add memory providers user guide + developer guide
New pages:
- user-guide/features/memory-providers.md — comprehensive guide covering
all 7 shipped providers (Honcho, OpenViking, Mem0, Hindsight,
Holographic, RetainDB, ByteRover). Each with setup, config, tools,
cost, and unique features. Includes comparison table and profile
isolation notes.
- developer-guide/memory-provider-plugin.md — how to build a new memory
provider plugin. Covers ABC, required methods, config schema,
save_config, threading contract, profile isolation, testing.
Updated pages:
- user-guide/features/memory.md — replaced Honcho section with link to
new Memory Providers page
- user-guide/features/honcho.md — replaced with migration redirect to
the new Memory Providers page
- sidebars.ts — added both new pages to navigation
* fix(memory): auto-migrate Honcho users to memory provider plugin
When honcho.json or ~/.honcho/config.json exists but memory.provider
is not set, automatically set memory.provider: honcho in config.yaml
and activate the plugin. The plugin reads the same config files, so
all data and credentials are preserved. Zero user action needed.
Persists the migration to config.yaml so it only fires once. Prints
a one-line confirmation in non-quiet mode.
* fix(memory): only auto-migrate Honcho when enabled + credentialed
Check HonchoClientConfig.enabled AND (api_key OR base_url) before
auto-migrating — not just file existence. Prevents false activation
for users who disabled Honcho, stopped using it (config lingers),
or have ~/.honcho/ from a different tool.
* feat(memory): auto-install pip dependencies during hermes memory setup
Reads pip_dependencies from plugin.yaml, checks which are missing,
installs them via pip before config walkthrough. Also shows install
guidance for external_dependencies (e.g. brv CLI for ByteRover).
Updated all 7 plugin.yaml files with pip_dependencies:
- honcho: honcho-ai
- mem0: mem0ai
- openviking: httpx
- hindsight: hindsight-client
- holographic: (none)
- retaindb: requests
- byterover: (external_dependencies for brv CLI)
* fix: remove remaining Honcho crash risks from cli.py and gateway
cli.py: removed Honcho session re-mapping block (would crash importing
deleted tools/honcho_tools.py), Honcho flush on compress, Honcho
session display on startup, Honcho shutdown on exit, honcho_session_key
AIAgent param.
gateway/run.py: removed honcho_session_key params from helper methods,
sync_honcho param, _honcho.shutdown() block.
tests: fixed test_cron_session_with_honcho_key_skipped (was passing
removed honcho_key param to _flush_memories_for_session).
* fix: include plugins/ in pyproject.toml package list
Without this, plugins/memory/ wouldn't be included in non-editable
installs. Hermes always runs from the repo checkout so this is belt-
and-suspenders, but prevents breakage if the install method changes.
* fix(memory): correct pip-to-import name mapping for dep checks
The heuristic dep.replace('-', '_') fails for packages where the pip
name differs from the import name: honcho-ai→honcho, mem0ai→mem0,
hindsight-client→hindsight_client. Added explicit mapping table so
hermes memory setup doesn't try to reinstall already-installed packages.
* chore: remove dead code from old plugin memory registration path
- hermes_cli/plugins.py: removed register_memory_provider(),
_memory_providers list, get_plugin_memory_providers() — memory
providers now use plugins/memory/ discovery, not the general plugin system
- hermes_cli/main.py: stripped 74 lines of dead honcho argparse
subparsers (setup, status, sessions, map, peer, mode, tokens,
identity, migrate) — kept only the migration redirect
- agent/memory_provider.py: updated docstring to reflect new
registration path
- tests: replaced TestPluginMemoryProviderRegistration with
TestPluginMemoryDiscovery that tests the actual plugins/memory/
discovery system. Added 3 new tests (discover, load, nonexistent).
* chore: delete dead honcho_integration/cli.py and its tests
cli.py (794 lines) was the old 'hermes honcho' command handler — nobody
calls it since cmd_honcho was replaced with a migration redirect.
Deleted tests that imported from removed code:
- tests/honcho_integration/test_cli.py (tested _resolve_api_key)
- tests/honcho_integration/test_config_isolation.py (tested CLI config paths)
- tests/tools/test_honcho_tools.py (tested the deleted tools/honcho_tools.py)
Remaining honcho_integration/ files (actively used by the plugin):
- client.py (445 lines) — config loading, SDK client creation
- session.py (991 lines) — session management, queries, flush
* refactor: move honcho_integration/ into the honcho plugin
Moves client.py (445 lines) and session.py (991 lines) from the
top-level honcho_integration/ package into plugins/memory/honcho/.
No Honcho code remains in the main codebase.
- plugins/memory/honcho/client.py — config loading, SDK client creation
- plugins/memory/honcho/session.py — session management, queries, flush
- Updated all imports: run_agent.py (auto-migration), hermes_cli/doctor.py,
plugin __init__.py, session.py cross-import, all tests
- Removed honcho_integration/ package and pyproject.toml entry
- Renamed tests/honcho_integration/ → tests/honcho_plugin/
* docs: update architecture + gateway-internals for memory provider system
- architecture.md: replaced honcho_integration/ with plugins/memory/
- gateway-internals.md: replaced Honcho-specific session routing and
flush lifecycle docs with generic memory provider interface docs
* fix: update stale mock path for resolve_active_host after honcho plugin migration
* fix(memory): address review feedback — P0 lifecycle, ABC contract, honcho CLI restore
Review feedback from Honcho devs (erosika):
P0 — Provider lifecycle:
- Remove on_session_end() + shutdown_all() from run_conversation() tail
(was killing providers after every turn in multi-turn sessions)
- Add shutdown_memory_provider() method on AIAgent for callers
- Wire shutdown into CLI atexit, reset_conversation, gateway stop/expiry
Bug fixes:
- Remove sync_honcho=False kwarg from /btw callsites (TypeError crash)
- Fix doctor.py references to dead 'hermes honcho setup' command
- Cache prefetch_all() before tool loop (was re-calling every iteration)
ABC contract hardening (all backwards-compatible):
- Add session_id kwarg to prefetch/sync_turn/queue_prefetch
- Make on_pre_compress() return str (provider insights in compression)
- Add **kwargs to on_turn_start() for runtime context
- Add on_delegation() hook for parent-side subagent observation
- Document agent_context/agent_identity/agent_workspace kwargs on
initialize() (prevents cron corruption, enables profile scoping)
- Fix docstring: single external provider, not multiple
Honcho CLI restoration:
- Add plugins/memory/honcho/cli.py (from main's honcho_integration/cli.py
with imports adapted to plugin path)
- Restore full hermes honcho command with all subcommands (status, peer,
mode, tokens, identity, enable/disable, sync, peers, --target-profile)
- Restore auto-clone on profile creation + sync on hermes update
- hermes honcho setup now redirects to hermes memory setup
* fix(memory): wire on_delegation, skip_memory for cron/flush, fix ByteRover return type
- Wire on_delegation() in delegate_tool.py — parent's memory provider
is notified with task+result after each subagent completes
- Add skip_memory=True to cron scheduler (prevents cron system prompts
from corrupting user representations — closes#4052)
- Add skip_memory=True to gateway flush agent (throwaway agent shouldn't
activate memory provider)
- Fix ByteRover on_pre_compress() return type: None -> str
* fix(honcho): port profile isolation fixes from PR #4632
Ports 5 bug fixes found during profile testing (erosika's PR #4632):
1. 3-tier config resolution — resolve_config_path() now checks
$HERMES_HOME/honcho.json → ~/.hermes/honcho.json → ~/.honcho/config.json
(non-default profiles couldn't find shared host blocks)
2. Thread host=_host_key() through from_global_config() in cmd_setup,
cmd_status, cmd_identity (--target-profile was being ignored)
3. Use bare profile name as aiPeer (not host key with dots) — Honcho's
peer ID pattern is ^[a-zA-Z0-9_-]+$, dots are invalid
4. Wrap add_peers() in try/except — was fatal on new AI peers, killed
all message uploads for the session
5. Gate Honcho clone behind --clone/--clone-all on profile create
(bare create should be blank-slate)
Also: sanitize assistant_peer_id via _sanitize_id()
* fix(tests): add module cleanup fixture to test_cli_provider_resolution
test_cli_provider_resolution._import_cli() wipes tools.*, cli, and
run_agent from sys.modules to force fresh imports, but had no cleanup.
This poisoned all subsequent tests on the same xdist worker — mocks
targeting tools.file_tools, tools.send_message_tool, etc. patched the
NEW module object while already-imported functions still referenced
the OLD one. Caused ~25 cascade failures: send_message KeyError,
process_registry FileNotFoundError, file_read_guards timeouts,
read_loop_detection file-not-found, mcp_oauth None port, and
provider_parity/codex_execution stale tool lists.
Fix: autouse fixture saves all affected modules before each test and
restores them after, matching the pattern in
test_managed_browserbase_and_modal.py.
Anthropic extended thinking blocks include an opaque 'signature' field
required for thinking chain continuity across multi-turn tool-use
conversations. Previously, normalize_anthropic_response() extracted
only the thinking text and set reasoning_details=None, discarding the
signature. On subsequent turns the API could not verify the chain.
Changes:
- _to_plain_data(): new recursive SDK-to-dict converter with depth cap
(20 levels) and path-based cycle detection for safety
- _extract_preserved_thinking_blocks(): rehydrates preserved thinking
blocks (including signature) from reasoning_details on assistant
messages, placing them before tool_use blocks as Anthropic requires
- normalize_anthropic_response(): stores full thinking blocks in
reasoning_details via _to_plain_data()
- _extract_reasoning(): adds 'thinking' key to the detail lookup chain
so Anthropic-format details are found alongside OpenRouter format
Salvaged from PR #4503 by @priveperfumes — focused on the thinking
block continuity fix only (cache strategy and other changes excluded).
Setup wizard now shows existing allowed_users when reconfiguring a
platform and preserves them if the user presses Enter. Previously the
wizard would display a misleading "No allowlist set" warning even when
the .env still held the original IDs.
Also downgrades the "provider X has no API key configured" log from
WARNING to DEBUG in resolve_provider_client — callers already handle
the None return with their own contextual messages. This eliminates
noisy startup warnings for providers in the fallback chain that the
user never configured (e.g. minimax).
- Add missing `from agent.credential_pool import load_pool` import to
auxiliary_client.py (introduced by the credential pool feature in main)
- Thread `args` through `select_provider_and_model(args=None)` so TLS
options from `cmd_model` reach `_model_flow_nous`
- Mock `_require_tty` in test_cmd_model_forwards_nous_login_tls_options
so it can run in non-interactive test environments
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
OpenAI's newer models (GPT-5, Codex) give stronger instruction-following
weight to the 'developer' role vs 'system'. Swap the role at the API
boundary in _build_api_kwargs() for the chat_completions path so internal
message representation stays consistent ('system' everywhere).
Applies regardless of provider — OpenRouter, Nous portal, direct, etc.
The codex_responses path (direct OpenAI) uses 'instructions' instead of
message roles, so it's unaffected.
DEVELOPER_ROLE_MODELS constant in prompt_builder.py defines the matching
model name substrings: ('gpt-5', 'codex').
When PyYAML is unavailable or YAML frontmatter is malformed, the fallback
parser may return metadata as a string instead of a dict. This causes
AttributeError when calling .get("hermes") on the string.
Added explicit type checks to handle cases where metadata or hermes fields
are not dicts, preventing the crash.
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
The total_tokens field includes cache_read + cache_write tokens, but
the display only showed input + output — making the math look wrong
(e.g. 765K + 134K displayed but total said 9.2M). Now shows a cache
line when cache tokens are present so all visible numbers sum to the
displayed total.
Affects both terminal (hermes insights) and gateway (/insights)
formats.
Show inline diffs in the CLI transcript when write_file, patch, or
skill_manage modifies files. Captures a filesystem snapshot before the
tool runs, computes a unified diff after, and renders it with ANSI
coloring in the activity feed.
Adds tool_start_callback and tool_complete_callback hooks to AIAgent
for pre/post tool execution notifications.
Also fixes _extract_parallel_scope_path to normalize relative paths
to absolute, preventing the parallel overlap detection from missing
conflicts when the same file is referenced with different path styles.
Gated by display.inline_diffs config option (default: true).
Based on PR #3774 by @kshitijk4poor.
Three bugs prevented credential pool rotation from working when multiple
Codex OAuth tokens were configured:
1. credential_pool was dropped during smart model turn routing.
resolve_turn_route() constructed runtime dicts without it, so the
AIAgent was created without pool access. Fixed in smart_model_routing.py
(no-route and fallback paths), cli.py, and gateway/run.py.
2. Eager fallback fired before pool rotation on 429. The rate-limit
handler at line ~7180 switched to a fallback provider immediately,
before _recover_with_credential_pool got a chance to rotate to the
next credential. Now deferred when the pool still has credentials.
3. (Non-issue) Retry budget was reported as too small, but successful
pool rotations already skip retry_count increment — no change needed.
Reported by community member Schinsly who identified all three root
causes and verified the fix locally with multiple Codex accounts.
Follow-up to PR #4305 — .config/gh was added to the write-deny list
but missed from _SENSITIVE_HOME_DIRS, leaving GitHub CLI OAuth tokens
exposed via @file:~/.config/gh/hosts.yml context injection.
- Add gho_, ghu_, ghs_, ghr_ prefix patterns (OAuth, user-to-server,
server-to-server, and refresh tokens) — all four types used by
GitHub Apps and Copilot auth flows were absent from _PREFIX_PATTERNS
- Snapshot HERMES_REDACT_SECRETS at module import time instead of
re-reading os.getenv() on every call, preventing runtime env mutations
(e.g. LLM-generated export commands) from disabling redaction
* feat(auth): add same-provider credential pools and rotation UX
Add same-provider credential pooling so Hermes can rotate across
multiple credentials for a single provider, recover from exhausted
credentials without jumping providers immediately, and configure
that behavior directly in hermes setup.
- agent/credential_pool.py: persisted per-provider credential pools
- hermes auth add/list/remove/reset CLI commands
- 429/402/401 recovery with pool rotation in run_agent.py
- Setup wizard integration for pool strategy configuration
- Auto-seeding from env vars and existing OAuth state
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Salvaged from PR #2647
* fix(tests): prevent pool auto-seeding from host env in credential pool tests
Tests for non-pool Anthropic paths and auth remove were failing when
host env vars (ANTHROPIC_API_KEY) or file-backed OAuth credentials
were present. The pool auto-seeding picked these up, causing unexpected
pool entries in tests.
- Mock _select_pool_entry in auxiliary_client OAuth flag tests
- Clear Anthropic env vars and mock _seed_from_singletons in auth remove test
* feat(auth): add thread safety, least_used strategy, and request counting
- Add threading.Lock to CredentialPool for gateway thread safety
(concurrent requests from multiple gateway sessions could race on
pool state mutations without this)
- Add 'least_used' rotation strategy that selects the credential
with the lowest request_count, distributing load more evenly
- Add request_count field to PooledCredential for usage tracking
- Add mark_used() method to increment per-credential request counts
- Wrap select(), mark_exhausted_and_rotate(), and try_refresh_current()
with lock acquisition
- Add tests: least_used selection, mark_used counting, concurrent
thread safety (4 threads × 20 selects with no corruption)
* feat(auth): add interactive mode for bare 'hermes auth' command
When 'hermes auth' is called without a subcommand, it now launches an
interactive wizard that:
1. Shows full credential pool status across all providers
2. Offers a menu: add, remove, reset cooldowns, set strategy
3. For OAuth-capable providers (anthropic, nous, openai-codex), the
add flow explicitly asks 'API key or OAuth login?' — making it
clear that both auth types are supported for the same provider
4. Strategy picker shows all 4 options (fill_first, round_robin,
least_used, random) with the current selection marked
5. Remove flow shows entries with indices for easy selection
The subcommand paths (hermes auth add/list/remove/reset) still work
exactly as before for scripted/non-interactive use.
* fix(tests): update runtime_provider tests for config.yaml source of truth (#4165)
Tests were using OPENAI_BASE_URL env var which is no longer consulted
after #4165. Updated to use model config (provider, base_url, api_key)
which is the new single source of truth for custom endpoint URLs.
* feat(auth): support custom endpoint credential pools keyed by provider name
Custom OpenAI-compatible endpoints all share provider='custom', making
the provider-keyed pool useless. Now pools for custom endpoints are
keyed by 'custom:<normalized_name>' where the name comes from the
custom_providers config list (auto-generated from URL hostname).
- Pool key format: 'custom:together.ai', 'custom:local-(localhost:8080)'
- load_pool('custom:name') seeds from custom_providers api_key AND
model.api_key when base_url matches
- hermes auth add/list now shows custom endpoints alongside registry
providers
- _resolve_openrouter_runtime and _resolve_named_custom_runtime check
pool before falling back to single config key
- 6 new tests covering custom pool keying, seeding, and listing
* docs: add Excalidraw diagram of full credential pool flow
Comprehensive architecture diagram showing:
- Credential sources (env vars, auth.json OAuth, config.yaml, CLI)
- Pool storage and auto-seeding
- Runtime resolution paths (registry, custom, OpenRouter)
- Error recovery (429 retry-then-rotate, 402 immediate, 401 refresh)
- CLI management commands and strategy configuration
Open at: https://excalidraw.com/#json=2Ycqhqpi6f12E_3ITyiwh,c7u9jSt5BwrmiVzHGbm87g
* fix(tests): update setup wizard pool tests for unified select_provider_and_model flow
The setup wizard now delegates to select_provider_and_model() instead
of using its own prompt_choice-based provider picker. Tests needed:
- Mock select_provider_and_model as no-op (provider pre-written to config)
- Call _stub_tts BEFORE custom prompt_choice mock (it overwrites it)
- Pre-write model.provider to config so the pool step is reached
* docs: add comprehensive credential pool documentation
- New page: website/docs/user-guide/features/credential-pools.md
Full guide covering quick start, CLI commands, rotation strategies,
error recovery, custom endpoint pools, auto-discovery, thread safety,
architecture, and storage format.
- Updated fallback-providers.md to reference credential pools as the
first layer of resilience (same-provider rotation before cross-provider)
- Added hermes auth to CLI commands reference with usage examples
- Added credential_pool_strategies to configuration guide
* chore: remove excalidraw diagram from repo (external link only)
* refactor: simplify credential pool code — extract helpers, collapse extras, dedup patterns
- _load_config_safe(): replace 4 identical try/except/import blocks
- _iter_custom_providers(): shared generator for custom provider iteration
- PooledCredential.extra dict: collapse 11 round-trip-only fields
(token_type, scope, client_id, portal_base_url, obtained_at,
expires_in, agent_key_id, agent_key_expires_in, agent_key_reused,
agent_key_obtained_at, tls) into a single extra dict with
__getattr__ for backward-compatible access
- _available_entries(): shared exhaustion-check between select and peek
- Dedup anthropic OAuth seeding (hermes_pkce + claude_code identical)
- SimpleNamespace replaces class _Args boilerplate in auth_commands
- _try_resolve_from_custom_pool(): shared pool-check in runtime_provider
Net -17 lines. All 383 targeted tests pass.
---------
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
OPENAI_BASE_URL was written to .env AND config.yaml, creating a dual-source
confusion. Users (especially Docker) would see the URL in .env and assume
that's where all config lives, then wonder why LLM_MODEL in .env didn't work.
Changes:
- Remove all 27 save_env_value("OPENAI_BASE_URL", ...) calls across main.py,
setup.py, and tools_config.py
- Remove OPENAI_BASE_URL env var reading from runtime_provider.py, cli.py,
models.py, and gateway/run.py
- Remove LLM_MODEL/HERMES_MODEL env var reading from gateway/run.py and
auxiliary_client.py — config.yaml model.default is authoritative
- Vision base URL now saved to config.yaml auxiliary.vision.base_url
(both setup wizard and tools_config paths)
- Tests updated to set config values instead of env vars
Convention enforced: .env is for SECRETS only (API keys). All other
configuration (model names, base URLs, provider selection) lives
exclusively in config.yaml.
- Add api.fireworks.ai to _URL_TO_PROVIDER for automatic provider detection
- Add fireworks to PROVIDER_TO_MODELS_DEV mapped to 'fireworks-ai' (the
correct models.dev provider key — original PR used 'fireworks' which
would silently fail the lookup)
Cherry-picked from PR #3989 with models.dev key fix.
Co-authored-by: sroecker <sroecker@users.noreply.github.com>
Claude Code >=2.1.81 checks for a 'scopes' array containing 'user:inference'
in ~/.claude/.credentials.json before accepting stored OAuth tokens as valid.
When Hermes refreshes the token, it writes only accessToken, refreshToken, and
expiresAt — omitting the scopes field. This causes Claude Code to report
'loggedIn: false' and refuse to start, even though the token is valid.
This commit:
- Parses the 'scope' field from the OAuth refresh response
- Passes it to _write_claude_code_credentials() as a keyword argument
- Persists the scopes array in the claudeAiOauth credential store
- Preserves existing scopes when the refresh response omits the field
Tested against Claude Code v2.1.87 on Linux — auth status correctly reports
loggedIn: true and claude --print works after this fix.
Co-authored-by: Nick <git@flybynight.io>
* fix: treat non-sk-ant- prefixed keys (Azure AI Foundry) as regular API keys, not OAuth tokens
* fix: treat non-sk-ant- keys as regular API keys, not OAuth tokens
_is_oauth_token() returned True for any key not starting with
sk-ant-api, misclassifying Azure AI Foundry keys as OAuth tokens
and sending Bearer auth instead of x-api-key → 401 rejection.
Real Anthropic OAuth tokens all start with sk-ant-oat (confirmed
from live .credentials.json). Non-sk-ant- keys are third-party
provider keys that should use x-api-key.
Test fixtures updated to use realistic sk-ant-oat01- prefixed
tokens instead of fake strings.
Salvaged from PR #4075 by @HangGlidersRule.
---------
Co-authored-by: Clawdbot <clawdbot@openclaw.ai>
MiniMax's /anthropic endpoints implement Anthropic's Messages API but
require Authorization: Bearer instead of x-api-key. Without this fix,
MiniMax users get 401 errors in gateway sessions.
Adds _requires_bearer_auth() to detect MiniMax endpoints and route
through auth_token in the Anthropic SDK. Check runs before OAuth
token detection so MiniMax keys aren't misclassified as setup tokens.
Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
ElevenLabs (sk_), Tavily (tvly-), and Exa (exa_) keys were not covered
by _PREFIX_PATTERNS, leaking in plain text via printenv or log output.
Salvaged from PR #3790 by @memosr. Tests rewritten with correct
assertions (original tests had vacuously true checks).
Co-authored-by: memosr <memosr@users.noreply.github.com>
* add .aac audio file format support to transcription tool
* fix(agent): support full context length resolution for direct Gemini API endpoints
Add generativelanguage.googleapis.com to _URL_TO_PROVIDER so direct
Gemini API users get correct 1M+ context length instead of the 128K
unknown-proxy fallback.
Co-authored-by: bb873 <bb873@users.noreply.github.com>
---------
Co-authored-by: Adrian Scott <adrian@adrianscott.com>
Co-authored-by: bb873 <bb873@users.noreply.github.com>
The auxiliary client's auto-detection chain was a black box — when
compression, summarization, or memory flush failed, the only clue was
a generic 'Request timed out' with no indication of which provider was
tried or why it was skipped.
Now logs at INFO level:
- 'Auxiliary auto-detect: using local/custom (qwen3.5-9b) — skipped:
openrouter, nous' when auto-detection picks a provider
- 'Auxiliary compression: using auto (qwen3.5-9b) at http://localhost:11434/v1'
before each auxiliary call
- 'Auxiliary compression: provider custom unavailable, falling back to
openrouter' on fallback
- Clear warning with actionable guidance when NO provider is available:
'Set OPENROUTER_API_KEY or configure a local model in config.yaml'
Local inference servers (Ollama, llama.cpp, vLLM, LM Studio) don't
require API keys, but the auxiliary client's _resolve_custom_runtime()
rejected endpoints with empty keys — causing the auto-detection chain
to skip the user's local server entirely. This broke compression,
summarization, and memory flush for users running local models without
an OpenRouter/cloud API key.
The main CLI already had this fix (PR #2556, 'no-key-required'
placeholder), but the auxiliary client's resolution path was missed.
Two fixes:
- _resolve_custom_runtime(): use 'no-key-required' placeholder instead
of returning None when base_url is present but key is empty
- resolve_provider_client() custom branch: same placeholder fallback
for explicit_base_url without explicit_api_key
Updates 2 tests that expected the old (broken) behavior.
Tool call previews (paths, commands, queries) were hardcoded to truncate
at 35-40 chars across CLI spinners, completion lines, and gateway progress
messages. Users could not see full file paths in tool output.
New config option: display.tool_preview_length (default 0 = no limit).
Set a positive number to truncate at that length.
Changes:
- display.py: module-level _tool_preview_max_len with getter/setter;
build_tool_preview() and get_cute_tool_message() _trunc/_path respect it
- cli.py: reads config at startup, spinner widget respects config
- gateway/run.py: reads config per-message, progress callback respects config
- run_agent.py: removed redundant 30-char quiet-mode spinner truncation
- config.py: added display.tool_preview_length to DEFAULT_CONFIG
Reported by kriskaminski
Add skills.external_dirs config option — a list of additional directories
to scan for skills alongside ~/.hermes/skills/. External dirs are read-only:
skill creation/editing always writes to the local dir. Local skills take
precedence when names collide.
This lets users share skills across tools/agents without copying them into
Hermes's own directory (e.g. ~/.agents/skills, /shared/team-skills).
Changes:
- agent/skill_utils.py: add get_external_skills_dirs() and get_all_skills_dirs()
- agent/prompt_builder.py: scan external dirs in build_skills_system_prompt()
- tools/skills_tool.py: _find_all_skills() and skill_view() search external dirs;
security check recognizes configured external dirs as trusted
- agent/skill_commands.py: /skill slash commands discover external skills
- hermes_cli/config.py: add skills.external_dirs to DEFAULT_CONFIG
- cli-config.yaml.example: document the option
- tests/agent/test_external_skills.py: 11 tests covering discovery, precedence,
deduplication, and skill_view for external skills
Requested by community member primco.
Background agent's KawaiiSpinner wrote \r-based animation and stop()
messages through StdoutProxy, colliding with prompt_toolkit's status bar.
Two fixes:
- display.py: use isinstance(out, StdoutProxy) instead of fragile
hasattr+name check for detecting prompt_toolkit's stdout wrapper
- cli.py: silence bg agent's raw spinner (_print_fn=no-op) and route
thinking updates through the TUI widget only when no foreground
agent is active; clear spinner text in finally block with same guard
Closes#2718
Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
Salvage of PR #3533 (binhnt92). Follow-up to #3480 — applies min(100, ...) to 5 remaining unclamped percentage display sites in context_compressor, cli /stats, gateway /stats, and memory tool. Defensive clamps now that the root cause (estimation heuristic) was already removed in #3480.
Co-Authored-By: binhnt92 <binhnt92@users.noreply.github.com>
Add per-task timeout settings under auxiliary.{task}.timeout in config.yaml
instead of hardcoded values. Users with slow local models (Ollama, llama.cpp)
can now increase timeouts for compression, vision, session search, etc.
Defaults:
- auxiliary.compression.timeout: 120s (was hardcoded 45s)
- auxiliary.vision.timeout: 30s (unchanged)
- all other aux tasks: 30s (was hardcoded 30s)
- title_generator: 30s (was hardcoded 15s)
call_llm/async_call_llm now auto-resolve timeout from config when not
explicitly passed. Callers can still override with an explicit timeout arg.
Based on PR #3406 by alanfwilliams. Converted from env vars to config.yaml
per project conventions.
Co-authored-by: alanfwilliams <alanfwilliams@users.noreply.github.com>
Use atomic_json_write() from utils.py instead of plain open()/json.dump()
for the models.dev disk cache. Prevents corrupted cache if the process is
killed mid-write — _load_disk_cache() silently returns {} on corrupt JSON,
losing all model metadata until the next successful API fetch.
Co-authored-by: memosr <memosr@users.noreply.github.com>
Cherry-pick of feat/gpt-tool-steering with modifications:
1. Tool-use enforcement prompt (refactored from GPT-specific):
- Renamed GPT_TOOL_USE_GUIDANCE -> TOOL_USE_ENFORCEMENT_GUIDANCE
- Added TOOL_USE_ENFORCEMENT_MODELS tuple: ('gpt', 'codex')
- Injection logic now checks against the tuple instead of hardcoding
'gpt' — adding new model families is a one-line change
- Addresses models describing actions instead of making tool calls
2. Budget warning history stripping:
- _strip_budget_warnings_from_history() strips _budget_warning JSON
keys and [BUDGET WARNING: ...] text from tool results at the start
of run_conversation()
- Prevents old budget warnings from poisoning subsequent turns
Based on PR #3479 by teknium1.
* fix: cap context pressure percentage at 100% in display
The forward-looking token estimate can overshoot the compaction threshold
(e.g. a large tool result pushes it from 70% to 109% in one step). The
progress bar was already capped via min(), but pct_int was not — causing
the user to see '109% to compaction' which is confusing.
Cap pct_int at 100 in both CLI and gateway display functions.
Reported by @JoshExile82.
* refactor: use real API token counts for compression decisions
Replace the rough chars/3 estimation with actual prompt_tokens +
completion_tokens from the API response. The estimation was needed to
predict whether tool results would push context past the threshold, but
the default 50% threshold leaves ample headroom — if tool results push
past it, the next API call reports real usage and triggers compression
then.
This removes all estimation from the compression and context pressure
paths, making both 100% data-driven from provider-reported token counts.
Also removes the dead _msg_count_before_tools variable.
_expand_git_reference() and _rg_files() called subprocess.run()
without a timeout. On a large repository, @diff, @staged, or
@git:N references could hang the agent indefinitely while git
or ripgrep processes slow output.
- Add timeout=30 to git subprocess in _expand_git_reference()
with a user-friendly error message on TimeoutExpired
- Add timeout=10 to rg subprocess in _rg_files() returning
None on timeout (falls back to os.walk folder listing)
Co-authored-by: memosr.eth <96793918+memosr@users.noreply.github.com>
Salvage of #3389 by @binhnt92 with reasoning fallback and retry logic added on top.
All 7 auxiliary LLM call sites now use extract_content_or_reasoning() which mirrors the main agent loop's behavior: extract content, strip think blocks, fall back to structured reasoning fields, retry on empty.
Closes#3389.
Show only agentic models that map to OpenRouter defaults:
Qwen/Qwen3.5-397B-A17B ↔ qwen/qwen3.5-plus
Qwen/Qwen3.5-35B-A3B ↔ qwen/qwen3.5-35b-a3b
deepseek-ai/DeepSeek-V3.2 ↔ deepseek/deepseek-chat
moonshotai/Kimi-K2.5 ↔ moonshotai/kimi-k2.5
MiniMaxAI/MiniMax-M2.5 ↔ minimax/minimax-m2.5
zai-org/GLM-5 ↔ z-ai/glm-5
XiaomiMiMo/MiMo-V2-Flash ↔ xiaomi/mimo-v2-pro
moonshotai/Kimi-K2-Thinking ↔ moonshotai/kimi-k2-thinking
Users can still pick any HF model via Enter custom model name.
The Anthropic adapter defaulted to max_tokens=16384 when no explicit value
was configured. This severely limits thinking-enabled models where thinking
tokens count toward max_tokens:
- Claude Opus 4.6 supports 128K output but was capped at 16K
- Claude Sonnet 4.6 supports 64K output but was capped at 16K
With extended thinking (adaptive or budget-based), the model could exhaust
the entire 16K on reasoning, leaving zero tokens for the actual response.
This caused two user-visible errors:
- 'Response truncated (finish_reason=length)' — thinking consumed most tokens
- 'Response only contains think block with no content' — thinking consumed all
Fix: add _ANTHROPIC_OUTPUT_LIMITS lookup table (sourced from Anthropic docs
and Cline's model catalog) and use the model's actual output limit as the
default. Unknown future models default to 128K (the current maximum).
Also adds context_length clamping: if the user configured a smaller context
window (e.g. custom endpoint), max_tokens is clamped to context_length - 1
to avoid exceeding the window.
Closes#2706
Salvage of PR #1747 (original PR #1171 by @davanstrien) onto current main.
Registers Hugging Face Inference Providers (router.huggingface.co/v1) as a named provider:
- hermes chat --provider huggingface (or --provider hf)
- 18 curated open models via hermes model picker
- HF_TOKEN in ~/.hermes/.env
- OpenAI-compatible endpoint with automatic failover (Groq, Together, SambaNova, etc.)
Files: auth.py, models.py, main.py, setup.py, config.py, model_metadata.py, .env.example, 5 docs pages, 17 new tests.
Co-authored-by: Daniel van Strien <davanstrien@gmail.com>
The OpenAI SDK's AsyncHttpxClientWrapper.__del__ schedules aclose() via
asyncio.get_running_loop().create_task(). When an AsyncOpenAI client is
garbage-collected while prompt_toolkit's event loop is running (the common
CLI idle state), the aclose() task runs on prompt_toolkit's loop but the
underlying TCP transport is bound to a different (dead) worker loop.
The transport's self._loop.call_soon() then raises RuntimeError('Event
loop is closed'), which prompt_toolkit surfaces as the disruptive
'Unhandled exception in event loop ... Press ENTER to continue...' error.
Three-layer fix:
1. neuter_async_httpx_del(): Monkey-patches __del__ to a no-op at CLI
startup before any AsyncOpenAI clients are created. Safe because
cached clients are explicitly cleaned via _force_close_async_httpx,
and uncached clients' TCP connections are cleaned by the OS on exit.
2. Custom asyncio exception handler: Installed on prompt_toolkit's event
loop to silently suppress 'Event loop is closed' RuntimeError.
Defense-in-depth for SDK upgrades that might change the class name.
3. cleanup_stale_async_clients(): Called after each agent turn (when the
agent thread joins) to proactively evict cache entries whose event
loop is closed, preventing stale clients from accumulating.
When user messages have empty content (e.g., Discord @mention-only
messages, unrecognized attachments), the Anthropic API rejects the
request with 'user messages must have non-empty content'.
Changes:
- anthropic_adapter.py: Add empty content validation for user messages
(string and list formats), matching the existing pattern for assistant
and tool messages. Empty content gets '(empty message)' placeholder.
- discord.py: Defense-in-depth check at gateway layer to catch empty
messages before they enter session history.
- Add 4 regression tests covering empty string, whitespace-only,
empty list, and empty text block scenarios.
Fixes#3143
Co-authored-by: Bartok9 <bartok9@users.noreply.github.com>
_try_anthropic() caught ImportError on the module import (line 667-669)
but not on the build_anthropic_client() call (line 696). When the
anthropic_adapter module imports fine but the anthropic SDK is missing,
build_anthropic_client() raises ImportError at call time. This escaped
_try_anthropic() entirely, killing get_available_vision_backends() and
cascading to 7 test failures:
- 4 setup wizard tests hit unexpected 'Configure vision:' prompt
- 3 codex-auth-as-vision tests failed check_vision_requirements()
The fix wraps the build_anthropic_client call in try/except ImportError,
returning (None, None) when the SDK is unavailable — consistent with the
existing guard at the top of the function.
* fix(gateway): silence flush agent terminal output
quiet_mode=True only suppresses AIAgent init messages.
Tool call output still leaks to the terminal through
_safe_print → _print_fn during session reset/expiry.
Since #2670 injected live memory state into the flush prompt,
the flush agent now reliably calls memory tools — making the
output leak noticeable for the first time.
Set _print_fn to a no-op so the background flush is fully silent.
* test(gateway): add test for flush agent terminal silence + fix dotenv mock
- Add TestFlushAgentSilenced: verifies _print_fn is set to a no-op on
the flush agent so tool output never leaks to the terminal
- Fix pre-existing test failures: replace patch('run_agent.AIAgent')
with sys.modules mock to avoid importing run_agent (requires openai)
- Add autouse _mock_dotenv fixture so all tests in this file run
without the dotenv package installed
* fix(display): route KawaiiSpinner output through print_fn to fully silence flush agent
The previous fix set tmp_agent._print_fn = no-op on the flush agent but
spinner output and quiet-mode cute messages bypassed _print_fn entirely:
- KawaiiSpinner captured sys.stdout at __init__ and wrote directly to it
- quiet-mode tool results used builtin print() instead of _safe_print()
Add optional print_fn parameter to KawaiiSpinner.__init__; _write routes
through it when set. Pass self._print_fn to all spinner construction sites
in run_agent.py and change the quiet-mode cute message print to _safe_print.
The existing gateway fix (tmp_agent._print_fn = lambda) now propagates
correctly through both paths.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(gateway): silence hygiene and compression background agents
Two more background AIAgent instances in the gateway were created with
quiet_mode=True but without _print_fn = no-op, causing tool output to
leak to the terminal:
- _hyg_agent (in-turn hygiene memory agent)
- tmp_agent (_compress_context path)
Apply the same _print_fn no-op pattern used for the flush agent.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* chore(display): remove unused _last_flush_time from KawaiiSpinner
Attribute was set but never read; upstream already removed it.
Leftover from conflict resolution during rebase onto upstream/main.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Dilee <uzmpsk.dilekakbas@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
- add managed modal and gateway-backed tool integrations\n- improve CLI setup, auth, and configuration for subscriber flows\n- expand tests and docs for managed tool support
Nous Portal now passes through OpenRouter model names and routes from
there. Update the static fallback model list and auxiliary client default
to use OpenRouter-format slugs (provider/model) instead of bare names.
- _PROVIDER_MODELS['nous']: full OpenRouter catalog
- _NOUS_MODEL: google/gemini-3-flash-preview (was gemini-3-flash)
- Updated 4 test assertions for the new default model name
Anthropic migrated their OAuth infrastructure from console.anthropic.com
to platform.claude.com (Claude Code v2.1.81+). Update _refresh_oauth_token()
to try the new endpoint first, falling back to the old one for tokens
issued before the migration.
Also switches Content-Type from application/x-www-form-urlencoded to
application/json to match current Claude Code behavior.
Salvaged from PR #2741 by kshitijk4poor.
Two improvements salvaged from PR #2600 (paraddox):
1. Preflight compression now counts tool schema tokens alongside system
prompt and messages. With 50+ tools enabled, schemas can add 20-30K
tokens that were previously invisible to the estimator, delaying
compression until the API rejected the request.
2. Context probe persistence guard: when the agent steps down context
tiers after a context-length error, only provider-confirmed numeric
limits (parsed from the error message) are cached to disk. Guessed
fallback tiers from get_next_probe_tier() stay in-memory only,
preventing wrong values from polluting the persistent cache.
Co-authored-by: paraddox <paraddox@users.noreply.github.com>
Three categories of cleanup, all zero-behavioral-change:
1. F-strings without placeholders (154 fixes across 29 files)
- Converted f'...' to '...' where no {expression} was present
- Heaviest files: run_agent.py (24), cli.py (20), honcho_integration/cli.py (34)
2. Simplify defensive patterns in run_agent.py
- Added explicit self._is_anthropic_oauth = False in __init__ (before
the api_mode branch that conditionally sets it)
- Replaced 7x getattr(self, '_is_anthropic_oauth', False) with direct
self._is_anthropic_oauth (attribute always initialized now)
- Added _is_openrouter_url() and _is_anthropic_url() helper methods
- Replaced 3 inline 'openrouter' in self._base_url_lower checks
3. Remove dead code in small files
- hermes_cli/claw.py: removed unused 'total' computation
- tools/fuzzy_match.py: removed unused strip_indent() function and
pattern_stripped variable
Full test suite: 6184 passed, 0 failures
E2E PTY: banner clean, tool calls work, zero garbled ANSI
The recursive os.walk for AGENTS.md in subdirectories was undesired.
Only load AGENTS.md from the working directory root, matching the
behavior of CLAUDE.md and .cursorrules.
Remove run_hermes_oauth_login(), refresh_hermes_oauth_token(),
read_hermes_oauth_credentials(), _save_hermes_oauth_credentials(),
_generate_pkce(), and associated constants/credential file path.
This code was added in 63e88326 but never wired into any user-facing
flow (setup wizard, hermes model, or any CLI command). Neither
clawdbot/OpenClaw nor opencode implement PKCE for Anthropic — both
use setup-token or API keys. Dead code that was never tested in
production.
Also removes the credential resolution step that checked
~/.hermes/.anthropic_oauth.json (step 3 in resolve_anthropic_token),
renumbering remaining steps.
In gateway mode, async tools (vision_analyze, web_extract, session_search)
deadlock because _run_async() spawns a thread with asyncio.run(), creating
a new event loop, but _get_cached_client() returns an AsyncOpenAI client
bound to a different loop. httpx.AsyncClient cannot work across event loop
boundaries, causing await client.chat.completions.create() to hang forever.
Fix: include the event loop identity in the async client cache key so each
loop gets its own AsyncOpenAI instance. Also fix session_search_tool.py
which had its own broken asyncio.run()-in-thread pattern — now uses the
centralized _run_async() bridge.
frontmatter.get("metadata", {}) returns None (not {}) when the
key exists with a null value, crashing build_skills_system_prompt
with AttributeError: 'NoneType' object has no attribute 'get'.
Made-with: Cursor
Centralizes two widely-duplicated patterns into hermes_constants.py:
1. get_hermes_home() — Path resolution for ~/.hermes (HERMES_HOME env var)
- Was copy-pasted inline across 30+ files as:
Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
- Now defined once in hermes_constants.py (zero-dependency module)
- hermes_cli/config.py re-exports it for backward compatibility
- Removed local wrapper functions in honcho_integration/client.py,
tools/website_policy.py, tools/tirith_security.py, hermes_cli/uninstall.py
2. parse_reasoning_effort() — Reasoning effort string validation
- Was copy-pasted in cli.py, gateway/run.py, cron/scheduler.py
- Same validation logic: check against (xhigh, high, medium, low, minimal, none)
- Now defined once in hermes_constants.py, called from all 3 locations
- Warning log for unknown values kept at call sites (context-specific)
31 files changed, net +31 lines (125 insertions, 94 deletions)
Full test suite: 6179 passed, 0 failed
In gateway/Telegram mode, the stdout fd can be closed by executor
thread cleanup. KawaiiSpinner.stop() called isatty() on the closed fd,
raising ValueError and masking the original error.
Instead of a point fix, add a _is_tty property that centralizes the
closed-stream guard — both _animate() and stop() now use it. Follows
the same (ValueError, OSError) pattern already in _write().
Inspired by PR #2632 by bot-deo88.
format_token_count_compact() used unconditional rstrip("0") to clean up
decimal trailing zeros (e.g. "1.50" → "1.5"), but this also stripped
meaningful trailing zeros from whole numbers ("260" → "26", "100" → "1").
Guard the strip behind a decimal-point check.
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
When the CLI is active, sys.stdout is prompt_toolkit's StdoutProxy which
queues writes and injects newlines around each flush(). This causes every
\r spinner frame to land on its own line instead of overwriting the
previous one, producing visible flickering where the spinner and status
bar repeatedly swap positions.
The CLI already renders spinner state via a dedicated TUI widget
(_spinner_text / get_spinner_text), so KawaiiSpinner's \r-based loop is
redundant under StdoutProxy. Detect the proxy and suppress the animation
entirely — the thread still runs to preserve start()/stop() semantics.
Also removes the 0.4s flush rate-limit workaround that was papering over
the same issue, and cleans up the unused _last_flush_time attribute.
Salvaged from PR #2908 by Mibayy (fixed _raw -> raw detection, dropped
unrelated bundled changes).
build_skills_system_prompt() was calling _read_skill_conditions() which
re-read each SKILL.md file to extract conditional activation fields.
The frontmatter was already parsed by _parse_skill_file() earlier in
the same loop. Extract conditions inline from the existing frontmatter
dict instead, saving one file read per skill (~80+ on a typical setup).
Salvaged from PR #2827 by InB4DevOps.
- threshold: 0.80 → 0.50 (compress at 50%, not 80%)
- target_ratio: 0.40 → 0.20, now relative to threshold not total context
(20% of 50% = 10% of context as tail budget)
- summary ceiling: 32K → 12K (Gemini can't output more than ~12K)
- Updated DEFAULT_CONFIG, config display, example config, and tests
The summary_target_tokens parameter was accepted in the constructor,
stored on the instance, and never used — the summary budget was always
computed from hardcoded module constants (_SUMMARY_RATIO=0.20,
_MAX_SUMMARY_TOKENS=8000). This caused two compounding problems:
1. The config value was silently ignored, giving users no control
over post-compression size.
2. Fixed budgets (20K tail, 8K summary cap) didn't scale with
context window size. Switching from a 1M-context model to a
200K model would trigger compression that nuked 350K tokens
of conversation history down to ~30K.
Changes:
- Replace summary_target_tokens with summary_target_ratio (default 0.40)
which sets the post-compression target as a fraction of context_length.
Tail token budget and summary cap now scale proportionally:
MiniMax 200K → ~80K post-compression
GPT-5 1M → ~400K post-compression
- Change threshold_percent default: 0.50 → 0.80 (don't fire until
80% of context is consumed)
- Change protect_last_n default: 4 → 20 (preserve ~10 full turns)
- Summary token cap scales to 5% of context (was fixed 8K), capped
at 32K ceiling
- Read target_ratio and protect_last_n from config.yaml compression
section (both are now configurable)
- Remove hardcoded summary_target_tokens=500 from run_agent.py
- Add 5 new tests for ratio scaling, clamping, and new defaults
Move OpenRouter to position 1 in the setup wizard's provider list
to match hermes model ordering. Update default selection index and
fix test expectations for the new ordering.
Setup order: OpenRouter → Nous Portal → Codex → Custom → ...
When AsyncOpenAI clients are garbage-collected after the event loop
closes, their AsyncHttpxClientWrapper.__del__ tries to schedule
aclose() on the dead loop, causing RuntimeError: Event loop is closed.
prompt_toolkit catches this as an unhandled exception and shows
'Press ENTER to continue...' which blocks CLI exit.
Fix: Add shutdown_cached_clients() to auxiliary_client.py that marks
all cached async clients' underlying httpx transport as CLOSED before
GC runs. This prevents __del__ from attempting the aclose() call.
- _force_close_async_httpx(): sets httpx AsyncClient._state to CLOSED
- shutdown_cached_clients(): iterates _client_cache, closes sync clients
normally and marks async clients as closed
- Also fix stale client eviction in _get_cached_client to mark evicted
async clients as closed (was just del-ing them, triggering __del__)
- Call shutdown_cached_clients() from _run_cleanup() in cli.py
The context length resolver was querying the /models endpoint for known
providers like GitHub Copilot, which returns a provider-imposed limit
(128k) instead of the model's actual context window (400k for gpt-5.4).
Since this check happened before the models.dev lookup, the wrong value
won every time.
Fix:
- Add api.githubcopilot.com and models.github.ai to _URL_TO_PROVIDER
- Skip the endpoint metadata probe for known providers — their /models
data is unreliable for context length. models.dev has the correct
per-provider values.
Reported by danny [DUMB] — gpt-5.4 via Copilot was resolving to 128k
instead of the correct 400k from models.dev.
When a non-OpenRouter provider (e.g. minimax, anthropic) is set in
config.yaml but its API key is missing, Hermes silently fell back to
OpenRouter, causing confusing 404 errors.
Now checks if the user explicitly configured a provider before falling
back. Explicit providers raise RuntimeError with a clear message naming
the missing env var. Auto/openrouter/custom providers still fall through
to OpenRouter as before.
Three code paths fixed:
- run_agent.py AIAgent.__init__ — main client initialization
- auxiliary_client.py call_llm — sync auxiliary calls
- auxiliary_client.py call_llm_streaming — async auxiliary calls
Based on PR #2272 by @StefanIsMe. Applied manually to fix a
pconfig NameError in the original and extend to call_llm_streaming.
Co-authored-by: StefanIsMe <StefanIsMe@users.noreply.github.com>
Recent versions of llama.cpp moved the server properties endpoint from
/props to /v1/props (consistent with the /v1 API prefix convention).
The server-type detection path and the n_ctx reading path both used the
old /props URL, which returns 404 on current builds. This caused the
allocated context window size to fall back to a hardcoded default,
resulting in an incorrect (too small) value being displayed in the TUI
context bar.
Fix: try /v1/props first, fall back to /props for backward compatibility
with older llama.cpp builds. Both paths are now handled gracefully.
Two bugs in the auxiliary provider auto-detection chain:
1. Expired Codex JWT blocks the auto chain: _read_codex_access_token()
returned any stored token without checking expiry, preventing fallback
to working providers. Now decodes JWT exp claim and returns None for
expired tokens.
2. Auxiliary Anthropic client missing OAuth identity transforms:
_AnthropicCompletionsAdapter always called build_anthropic_kwargs with
is_oauth=False, causing 400 errors for OAuth tokens. Now detects OAuth
tokens via _is_oauth_token() and propagates the flag through the
adapter chain.
Cherry-picked from PR #2378 by 0xbyt4. Fixed test_api_key_no_oauth_flag
to mock resolve_anthropic_token directly (env var alone was insufficient).
redact_sensitive_text() now returns early for None and coerces other
non-string values to str before applying regex-based redaction,
preventing TypeErrors in logging/tool-output paths.
Cherry-picked from PR #2369 by aydnOktay.
On the native Anthropic Messages API path, convert_messages_to_anthropic()
moves top-level cache_control on role:tool messages inside the tool_result
block. On OpenRouter (chat_completions), no such conversion happens — the
unexpected top-level field causes a silent hang on the second tool call.
Add native_anthropic parameter to _apply_cache_marker() and
apply_anthropic_cache_control(). When False (OpenRouter), role:tool messages
are skipped entirely. When True (native Anthropic), existing behaviour is
preserved.
Fixes#2362
Only honor config.model.base_url for Anthropic resolution when
config.model.provider is actually "anthropic". This prevents a Codex
(or other provider) base_url from leaking into Anthropic runtime and
auxiliary client paths, which would send requests to the wrong
endpoint.
Closes#2384
Add @file:path, @folder:dir, @diff, @staged, @git:N, and @url:
references that expand inline before the message reaches the LLM.
Supports line ranges (@file:main.py:10-50), token budget enforcement
(soft warn at 25%, hard block at 50%), and path sandboxing for gateway.
Core module from PR #2090 by @kshitijk4poor. CLI and gateway wiring
rewritten against current main. Fixed asyncio.run() crash when called
from inside a running event loop (gateway).
Closes#682.
Two fixes for local model context detection:
1. Hardcoded DEFAULT_CONTEXT_LENGTHS matching was case-sensitive.
'qwen' didn't match 'Qwen3.5-9B-Q4_K_M.gguf' because of the
capital Q. Now uses model.lower() for comparison.
2. Added compressor initialization logging showing the detected
context_length, threshold, model, provider, and base_url.
This makes turn-1 compression bugs diagnosable from logs —
previously there was no log of what context length was detected.
When using Alibaba (DashScope) with an anthropic-compatible endpoint,
model names like qwen3.5-plus were being normalized to qwen3-5-plus.
Alibaba's API expects the dot. Added preserve_dots parameter to
normalize_model_name() and build_anthropic_kwargs().
Also fixed 401 auth: when provider is alibaba or base_url contains
dashscope/aliyuncs, use only the resolved API key (DASHSCOPE_API_KEY).
Never fall back to resolve_anthropic_token(), and skip Anthropic
credential refresh for DashScope endpoints.
Cherry-picked from PR #1748 by crazywriter1. Fixes#1739.
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
Previously, all project context files (AGENTS.md, .cursorrules, .hermes.md)
were loaded and concatenated into the system prompt. This bloated the prompt
with potentially redundant or conflicting instructions.
Now only ONE project context type is loaded, using priority order:
1. .hermes.md / HERMES.md (walk to git root)
2. AGENTS.md / agents.md (recursive directory walk)
3. CLAUDE.md / claude.md (cwd only, NEW)
4. .cursorrules / .cursor/rules/*.mdc (cwd only)
SOUL.md from HERMES_HOME remains independent and always loads.
Also adds CLAUDE.md as a recognized context file format, matching the
convention popularized by Claude Code.
Refactored the monolithic function into four focused helpers:
_load_hermes_md, _load_agents_md, _load_claude_md, _load_cursorrules.
Tests: replaced 1 coexistence test with 10 new tests covering priority
ordering, CLAUDE.md loading, case sensitivity, injection blocking.
In Docker/systemd/piped environments, the KawaiiSpinner animation
generates ~500 log lines per tool call. Now checks isatty() and
falls back to clean [tool]/[done] log lines in non-TTY contexts.
Interactive CLI behavior unchanged.
Based on work by 42-evey in PR #2203.
The official international DashScope endpoint uses dashscope-intl.aliyuncs.com
(per Alibaba docs), which the substring match on dashscope.aliyuncs.com misses
because of the hyphenated prefix.
If a tool_calls list contains a None entry (from malformed API response,
compression artifact, or corrupt session replay), convert_messages_to_anthropic
crashes with AttributeError: 'NoneType' object has no attribute 'get'.
Skip None and non-dict entries in the tool_calls iteration. Found via
chaos/fuzz testing with mixed valid/invalid tool_call entries.
Custom endpoint users (DashScope/Alibaba, Z.AI, Kimi, DeepSeek, etc.)
get wrong context lengths because their provider resolves as "openrouter"
or "custom", skipping the models.dev lookup entirely. For example,
qwen3.5-plus on DashScope falls to the generic "qwen" hardcoded default
(131K) instead of the correct 1M.
Add _infer_provider_from_url() that maps known API hostnames to their
models.dev provider IDs. When the explicit provider is generic
(openrouter/custom/empty), infer from the base URL before the models.dev
lookup. This resolves context lengths correctly for DashScope, Z.AI,
Kimi, MiniMax, DeepSeek, and Nous endpoints without requiring users to
manually set context_length in config.
Also refactors _is_known_provider_base_url() to use the same URL mapping,
removing the duplicated hostname list.
Cherry-picked from PR #2146 by @crazywriter1. Fixes#2104.
asyncio.run() creates and closes a fresh event loop each call. Cached
httpx/AsyncOpenAI clients bound to the dead loop crash on GC with
'Event loop is closed'. This hit vision_analyze on first use in CLI.
Two-layer fix:
- model_tools._run_async(): replace asyncio.run() with persistent
loop via _get_tool_loop() + run_until_complete()
- auxiliary_client._get_cached_client(): track which loop created
each async client, discard stale entries if loop is closed
6 regression tests covering loop lifecycle, reuse, and full vision
dispatch chain.
Co-authored-by: Test <test@test.com>
Cherry-picked from PR #2169 by @0xbyt4.
1. _strip_provider_prefix: skip Ollama model:tag names (qwen:0.5b)
2. Fuzzy match: remove reverse direction that made claude-sonnet-4
resolve to 1M instead of 200K
3. _has_content_after_think_block: reuse _strip_think_blocks() to
handle all tag variants (thinking, reasoning, REASONING_SCRATCHPAD)
4. models.dev lookup: elif→if so nous provider also queries models.dev
5. Disk cache fallback: use 5-min TTL instead of full hour so network
is retried soon
6. Delegate build: wrap child construction in try/finally so
_last_resolved_tool_names is always restored on exception
Two fixes for Telegram/gateway-specific bugs:
1. Anthropic adapter: strip orphaned tool_result blocks (mirror of
existing tool_use stripping). Context compression or session
truncation can remove an assistant message containing a tool_use
while leaving the subsequent tool_result intact. Anthropic rejects
these with a 400: 'unexpected tool_use_id found in tool_result
blocks'. The adapter now collects all tool_use IDs and filters out
any tool_result blocks referencing IDs not in that set.
2. Gateway: /reset and /new now bypass the running-agent guard (like
/status already does). Previously, sending /reset while an agent
was running caused the raw text to be queued and later fed back as
a user message with the same broken history — replaying the
corrupted session instead of resetting it. Now the running agent is
interrupted, pending messages are cleared, and the reset command
dispatches immediately.
Tests updated: existing tests now include proper tool_use→tool_result
pairs; two new tests cover orphaned tool_result stripping.
Co-authored-by: Test <test@test.com>
* feat: context pressure warnings for CLI and gateway
User-facing notifications as context approaches the compaction threshold.
Warnings fire at 60% and 85% of the way to compaction — relative to
the configured compression threshold, not the raw context window.
CLI: Formatted line with a progress bar showing distance to compaction.
Cyan at 60% (approaching), bold yellow at 85% (imminent).
◐ context ▰▰▰▰▰▰▰▰▰▰▰▰▱▱▱▱▱▱▱▱ 60% to compaction 100k threshold (50%) · approaching compaction
⚠ context ▰▰▰▰▰▰▰▰▰▰▰▰▰▰▰▰▰▱▱▱ 85% to compaction 100k threshold (50%) · compaction imminent
Gateway: Plain-text notification sent to the user's chat via the new
status_callback mechanism (asyncio.run_coroutine_threadsafe bridge,
same pattern as step_callback).
Does NOT inject into the message stream. The LLM never sees these
warnings. Flags reset after each compaction cycle.
Files changed:
- agent/display.py — format_context_pressure(), format_context_pressure_gateway()
- run_agent.py — status_callback param, _context_50/70_warned flags,
_emit_context_pressure(), flag reset in _compress_context()
- gateway/run.py — _status_callback_sync bridge, wired to AIAgent
- tests/test_context_pressure.py — 23 tests
* Merge remote-tracking branch 'origin/main' into hermes/hermes-7ea545bf
---------
Co-authored-by: Test <test@test.com>
Replace the fragile hardcoded context length system with a multi-source
resolution chain that correctly identifies context windows per provider.
Key changes:
- New agent/models_dev.py: Fetches and caches the models.dev registry
(3800+ models across 100+ providers with per-provider context windows).
In-memory cache (1hr TTL) + disk cache for cold starts.
- Rewritten get_model_context_length() resolution chain:
0. Config override (model.context_length)
1. Custom providers per-model context_length
2. Persistent disk cache
3. Endpoint /models (local servers)
4. Anthropic /v1/models API (max_input_tokens, API-key only)
5. OpenRouter live API (existing, unchanged)
6. Nous suffix-match via OpenRouter (dot/dash normalization)
7. models.dev registry lookup (provider-aware)
8. Thin hardcoded defaults (broad family patterns)
9. 128K fallback (was 2M)
- Provider-aware context: same model now correctly resolves to different
context windows per provider (e.g. claude-opus-4.6: 1M on Anthropic,
128K on GitHub Copilot). Provider name flows through ContextCompressor.
- DEFAULT_CONTEXT_LENGTHS shrunk from 80+ entries to ~16 broad patterns.
models.dev replaces the per-model hardcoding.
- CONTEXT_PROBE_TIERS changed from [2M, 1M, 512K, 200K, 128K, 64K, 32K]
to [128K, 64K, 32K, 16K, 8K]. Unknown models no longer start at 2M.
- hermes model: prompts for context_length when configuring custom
endpoints. Supports shorthand (32k, 128K). Saved to custom_providers
per-model config.
- custom_providers schema extended with optional models dict for
per-model context_length (backward compatible).
- Nous Portal: suffix-matches bare IDs (claude-opus-4-6) against
OpenRouter's prefixed IDs (anthropic/claude-opus-4.6) with dot/dash
normalization. Handles all 15 current Nous models.
- Anthropic direct: queries /v1/models for max_input_tokens. Only works
with regular API keys (sk-ant-api*), not OAuth tokens. Falls through
to models.dev for OAuth users.
Tests: 5574 passed (18 new tests for models_dev + updated probe tiers)
Docs: Updated configuration.md context length section, AGENTS.md
Co-authored-by: Test <test@test.com>
Cron jobs run unattended with no user present. Previously the agent had
send_message and clarify tools available, which makes no sense — the
final response is auto-delivered, and there's nobody to ask questions to.
Changes:
- Disable messaging and clarify toolsets for cron agent sessions
- Update cron platform hint to emphasize autonomous execution: no user
present, cannot ask questions, must execute fully and make decisions
- Update cronjob tool schema description to match (remove stale
send_message guidance)
* fix: preserve Ollama model:tag colons in context length detection
The colon-split logic in get_model_context_length() and
_query_local_context_length() assumed any colon meant provider:model
format (e.g. "local:my-model"). But Ollama uses model:tag format
(e.g. "qwen3.5:27b"), so the split turned "qwen3.5:27b" into just
"27b" — which matches nothing, causing a fallback to the 2M token
probe tier.
Now only recognised provider prefixes (local, openrouter, anthropic,
etc.) are stripped. Ollama model:tag names pass through intact.
* fix: update claude-opus-4-6 and claude-sonnet-4-6 context length from 200K to 1M
Both models support 1,000,000 token context windows. The hardcoded defaults
were set before Anthropic expanded the context for the 4.6 generation.
Verified via models.dev and OpenRouter API data.
---------
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Co-authored-by: Test <test@test.com>
The colon-split logic in get_model_context_length() and
_query_local_context_length() assumed any colon meant provider:model
format (e.g. "local:my-model"). But Ollama uses model:tag format
(e.g. "qwen3.5:27b"), so the split turned "qwen3.5:27b" into just
"27b" — which matches nothing, causing a fallback to the 2M token
probe tier.
Now only recognised provider prefixes (local, openrouter, anthropic,
etc.) are stripped. Ollama model:tag names pass through intact.
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Custom endpoints (LM Studio, Ollama, vLLM, llama.cpp) silently fall
back to 2M tokens when /v1/models doesn't include context_length.
Adds _query_local_context_length() which queries server-specific APIs:
- LM Studio: /api/v1/models (max_context_length + loaded instances)
- Ollama: /api/show (model_info + num_ctx parameters)
- llama.cpp: /props (n_ctx from default_generation_settings)
- vLLM: /v1/models/{model} (max_model_len)
Prefers loaded instance context over max (e.g., 122K loaded vs 1M max).
Results are cached via save_context_length() to avoid repeated queries.
Also fixes detect_local_server_type() misidentifying LM Studio as
Ollama (LM Studio returns 200 for /api/tags with an error body).
When LM Studio has a model loaded with a custom context size (e.g.,
122K), prefer that over the model's max_context_length (e.g., 1M).
This makes the TUI status bar show the actual runtime context window.
Instead of defaulting to 2M for unknown local models, query the server
API for the real context length. Supports Ollama (/api/show), vLLM
(max_model_len), and LM Studio (/v1/models). Results are cached to
avoid repeated queries.
Closes#1911
- insights.py: Pre-compute SELECT queries as class constants instead of
f-string interpolation at runtime. _SESSION_COLS is now evaluated once
at class definition time.
- hermes_state.py: Add identifier quoting and whitelist validation for
ALTER TABLE column names in schema migrations.
- Add 4 tests verifying no injection vectors in SQL query construction.
* fix: detect context length for custom model endpoints via fuzzy matching + config override
Custom model endpoints (non-OpenRouter, non-known-provider) were silently
falling back to 2M tokens when the model name didn't exactly match what the
endpoint's /v1/models reported. This happened because:
1. Endpoint metadata lookup used exact match only — model name mismatches
(e.g. 'qwen3.5:9b' vs 'Qwen3.5-9B-Q4_K_M.gguf') caused a miss
2. Single-model servers (common for local inference) required exact name
match even though only one model was loaded
3. No user escape hatch to manually set context length
Changes:
- Add fuzzy matching for endpoint model metadata: single-model servers
use the only available model regardless of name; multi-model servers
try substring matching in both directions
- Add model.context_length config override (highest priority) so users
can explicitly set their model's context length in config.yaml
- Log an informative message when falling back to 2M probe, telling
users about the config override option
- Thread config_context_length through ContextCompressor and AIAgent init
Tests: 6 new tests covering fuzzy match, single-model fallback, config
override (including zero/None edge cases).
* fix: auto-detect local model name and context length for local servers
Cherry-picked from PR #2043 by sudoingX.
- Auto-detect model name from local server's /v1/models when only one
model is loaded (no manual model name config needed)
- Add n_ctx_train and n_ctx to context length detection keys for llama.cpp
- Query llama.cpp /props endpoint for actual allocated context (not just
training context from GGUF metadata)
- Strip .gguf suffix from display in banner and status bar
- _auto_detect_local_model() in runtime_provider.py for CLI init
Co-authored-by: sudo <sudoingx@users.noreply.github.com>
* fix: revert accidental summary_target_tokens change + add docs for context_length config
- Revert summary_target_tokens from 2500 back to 500 (accidental change
during patching)
- Add 'Context Length Detection' section to Custom & Self-Hosted docs
explaining model.context_length config override
---------
Co-authored-by: Test <test@test.com>
Co-authored-by: sudo <sudoingx@users.noreply.github.com>
After #1675 removed ANTHROPIC_BASE_URL env var support, the Anthropic
provider base URL was hardcoded to https://api.anthropic.com. Now reads
model.base_url from config.yaml as an override, falling back to the
default when not set. Also applies to the auxiliary client.
Cherry-picked from PR #1949 by @rivercrab26.
Co-authored-by: rivercrab26 <rivercrab26@users.noreply.github.com>
_align_boundary_backward only checked messages[idx-1] to decide if
the compress-end boundary splits a tool_call/result group. When an
assistant issues 3+ parallel tool calls, their results span multiple
consecutive messages. If the boundary fell in the middle of that group,
the parent assistant was summarized away and orphaned tool results were
silently deleted by _sanitize_tool_pairs.
Now walks backward through all consecutive tool results to find the
parent assistant, then pulls the boundary before the entire group.
6 regression tests added in tests/test_compression_boundary.py.
Co-authored-by: Guts <Gutslabs@users.noreply.github.com>
SOUL.md now loads in slot #1 of the system prompt, replacing the
hardcoded DEFAULT_AGENT_IDENTITY. This lets users fully customize
the agent's identity and personality by editing ~/.hermes/SOUL.md
without it conflicting with the built-in identity text.
When SOUL.md is loaded as identity, it's excluded from the context
files section to avoid appearing twice. When SOUL.md is missing,
empty, unreadable, or skip_context_files is set, the hardcoded
DEFAULT_AGENT_IDENTITY is used as a fallback.
The default SOUL.md (seeded on first run) already contains the full
Hermes personality, so existing installs are unaffected.
Co-authored-by: Test <test@test.com>
* fix: banner skill count now respects disabled skills and platform filtering
The banner's get_available_skills() was doing a raw rglob scan of
~/.hermes/skills/ without checking:
- Whether skills are disabled (skills.disabled config)
- Whether skills match the current platform (platforms: frontmatter)
This caused the banner to show inflated skill counts (e.g. '100 skills'
when many are disabled) and list macOS-only skills on Linux.
Fix: delegate to _find_all_skills() from tools/skills_tool which already
handles both platform gating and disabled-skill filtering.
* fix: system prompt and slash commands now respect disabled skills
Two more places where disabled skills were still surfaced:
1. build_skills_system_prompt() in prompt_builder.py — disabled skills
appeared in the <available_skills> system prompt section, causing
the agent to suggest/load them despite being disabled.
2. scan_skill_commands() in skill_commands.py — disabled skills still
registered as /skill-name slash commands in CLI help and could be
invoked.
Both now load _get_disabled_skill_names() and filter accordingly.
* fix: skill_view blocks disabled skills
skill_view() checked platform compatibility but not disabled state,
so the agent could still load and read disabled skills directly.
Now returns a clear error when a disabled skill is requested, telling
the user to enable it via hermes skills or inspect the files manually.
---------
Co-authored-by: Test <test@test.com>
* perf: cache base_url.lower() via property, consolidate triple load_config(), hoist set constant
run_agent.py:
- Add base_url property that auto-caches _base_url_lower on every
assignment, eliminating 12+ redundant .lower() calls per API cycle
across __init__, _build_api_kwargs, _supports_reasoning_extra_body,
and the main conversation loop
- Consolidate three separate load_config() disk reads in __init__
(memory, skills, compression) into a single call, reusing the
result dict for all three config sections
model_tools.py:
- Hoist _READ_SEARCH_TOOLS set to module level (was rebuilt inside
handle_function_call on every tool invocation)
* Use endpoint metadata for custom model context and pricing
---------
Co-authored-by: kshitij <82637225+kshitijk4poor@users.noreply.github.com>
MiniMax: Add M2.7 and M2.7-highspeed as new defaults across provider
model lists, auxiliary client, metadata, setup wizard, RL training tool,
fallback tests, and docs. Retain M2.5/M2.1 as alternatives.
OpenRouter: Add grok-4.20-beta, nemotron-3-super-120b-a12b:free,
trinity-large-preview:free, glm-5-turbo, and hunter-alpha to the
model catalog.
MiniMax changes based on PR #1882 by @octo-patch (applied manually
due to stale conflicts in refactored pricing module).
Add first-class GitHub Copilot and Copilot ACP provider support across
model selection, runtime provider resolution, CLI sessions, delegated
subagents, cron jobs, and the Telegram gateway.
This also normalizes Copilot model catalogs and API modes, introduces a
Copilot ACP OpenAI-compatible shim, and fixes service-mode auth by
resolving Homebrew-installed gh binaries under launchd.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replaces all remaining print() calls in compress() with logger.info()
and logger.warning() for consistency with the rest of the module.
Inspired by PR #1822.
compress() checks both the head and tail neighbors when choosing the
summary message role. When only the tail collides, the role is flipped.
When BOTH roles would create consecutive same-role messages (e.g.
head=assistant, tail=user), the summary is merged into the first tail
message instead of inserting a standalone message that breaks role
alternation and causes API 400 errors.
The previous code handled head-side collision but left the tail-side
uncovered — long conversations would crash mid-reply with no useful
error, forcing the user to /reset and lose session history.
Based on PR #1186 by @alireza78a, with improved double-collision
handling (merge into tail instead of unconditional 'user' fallback).
Co-authored-by: alireza78a <alireza78.crypto@gmail.com>
- Add summary_base_url config option to compression block for custom
OpenAI-compatible endpoints (e.g. zai, DeepSeek, Ollama)
- Remove compression env var bridges from cli.py and gateway/run.py
(CONTEXT_COMPRESSION_* env vars no longer set from config)
- Switch run_agent.py to read compression config directly from
config.yaml instead of env vars
- Fix backwards-compat block in _resolve_task_provider_model to also
fire when auxiliary.compression.provider is 'auto' (DEFAULT_CONFIG
sets this, which was silently preventing the compression section's
summary_* keys from being read)
- Add test for summary_base_url config-to-client flow
- Update docs to show compression as config.yaml-only
Closes#1591
Based on PR #1702 by @uzaylisak
Four small fixes:
1. model_tools.py: Tool import failures logged at WARNING instead of
DEBUG. If a tool module fails to import (syntax error, missing dep),
the user now sees a warning instead of the tool silently vanishing.
2. hermes_cli/config.py: Remove duplicate 'import sys' (lines 19, 21).
3. agent/model_metadata.py: Remove 6 duplicate entries in
DEFAULT_CONTEXT_LENGTHS dict. Python keeps the last value, so no
functional change, but removes maintenance confusion.
4. hermes_state.py: Add missing self._lock to the LIKE query in
resolve_session_id(). The exact-match path used get_session()
(which locks internally), but the prefix fallback queried _conn
without the lock.
Adds .hermes.md / HERMES.md discovery for per-project agent configuration.
When the agent starts, it walks from cwd to the git root looking for
.hermes.md (preferred) or HERMES.md, strips any YAML frontmatter, and
injects the markdown body into the system prompt as project context.
- Nearest-first discovery (subdirectory configs shadow parent)
- Stops at git root boundary (no leaking into parent repos)
- YAML frontmatter stripped (structured config deferred to Phase 2)
- Same injection scanning and 20K truncation as other context files
- 22 comprehensive tests
Original implementation by ch3ronsa. Cherry-picked and adapted for current main.
Closes#681 (Phase 1)
After the first user→assistant exchange, Hermes now generates a short
descriptive session title via the auxiliary LLM (compression task config).
Title generation runs in a background thread so it never delays the
user-facing response.
Key behaviors:
- Fires only on the first 1-2 exchanges (checks user message count)
- Skips if a title already exists (user-set titles are never overwritten)
- Uses call_llm with compression task config (cheapest/fastest model)
- Truncates long messages to keep the title generation request small
- Cleans up LLM output: strips quotes, 'Title:' prefixes, enforces 80 char max
- Works in both CLI and gateway (Telegram/Discord/etc.)
Also updates /title (no args) to show the session ID alongside the title
in both CLI and gateway.
Implements #1426
The fuzzy match for model context lengths iterated dict insertion
order. Shorter model names (e.g. 'gpt-5') could match before more
specific ones (e.g. 'gpt-5.4-pro'), returning the wrong context
length.
Sort by key length descending so more specific model names always
match first.
The summary message role was determined only by the last head message,
ignoring the first tail message. This could create consecutive user
messages (rejected by Anthropic) when the tail started with 'user'.
Now checks both neighbors. Priority: avoid colliding with the head
(already committed). If the chosen role also collides with the tail,
flip it — but only if flipping wouldn't re-collide with the head.
When tool_choice was 'none', the code did 'pass' — no tool_choice
was sent but tools were still included in the request. Anthropic
defaults to 'auto' when tools are present, so the model could still
call tools despite the caller requesting 'none'.
Fix: omit tools entirely from the request when tool_choice is 'none',
which is the only way to prevent tool use with the Anthropic API.
The module-level auxiliary_is_nous was set to True by _try_nous() and
never reset. In long-running gateway processes, once Nous was resolved
as auxiliary provider, the flag stayed True forever — even if
subsequent resolutions chose a different provider (e.g. OpenRouter).
This caused Nous product tags to be sent to non-Nous providers.
Reset the flag at the start of _resolve_auto() so only the winning
provider's flag persists.
When two consecutive assistant messages had mixed content types (one
string, one list), the merge logic just replaced the earlier message
entirely with the later one (fixed[-1] = m), silently dropping the
earlier message's content.
Apply the same normalization pattern used in the tool_use merge path
(lines 952-956): convert both to list format before concatenating.
This preserves all content from both messages.