Regression from #11161 (Claude Opus 4.7 migration, commit 0517ac3e).
The Opus 4.7 migration changed `ADAPTIVE_EFFORT_MAP["xhigh"]` from "max"
(the pre-migration alias) to "xhigh" to preserve the new 4.7 effort level
as distinct from max. This is correct for 4.7, but Opus/Sonnet 4.6 only
expose 4 levels (low/medium/high/max) — sending "xhigh" there now 400s:
BadRequestError [HTTP 400]: This model does not support effort
level 'xhigh'. Supported levels: high, low, max, medium.
Users who set reasoning_effort=xhigh as their default (xhigh is the
recommended default for coding/agentic on 4.7 per the Anthropic migration
guide) now 400 every request the moment they switch back to a 4.6 model
via `/model` or config. Verified live against the Anthropic API on
`anthropic==0.94.0`.
Fix: make the mapping model-aware. Add `_supports_xhigh_effort()`
predicate (matches 4-7/4.7 substrings, mirroring the existing
`_supports_adaptive_thinking` / `_forbids_sampling_params` pattern).
On pre-4.7 adaptive models, downgrade xhigh→max (the strongest effort
those models accept, restoring pre-migration behavior). On 4.7+, keep
xhigh as a distinct level.
Per Anthropic's migration guide, xhigh is 4.7-only:
https://platform.claude.com/docs/en/about-claude/models/migration-guide
> Opus 4.7 effort levels: max, xhigh (new), high, medium, low.
> Opus 4.6 effort levels: max, high, medium, low.
SDK typing confirms: `anthropic.types.OutputConfigParam.effort: Literal[
"low", "medium", "high", "max"]` (v0.94.0 not yet updated for xhigh).
## Test plan
Verified live on macOS 15.5 / anthropic==0.94.0:
claude-opus-4-6 + effort=xhigh → output_config.effort=max → 200 OK
claude-opus-4-7 + effort=xhigh → output_config.effort=xhigh → 200 OK
claude-opus-4-6 + effort=max → output_config.effort=max → 200 OK
claude-opus-4-7 + effort=max → output_config.effort=max → 200 OK
`tests/agent/test_anthropic_adapter.py` — 120 pass (replaced 1 bugged
test that asserted the broken behavior, added 1 for 4.7 preservation).
Full adapter suite: 120 passed in 1.05s.
Broader suite (agent + run_agent + cli/gateway reasoning): 2140 passed
(2 pre-existing failures on clean upstream/main, unrelated).
## Platforms
Tested on macOS 15.5. No platform-specific code paths touched.
Claude Opus 4.7 introduced several breaking API changes that the current
codebase partially handled but not completely. This patch finishes the
migration per the official migration guide at
https://platform.claude.com/docs/en/about-claude/models/migration-guideFixesNousResearch/hermes-agent#11137
Breaking-change coverage:
1. Adaptive thinking + output_config.effort — 4.7 is now recognized by
_supports_adaptive_thinking() (extends previous 4.6-only gate).
2. Sampling parameter stripping — 4.7 returns 400 for any non-default
temperature / top_p / top_k. build_anthropic_kwargs drops them as a
safety net; the OpenAI-protocol auxiliary path (_build_call_kwargs)
and AnthropicCompletionsAdapter.create() both early-exit before
setting temperature for 4.7+ models. This keeps flush_memories and
structured-JSON aux paths that hardcode temperature from 400ing
when the aux model is flipped to 4.7.
3. thinking.display = "summarized" — 4.7 defaults display to "omitted",
which silently hides reasoning text from Hermes's CLI activity feed
during long tool runs. Restoring "summarized" preserves 4.6 UX.
4. Effort level mapping — xhigh now maps to xhigh (was xhigh→max, which
silently over-efforted every coding/agentic request). max is now a
distinct ceiling per Anthropic's 5-level effort model.
5. New stop_reason values — refusal and model_context_window_exceeded
were silently collapsed to "stop" (end_turn) by the adapter's
stop_reason_map. Now mapped to "content_filter" and "length"
respectively, matching upstream finish-reason handling already in
bedrock_adapter.
6. Model catalogs — claude-opus-4-7 added to the Anthropic provider
list, anthropic/claude-opus-4.7 added at top of OpenRouter fallback
catalog (recommended), claude-opus-4-7 added to model_metadata
DEFAULT_CONTEXT_LENGTHS (1M, matching 4.6 per migration guide).
7. Prefill docstrings — run_agent.AIAgent and BatchRunner now document
that Anthropic Sonnet/Opus 4.6+ reject a trailing assistant-role
prefill (400).
8. Tests — 4 new tests in test_anthropic_adapter covering display
default, xhigh preservation, max on 4.7, refusal / context-overflow
stop_reason mapping, plus the sampling-param predicate. test_model_metadata
accepts 4.7 at 1M context.
Tested on macOS 15.5 (darwin). 119 tests pass in
tests/agent/test_anthropic_adapter.py, 1320 pass in tests/agent/.
Ensure _align_boundary_backward never pushes the last user message
into the compressed region. Without this, compression could delete
the user active task instruction mid-session.
Cherry-picked from #10969 by @sontianye. Fixes#10896.
resolve_vision_provider_client() was receiving the raw call_llm
parameters instead of the resolved provider/model/key/url from
_resolve_task_provider_model(). This caused config overrides
(auxiliary.vision.provider, etc.) to be silently discarded.
Cherry-picked from #10901 by @lrawnsley.
The gateway compression notifications were already removed in commit cc63b2d1
(PR #4139), but the agent-level context pressure warnings (85%/95% tiered
alerts via _emit_context_pressure) were still firing on both CLI and gateway.
Removed:
- _emit_context_pressure method and all call sites in run_conversation()
- Class-level dedup state (_context_pressure_last_warned, _CONTEXT_PRESSURE_COOLDOWN)
- Instance attribute _context_pressure_warned_at
- Pressure reset logic in _compress_context
- format_context_pressure and format_context_pressure_gateway from agent/display.py
- Orphaned ANSI constants that only served these functions
- tests/run_agent/test_context_pressure.py (all 361 lines)
Compression itself continues to run silently in the background.
Closes#3784
Skins define waiting_faces, thinking_faces, and thinking_verbs in their
spinner config, but all 7 call sites in run_agent.py used hardcoded class
constants. Add three classmethods on KawaiiSpinner that query the active
skin first and fall back to the class constants, matching the existing
pattern used for wings/tool_prefix/tool_emojis.
Co-authored-by: nosleepcassette <nosleepcassette@users.noreply.github.com>
_load_skill_payload() reconstructed skill_dir as SKILLS_DIR / relative_path,
which is wrong for external skills from skills.external_dirs — they live
outside SKILLS_DIR entirely. Scripts and linked files failed to load.
Fix: skill_view() now includes the absolute skill_dir in its result dict.
_load_skill_payload() uses that directly when available, falling back to
the SKILLS_DIR-relative reconstruction only for legacy responses.
Closes#10313
When Nous returns a 429, the retry amplification chain burns up to 9
API requests per conversation turn (3 SDK retries × 3 Hermes retries),
each counting against RPH and deepening the rate limit. With multiple
concurrent sessions (cron + gateway + auxiliary), this creates a spiral
where retries keep the limit tapped indefinitely.
New module: agent/nous_rate_guard.py
- Shared file-based rate limit state (~/.hermes/rate_limits/nous.json)
- Parses reset time from x-ratelimit-reset-requests-1h, x-ratelimit-
reset-requests, retry-after headers, or error context
- Falls back to 5-minute default cooldown if no header data
- Atomic writes (tempfile + rename) for cross-process safety
- Auto-cleanup of expired state files
run_agent.py changes:
- Top-of-retry-loop guard: when another session already recorded Nous
as rate-limited, skip the API call entirely. Try fallback provider
first, then return a clear message with the reset time.
- On 429 from Nous: record rate limit state and skip further retries
(sets retry_count = max_retries to trigger fallback path)
- On success from Nous: clear the rate limit state so other sessions
know they can resume
auxiliary_client.py changes:
- _try_nous() checks rate guard before attempting Nous in the auxiliary
fallback chain. When rate-limited, returns (None, None) so the chain
skips to the next provider instead of piling more requests onto Nous.
This eliminates three sources of amplification:
1. Hermes-level retries (saves 6 of 9 calls per turn)
2. Cross-session retries (cron + gateway all skip Nous)
3. Auxiliary fallback to Nous (compression/session_search skip too)
Includes 24 tests covering the rate guard module, header parsing,
state lifecycle, and auxiliary client integration.
When proxy env vars (HTTP_PROXY, HTTPS_PROXY, ALL_PROXY) contain
malformed URLs — e.g. 'http://127.0.0.1:6153export' from a broken
shell config — the OpenAI/httpx client throws a cryptic 'Invalid port'
error that doesn't identify the offending variable.
Add _validate_proxy_env_urls() and _validate_base_url() in
auxiliary_client.py, called from resolve_provider_client() and
_create_openai_client() to fail fast with a clear, actionable error
message naming the broken env var or URL.
Closes#6360
Co-authored-by: MestreY0d4-Uninter <MestreY0d4-Uninter@users.noreply.github.com>
Found via trace data audit: JWT tokens (eyJ...) and Discord snowflake
mentions (<@ID>) were passing through unredacted.
JWT pattern: matches 1/2/3-part tokens starting with eyJ (base64 for '{').
Zero false-positive risk — no normal text matches eyJ + 10+ base64url chars.
Discord pattern: matches <@digits> and <@!digits> with 17-20 digit snowflake
IDs. Syntactically unique to Discord's mention format.
Both patterns follow the same structural-uniqueness standard as existing
prefix patterns (sk-, ghp_, AKIA, etc.).
The _client_cache used event loop id() as part of the cache key, so
every new worker-thread event loop created a new entry for the same
provider config. In long-running gateways where threads are recycled
frequently, this caused unbounded cache growth — each stale entry
held an unclosed AsyncOpenAI client with its httpx connection pool,
eventually exhausting file descriptors.
Fix: remove loop_id from the cache key and instead validate on each
async cache hit that the cached loop is the current, open loop. If
the loop changed or was closed, the stale entry is replaced in-place
rather than creating an additional entry. This bounds cache growth
to at most one entry per unique provider config.
Also adds a _CLIENT_CACHE_MAX_SIZE (64) safety belt with FIFO
eviction as defense-in-depth against any remaining unbounded growth.
Cross-loop safety is preserved: different event loops still get
different client instances (validated by existing test suite).
Closes#10200
OV transparently handles message history across /new and /compress: old
messages stay in the same session and extraction is idempotent, so there's
no need to rebind providers to a new session_id. The only thing the
session boundary actually needs is to trigger extraction.
- MemoryProvider / MemoryManager: remove on_session_reset hook
- OpenViking: remove on_session_reset override (nothing to do)
- AIAgent: replace rotate_memory_session with commit_memory_session
(just calls on_session_end, no rebind)
- cli.py / run_agent.py: single commit_memory_session call at the
session boundary before session_id rotates
- tests: replace on_session_reset coverage with routing tests for
MemoryManager.on_session_end
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace hasattr-forked OpenViking-specific paths with a proper base-class
hook. Collapse the two agent wrappers into a single rotate_memory_session
so callers don't orchestrate commit + rebind themselves.
- MemoryProvider: add on_session_reset(new_session_id) as a default no-op
- MemoryManager: on_session_reset fans out unconditionally (no hasattr,
no builtin skip — base no-op covers it)
- OpenViking: rename reset_session -> on_session_reset; drop the explicit
POST /api/v1/sessions (OV auto-creates on first message) and the two
debug raise_for_status wrappers
- AIAgent: collapse commit_memory_session + reinitialize_memory_session
into rotate_memory_session(new_sid, messages)
- cli.py / run_agent.py: replace hasattr blocks and the split calls with
a single unconditional rotate_memory_session call; compression path
now passes the real messages list instead of []
- tests: align with on_session_reset, assert reset does NOT POST /sessions
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The OpenViking memory provider extracts memories when its session is
committed (POST /api/v1/sessions/{id}/commit). Before this fix, the
CLI had two code paths that changed the active session_id without ever
committing the outgoing OpenViking session:
1. /new (new_session() in cli.py) — called flush_memories() to write
MEMORY.md, then immediately discarded the old session_id. The
accumulated OpenViking session was never committed, so all context
from that session was lost before extraction could run.
2. /compress and auto-compress (_compress_context() in run_agent.py) —
split the SQLite session (new session_id) but left the OpenViking
provider pointing at the old session_id with no commit, meaning all
messages synced to OpenViking were silently orphaned.
The gateway already handles session commit on /new and /reset via
shutdown_memory_provider() on the cached agent; the CLI path did not.
Fix: introduce a lightweight session-transition lifecycle alongside
the existing full shutdown path:
- OpenVikingMemoryProvider.reset_session(new_session_id): waits for
in-flight background threads, resets per-session counters, and
creates the new OV session via POST /api/v1/sessions — without
tearing down the HTTP client (avoids connection overhead on /new).
- MemoryManager.restart_session(new_session_id): calls reset_session()
on providers that implement it; falls back to initialize() for
providers that do not. Skips the builtin provider (no per-session
state).
- AIAgent.commit_memory_session(messages): wraps
memory_manager.on_session_end() without shutdown — commits OV session
for extraction but leaves the provider alive for the next session.
- AIAgent.reinitialize_memory_session(new_session_id): wraps
memory_manager.restart_session() — transitions all external providers
to the new session after session_id has been assigned.
Call sites:
- cli.py new_session(): commit BEFORE session_id changes, reinitialize
AFTER — ensuring OV extraction runs on the correct session and the
new session is immediately ready for the next turn.
- run_agent._compress_context(): same pattern, inside the
if self._session_db: block where the session_id split happens.
/compress and auto-compress are functionally identical at this layer:
both call _compress_context(), so both are fixed by the same change.
Tests added to tests/agent/test_memory_provider.py:
- TestMemoryManagerRestartSession: reset_session() routing, builtin
skip, initialize() fallback, failure tolerance, empty-manager noop.
- TestOpenVikingResetSession: session_id update, per-session state
clear, POST /api/v1/sessions call, API failure tolerance, no-client
noop.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Tool schema descriptions and tool return values contained hardcoded
~/.hermes paths that the model sees and uses. When HERMES_HOME is set
to a custom path (Docker containers, profiles), the agent would still
reference ~/.hermes — looking at the wrong directory.
Fixes 6 locations across 5 files:
- tools/tts_tool.py: output_path schema description
- tools/cronjob_tools.py: script path schema description
- tools/skill_manager_tool.py: skill_manage schema description
- tools/skills_tool.py: two tool return messages
- agent/skill_commands.py: skill config injection text
All now use display_hermes_home() which resolves to the actual
HERMES_HOME path (e.g. /opt/data for Docker, ~/.hermes/profiles/X
for profiles, ~/.hermes for default).
Reported by: Sandeep Narahari (PrithviDevs)
Expose skill usage in analytics so the dashboard and insights output can
show which skills the agent loads and manages over time.
This adds skill aggregation to the InsightsEngine by extracting
`skill_view` and `skill_manage` calls from assistant tool_calls,
computing per-skill totals, and including the results in both terminal
and gateway insights formatting. It also extends the dashboard analytics
API and Analytics page to render a Top Skills table.
Terminology is aligned with the skills docs:
- Agent Loaded = `skill_view` events
- Agent Managed = `skill_manage` actions
Architecture:
- agent/insights.py collects and aggregates per-skill usage
- hermes_cli/web_server.py exposes `skills` on `/api/analytics/usage`
- web/src/lib/api.ts adds analytics skill response types
- web/src/pages/AnalyticsPage.tsx renders the Top Skills table
- web/src/i18n/{en,zh}.ts updates user-facing labels
Tests:
- tests/agent/test_insights.py covers skill aggregation and formatting
- tests/hermes_cli/test_web_server.py covers analytics API contract
including the `skills` payload
- verified with `cd web && npm run build`
Files changed:
- agent/insights.py
- hermes_cli/web_server.py
- tests/agent/test_insights.py
- tests/hermes_cli/test_web_server.py
- web/src/i18n/en.ts
- web/src/i18n/types.ts
- web/src/i18n/zh.ts
- web/src/lib/api.ts
- web/src/pages/AnalyticsPage.tsx
Four independent fixes:
1. Reset activity timestamp on cached agent reuse (#9051)
When the gateway reuses a cached AIAgent for a new turn, the
_last_activity_ts from the previous turn (possibly hours ago)
carried over. The inactivity timeout handler immediately saw
the agent as idle for hours and killed it.
Fix: reset _last_activity_ts, _last_activity_desc, and
_api_call_count when retrieving an agent from the cache.
2. Detect uv-managed virtual environments (#8620 sub-issue 1)
The systemd unit generator fell back to sys.executable (uv's
standalone Python) when running under 'uv run', because
sys.prefix == sys.base_prefix. The generated ExecStart pointed
to a Python binary without site-packages.
Fix: check VIRTUAL_ENV env var before falling back to
sys.executable. uv sets VIRTUAL_ENV even when sys.prefix
doesn't reflect the venv.
3. Nudge model to continue after empty post-tool response (#9400)
Weaker models sometimes return empty after tool calls. The agent
silently abandoned the remaining work.
Fix: append assistant('(empty)') + user nudge message and retry
once. Resets after each successful tool round.
4. Compression model fallback on permanent errors (#8620 sub-issue 4)
When the default summary model (gemini-3-flash) returns 503
'model_not_found' on custom proxies, the compressor entered a
600s cooldown, leaving context growing unbounded.
Fix: detect permanent model-not-found errors (503, 404,
'model_not_found', 'no available channel') and fall back to
using the main model for compression instead of entering
cooldown. One-time fallback with immediate retry.
Test plan: 40 compressor tests + 97 gateway/CLI tests + 9 venv tests pass
Add 'xai', 'x-ai', 'x.ai', 'grok' to _PROVIDER_PREFIXES so that
colon-prefixed model names (e.g. xai:grok-4.20) are stripped correctly
for context length lookups.
Cherry-picked from PR #9184 by @Julientalbot.
- Add glm-5v-turbo to OpenRouter, Nous, and native Z.AI model lists
- Add glm-5v context length entry (200K tokens) to model metadata
- Update Z.AI endpoint probe to try multiple candidate models per
endpoint (glm-5.1, glm-5v-turbo, glm-4.7) — fixes detection for
newer coding plan accounts that lack older models
- Add zai to _PROVIDER_VISION_MODELS so auxiliary vision tasks
(vision_analyze, browser screenshots) route through 5v
Fixes#9888
Seed qwen-oauth credentials from resolve_qwen_runtime_credentials() in
_seed_from_singletons(). Users who authenticate via 'qwen auth qwen-oauth'
store tokens in ~/.qwen/oauth_creds.json which the runtime resolver reads
but the credential pool couldn't detect — same gap pattern as copilot.
Uses refresh_if_expiring=False to avoid network calls during discovery.
Seed copilot credentials from resolve_copilot_token() in the credential
pool's _seed_from_singletons(), alongside the existing anthropic and
openai-codex seeding logic. This makes copilot appear in the /model
provider picker when the user authenticates solely through gh auth token.
Cherry-picked from PR #9767 by Marvae.
Add ctx.register_skill() API so plugins can ship SKILL.md files under
a 'plugin:skill' namespace, preventing name collisions with built-in
Hermes skills. skill_view() detects the ':' separator and routes to
the plugin registry while bare names continue through the existing
flat-tree scan unchanged.
Key additions:
- agent/skill_utils: parse_qualified_name(), is_valid_namespace()
- hermes_cli/plugins: PluginContext.register_skill(), PluginManager
skill registry (find/list/remove)
- tools/skills_tool: qualified name dispatch in skill_view(),
_serve_plugin_skill() with full guards (disabled, platform,
injection scan), bundle context banner with sibling listing,
stale registry self-heal
- Hoisted _INJECTION_PATTERNS to module level (dedup)
- Updated skill_view schema description
Based on PR #9334 by N0nb0at. Lean P1 salvage — omits autogen shim
(P2) for a simpler first merge.
Closes#8422
- Rename platform from 'qq' to 'qqbot' across all integration points
(Platform enum, toolset, config keys, import paths, file rename qq.py → qqbot.py)
- Add PLATFORM_HINTS for QQBot in prompt_builder (QQ supports markdown)
- Set SUPPORTS_MESSAGE_EDITING = False to skip streaming on QQ
(prevents duplicate messages from non-editable partial + final sends)
- Add _send_qqbot() standalone send function for cron/send_message tool
- Add interactive _setup_qq() wizard in hermes_cli/setup.py
- Restore missing _setup_signal/email/sms/dingtalk/feishu/wecom/wecom_callback
functions that were lost during the original merge
* Add hermes debug share instructions to all issue templates
- bug_report.yml: Add required Debug Report section with hermes debug share
and /debug instructions, make OS/Python/Hermes version optional (covered
by debug report), demote old logs field to optional supplementary
- setup_help.yml: Replace hermes doctor reference with hermes debug share,
add Debug Report section with fallback chain (debug share -> --local -> doctor)
- feature_request.yml: Add optional Debug Report section for environment context
All templates now guide users to run hermes debug share (or /debug in chat)
and paste the resulting paste.rs links, giving maintainers system info,
config, and recent logs in one step.
* feat: add openrouter/elephant-alpha to curated model lists
- Add to OPENROUTER_MODELS (free, positioned above GPT models)
- Add to _PROVIDER_MODELS["nous"] mirror list
- Add 256K context window fallback in model_metadata.py
The generic 'gpt-5' fallback was set to 128,000 — which is the max
OUTPUT tokens, not the context window. GPT-5 base and most variants
(codex, mini) have 400,000 context. This caused /model to report
128k for models like gpt-5.3-codex when models.dev was unavailable.
Added specific entries for GPT-5 variants with different context sizes:
- gpt-5.4, gpt-5.4-pro: 1,050,000 (1.05M)
- gpt-5.4-mini, gpt-5.4-nano: 400,000
- gpt-5.3-codex-spark: 128,000 (reduced)
- gpt-5.1-chat: 128,000 (chat variant)
- gpt-5 (catch-all): 400,000
Sources: https://developers.openai.com/api/docs/models
Port two improvements inspired by Kilo-Org/kilocode analysis:
1. Error classifier: add context overflow patterns for vLLM, Ollama,
and llama.cpp/llama-server. These local inference servers return
different error formats than cloud providers (e.g., 'exceeds the
max_model_len', 'context length exceeded', 'slot context'). Without
these patterns, context overflow errors from local servers are
misclassified as format errors, causing infinite retries instead
of triggering compression.
2. MCP initial connection retry: previously, if the very first
connection attempt to an MCP server failed (e.g., transient DNS
blip at startup), the server was permanently marked as failed with
no retry. Post-connect reconnection had 5 retries with exponential
backoff, but initial connection had zero. Now initial connections
retry up to 3 times with backoff before giving up, matching the
resilience of post-connect reconnection.
(Inspired by Kilo Code's MCP server disappearing fix in v1.3.3)
Tests: 6 new error classifier tests, 4 new MCP retry tests, 1
updated existing test. All 276 affected tests pass.
Adds Arcee AI as a standard direct provider (ARCEEAI_API_KEY) with
Trinity models: trinity-large-thinking, trinity-large-preview, trinity-mini.
Standard OpenAI-compatible provider checklist: auth.py, config.py,
models.py, main.py, providers.py, doctor.py, model_normalize.py,
model_metadata.py, setup.py, trajectory_compressor.py.
Based on PR #9274 by arthurbr11, simplified to a standard direct
provider without dual-endpoint OpenRouter routing.
- Use isinstance() with try/except import for CopilotACPClient check
in _to_async_client instead of fragile __class__.__name__ string check
- Restore accurate comment: GPT-5.x models *require* (not 'often require')
the Responses API on OpenAI/OpenRouter; ACP is the exception, not a
softening of the requirement
- Add inline comment explaining the ACP exclusion rationale
Cherry-picked from PR #7637 by hcshen0111.
Adds kimi-coding-cn provider with dedicated KIMI_CN_API_KEY env var
and api.moonshot.cn/v1 endpoint for China-region Moonshot users.
The v11→v12 migration converts custom_providers (list) into providers
(dict), then deletes the list. But all runtime resolvers read from
custom_providers — after migration, named custom endpoints silently stop
resolving and fallback chains fail with AuthError.
Add get_compatible_custom_providers() that reads from both config schemas
(legacy custom_providers list + v12+ providers dict), normalizes entries,
deduplicates, and returns a unified list. Update ALL consumers:
- hermes_cli/runtime_provider.py: _get_named_custom_provider() + key_env
- hermes_cli/auth_commands.py: credential pool provider names
- hermes_cli/main.py: model picker + _model_flow_named_custom()
- agent/auxiliary_client.py: key_env + custom_entry model fallback
- agent/credential_pool.py: _iter_custom_providers()
- cli.py + gateway/run.py: /model switch custom_providers passthrough
- run_agent.py + gateway/run.py: per-model context_length lookup
Also: use config.pop() instead of del for safer migration, fix stale
_config_version assertions in tests, add pool mock to codex test.
Co-authored-by: 墨綠BG <s5460703@gmail.com>
Closes#8776, salvaged from PR #8814
resolve_vision_provider_client() computed resolved_api_mode from config
but never passed it to downstream resolve_provider_client() or
_get_cached_client() calls, causing custom providers with
api_mode: anthropic_messages to crash when used for vision tasks.
Also remove the for_vision special case in _normalize_aux_provider()
that incorrectly discarded named custom provider identifiers.
Fixes#8857
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove the backward-compat code paths that read compression provider/model
settings from legacy config keys and env vars, which caused silent failures
when auto-detection resolved to incompatible backends.
What changed:
- Remove compression.summary_model, summary_provider, summary_base_url from
DEFAULT_CONFIG and cli.py defaults
- Remove backward-compat block in _resolve_task_provider_model() that read
from the legacy compression section
- Remove _get_auxiliary_provider() and _get_auxiliary_env_override() helper
functions (AUXILIARY_*/CONTEXT_* env var readers)
- Remove env var fallback chain for per-task overrides
- Update hermes config show to read from auxiliary.compression
- Add config migration (v16→17) that moves non-empty legacy values to
auxiliary.compression and strips the old keys
- Update example config and openclaw migration script
- Remove/update tests for deleted code paths
Compression model/provider is now configured exclusively via:
auxiliary.compression.provider / auxiliary.compression.model
Closes#8923
_query_local_context_length was checking model_info.context_length
(the GGUF training max) before num_ctx (the Modelfile runtime override),
inverse to query_ollama_num_ctx. The two helpers therefore disagreed on
the same model:
hermes-brain:qwen3-14b-ctx32k # Modelfile: num_ctx 32768
underlying qwen3:14b GGUF # qwen3.context_length: 40960
query_ollama_num_ctx correctly returned 32768 (the value Ollama will
actually allocate KV cache for). _query_local_context_length returned
40960, which let ContextCompressor grow conversations past 32768 before
triggering compression — at which point Ollama silently truncated the
prefix, corrupting context.
Swap the order so num_ctx is checked first, matching query_ollama_num_ctx.
Adds a parametrized test that seeds both values and asserts num_ctx wins.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
auxiliary_client.py had its own regex mirroring _strip_think_blocks
but was missing the <thought> variant. Also adds test coverage for
<thought> paired and orphaned tags.
The previous wording ('If one clearly matches') set too high a threshold,
and 'If none match, proceed normally' was an easy escape hatch for lazy
models. Now:
- Lowered threshold: 'matches or is even partially relevant'
- Added MUST directive and 'err on the side of loading' guidance
- Replaced permissive closer with 'only proceed without if genuinely none
are relevant'
This should reduce cases where the agent skips loading relevant skills
unless explicitly forced.
When running inside WSL (Windows Subsystem for Linux), inject a hint into
the system prompt explaining that the Windows host filesystem is mounted
at /mnt/c/, /mnt/d/, etc. This lets the agent naturally translate Windows
paths (Desktop, Documents) to their /mnt/ equivalents without the user
needing to configure anything.
Uses the existing is_wsl() detection from hermes_constants (cached,
checks /proc/version for 'microsoft'). Adds build_environment_hints()
in prompt_builder.py — extensible for Termux, Docker, etc. later.
Closes the UX gap where WSL users had to manually explain path
translation to the agent every session.
OpenAI OAuth refresh tokens are single-use and rotate on every refresh.
When Hermes refreshes a Codex token, it consumed the old refresh_token
but never wrote the new pair back to ~/.codex/auth.json. This caused
Codex CLI and VS Code to fail with 'refresh_token_reused' on their
next refresh attempt.
This mirrors the existing Anthropic write-back pattern where refreshed
tokens are written to ~/.claude/.credentials.json via
_write_claude_code_credentials().
Changes:
- Add _write_codex_cli_tokens() in hermes_cli/auth.py (parallel to
_write_claude_code_credentials in anthropic_adapter.py)
- Call it from _refresh_codex_auth_tokens() (non-pool refresh path)
- Call it from credential_pool._refresh_entry() (pool happy path + retry)
- Add tests for the new write-back behavior
- Update existing test docstring to clarify _save_codex_tokens vs
_write_codex_cli_tokens separation
Fixes refresh token conflict reported by @ec12edfae2cb221
The previous wording ('If one clearly matches') set too high a threshold,
and 'If none match, proceed normally' was an easy escape hatch for lazy
models. Now:
- Lowered threshold: 'matches or is even partially relevant'
- Added MUST directive and 'err on the side of loading' guidance
- Replaced permissive closer with 'only proceed without if genuinely none
are relevant'
This should reduce cases where the agent skips loading relevant skills
unless explicitly forced.
- Add openai/openai-codex -> openai mapping to PROVIDER_TO_MODELS_DEV
so context-length lookups use models.dev data instead of 128k fallback.
Fixes#8161.
- Set api_mode from custom_providers entry when switching via hermes model,
and clear stale api_mode when the entry has none. Also extract api_mode
in _named_custom_provider_map(). Fixes#8181.
- Convert OpenAI image_url content blocks to Anthropic image blocks when
the endpoint is Anthropic-compatible (MiniMax, MiniMax-CN, or any URL
containing /anthropic). Fixes#8147.
Users whose credentials exist only in external files — OpenAI Codex
OAuth tokens in ~/.codex/auth.json or Anthropic Claude Code credentials
in ~/.claude/.credentials.json — would not see those providers in the
/model picker, even though hermes auth and hermes model detected them.
Root cause: list_authenticated_providers() only checked the raw Hermes
auth store and env vars. External credential file fallbacks (Codex CLI
import, Claude Code file discovery) were never triggered.
Fix (three parts):
1. _seed_from_singletons() in credential_pool.py: openai-codex now
imports from ~/.codex/auth.json when the Hermes auth store is empty,
mirroring resolve_codex_runtime_credentials().
2. list_authenticated_providers() in model_switch.py: auth store + pool
checks now run for ALL providers (not just OAuth auth_type), catching
providers like anthropic that support both API key and OAuth.
3. list_authenticated_providers(): direct check for anthropic external
credential files (Claude Code, Hermes PKCE). The credential pool
intentionally gates anthropic behind is_provider_explicitly_configured()
to prevent auxiliary tasks from silently consuming tokens. The /model
picker bypasses this gate since it is discovery-oriented.
After compression, models (especially Kimi 2.5) would sometimes respond
to questions from the summary instead of the latest user message. This
happened ~30% of the time on Telegram.
Root cause: the summary's 'Next Steps' section read as active instructions,
and the SUMMARY_PREFIX didn't explicitly tell the model to ignore questions
in the summary. When the summary merged into the first tail message, there
was no clear separator between historical context and the actual user message.
Changes inspired by competitor analysis (Claude Code, OpenCode, Codex):
1. SUMMARY_PREFIX rewritten with explicit 'Do NOT answer questions from
this summary — respond ONLY to the latest user message AFTER it'
2. Summarizer preamble (shared by both prompts) adds:
- 'Do NOT respond to any questions' (from OpenCode's approach)
- 'Different assistant' framing (from Codex) to create psychological
distance between summary content and active conversation
3. New summary sections:
- '## Resolved Questions' — tracks already-answered questions with
their answers, preventing re-answering (from Claude Code's
'Pending user asks' pattern)
- '## Pending User Asks' — explicitly marks unanswered questions
- '## Remaining Work' replaces '## Next Steps' — passive framing
avoids reading as active instructions
4. merge-summary-into-tail path now inserts a clear separator:
'--- END OF CONTEXT SUMMARY — respond to the message below ---'
5. Iterative update prompt now instructs: 'Move answered questions to
Resolved Questions' to maintain the resolved/pending distinction
across multiple compactions.
Adds an optional focus topic to /compress: `/compress database schema`
guides the summariser to preserve information related to the focus topic
(60-70% of summary budget) while compressing everything else more aggressively.
Inspired by Claude Code's /compact <focus>.
Changes:
- context_compressor.py: focus_topic parameter on _generate_summary() and
compress(); appends FOCUS TOPIC guidance block to the LLM prompt
- run_agent.py: focus_topic parameter on _compress_context(), passed through
to the compressor
- cli.py: _manual_compress() extracts focus topic from command string,
preserves existing manual_compression_feedback integration (no regression)
- gateway/run.py: _handle_compress_command() extracts focus from event args
and passes through — full gateway parity
- commands.py: args_hint="[focus topic]" on /compress CommandDef
Salvaged from PR #7459 (CLI /compress focus only — /context command deferred).
15 new tests across CLI, compressor, and gateway.
Switch estimate_tokens_rough(), estimate_messages_tokens_rough(), and
estimate_request_tokens_rough() from floor division (len // 4) to
ceiling division ((len + 3) // 4). Short texts (1-3 chars) previously
estimated as 0 tokens, causing the compressor and pre-flight checks to
systematically undercount when many short tool results are present.
Also replaced the inline duplicate formula in run_conversation()
(total_chars // 4) with a call to the shared
estimate_messages_tokens_rough() function.
Updated 4 tests that hardcoded floor-division expected values.
Related: issue #6217, PR #6629
Three root causes of the 'agent stops mid-task' gateway bug:
1. Compression threshold floor (64K tokens minimum)
- The 50% threshold on a 100K-context model fired at 50K tokens,
causing premature compression that made models lose track of
multi-step plans. Now threshold_tokens = max(50% * context, 64K).
- Models with <64K context are rejected at startup with a clear error.
2. Budget warning removal — grace call instead
- Removed the 70%/90% iteration budget warnings entirely. These
injected '[BUDGET WARNING: Provide your final response NOW]' into
tool results, causing models to abandon complex tasks prematurely.
- Now: no warnings during normal execution. When the budget is
actually exhausted (90/90), inject a user message asking the model
to summarise, allow one grace API call, and only then fall back
to _handle_max_iterations.
3. Activity touches during long terminal execution
- _wait_for_process polls every 0.2s but never reported activity.
The gateway's inactivity timeout (default 1800s) would fire during
long-running commands that appeared 'idle.'
- Now: thread-local activity callback fires every 10s during the
poll loop, keeping the gateway's activity tracker alive.
- Agent wires _touch_activity into the callback before each tool call.
Also: docs update noting 64K minimum context requirement.
Closes#7915 (root cause was agent-loop termination, not Weixin delivery limits).
* fix(tools): neutralize shell injection in _write_to_sandbox via path quoting
_write_to_sandbox interpolated storage_dir and remote_path directly into
a shell command passed to env.execute(). Paths containing shell
metacharacters (spaces, semicolons, $(), backticks) could trigger
arbitrary command execution inside the sandbox.
Fix: wrap both paths with shlex.quote(). Clean paths (alphanumeric +
slashes/hyphens/dots) are left unmodified by shlex.quote, so existing
behavior is unchanged. Paths with unsafe characters get single-quoted.
Tests added for spaces, $(command) substitution, and semicolon injection.
* fix: is_local_endpoint misses Docker/Podman DNS names
host.docker.internal, host.containers.internal, gateway.docker.internal,
and host.lima.internal are well-known DNS names that container runtimes
use to resolve the host machine. Users running Ollama on the host with
the agent in Docker/Podman hit the default 120s stream timeout instead
of the bumped 1800s because these hostnames weren't recognized as local.
Add _CONTAINER_LOCAL_SUFFIXES tuple and suffix check in
is_local_endpoint(). Tests cover all three runtime families plus a
negative case for domains that merely contain the suffix as a substring.
The auxiliary client previously checked env vars (AUXILIARY_{TASK}_PROVIDER,
AUXILIARY_{TASK}_MODEL, etc.) before config.yaml's auxiliary.{task}.* section.
This violated the project's '.env is for secrets only' policy — these are
behavioral settings, not API keys.
Flipped the resolution order in _resolve_task_provider_model():
1. Explicit args (always win)
2. config.yaml auxiliary.{task}.* (PRIMARY)
3. Env var overrides (backward-compat fallback only)
4. 'auto' (full auto-detection chain)
Env var reading code is kept for backward compatibility but config.yaml
now takes precedence. Updated module docstring and function docstring.
Also removed AUXILIARY_VISION_MODEL from _EXTRA_ENV_KEYS in config.py.
Cherry-picked from PR #7702 by kshitijk4poor.
Adds Xiaomi MiMo as a direct provider (XIAOMI_API_KEY) with models:
- mimo-v2-pro (1M context), mimo-v2-omni (256K, multimodal), mimo-v2-flash (256K, cheapest)
Standard OpenAI-compatible provider checklist: auth.py, config.py, models.py,
main.py, providers.py, doctor.py, model_normalize.py, model_metadata.py,
models_dev.py, auxiliary_client.py, .env.example, cli-config.yaml.example.
Follow-up: vision tasks use mimo-v2-omni (multimodal) instead of the user's
main model. Non-vision aux uses the user's selected model. Added
_PROVIDER_VISION_MODELS dict for provider-specific vision model overrides.
On failure, falls back to aggregators (gemini flash) via existing fallback chain.
Corrects pre-existing context lengths: mimo-v2-pro 1048576→1000000,
mimo-v2-omni 1048576→256000, adds mimo-v2-flash 256000.
36 tests covering registry, aliases, auto-detect, credentials, models.dev,
normalization, URL mapping, providers module, doctor, aux client, vision
model override, and agent init.
Cherry-picked from PR #7749 by kshitijk4poor with modifications:
- Raise hard image limit from 5 MB to 20 MB (matches most restrictive provider)
- Send images at full resolution first; only auto-resize to 5 MB on API failure
- Add _is_image_size_error() helper to detect size-related API rejections
- Auto-resize uses Pillow (soft dep) with progressive downscale + JPEG quality reduction
- Fix get_model_capabilities() to check modalities.input for vision support
- Increase default vision timeout from 30s to 120s (matches hardcoded fallback intent)
- Applied retry-with-resize to both vision_analyze_tool and browser_vision
Closes#7740
Based on PR #7285 by @kshitijk4poor.
Two bugs affecting Qwen OAuth users:
1. Wrong context window — qwen3-coder-plus showed 128K instead of 1M.
Added specific entries before the generic qwen catch-all:
- qwen3-coder-plus: 1,000,000 (corrected from PR's 1,048,576 per
official Alibaba Cloud docs and OpenRouter)
- qwen3-coder: 262,144
2. Random stopping — max_tokens was suppressed for Qwen Portal, so the
server applied its own low default. Reasoning models exhaust that on
thinking tokens. Now: honor explicit max_tokens, default to 65536
when unset.
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
process_registry.py: _reader_loop() has process.wait() after the try-except
block (line 380). If the reader thread crashes with an unexpected exception
(e.g. MemoryError, KeyboardInterrupt), control exits the except handler but
skips wait() — leaving the child as a zombie process. Move wait() and the
cleanup into a finally block so the child is always reaped.
cron/scheduler.py: _run_job_script() only redacts secrets in stdout on the
SUCCESS path (line 417-421). When a cron script fails (non-zero exit), both
stdout and stderr are returned WITHOUT redaction (lines 407-413). A script
that accidentally prints an API key to stderr during a failure would leak it
into the LLM context. Move redaction before the success/failure branch so
both paths benefit.
skill_commands.py: _build_skill_message() enumerates supporting files using
rglob("*") but only checks is_file() (line 171) without filtering symlinks.
PR #6693 added symlink protection to scan_skill_commands() but missed this
function. A malicious skill can create symlinks in references/ pointing to
arbitrary files, exposing their paths (and potentially content via skill_view)
to the LLM. Add is_symlink() check to match the guard in scan_skill_commands.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
async_call_llm (and call_llm) can return non-OpenAI objects from
custom providers or adapter shims, crashing downstream consumers
with misleading AttributeError ('str' has no attribute 'choices').
Add _validate_llm_response() that checks the response has the
expected .choices[0].message shape before returning. Wraps all
return paths in call_llm, async_call_llm, and fallback paths.
Fails fast with a clear RuntimeError identifying the task, response
type, and a preview of the malformed payload.
Closes#7264
`resolve_provider_client()` already drops OpenRouter-format model slugs
(containing "/") when the resolved provider is not OpenRouter (line 1097).
However, `_get_cached_client()` returns `model or cached_default` directly
on cache hits, bypassing this check entirely.
When the main provider is openai-codex, the auto-detection chain (Step 1
of `_resolve_auto`) caches a CodexAuxiliaryClient. Subsequent auxiliary
calls for different tasks (e.g. compression with `summary_model:
google/gemini-3-flash-preview`) hit the cache and pass the OpenRouter-
format model slug straight to the Codex Responses API, which does not
understand it and returns an empty `response.output`.
This causes two user-visible failures:
- "Invalid API response shape" (empty output after 3 retries)
- "Context length exceeded, cannot compress further" (compression itself
fails through the same path)
Add `_compat_model()` helper that mirrors the "/" check from
`resolve_provider_client()` and call it on the cache-hit return path.
Four fixes to auxiliary_client.py:
1. Respect explicit provider as hard constraint (#7559)
When auxiliary.{task}.provider is explicitly set (not 'auto'),
connection/payment errors no longer silently fallback to cloud
providers. Local-only users (Ollama, vLLM) will no longer get
unexpected OpenRouter billing from auxiliary tasks.
2. Eliminate model='default' sentinel (#7512)
_resolve_api_key_provider() no longer sends literal 'default' as
model name to APIs. Providers without a known aux model in
_API_KEY_PROVIDER_AUX_MODELS are skipped instead of producing
model_not_supported errors.
3. Add payment/connection fallback to async_call_llm (#7512)
async_call_llm now mirrors sync call_llm's fallback logic for
payment (402) and connection errors. Previously, async consumers
(session_search, web_tools, vision) got hard failures with no
recovery. Also fixes hardcoded 'openrouter' fallback to use the
full auto-detection chain.
4. Use accurate error reason in fallback logs (#7512)
_try_payment_fallback() now accepts a reason parameter and uses
it in log messages. Connection timeouts are no longer misleadingly
logged as 'payment error'.
Closes#7559Closes#7512
The auxiliary client always calls client.chat.completions.create(),
ignoring the api_mode config flag. This breaks codex-family models
(e.g. gpt-5.3-codex) on direct OpenAI API keys, which need the
/v1/responses endpoint.
Changes:
- Expand _resolve_task_provider_model to return api_mode (5-tuple)
- Read api_mode from auxiliary.{task}.api_mode config and env vars
(AUXILIARY_{TASK}_API_MODE)
- Pass api_mode through _get_cached_client to resolve_provider_client
- Add _needs_codex_wrap/_wrap_if_needed helpers that wrap plain OpenAI
clients in CodexAuxiliaryClient when api_mode=codex_responses or
when auto-detection finds api.openai.com + codex model pattern
- Apply wrapping at all custom endpoint, named custom provider, and
API-key provider return paths
- Update test mocks for the new 5-tuple return format
Users can now set:
auxiliary:
compression:
model: gpt-5.3-codex
base_url: https://api.openai.com/v1
api_mode: codex_responses
Closes#6800
Refactor hardcoded color constants throughout the CLI to resolve from
the active skin engine, so custom themes fully control the visual
appearance.
cli.py:
- Replace _GOLD constant with _ACCENT (_SkinAwareAnsi class) that
lazily resolves response_border from the active skin
- Rename _GOLD_DEFAULT to _ACCENT_ANSI_DEFAULT
- Make _build_compact_banner() read banner_title/accent/dim from skin
- Make session resume notifications use _accent_hex()
- Make status line use skin colors (accent_color, separator_color,
label_color instead of cryptic _dim_c/_dim_c2/_accent_c/_label_c)
- Reset _ACCENT cache on /skin switch
agent/display.py:
- Replace hardcoded diff ANSI escapes with skin-aware functions:
_diff_dim(), _diff_file(), _diff_hunk(), _diff_minus(), _diff_plus()
(renamed from SCREAMING_CASE _ANSI_* to snake_case)
- Add reset_diff_colors() for cache invalidation on skin switch
Aligns MiniMax provider with official API documentation. Fixes 6 bugs:
transport mismatch (openai_chat -> anthropic_messages), credential leak
in switch_model(), prompt caching sent to non-Anthropic endpoints,
dot-to-hyphen model name corruption, trajectory compressor URL routing,
and stale doctor health check.
Also corrects context window (204,800), thinking support (manual mode),
max output (131,072), and model catalog (M2 family only on /anthropic).
Source: https://platform.minimax.io/docs/api-reference/text-anthropic-api
Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
_is_oauth_token() returned True for any key not starting with 'sk-ant-api',
which means MiniMax and Alibaba API keys were falsely treated as Anthropic
OAuth tokens. This triggered the Claude Code compatibility path:
- All tool names prefixed with mcp_ (e.g. mcp_terminal, mcp_web_search)
- System prompt injected with 'You are Claude Code' identity
- 'Hermes Agent' replaced with 'Claude Code' throughout
Fix: Make _is_oauth_token() positively identify Anthropic OAuth tokens by
their key format instead of using a broad catch-all:
- sk-ant-* (but not sk-ant-api-*) -> setup tokens, managed keys
- eyJ* -> JWTs from Anthropic OAuth flow
- Everything else -> False (MiniMax, Alibaba, etc.)
Reported by stefan171.
GPT-5+ models (except gpt-5-mini) are only accessible via the Responses
API on Copilot. When these models were configured as the compression
summary_model (or any auxiliary task), the plain OpenAI client sent them
to /chat/completions which returned a 400 error:
model "gpt-5.4-mini" is not accessible via the /chat/completions endpoint
resolve_provider_client() now checks _should_use_copilot_responses_api()
for the copilot provider and wraps the client in CodexAuxiliaryClient
when needed, routing calls through responses.stream() transparently.
Adds tests for both the wrapping (gpt-5.4-mini) and non-wrapping
(gpt-4.1-mini) paths.
Follow-up fixes for the context engine plugin slot (PR #5700):
- Enhance ContextEngine ABC: add threshold_percent, protect_first_n,
protect_last_n as class attributes; complete update_model() default
with threshold recalculation; clarify on_session_end() lifecycle docs
- Add ContextCompressor.update_model() override for model/provider/
base_url/api_key updates
- Replace all direct compressor internal access in run_agent.py with
ABC interface: switch_model(), fallback restore, context probing
all use update_model() now; _context_probed guarded with getattr/
hasattr for plugin engine compatibility
- Create plugins/context_engine/ directory with discovery module
(mirrors plugins/memory/ pattern) — discover_context_engines(),
load_context_engine()
- Add context.engine config key to DEFAULT_CONFIG (default: compressor)
- Config-driven engine selection in run_agent.__init__: checks config,
then plugins/context_engine/<name>/, then general plugin system,
falls back to built-in ContextCompressor
- Wire on_session_end() in shutdown_memory_provider() at real session
boundaries (CLI exit, /reset, gateway expiry)
- PluginContext.register_context_engine() lets plugins replace the
built-in ContextCompressor with a custom ContextEngine implementation
- PluginManager stores the registered engine; only one allowed
- run_agent.py checks for a plugin engine at init before falling back
to the default ContextCompressor
- reset_session_state() now calls engine.on_session_reset() instead of
poking internal attributes directly
- ContextCompressor.on_session_reset() handles its own internals
(_context_probed, _previous_summary, etc.)
- 19 new tests covering ABC contract, defaults, plugin slot registration,
rejection of duplicates/non-engines, and compressor reset behavior
- All 34 existing compressor tests pass unchanged
Introduces agent/context_engine.py — an abstract base class that defines
the pluggable context engine interface. ContextCompressor now inherits
from ContextEngine as the default implementation.
No behavior change. All 34 existing compressor tests pass.
This is the foundation for a context engine plugin slot, enabling
third-party engines like LCM (Lossless Context Management) to replace
the built-in compressor via the plugin system.
When two gateway messages arrived concurrently, _set_session_env wrote
HERMES_SESSION_PLATFORM/CHAT_ID/CHAT_NAME/THREAD_ID into the process-global
os.environ. Because asyncio tasks share the same process, Message B would
overwrite Message A's values mid-flight, causing background-task notifications
and tool calls to route to the wrong thread/chat.
Replace os.environ with Python's contextvars.ContextVar. Each asyncio task
(and any run_in_executor thread it spawns) gets its own copy, so concurrent
messages never interfere.
Changes:
- New gateway/session_context.py with ContextVar definitions, set/clear/get
helpers, and os.environ fallback for CLI/cron/test backward compatibility
- gateway/run.py: _set_session_env returns reset tokens, _clear_session_env
accepts them for proper cleanup in finally blocks
- All tool consumers updated: cronjob_tools, send_message_tool, skills_tool,
terminal_tool (both notify_on_complete AND check_interval blocks), tts_tool,
agent/skill_utils, agent/prompt_builder
- Tests updated for new contextvar-based API
Fixes#7358
Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
Adds xAI as a first-class provider: ProviderConfig in auth.py,
HermesOverlay in providers.py, 11 curated Grok models, URL mapping
in model_metadata.py, aliases (x-ai, x.ai), and env var tests.
Uses standard OpenAI-compatible chat completions.
Closes#7050
- Remove sys.path.insert hack (leftover from standalone dev)
- Add token lock (acquire_scoped_lock/release_scoped_lock) in
connect()/disconnect() to prevent duplicate pollers across profiles
- Fix get_connected_platforms: WEIXIN check must precede generic
token/api_key check (requires both token AND account_id)
- Add WEIXIN_HOME_CHANNEL_NAME to _EXTRA_ENV_KEYS
- Add gateway setup wizard with QR login flow
- Add platform status check for partially configured state
- Add weixin.md docs page with full adapter documentation
- Update environment-variables.md reference with all 11 env vars
- Update sidebars.ts to include weixin docs page
- Wire all gateway integration points onto current main
Salvaged from PR #6747 by Zihan Huang.
Port from anomalyco/opencode#21355: Alibaba's DashScope API returns a
unique throttling message ('Request rate increased too quickly...') that
doesn't match standard rate-limit patterns ('rate limit', 'too many
requests'). This caused Alibaba errors to fall through to the 'unknown'
category rather than being properly classified as rate_limit with
appropriate backoff/rotation.
Add 'rate increased too quickly' to _RATE_LIMIT_PATTERNS and test with
the exact error message observed from the Alibaba provider.
_resolve_api_key_provider() now checks is_provider_explicitly_configured
before calling _try_anthropic(). Previously, any auxiliary fallback
(e.g. when kimi-coding key was invalid) would silently discover and use
Claude Code OAuth tokens — consuming the user's Claude Max subscription
without their knowledge.
This is the auxiliary-client counterpart of the setup-wizard gate in
PR #4210.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previously, removing a claude_code credential from the anthropic pool
only printed a note — the next load_pool() re-seeded it from
~/.claude/.credentials.json. Now writes a 'suppressed_sources' flag
to auth.json that _seed_from_singletons checks before seeding.
Follows the pattern of env: source removal (clears .env var) and
device_code removal (clears auth store state).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
_seed_from_singletons('anthropic') now checks
is_provider_explicitly_configured('anthropic') before reading
~/.claude/.credentials.json. Without this, the auxiliary client
fallback chain silently discovers and uses Claude Code tokens when
the user's primary provider key is invalid — consuming their Claude
Max subscription quota without consent.
Follows the same gating pattern as PR #4210 (setup wizard gate)
but applied to the credential pool seeding path.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Automated dead code audit using vulture + coverage.py + ast-grep intersection,
confirmed by Opus deep verification pass. Every symbol verified to have zero
production callers (test imports excluded from reachability analysis).
Removes ~1,534 lines of dead production code across 46 files and ~1,382 lines
of stale test code. 3 entire files deleted (agent/builtin_memory_provider.py,
hermes_cli/checklist.py, tests/hermes_cli/test_setup_model_selection.py).
Co-authored-by: alt-glitch <balyan.sid@gmail.com>
prompt_builder.py: The `hidden_div` detection pattern uses `.*` which does not
match newlines in Python regex (re.DOTALL is not passed). An attacker can bypass
detection by splitting the style attribute across lines:
`<div style="color:red;\ndisplay: none">injected content</div>`
Replace `.*` with `[\s\S]*?` to match across line boundaries.
credential_files.py: `_load_config_files()` catches all exceptions at DEBUG level
(line 171), making YAML parse failures invisible in production logs. Users whose
credential files silently fail to mount into sandboxes have no diagnostic clue.
Promote to WARNING to match the severity pattern used by the path validation
warnings at lines 150 and 158 in the same function.
webhook.py: `_reload_dynamic_routes()` logs JSON parse failures at WARNING (line
265) but the impact — stale/corrupted dynamic routes persisting silently — warrants
ERROR level to ensure operator visibility in alerting pipelines.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
error_classifier.py: Message-only auth errors ("invalid api key", "unauthorized",
etc.) were classified as retryable=True (line 707), inconsistent with the HTTP 401
path (line 432) which correctly uses retryable=False + should_fallback=True. The
mismatch causes 3 wasted retries with the same broken credential before fallback,
while 401 errors immediately attempt fallback. Align the message-based path to
match: retryable=False, should_fallback=True.
web_tools.py: The _PREFIX_RE secret-detection check in web_extract_tool() runs
against the raw URL string (line 1196). URL-encoded secrets like %73k-1234... (
sk-1234...) bypass the filter because the regex expects literal ASCII. Add
urllib.parse.unquote() before the check so percent-encoded variants are also caught.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
xAI /v1/models does not return context_length metadata, so Hermes
probes down to the 128k default whenever a user configures a custom
provider pointing at https://api.x.ai/v1. This forces every xAI user
to manually override model.context_length in config.yaml (2M for
Grok 4.20 / 4.1-fast / 4-fast) or lose most of the usable context
window.
Add DEFAULT_CONTEXT_LENGTHS entries for the Grok family so the
fallback lookup returns the correct value via substring matching.
Values sourced from models.dev (2026-04) and cross-checked against
the xAI /v1/models listing:
- grok-4.20-* 2,000,000 (reasoning, non-reasoning, multi-agent)
- grok-4-1-fast-* 2,000,000
- grok-4-fast-* 2,000,000
- grok-4 / grok-4-0709 256,000
- grok-code-fast-1 256,000
- grok-3* 131,072
- grok-2 / latest 131,072
- grok-2-vision* 8,192
- grok (catch-all) 131,072
Keys are ordered longest-first so that specific variants match before
the catch-all, consistent with the existing Claude/Gemma/MiniMax entries.
Add TestDefaultContextLengths.test_grok_models_context_lengths and
test_grok_substring_matching to pin the values and verify the full
lookup path. All 77 tests in test_model_metadata.py pass.
Auth errors matched by message pattern were incorrectly marked retryable=True, causing futile retry loops. Aligns with _classify_by_status() which already sets retryable=False for 401/403. Fixes#7026. Contributed by @kuishou68.
The hardcoded User-Agent 'KimiCLI/1.3' is outdated — Kimi CLI is now at
v1.30.0. The stale version string causes intermittent 403 errors from
Kimi's coding endpoint ('only available for Coding Agents').
Update all 8 occurrences across run_agent.py, auxiliary_client.py, and
doctor.py to 'KimiCLI/1.30.0' to match the current official Kimi CLI.
Extends the /fast command to support Anthropic's Fast Mode beta in addition
to OpenAI Priority Processing. When enabled on Claude Opus 4.6, adds
speed:"fast" and the fast-mode-2026-02-01 beta header to API requests for
~2.5x faster output token throughput.
Changes:
- hermes_cli/models.py: Add _ANTHROPIC_FAST_MODE_MODELS registry,
model_supports_fast_mode() now recognizes Claude Opus 4.6,
resolve_fast_mode_overrides() returns {speed: fast} for Anthropic
vs {service_tier: priority} for OpenAI
- agent/anthropic_adapter.py: Add _FAST_MODE_BETA constant,
build_anthropic_kwargs() accepts fast_mode=True which injects
speed:fast + beta header via extra_headers (skipped for third-party
Anthropic-compatible endpoints like MiniMax)
- run_agent.py: Pass fast_mode to build_anthropic_kwargs in the
anthropic_messages path of _build_api_kwargs()
- cli.py: Update _handle_fast_command with provider-aware messaging
(shows 'Anthropic Fast Mode' vs 'Priority Processing')
- hermes_cli/commands.py: Update /fast description to mention both
providers
- tests: 13 new tests covering Anthropic model detection, override
resolution, CLI availability, routing, adapter kwargs, and
third-party endpoint safety
When the model mentions <think> as literal text in its response (e.g.
"(/think not producing <think> tags)"), the streaming display treated it
as a reasoning block opener and suppressed everything after it. The
response box would close with truncated content and no error — the API
response was complete but the display ate it.
Root cause: _stream_delta() matched <think> anywhere in the text stream
regardless of position. Real reasoning blocks always start at the
beginning of a line; mentions in prose appear mid-sentence.
Fix: track line position across streaming deltas with a
_stream_last_was_newline flag. Only enter reasoning suppression when
the tag appears at a block boundary (start of stream, after a newline,
or after only whitespace on the current line). Add a _flush_stream()
safety net that recovers buffered content if no closing tag is found
by end-of-stream.
Also fixes three related issues discovered during investigation:
- anthropic_adapter: _get_anthropic_max_output() now normalizes dots to
hyphens so 'claude-opus-4.6' matches the 'claude-opus-4-6' table key
(was returning 32K instead of 128K)
- run_agent: send explicit max_tokens for Claude models on Nous Portal,
same as OpenRouter — both proxy to Anthropic's API which requires it.
Without it the backend defaults to a low limit that truncates responses.
- run_agent: reset truncated_tool_call_retries after successful tool
execution so a single truncation doesn't poison the entire conversation.
The Codex retry block and valid-token short-circuit in _refresh_entry()
both return early, bypassing the auth.json sync at the end of the method.
This adds _sync_device_code_entry_to_auth_store() calls on both paths
so refreshed/synced tokens are written back to auth.json regardless of
which code path succeeds.
MiniMax's Anthropic-compatible endpoints reject requests that include
the fine-grained-tool-streaming beta header — every tool-use message
triggers a connection error (~18s timeout). Regular chat works fine.
Add _common_betas_for_base_url() that filters out the tool-streaming
beta for Bearer-auth (MiniMax) endpoints while keeping all other betas.
All four client-construction branches now use the filtered list.
Based on #6528 by @HiddenPuppy.
Original cherry-picked from PR #6688 by kshitijk4poor.
Fixes#6510, fixes#6555.
_classify_by_message had no handling for _USAGE_LIMIT_PATTERNS, so
messages like 'usage limit exceeded, try again in 5 minutes' arriving
without an HTTP status code fell through to FailoverReason.unknown
instead of rate_limit.
Apply the same billing/rate-limit disambiguation that _classify_402
already uses: USAGE_LIMIT_PATTERNS + transient signal → rate_limit,
USAGE_LIMIT_PATTERNS alone → billing.
Add 4 tests covering the no-status-code usage-limit path.
When _generate_summary() failed (no provider, timeout, model error),
the compressor silently dropped all middle turns with just a debug
log. The agent would then see head + tail with no explanation of the
gap, causing total context amnesia (generic greetings instead of
continuing the conversation).
Now generates a static fallback marker that tells the model context
was lost and to continue from the recent tail messages. The fallback
flows through the same role-alternation logic as a real summary so
message structure stays valid.
Step 1 of _resolve_auto() explicitly excluded 'custom' providers,
forcing custom endpoint users through the fragile fallback chain
instead of using their known-working main model credentials.
This caused silent compression failures for users on local OpenAI-
compatible endpoints — the summary generation would fail, middle
turns would be silently dropped, and the agent would lose all
conversation context.
Remove 'custom' from the exclusion list so custom endpoint users
get the same main-model-first treatment as DeepSeek, Anthropic,
Gemini, and other direct providers.
When the API returns "max_tokens too large given prompt" (input tokens
are within the context window, but input + requested output > window),
the old code incorrectly routed through the same handler as "prompt too
long" errors, calling get_next_probe_tier() and permanently halving
context_length. This made things worse: the window was fine, only the
requested output size needed trimming for that one call.
Two distinct error classes now handled separately:
Prompt too long — input itself exceeds context window.
Fix: compress history + halve context_length (existing behaviour,
unchanged).
Output cap too large — input OK, but input + max_tokens > window.
Fix: parse available_tokens from the error message, set a one-shot
_ephemeral_max_output_tokens override for the retry, and leave
context_length completely untouched.
Changes:
- agent/model_metadata.py: add parse_available_output_tokens_from_error()
that detects Anthropic's "available_tokens: N" error format and returns
the available output budget, or None for all other error types.
- run_agent.py: call the new parser first in the is_context_length_error
block; if it fires, set _ephemeral_max_output_tokens (with a 64-token
safety margin) and break to retry without touching context_length.
_build_api_kwargs consumes the ephemeral value exactly once then clears
it so subsequent calls use self.max_tokens normally.
- agent/anthropic_adapter.py: expand build_anthropic_kwargs docstring to
clearly document the max_tokens (output cap) vs context_length (total
window) distinction, which is a persistent source of confusion due to
the OpenAI-inherited "max_tokens" name.
- cli-config.yaml.example: add inline comments explaining both keys side
by side where users are most likely to look.
- website/docs/integrations/providers.md: add a callout box at the top
of "Context Length Detection" and clarify the troubleshooting entry.
- tests/test_ctx_halving_fix.py: 24 tests across four classes covering
the parser, build_anthropic_kwargs clamping, ephemeral one-shot
consumption, and the invariant that context_length is never mutated
on output-cap errors.
The error classifier's generic-400 heuristic only extracted err_body_msg from
the nested body structure (body['error']['message']), missing the flat body
format used by OpenAI's Responses API (body['message']). This caused
descriptive 400 errors like 'Invalid input[index].name: string does not match
pattern' to appear generic when the session was large, misclassifying them as
context overflow and triggering an infinite compression loop.
Added flat-body fallback in _classify_400() consistent with the parent
classify_api_error() function's existing handling at line 297-298.
When is explicitly set to ,
the custom-endpoint path in creates a plain
client without provider-specific headers. This means sync vision calls (e.g.
) use the generic User-Agent and get rejected by
Kimi's coding endpoint with a 403:
'Kimi For Coding is currently only available for Coding Agents such as Kimi CLI...'
The async converter already injects , and the
auto-detected API-key provider path also injects it, but the explicit custom
endpoint shortcut was missing it entirely.
This patch adds the same injection to the custom endpoint
branch, and updates all existing Kimi header sites to for
consistency.
Fixes <issue number to be filled in>
The credential pool seeder (_seed_from_env) hardcoded the base URL
for API-key providers without running provider-specific auto-detection.
For kimi-coding, this caused sk-kimi- prefixed keys to be seeded with
the legacy api.moonshot.ai/v1 endpoint instead of api.kimi.com/coding/v1,
resulting in HTTP 401 on the first request.
Import and call _resolve_kimi_base_url for kimi-coding so the pool
uses the correct endpoint based on the key prefix, matching the
runtime credential resolver behavior.
Also fix a comment: sk-kimi- keys are issued by kimi.com/code,
not platform.kimi.ai.
Fixes#5561
Two bugs in the model fallback system:
1. Nous login leaves stale model in config (provider=nous, model=opus
from previous OpenRouter setup). Fixed by deferring the config.yaml
provider write until AFTER model selection completes, and passing the
selected model atomically via _update_config_for_provider's
default_model parameter. Previously, _update_config_for_provider was
called before model selection — if selection failed (free tier, no
models, exception), config stayed as nous+opus permanently.
2. Codex/stale providers in auxiliary fallback can't connect but block
the auto-detection chain. Added _is_connection_error() detection
(APIConnectionError, APITimeoutError, DNS failures, connection
refused) alongside the existing _is_payment_error() check in
call_llm(). When a provider endpoint is unreachable, the system now
falls back to the next available provider instead of crashing.
Parse x-ratelimit-* headers from inference API responses (Nous Portal,
OpenRouter, OpenAI-compatible) and display them in the /usage command.
- New agent/rate_limit_tracker.py: parse 12 rate limit headers (RPM/RPH/
TPM/TPH limits, remaining, reset timers), format as progress bars (CLI)
or compact one-liner (gateway)
- Hook into streaming path in run_agent.py: stream.response.headers is
available on the OpenAI SDK Stream object before chunks are consumed
- CLI /usage: appends rate limit section with progress bars + warnings
when any bucket exceeds 80%
- Gateway /usage: appends compact rate limit summary
- 24 unit tests covering parsing, formatting, edge cases
Headers captured per response:
x-ratelimit-{limit,remaining,reset}-{requests,tokens}{,-1h}
Example CLI display:
Nous Rate Limits (captured just now):
Requests/min [░░░░░░░░░░░░░░░░░░░░] 0.1% 1/800 used (799 left, resets in 59s)
Tokens/hr [░░░░░░░░░░░░░░░░░░░░] 0.0% 49/336.0M (336.0M left, resets in 52m)
Wrap is_dir() in _is_valid_subdir() and is_file() in
_load_hints_for_directory() with OSError handlers so that
inaccessible directories (e.g. /root from a non-root Daytona
host user) are silently skipped instead of crashing the agent.
The existing PermissionError PRs for prompt_builder.py (#6247,
#6321, #6355) do not cover subdirectory_hints.py, which was
identified as a separate crash path in the #6214 comments.
Ref: #6214
The 24-hour default cooldown for 402-exhausted credentials was far too
aggressive — if a user tops up credits or the 402 was caused by an
oversized max_tokens request rather than true billing exhaustion, they
shouldn't have to wait a full day. Reduce to 1 hour (matching the
existing 429 TTL).
Inspired by PR #6493 (michalkomar).
Two issues resolved:
1. Add opencode.ai to _URL_TO_PROVIDER mapping so base_url routes through
models.dev lookup (which has mimo-v2-pro at 1M context) instead of
falling back to probing /models (404) and defaulting to 128K.
2. Fix _format_context_length to round cleanly: 1048576 → '1M' instead
of '1.048576M'. Applies same rounding logic to K values.
Tail protection was effectively message-count based despite having a
token budget, because protect_last_n=20 acted as a hard floor. A single
50K-token tool output would cause all 20 recent messages to be
preserved regardless of budget, leaving little room for summarization.
Changes:
- _find_tail_cut_by_tokens: min_tail reduced from protect_last_n (20)
to 3; token budget is now the primary criterion
- Soft ceiling at 1.5x budget to avoid cutting mid-oversized-message
- _prune_old_tool_results: accepts optional protect_tail_tokens so
pruning also respects the token budget instead of a fixed count
- compress() minimum message check relaxed from protect_first_n +
protect_last_n + 1 to protect_first_n + 3 + 1
- Tool group alignment (no splitting tool_call/result) preserved
Three targeted improvements to the compression system:
1. Replace hardcoded truncation limits with named class constants
(_CONTENT_MAX=6000, _CONTENT_HEAD=4000, _CONTENT_TAIL=1500,
_TOOL_ARGS_MAX=1500, _TOOL_ARGS_HEAD=1200). Previous limits
(3000/500) heavily truncated the summarizer's input — a 200-line
edit got cut to 3000 chars before the summarizer ever saw it.
2. Add '## Tools & Patterns' section to both compression prompt
templates (first-pass and iterative). Preserves working tool
invocations, preferred flags, and tool-specific discoveries
across compaction boundaries.
3. Warn users on 2nd+ compression: 'Session compressed N times —
accuracy may degrade. Consider /new to start fresh.'
Ref #499
Two linked fixes for MiniMax Anthropic-compatible fallback:
1. Normalize httpx.URL to str before calling .rstrip() in auth/provider
detection helpers. Some client objects expose base_url as httpx.URL,
not str — crashed with AttributeError in _requires_bearer_auth() and
_is_third_party_anthropic_endpoint(). Also fixes _try_activate_fallback()
to use the already-stringified fb_base_url instead of raw httpx.URL.
2. Strip Anthropic-proprietary thinking block signatures when targeting
third-party Anthropic-compatible endpoints (MiniMax, Azure AI Foundry,
self-hosted proxies). These endpoints cannot validate Anthropic's
signatures and reject them with HTTP 400 'Invalid signature in
thinking block'. Now threads base_url through convert_messages_to_anthropic()
→ build_anthropic_kwargs() so signature management is endpoint-aware.
Based on PR #4945 by kshitijk4poor (rstrip fix).
Fixes#4944.
Fixes 9 test failures on current main, incorporating ideas from PR stack
#6219-#6222 by xinbenlv with corrections:
- model_metadata: sync HF context length key casing
(minimaxai/minimax-m2.5 → MiniMaxAI/MiniMax-M2.5)
- cli.py: route quick command error output through self.console
instead of creating a new ChatConsole() instance
- docker.py: explicit docker_forward_env entries now bypass the
Hermes secret blocklist (intentional opt-in wins over generic filter)
- auxiliary_client: revert _read_main_provider() to simple
provider.strip().lower() — the _normalize_aux_provider() call
introduced in 5c03f2e7 stripped the custom: prefix, breaking
named custom provider resolution
- auxiliary_client: flip vision auto-detection order to
active provider → OpenRouter → Nous → stop (was OR → Nous → active)
- test: update vision priority test to match new order
Based on PR #6219-#6222 by xinbenlv.
- Add HERMES_QWEN_BASE_URL to OPTIONAL_ENV_VARS in config.py (was missing
despite being referenced in code)
- Remove redundant qwen-oauth entry from _API_KEY_PROVIDER_AUX_MODELS
(non-aggregator providers use their main model for aux tasks automatically)
Based on #6079 by @tunamitom with critical fixes and comprehensive tests.
Changes from #6079:
- Fix: sanitization overwrite bug — Qwen message prep now runs AFTER codex
field sanitization, not before (was silently discarding Qwen transforms)
- Fix: missing try/except AuthError in runtime_provider.py — stale Qwen
credentials now fall through to next provider on auto-detect
- Fix: 'qwen' alias conflict — bare 'qwen' stays mapped to 'alibaba'
(DashScope); use 'qwen-portal' or 'qwen-cli' for the OAuth provider
- Fix: hardcoded ['coder-model'] replaced with live API fetch + curated
fallback list (qwen3-coder-plus, qwen3-coder)
- Fix: extract _is_qwen_portal() helper + _qwen_portal_headers() to replace
5 inline 'portal.qwen.ai' string checks and share headers between init
and credential swap
- Fix: add Qwen branch to _apply_client_headers_for_base_url for mid-session
credential swaps
- Fix: remove suspicious TypeError catch blocks around _prompt_provider_choice
- Fix: handle bare string items in content lists (were silently dropped)
- Fix: remove redundant dict() copies after deepcopy in message prep
- Revert: unrelated ai-gateway test mock removal and model_switch.py comment deletion
New tests (30 test functions):
- _qwen_cli_auth_path, _read_qwen_cli_tokens (success + 3 error paths)
- _save_qwen_cli_tokens (roundtrip, parent creation, permissions)
- _qwen_access_token_is_expiring (5 edge cases: fresh, expired, within skew,
None, non-numeric)
- _refresh_qwen_cli_tokens (success, preserve old refresh, 4 error paths,
default expires_in, disk persistence)
- resolve_qwen_runtime_credentials (fresh, auto-refresh, force-refresh,
missing token, env override)
- get_qwen_auth_status (logged in, not logged in)
- Runtime provider resolution (direct, pool entry, alias)
- _build_api_kwargs (metadata, vl_high_resolution_images, message formatting,
max_tokens suppression)
Hermes Agent identified and patched its own prompting blind spots through
automated self-evaluation — running 64+ tool-use benchmarks across GPT-5.4
and Codex-5.3, diagnosing 5 failure modes, writing targeted prompt patches,
and verifying the fix in a closed loop.
Failure modes discovered and fixed:
- Mental arithmetic (wrong answers: 39,152,053 vs correct 39,151,253)
- User profile hallucination ('Windows 11' when running on Linux)
- Time guessing without verification
- Clarification-seeking instead of acting ('open where?' for port checks)
- Hash computation from memory (SHA-256, encodings)
- Confusing system RAM with agent's own persistent memory store
Two new XML sections added to OPENAI_MODEL_EXECUTION_GUIDANCE:
- <mandatory_tool_use>: explicit categories that must always use tools
- <act_dont_ask>: default to action on obvious interpretations
Results:
gpt-5.4: 68.8% → 100% tool compliance (+31.2pp)
gpt-5.3-codex: 62.5% → 100% tool compliance (+37.5pp)
Regression: 0/8 conversational prompts over-tooled
Anthropic signs thinking blocks against the full turn content. Any
upstream mutation (context compression, session truncation, orphan
stripping, message merging) invalidates the signature, causing HTTP 400
'Invalid signature in thinking block' — especially in long-lived
gateway sessions.
Strategy (following clawdbot/OpenClaw pattern):
1. Strip thinking/redacted_thinking from all assistant messages EXCEPT
the last one — preserves reasoning continuity on the current
tool-use chain while avoiding stale signature errors on older turns.
2. Downgrade unsigned thinking blocks to plain text — Anthropic can't
validate them, but the reasoning content is preserved.
3. Strip cache_control from thinking/redacted_thinking blocks to
prevent cache markers from interfering with signature validation.
4. Drop thinking blocks from the second message when merging
consecutive assistant messages (role alternation enforcement).
5. Error recovery: on HTTP 400 mentioning 'signature' and 'thinking',
strip all reasoning_details from the conversation and retry once.
This is the safety net for edge cases the proactive stripping
misses.
Addresses the issue reported in PR #6086 by @mingginwan while
preserving reasoning continuity (their PR stripped ALL thinking
blocks unconditionally).
Files changed:
- agent/anthropic_adapter.py: thinking block management in
convert_messages_to_anthropic (strip old turns, downgrade unsigned,
strip cache_control, merge-time strip)
- run_agent.py: one-shot signature error recovery in retry loop
- tests/test_anthropic_adapter.py: 10 new tests covering all cases
Simplify the vision auto-detection chain from 5 backends (openrouter,
nous, codex, anthropic, custom) down to 3:
1. OpenRouter (known vision-capable default model)
2. Nous Portal (known vision-capable default model)
3. Active provider + model (whatever the user is running)
4. Stop
This is simpler and more predictable. The active provider step uses
resolve_provider_client() which handles all provider types including
named custom providers (from #5978).
Removed the complex preferred-provider promotion logic and API-level
fallback — the chain is short enough that it doesn't need them.
Based on PR #5376 by Mibay. Closes#5366.
Salvaged fixes from community PRs:
- fix(model_switch): _read_auth_store → _load_auth_store + fix auth store
key lookup (was checking top-level dict instead of store['providers']).
OAuth providers now correctly detected in /model picker.
Cherry-picked from PR #5911 by Xule Lin (linxule).
- fix(ollama): pass num_ctx to override 2048 default context window.
Ollama defaults to 2048 context regardless of model capabilities. Now
auto-detects from /api/show metadata and injects num_ctx into every
request. Config override via model.ollama_num_ctx. Fixes#2708.
Cherry-picked from PR #5929 by kshitij (kshitijk4poor).
- fix(aux): normalize provider aliases for vision/auxiliary routing.
Adds _normalize_aux_provider() with 17 aliases (google→gemini,
claude→anthropic, glm→zai, etc). Fixes vision routing failure when
provider is set to 'google' instead of 'gemini'.
Cherry-picked from PR #5793 by e11i (Elizabeth1979).
- fix(aux): rewrite MiniMax /anthropic base URLs to /v1 for OpenAI SDK.
MiniMax's inference_base_url ends in /anthropic (Anthropic Messages API),
but auxiliary client uses OpenAI SDK which appends /chat/completions →
404 at /anthropic/chat/completions. Generic _to_openai_base_url() helper
rewrites terminal /anthropic to /v1 for OpenAI-compatible endpoint.
Inspired by PR #5786 by Lempkey.
Added debug logging to silent exception blocks across all fixes.
Co-authored-by: Hermes Agent <hermes@nousresearch.com>
Free-tier Nous Portal users were getting mimo-v2-omni (a multimodal
model) for all auxiliary tasks including compression, session search,
and web extraction. Now routes non-vision tasks to mimo-v2-pro (a
text model) which is better suited for those workloads.
- Added _NOUS_FREE_TIER_AUX_MODEL constant for text auxiliary tasks
- _try_nous() accepts vision=False param to select the right model
- Vision path (_resolve_strict_vision_backend) passes vision=True
- All other callers default to vision=False → mimo-v2-pro
* fix(telegram): replace substring caption check with exact line-by-line match
Captions in photo bursts and media group albums were silently dropped when
a shorter caption happened to be a substring of an existing one (e.g.
"Meeting" lost inside "Meeting agenda"). Extract a shared _merge_caption
static helper that splits on "\n\n" and uses exact match with whitespace
normalisation, then use it in both _enqueue_photo_event and
_queue_media_group_event.
Adds 13 unit tests covering the fixed bug scenarios.
Cherry-picked from PR #2671 by Dilee.
* fix: extend caption substring fix to all platforms
Move _merge_caption helper from TelegramAdapter to BasePlatformAdapter
so all adapters inherit it. Fix the same substring-containment bug in:
- gateway/platforms/base.py (photo burst merging)
- gateway/run.py (priority photo follow-up merging)
- gateway/platforms/feishu.py (media batch merging)
The original fix only covered telegram.py. The same bug existed in base.py
and run.py (pure substring check) and feishu.py (list membership without
whitespace normalization).
* fix(auxiliary): resolve named custom providers and 'main' alias in auxiliary routing
Two bugs caused auxiliary tasks (vision, compression, etc.) to fail when
using named custom providers defined in config.yaml:
1. 'provider: main' was hardcoded to 'custom', which only checks legacy
OPENAI_BASE_URL env vars. Now reads _read_main_provider() to resolve
to the actual provider (e.g., 'custom:beans', 'openrouter', 'deepseek').
2. Named custom provider names (e.g., 'beans') fell through to
PROVIDER_REGISTRY which doesn't know about config.yaml entries.
Now checks _get_named_custom_provider() before the registry fallback.
Fixes both resolve_provider_client() and _normalize_vision_provider()
so the fix covers all auxiliary tasks (vision, compression, web_extract,
session_search, etc.).
Adds 13 unit tests. Reported by Laura via Discord.
---------
Co-authored-by: Dilee <uzmpsk.dilekakbas@gmail.com>
16 callsites across 14 files were re-deriving the hermes home path
via os.environ.get('HERMES_HOME', ...) instead of using the canonical
get_hermes_home() from hermes_constants. This breaks profiles — each
profile has its own HERMES_HOME, and the inline fallback defaults to
~/.hermes regardless.
Fixed by importing and calling get_hermes_home() at each site. For
files already inside the hermes process (agent/, hermes_cli/, tools/,
gateway/, plugins/), this is always safe. Files that run outside the
process context (mcp_serve.py, mcp_oauth.py) already had correct
try/except ImportError fallbacks and were left alone.
Skipped: hermes_constants.py (IS the implementation), env_loader.py
(bootstrap), profiles.py (intentionally manipulates the env var),
standalone scripts (optional-skills/, skills/), and tests.
Comprehensive cleanup across 80 files based on automated (ruff, pyflakes, vulture)
and manual analysis of the entire codebase.
Changes by category:
Unused imports removed (~95 across 55 files):
- Removed genuinely unused imports from all major subsystems
- agent/, hermes_cli/, tools/, gateway/, plugins/, cron/
- Includes imports in try/except blocks that were truly unused
(vs availability checks which were left alone)
Unused variables removed (~25):
- Removed dead variables: connected, inner, channels, last_exc,
source, new_server_names, verify, pconfig, default_terminal,
result, pending_handled, temperature, loop
- Dropped unused argparse subparser assignments in hermes_cli/main.py
(12 instances of add_parser() where result was never used)
Dead code removed:
- run_agent.py: Removed dead ternary (None if False else None) and
surrounding unreachable branch in identity fallback
- run_agent.py: Removed write-only attribute _last_reported_tool
- hermes_cli/providers.py: Removed dead @property decorator on
module-level function (decorator has no effect outside a class)
- gateway/run.py: Removed unused MCP config load before reconnect
- gateway/platforms/slack.py: Removed dead SessionSource construction
Undefined name bugs fixed (would cause NameError at runtime):
- batch_runner.py: Added missing logger = logging.getLogger(__name__)
- tools/environments/daytona.py: Added missing Dict and Path imports
Unnecessary global statements removed (14):
- tools/terminal_tool.py: 5 functions declared global for dicts
they only mutated via .pop()/[key]=value (no rebinding)
- tools/browser_tool.py: cleanup thread loop only reads flag
- tools/rl_training_tool.py: 4 functions only do dict mutations
- tools/mcp_oauth.py: only reads the global
- hermes_time.py: only reads cached values
Inefficient patterns fixed:
- startswith/endswith tuple form: 15 instances of
x.startswith('a') or x.startswith('b') consolidated to
x.startswith(('a', 'b'))
- len(x)==0 / len(x)>0: 13 instances replaced with pythonic
truthiness checks (not x / bool(x))
- in dict.keys(): 5 instances simplified to in dict
- Redefined unused name: removed duplicate _strip_mdv2 import in
send_message_tool.py
Other fixes:
- hermes_cli/doctor.py: Replaced undefined logger.debug() with pass
- hermes_cli/config.py: Consolidated chained .endswith() calls
Test results: 3934 passed, 17 failed (all pre-existing on main),
19 skipped. Zero regressions.
- Show pricing during initial Nous Portal login (was missing from
_login_nous, only shown in the already-logged-in hermes model path)
- Filter free models for paid subscribers: non-allowlisted free models
are hidden; allowlisted models (xiaomi/mimo-v2-pro, xiaomi/mimo-v2-omni)
only appear when actually priced as free
- Detect free-tier accounts via portal api/oauth/account endpoint
(monthly_charge == 0); free-tier users see only free models as
selectable, with paid models shown dimmed and unselectable
- Use xiaomi/mimo-v2-omni as the auxiliary vision model for free-tier
Nous users so vision_analyze and browser_vision work without paid
model access (replaces the default google/gemini-3-flash-preview)
- Unavailable models rendered via print() before TerminalMenu to avoid
simple_term_menu line-width padding artifacts; upgrade URL resolved
from auth state portal_base_url (supports staging/custom portals)
- Add 21 tests covering filter_nous_free_models, is_nous_free_tier,
and partition_nous_models_by_tier
* feat: switch managed browser provider from Browserbase to Browser Use
The Nous subscription tool gateway now routes browser automation through
Browser Use instead of Browserbase. This commit:
- Adds managed Nous gateway support to BrowserUseProvider (idempotency
keys, X-BB-API-Key auth header, external_call_id persistence)
- Removes managed gateway support from BrowserbaseProvider (now
direct-only via BROWSERBASE_API_KEY/BROWSERBASE_PROJECT_ID)
- Updates browser_tool.py fallback: prefers Browser Use over Browserbase
- Updates nous_subscription.py: gateway vendor 'browser-use', auto-config
sets cloud_provider='browser-use' for new subscribers
- Updates tools_config.py: Nous Subscription entry now uses Browser Use
- Updates setup.py, cli.py, status.py, prompt_builder.py display strings
- Updates all affected tests to match new behavior
Browserbase remains fully functional for users with direct API credentials.
The change only affects the managed/subscription path.
* chore: remove redundant Browser Use hint from system prompt
* fix: upgrade Browser Use provider to v3 API
- Base URL: api/v2 -> api/v3 (v2 is legacy)
- Unified all endpoints to use native Browser Use paths:
- POST /browsers (create session, returns cdpUrl)
- PATCH /browsers/{id} with {action: stop} (close session)
- Removed managed-mode branching that used Browserbase-style
/v1/sessions paths — v3 gateway now supports /browsers directly
- Removed unused managed_mode variable in close_session
* fix(browser-use): use X-Browser-Use-API-Key header for managed mode
The managed gateway expects X-Browser-Use-API-Key, not X-BB-API-Key
(which is a Browserbase-specific header). Using the wrong header caused
a 401 AUTH_ERROR on every managed-mode browser session create.
Simplified _headers() to always use X-Browser-Use-API-Key regardless
of direct vs managed mode.
* fix(nous_subscription): browserbase explicit provider is direct-only
Since managed Nous gateway now routes through Browser Use, the
browserbase explicit provider path should not check managed_browser_available
(which resolves against the browser-use gateway). Simplified to direct-only
with managed=False.
* fix(browser-use): port missing improvements from PR #5605
- CDP URL normalization: resolve HTTP discovery URLs to websocket after
cloud provider create_session() (prevents agent-browser failures)
- Managed session payload: send timeout=5 and proxyCountryCode=us for
gateway-backed sessions (prevents billing overruns)
- Update prompt builder, browser_close schema, and module docstring to
replace remaining Browserbase references with Browser Use
- Dynamic /browser status detection via _get_cloud_provider() instead
of hardcoded env var checks (future-proof for new providers)
- Rename post_setup key from 'browserbase' to 'agent_browser'
- Update setup hint to mention Browser Use alongside Browserbase
- Add tests: CDP normalization, browserbase direct-only guard,
managed browser-use gateway, direct browserbase fallback
---------
Co-authored-by: rob-maron <132852777+rob-maron@users.noreply.github.com>
* refactor: remove browser_close tool — auto-cleanup handles it
The browser_close tool was called in only 9% of browser sessions (13/144
navigations across 66 sessions), always redundantly — cleanup_browser()
already runs via _cleanup_task_resources() at conversation end, and the
background inactivity reaper catches anything else.
Removing it saves one tool schema slot in every browser-enabled API call.
Also fixes a latent bug: cleanup_browser() now handles Camofox sessions
too (previously only Browserbase). Camofox sessions were never auto-cleaned
per-task because they live in a separate dict from _active_sessions.
Files changed (13):
- tools/browser_tool.py: remove function, schema, registry entry; add
camofox cleanup to cleanup_browser()
- toolsets.py, model_tools.py, prompt_builder.py, display.py,
acp_adapter/tools.py: remove browser_close from all tool lists
- tests/: remove browser_close test, update toolset assertion
- docs/skills: remove all browser_close references
* fix: repeat browser_scroll 5x per call for meaningful page movement
Most backends scroll ~100px per call — barely visible on a typical
viewport. Repeating 5x gives ~500px (~half a viewport), making each
scroll tool call actually useful.
Backend-agnostic approach: works across all 7+ browser backends without
needing to configure each one's scroll amount individually. Breaks
early on error for the agent-browser path.
* feat: auto-return compact snapshot from browser_navigate
Every browser session starts with navigate → snapshot. Now navigate
returns the compact accessibility tree snapshot inline, saving one
tool call per browser task.
The snapshot captures the full page DOM (not viewport-limited), so
scroll position doesn't affect it. browser_snapshot remains available
for refreshing after interactions or getting full=true content.
Both Browserbase and Camofox paths auto-snapshot. If the snapshot
fails for any reason, navigation still succeeds — the snapshot is
a bonus, not a requirement.
Schema descriptions updated to guide models: navigate mentions it
returns a snapshot, snapshot mentions it's for refresh/full content.
* refactor: slim cronjob tool schema — consolidate model/provider, drop unused params
Session data (151 calls across 67 sessions) showed several schema
properties were never used by models. Consolidated and cleaned up:
Removed from schema (still work via backend/CLI):
- skill (singular): use skills array instead
- reason: pause-only, unnecessary
- include_disabled: now defaults to true
- base_url: extreme edge case, zero usage
- provider (standalone): merged into model object
Consolidated:
- model + provider → single 'model' object with {model, provider} fields.
If provider is omitted, the current main provider is pinned at creation
time so the job stays stable even if the user changes their default.
Kept:
- script: useful data collection feature
- skills array: standard interface for skill loading
Schema shrinks from 14 to 10 properties. All backend functionality
preserved — the Python function signature and handler lambda still
accept every parameter.
* fix: remove mixture_of_agents from core toolsets — opt-in only via hermes tools
MoA was in _HERMES_CORE_TOOLS and composite toolsets (hermes-cli,
hermes-messaging, safe), which meant it appeared in every session
for anyone with OPENROUTER_API_KEY set. The _DEFAULT_OFF_TOOLSETS
gate only works after running 'hermes tools' explicitly.
Now MoA only appears when a user explicitly enables it via
'hermes tools'. The moa toolset definition and check_fn remain
unchanged — it just needs to be opted into.
The credential pool seeder and runtime credential resolver hardcoded
api.z.ai/api/paas/v4 for all Z.AI keys. Keys on the Coding Plan (or CN
endpoint) would hit the wrong endpoint, causing 401/429 errors on the
first request even though a working endpoint exists.
Add _resolve_zai_base_url() that:
- Respects GLM_BASE_URL env var (no probe when explicitly set)
- Probes all candidate endpoints (global, cn, coding-global, coding-cn)
via detect_zai_endpoint() to find one that returns HTTP 200
- Caches the detected endpoint in provider state (auth.json) keyed on
a SHA-256 hash of the API key so subsequent starts skip the probe
- Falls back to the default URL if all probes fail
Wire into both _seed_from_env() in the credential pool and
resolve_api_key_provider_credentials() in the runtime resolver,
matching the pattern from the kimi-coding fix (PR #5566).
Fixes the same class of bug as #5561 but for the zai provider.
Cherry-picked from PR #5580 by MestreY0d4-Uninter.
- Share parent's credential pool with child agents for key rotation
- Leasing layer spreads parallel children across keys (least-loaded)
- Thread-safe acquire_lease/release_lease in CredentialPool
- Reverted sneaked-in tool-name restoration change (kept original
getattr + isinstance guard pattern)
Two remaining gaps from the codex empty-output spec:
1. Normalize dict-shaped streamed items: output_item.done events may
yield dicts (raw/fallback paths) instead of SDK objects. The
extraction loop now uses _item_get() that handles both getattr
and dict .get() access.
2. Avoid plain-text synthesis when function_call events were streamed:
tracks has_function_calls during streaming and skips text-delta
synthesis when tool calls are present — prevents collapsing a
tool-call response into a fake text message.
The _CodexCompletionsAdapter (used for compression, vision, web_extract,
session_search, and memory flush when on the codex provider) streamed
responses but discarded all events with 'for _event in stream: pass'.
When get_final_response() returned empty output (the same chatgpt.com
backend-api shape change), auxiliary calls silently returned None content.
Now collects response.output_item.done and text deltas during streaming
and backfills empty output — same pattern as _run_codex_stream().
Tested live against chatgpt.com/backend-api/codex with OAuth.
OpenAI OAuth refresh tokens are single-use and rotate on every refresh.
When the Codex CLI (or another Hermes profile) refreshes its token, the
pool entry's refresh_token becomes stale. Subsequent refresh attempts
fail with invalid_grant, and the entry enters a 24-hour exhaustion
cooldown with no recovery path.
This mirrors the existing _sync_anthropic_entry_from_credentials_file()
pattern: when an openai-codex entry is exhausted, compare its
refresh_token against ~/.codex/auth.json and sync the fresh pair if
they differ.
Fixes the common scenario where users run 'codex login' to refresh
their token externally and Hermes never picks it up.
Co-authored-by: David Andrews (LexGenius.ai) <david@lexgenius.ai>
Two fixes:
1. Replace all stale 'hermes login' references with 'hermes auth' across
auth.py, auxiliary_client.py, delegate_tool.py, config.py, run_agent.py,
and documentation. The 'hermes login' command was deprecated; 'hermes auth'
now handles OAuth credential management.
2. Fix credential removal not persisting for singleton-sourced credentials
(device_code for openai-codex/nous, hermes_pkce for anthropic).
auth_remove_command already cleared env vars for env-sourced credentials,
but singleton credentials stored in the auth store were re-seeded by
_seed_from_singletons() on the next load_pool() call. Now clears the
underlying auth store entry when removing singleton-sourced credentials.
Skills can now declare config.yaml settings via metadata.hermes.config
in their SKILL.md frontmatter. Values are stored under skills.config.*
namespace, prompted during hermes config migrate, shown in hermes config
show, and injected into the skill context at load time.
Also adds the llm-wiki skill (Karpathy's LLM Wiki pattern) as the first
skill to use the new config interface, declaring wiki.path.
Skill config interface (new):
- agent/skill_utils.py: extract_skill_config_vars(), discover_all_skill_config_vars(),
resolve_skill_config_values(), SKILL_CONFIG_PREFIX
- agent/skill_commands.py: _inject_skill_config() injects resolved values
into skill messages as [Skill config: ...] block
- hermes_cli/config.py: get_missing_skill_config_vars(), skill config
prompting in migrate_config(), Skill Settings in show_config()
LLM Wiki skill (skills/research/llm-wiki/SKILL.md):
- Three-layer architecture (raw sources, wiki pages, schema)
- Three operations (ingest, query, lint)
- Session orientation, page thresholds, tag taxonomy, update policy,
scaling guidance, log rotation, archiving workflow
Docs: creating-skills.md, configuration.md, skills.md, skills-catalog.md
Closes#5100
When a user runs out of OpenRouter credits and switches to Codex (or any
other provider), auxiliary tasks (compression, vision, web_extract) would
still try OpenRouter first and fail with 402. Two fixes:
1. Payment fallback in call_llm(): When a resolved provider returns HTTP 402
or a credit-related error, automatically retry with the next available
provider in the auto-detection chain. Skips the depleted provider and
tries Nous → Custom → Codex → API-key providers.
2. Remove hardcoded OpenRouter fallback: The old code fell back specifically
to OpenRouter when auto/custom resolution returned no client. Now falls
back to the full auto-detection chain, which handles any available
provider — not just OpenRouter.
Also extracts _get_provider_chain() as a shared function (replaces inline
tuple in _resolve_auto and the new fallback), built at call time so test
patches on _try_* functions remain visible.
Adds 16 tests covering _is_payment_error(), _get_provider_chain(),
_try_payment_fallback(), and call_llm() integration with 402 retry.
Telegram Bot API requires command names to contain only lowercase a-z,
digits 0-9, and underscores. Skill/plugin names containing characters
like +, /, @, or . caused set_my_commands to fail with
Bot_command_invalid.
Two-layer fix:
- scan_skill_commands(): strip non-alphanumeric/non-hyphen chars from
cmd_key at source, collapse consecutive hyphens, trim edges, skip
names that sanitize to empty string
- _sanitize_telegram_name(): centralized helper used by all 3 Telegram
name generation sites (core commands, plugin commands, skill commands)
with empty-name guard at each call site
Closes#5534
Grok models (x-ai/grok-4.20-beta, grok-code-fast-1) now receive tool-use
enforcement guidance, steering them to actually call tools instead of
describing intended actions. Matches both OpenRouter (x-ai/grok-*) and
direct xAI API usage.
Enable Hermes tool execution through the copilot-acp adapter by:
- Passing tool schemas and tool_choice into the ACP prompt text
- Instructing ACP backend to emit <tool_call>{...}</tool_call> blocks
- Parsing XML tool-call blocks and bare JSON fallback back into
Hermes-compatible SimpleNamespace tool call objects
- Setting finish_reason='tool_calls' when tool calls are extracted
- Cleaning tool-call markup from response text
Fix duplicate tool call extraction when both XML block and bare JSON
regexes matched the same content (XML blocks now take precedence).
Cherry-picked from PR #4536 by MestreY0d4-Uninter. Stripped heuristic
fallback system (auto-synthesized tool calls from prose) and
Portuguese-language patterns — tool execution should be model-decided,
not heuristic-guessed.
Consolidated salvage from PRs #5301 (qaqcvc), #5339 (lance0),
#5058 and #5098 (maymuneth).
Mem0 API v2 compatibility (#5301):
- All reads use filters={user_id: ...} instead of bare user_id= kwarg
- All writes use filters with user_id + agent_id for attribution
- Response unwrapping for v2 dict format {results: [...]}
- Split _read_filters() vs _write_filters() — reads are user-scoped
only for cross-session recall, writes include agent_id
- Preserved 'hermes-user' default (no breaking change for existing users)
- Omitted run_id scoping from #5301 — cross-session memory is Mem0's
core value, session-scoping reads would defeat that purpose
Memory prefetch context fencing (#5339):
- Wraps prefetched memory in <memory-context> fenced blocks with system
note marking content as recalled context, NOT user input
- Sanitizes provider output to strip fence-escape sequences, preventing
injection where memory content breaks out of the fence
- API-call-time only — never persisted to session history
Secret redaction (#5058, #5098):
- Added prefix patterns for Groq (gsk_), Matrix (syt_), RetainDB
(retaindb_), Hindsight (hsk-), Mem0 (mem0_), ByteRover (brv_)
Adds OPENAI_MODEL_EXECUTION_GUIDANCE — XML-tagged behavioral guidance
injected for GPT and Codex models alongside the existing tool-use
enforcement. Targets four specific failure modes:
- <tool_persistence>: retry on empty/partial results instead of giving up
- <prerequisite_checks>: do discovery/lookup before jumping to final action
- <verification>: check correctness/grounding/formatting before finalizing
- <missing_context>: use lookup tools instead of hallucinating
Follows the same injection pattern as GOOGLE_MODEL_OPERATIONAL_GUIDANCE
for Gemini/Gemma models. Inspired by OpenClaw PR #38953 and OpenAI's
GPT-5.4 prompting guide patterns.
As the agent navigates into subdirectories via tool calls (read_file,
terminal, search_files, etc.), automatically discover and load project
context files (AGENTS.md, CLAUDE.md, .cursorrules) from those directories.
Previously, context files were only loaded from the CWD at session start.
If the agent moved into backend/, frontend/, or any subdirectory with its
own AGENTS.md, those instructions were never seen.
Now, SubdirectoryHintTracker watches tool call arguments for file paths
and shell commands, resolves directories, and loads hint files on first
access. Discovered hints are appended to the tool result so the model
gets relevant context at the moment it starts working in a new area —
without modifying the system prompt (preserving prompt caching).
Features:
- Extracts paths from tool args (path, workdir) and shell commands
- Loads AGENTS.md, CLAUDE.md, .cursorrules (first match per directory)
- Deduplicates — each directory loaded at most once per session
- Ignores paths outside the working directory
- Truncates large hint files at 8K chars
- Works on both sequential and concurrent tool execution paths
Inspired by Block/goose SubdirectoryHintTracker.
Telegram's Bot API disallows hyphens in command names, so
_build_telegram_menu registers /claude-code as /claude_code. When the
user taps it from autocomplete, the gateway dispatch did a direct
lookup against skill_cmds (keyed on the hyphenated form) and missed,
silently falling through to the LLM as plain text. The model would
then typically call delegate_task, spawning a Hermes subagent instead
of invoking the intended skill.
Normalize underscores to hyphens in skill and plugin command lookup,
matching the existing pattern in _check_unavailable_skill.
Resolve exact label matches before treating digit-only input as a positional index so destructive auth removal does not mis-target credentials named with numeric labels.
Constraint: The CLI remove path must keep supporting existing index-based usage while adding safer label targeting
Rejected: Ban numeric labels | labels are free-form and existing users may already rely on them
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: When a destructive command accepts multiple identifier forms, prefer exact identity matches before fallback parsing heuristics
Tested: Focused pytest slice for auth commands, credential pool recovery, and routing (273 passed); py_compile on changed Python files
Not-tested: Full repository pytest suite
Persist structured exhaustion metadata from provider errors, use explicit reset timestamps when available, and expose label-based credential targeting in the auth CLI. This keeps long-lived Codex cooldowns from being misreported as one-hour waits and avoids forcing operators to manage entries by list position alone.
Constraint: Existing credential pool JSON needs to remain backward compatible with stored entries that only record status code and timestamp
Constraint: Runtime recovery must keep the existing retry-then-rotate semantics for 429s while enriching pool state with provider metadata
Rejected: Add a separate credential scheduler subsystem | too large for the Hermes pool architecture and unnecessary for this fix
Rejected: Only change CLI formatting | would leave runtime rotation blind to resets_at and preserve the serial-failure behavior
Confidence: high
Scope-risk: moderate
Reversibility: clean
Directive: Preserve structured rate-limit metadata when new providers expose reset hints; do not collapse back to status-code-only exhaustion tracking
Tested: Focused pytest slice for auth commands, credential pool recovery, and routing (272 passed); py_compile on changed Python files; hermes -w auth list/remove smoke test with temporary HERMES_HOME
Not-tested: Full repository pytest suite, broader gateway/integration flows outside the touched auth and pool paths
Users on direct API-key providers (Alibaba, DeepSeek, ZAI, etc.) without
an OpenRouter or Nous key would get broken auxiliary tasks (compression,
vision, etc.) because _resolve_auto() only tried aggregator providers
first, then fell back to iterating PROVIDER_REGISTRY with wrong default
model names.
Now _resolve_auto() checks the user's main provider first. If it's not
an aggregator (OpenRouter/Nous), it uses their main model directly for
all auxiliary tasks. Aggregator users still get the cheap gemini-flash
model as before.
Adds _read_main_provider() to read model.provider from config.yaml,
mirroring the existing _read_main_model().
Reported by SkyLinx — Alibaba Coding Plan user getting 400 errors from
google/gemini-3-flash-preview being sent to DashScope.
Bug fixes:
- agent/redact.py: catastrophic regex backtracking in _ENV_ASSIGN_RE — removed
re.IGNORECASE and changed [A-Z_]* to [A-Z0-9_]* to restrict matching to actual
env var name chars. Without this, the pattern backtracks exponentially on large
strings (e.g. 100K tool output), causing test_file_read_guards to time out.
- tools/file_operations.py: over-escaped newline in find -printf format string
produced literal backslash-n instead of a real newline, breaking file search
result parsing (total_count always 1, paths concatenated).
Test fixes:
- Remove stale pytestmark.skip from 4 test modules that were blanket-skipped as
'Hangs in non-interactive environments' but actually run fine:
- test_413_compression.py (12 tests, 25s)
- test_file_tools_live.py (71 tests, 24s)
- test_code_execution.py (61 tests, 99s)
- test_agent_loop_tool_calling.py (has proper OPENROUTER_API_KEY skip already)
- test_413_compression.py: fix threshold values in 2 preflight compression tests
where context_length was too small for the compressed output to fit in one pass.
- test_mcp_probe.py: add missing _MCP_AVAILABLE mock so tests work without MCP SDK.
- test_mcp_tool_issue_948.py: inject MCP symbols (StdioServerParameters etc.) when
SDK is not installed so patch() targets exist.
- test_approve_deny_commands.py: replace time.sleep(0.3) with deterministic polling
of _gateway_queues — fixes race condition where resolve fires before threads
register their approval entries, causing the test to hang indefinitely.
Net effect: +256 tests recovered from skip, 8 real failures fixed.
Address review feedback: replace bare `except: pass` with a debug
log when the post-retry write-back to ~/.claude/.credentials.json
fails. The write-back is best-effort (token is already resolved),
but logging helps troubleshooting.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
OAuth refresh tokens are single-use. When multiple consumers share the
same Anthropic OAuth session (credential pool entries, Claude Code CLI,
multiple Hermes profiles), whichever refreshes first invalidates the
refresh token for all others. This causes a cascade:
1. Pool entry tries to refresh with a consumed refresh token → 400
2. Pool marks the credential as "exhausted" with a 24-hour cooldown
3. All subsequent heartbeats skip the credential entirely
4. The fallback to resolve_anthropic_token() only works while the
access token in ~/.claude/.credentials.json hasn't expired
5. Once it expires, nothing can auto-recover without manual re-login
Fix:
- Add _sync_anthropic_entry_from_credentials_file() to detect when
~/.claude/.credentials.json has a newer refresh token and sync it
into the pool entry, clearing exhaustion status
- After a successful pool refresh, write the new tokens back to
~/.claude/.credentials.json so other consumers stay in sync
- On refresh failure, check if the credentials file has a different
(newer) refresh token and retry once before marking exhausted
- In _available_entries(), sync exhausted claude_code entries from
the credentials file before applying the 24-hour cooldown, so a
manual re-login or external refresh immediately unblocks agents
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two pre-existing issues causing test_file_read_guards timeouts on CI:
1. agent/redact.py: _ENV_ASSIGN_RE used unbounded [A-Z_]* with
IGNORECASE, matching any letter/underscore to end-of-string at
each position → O(n²) backtracking on 100K+ char inputs.
Bounded to {0,50} since env var names are never that long.
2. tools/file_tools.py: redact_sensitive_text() ran BEFORE the
character-count guard, so oversized content (that would be rejected
anyway) went through the expensive regex first. Reordered to check
size limit before redaction.
The Anthropic SDK appends /v1/messages to the base_url, so OpenCode's
base URL https://opencode.ai/zen/go/v1 produced a double /v1 path
(https://opencode.ai/zen/go/v1/v1/messages), causing 404s for MiniMax
models. Strip trailing /v1 when api_mode is anthropic_messages.
Also adds MiMo-V2-Pro, MiMo-V2-Omni, and MiniMax-M2.5 to the OpenCode
Go model lists per their updated docs.
Fixes#4890
Three interconnected bugs caused `hermes skills config` per-platform
settings to be silently ignored:
1. telegram_menu_commands() never filtered disabled skills — all skills
consumed menu slots regardless of platform config, hitting Telegram's
100 command cap. Now loads disabled skills for 'telegram' and excludes
them from the menu.
2. Gateway skill dispatch executed disabled skills because
get_skill_commands() (process-global cache) only filters by the global
disabled list at scan time. Added per-platform check before execution,
returning an actionable 'skill is disabled' message.
3. get_disabled_skill_names() only checked HERMES_PLATFORM env var, but
the gateway sets HERMES_SESSION_PLATFORM instead. Added
HERMES_SESSION_PLATFORM as fallback, plus an explicit platform=
parameter for callers that know their platform (menu builder, gateway
dispatch). Also added platform to prompt_builder's skills cache key
so multi-platform gateways get correct per-platform skill prompts.
Reported by SteveSkedasticity (CLAW community).
* feat(memory): add pluggable memory provider interface with profile isolation
Introduces a pluggable MemoryProvider ABC so external memory backends can
integrate with Hermes without modifying core files. Each backend becomes a
plugin implementing a standard interface, orchestrated by MemoryManager.
Key architecture:
- agent/memory_provider.py — ABC with core + optional lifecycle hooks
- agent/memory_manager.py — single integration point in the agent loop
- agent/builtin_memory_provider.py — wraps existing MEMORY.md/USER.md
Profile isolation fixes applied to all 6 shipped plugins:
- Cognitive Memory: use get_hermes_home() instead of raw env var
- Hindsight Memory: check $HERMES_HOME/hindsight/config.json first,
fall back to legacy ~/.hindsight/ for backward compat
- Hermes Memory Store: replace hardcoded ~/.hermes paths with
get_hermes_home() for config loading and DB path defaults
- Mem0 Memory: use get_hermes_home() instead of raw env var
- RetainDB Memory: auto-derive profile-scoped project name from
hermes_home path (hermes-<profile>), explicit env var overrides
- OpenViking Memory: read-only, no local state, isolation via .env
MemoryManager.initialize_all() now injects hermes_home into kwargs so
every provider can resolve profile-scoped storage without importing
get_hermes_home() themselves.
Plugin system: adds register_memory_provider() to PluginContext and
get_plugin_memory_providers() accessor.
Based on PR #3825. 46 tests (37 unit + 5 E2E + 4 plugin registration).
* refactor(memory): drop cognitive plugin, rewrite OpenViking as full provider
Remove cognitive-memory plugin (#727) — core mechanics are broken:
decay runs 24x too fast (hourly not daily), prefetch uses row ID as
timestamp, search limited by importance not similarity.
Rewrite openviking-memory plugin from a read-only search wrapper into
a full bidirectional memory provider using the complete OpenViking
session lifecycle API:
- sync_turn: records user/assistant messages to OpenViking session
(threaded, non-blocking)
- on_session_end: commits session to trigger automatic memory extraction
into 6 categories (profile, preferences, entities, events, cases,
patterns)
- prefetch: background semantic search via find() endpoint
- on_memory_write: mirrors built-in memory writes to the session
- is_available: checks env var only, no network calls (ABC compliance)
Tools expanded from 3 to 5:
- viking_search: semantic search with mode/scope/limit
- viking_read: tiered content (abstract ~100tok / overview ~2k / full)
- viking_browse: filesystem-style navigation (list/tree/stat)
- viking_remember: explicit memory storage via session
- viking_add_resource: ingest URLs/docs into knowledge base
Uses direct HTTP via httpx (no openviking SDK dependency needed).
Response truncation on viking_read to prevent context flooding.
* fix(memory): harden Mem0 plugin — thread safety, non-blocking sync, circuit breaker
- Remove redundant mem0_context tool (identical to mem0_search with
rerank=true, top_k=5 — wastes a tool slot and confuses the model)
- Thread sync_turn so it's non-blocking — Mem0's server-side LLM
extraction can take 5-10s, was stalling the agent after every turn
- Add threading.Lock around _get_client() for thread-safe lazy init
(prefetch and sync threads could race on first client creation)
- Add circuit breaker: after 5 consecutive API failures, pause calls
for 120s instead of hammering a down server every turn. Auto-resets
after cooldown. Logs a warning when tripped.
- Track success/failure in prefetch, sync_turn, and all tool calls
- Wait for previous sync to finish before starting a new one (prevents
unbounded thread accumulation on rapid turns)
- Clean up shutdown to join both prefetch and sync threads
* fix(memory): enforce single external memory provider limit
MemoryManager now rejects a second non-builtin provider with a warning.
Built-in memory (MEMORY.md/USER.md) is always accepted. Only ONE
external plugin provider is allowed at a time. This prevents tool
schema bloat (some providers add 3-5 tools each) and conflicting
memory backends.
The warning message directs users to configure memory.provider in
config.yaml to select which provider to activate.
Updated all 47 tests to use builtin + one external pattern instead
of multiple externals. Added test_second_external_rejected to verify
the enforcement.
* feat(memory): add ByteRover memory provider plugin
Implements the ByteRover integration (from PR #3499 by hieuntg81) as a
MemoryProvider plugin instead of direct run_agent.py modifications.
ByteRover provides persistent memory via the brv CLI — a hierarchical
knowledge tree with tiered retrieval (fuzzy text then LLM-driven search).
Local-first with optional cloud sync.
Plugin capabilities:
- prefetch: background brv query for relevant context
- sync_turn: curate conversation turns (threaded, non-blocking)
- on_memory_write: mirror built-in memory writes to brv
- on_pre_compress: extract insights before context compression
Tools (3):
- brv_query: search the knowledge tree
- brv_curate: store facts/decisions/patterns
- brv_status: check CLI version and context tree state
Profile isolation: working directory at $HERMES_HOME/byterover/ (scoped
per profile). Binary resolution cached with thread-safe double-checked
locking. All write operations threaded to avoid blocking the agent
(curate can take 120s with LLM processing).
* fix(memory): thread remaining sync_turns, fix holographic, add config key
Plugin fixes:
- Hindsight: thread sync_turn (was blocking up to 30s via _run_in_thread)
- RetainDB: thread sync_turn (was blocking on HTTP POST)
- Both: shutdown now joins sync threads alongside prefetch threads
Holographic retrieval fixes:
- reason(): removed dead intersection_key computation (bundled but never
used in scoring). Now reuses pre-computed entity_residuals directly,
moved role_content encoding outside the inner loop.
- contradict(): added _MAX_CONTRADICT_FACTS=500 scaling guard. Above
500 facts, only checks the most recently updated ones to avoid O(n^2)
explosion (~125K comparisons at 500 is acceptable).
Config:
- Added memory.provider key to DEFAULT_CONFIG ("" = builtin only).
No version bump needed (deep_merge handles new keys automatically).
* feat(memory): extract Honcho as a MemoryProvider plugin
Creates plugins/honcho-memory/ as a thin adapter over the existing
honcho_integration/ package. All 4 Honcho tools (profile, search,
context, conclude) move from the normal tool registry to the
MemoryProvider interface.
The plugin delegates all work to HonchoSessionManager — no Honcho
logic is reimplemented. It uses the existing config chain:
$HERMES_HOME/honcho.json -> ~/.honcho/config.json -> env vars.
Lifecycle hooks:
- initialize: creates HonchoSessionManager via existing client factory
- prefetch: background dialectic query
- sync_turn: records messages + flushes to API (threaded)
- on_memory_write: mirrors user profile writes as conclusions
- on_session_end: flushes all pending messages
This is a prerequisite for the MemoryManager wiring in run_agent.py.
Once wired, Honcho goes through the same provider interface as all
other memory plugins, and the scattered Honcho code in run_agent.py
can be consolidated into the single MemoryManager integration point.
* feat(memory): wire MemoryManager into run_agent.py
Adds 8 integration points for the external memory provider plugin,
all purely additive (zero existing code modified):
1. Init (~L1130): Create MemoryManager, find matching plugin provider
from memory.provider config, initialize with session context
2. Tool injection (~L1160): Append provider tool schemas to self.tools
and self.valid_tool_names after memory_manager init
3. System prompt (~L2705): Add external provider's system_prompt_block
alongside existing MEMORY.md/USER.md blocks
4. Tool routing (~L5362): Route provider tool calls through
memory_manager.handle_tool_call() before the catchall handler
5. Memory write bridge (~L5353): Notify external provider via
on_memory_write() when the built-in memory tool writes
6. Pre-compress (~L5233): Call on_pre_compress() before context
compression discards messages
7. Prefetch (~L6421): Inject provider prefetch results into the
current-turn user message (same pattern as Honcho turn context)
8. Turn sync + session end (~L8161, ~L8172): sync_all() after each
completed turn, queue_prefetch_all() for next turn, on_session_end()
+ shutdown_all() at conversation end
All hooks are wrapped in try/except — a failing provider never breaks
the agent. The existing memory system, Honcho integration, and all
other code paths are completely untouched.
Full suite: 7222 passed, 4 pre-existing failures.
* refactor(memory): remove legacy Honcho integration from core
Extracts all Honcho-specific code from run_agent.py, model_tools.py,
toolsets.py, and gateway/run.py. Honcho is now exclusively available
as a memory provider plugin (plugins/honcho-memory/).
Removed from run_agent.py (-457 lines):
- Honcho init block (session manager creation, activation, config)
- 8 Honcho methods: _honcho_should_activate, _strip_honcho_tools,
_activate_honcho, _register_honcho_exit_hook, _queue_honcho_prefetch,
_honcho_prefetch, _honcho_save_user_observation, _honcho_sync
- _inject_honcho_turn_context module-level function
- Honcho system prompt block (tool descriptions, CLI commands)
- Honcho context injection in api_messages building
- Honcho params from __init__ (honcho_session_key, honcho_manager,
honcho_config)
- HONCHO_TOOL_NAMES constant
- All honcho-specific tool dispatch forwarding
Removed from other files:
- model_tools.py: honcho_tools import, honcho params from handle_function_call
- toolsets.py: honcho toolset definition, honcho tools from core tools list
- gateway/run.py: honcho params from AIAgent constructor calls
Removed tests (-339 lines):
- 9 Honcho-specific test methods from test_run_agent.py
- TestHonchoAtexitFlush class from test_exit_cleanup_interrupt.py
Restored two regex constants (_SURROGATE_RE, _BUDGET_WARNING_RE) that
were accidentally removed during the honcho function extraction.
The honcho_integration/ package is kept intact — the plugin delegates
to it. tools/honcho_tools.py registry entries are now dead code (import
commented out in model_tools.py) but the file is preserved for reference.
Full suite: 7207 passed, 4 pre-existing failures. Zero regressions.
* refactor(memory): restructure plugins, add CLI, clean gateway, migration notice
Plugin restructure:
- Move all memory plugins from plugins/<name>-memory/ to plugins/memory/<name>/
(byterover, hindsight, holographic, honcho, mem0, openviking, retaindb)
- New plugins/memory/__init__.py discovery module that scans the directory
directly, loading providers by name without the general plugin system
- run_agent.py uses load_memory_provider() instead of get_plugin_memory_providers()
CLI wiring:
- hermes memory setup — interactive curses picker + config wizard
- hermes memory status — show active provider, config, availability
- hermes memory off — disable external provider (built-in only)
- hermes honcho — now shows migration notice pointing to hermes memory setup
Gateway cleanup:
- Remove _get_or_create_gateway_honcho (already removed in prev commit)
- Remove _shutdown_gateway_honcho and _shutdown_all_gateway_honcho methods
- Remove all calls to shutdown methods (4 call sites)
- Remove _honcho_managers/_honcho_configs dict references
Dead code removal:
- Delete tools/honcho_tools.py (279 lines, import was already commented out)
- Delete tests/gateway/test_honcho_lifecycle.py (131 lines, tested removed methods)
- Remove if False placeholder from run_agent.py
Migration:
- Honcho migration notice on startup: detects existing honcho.json or
~/.honcho/config.json, prints guidance to run hermes memory setup.
Only fires when memory.provider is not set and not in quiet mode.
Full suite: 7203 passed, 4 pre-existing failures. Zero regressions.
* feat(memory): standardize plugin config + add per-plugin documentation
Config architecture:
- Add save_config(values, hermes_home) to MemoryProvider ABC
- Honcho: writes to $HERMES_HOME/honcho.json (SDK native)
- Mem0: writes to $HERMES_HOME/mem0.json
- Hindsight: writes to $HERMES_HOME/hindsight/config.json
- Holographic: writes to config.yaml under plugins.hermes-memory-store
- OpenViking/RetainDB/ByteRover: env-var only (default no-op)
Setup wizard (hermes memory setup):
- Now calls provider.save_config() for non-secret config
- Secrets still go to .env via env vars
- Only memory.provider activation key goes to config.yaml
Documentation:
- README.md for each of the 7 providers in plugins/memory/<name>/
- Requirements, setup (wizard + manual), config reference, tools table
- Consistent format across all providers
The contract for new memory plugins:
- get_config_schema() declares all fields (REQUIRED)
- save_config() writes native config (REQUIRED if not env-var-only)
- Secrets use env_var field in schema, written to .env by wizard
- README.md in the plugin directory
* docs: add memory providers user guide + developer guide
New pages:
- user-guide/features/memory-providers.md — comprehensive guide covering
all 7 shipped providers (Honcho, OpenViking, Mem0, Hindsight,
Holographic, RetainDB, ByteRover). Each with setup, config, tools,
cost, and unique features. Includes comparison table and profile
isolation notes.
- developer-guide/memory-provider-plugin.md — how to build a new memory
provider plugin. Covers ABC, required methods, config schema,
save_config, threading contract, profile isolation, testing.
Updated pages:
- user-guide/features/memory.md — replaced Honcho section with link to
new Memory Providers page
- user-guide/features/honcho.md — replaced with migration redirect to
the new Memory Providers page
- sidebars.ts — added both new pages to navigation
* fix(memory): auto-migrate Honcho users to memory provider plugin
When honcho.json or ~/.honcho/config.json exists but memory.provider
is not set, automatically set memory.provider: honcho in config.yaml
and activate the plugin. The plugin reads the same config files, so
all data and credentials are preserved. Zero user action needed.
Persists the migration to config.yaml so it only fires once. Prints
a one-line confirmation in non-quiet mode.
* fix(memory): only auto-migrate Honcho when enabled + credentialed
Check HonchoClientConfig.enabled AND (api_key OR base_url) before
auto-migrating — not just file existence. Prevents false activation
for users who disabled Honcho, stopped using it (config lingers),
or have ~/.honcho/ from a different tool.
* feat(memory): auto-install pip dependencies during hermes memory setup
Reads pip_dependencies from plugin.yaml, checks which are missing,
installs them via pip before config walkthrough. Also shows install
guidance for external_dependencies (e.g. brv CLI for ByteRover).
Updated all 7 plugin.yaml files with pip_dependencies:
- honcho: honcho-ai
- mem0: mem0ai
- openviking: httpx
- hindsight: hindsight-client
- holographic: (none)
- retaindb: requests
- byterover: (external_dependencies for brv CLI)
* fix: remove remaining Honcho crash risks from cli.py and gateway
cli.py: removed Honcho session re-mapping block (would crash importing
deleted tools/honcho_tools.py), Honcho flush on compress, Honcho
session display on startup, Honcho shutdown on exit, honcho_session_key
AIAgent param.
gateway/run.py: removed honcho_session_key params from helper methods,
sync_honcho param, _honcho.shutdown() block.
tests: fixed test_cron_session_with_honcho_key_skipped (was passing
removed honcho_key param to _flush_memories_for_session).
* fix: include plugins/ in pyproject.toml package list
Without this, plugins/memory/ wouldn't be included in non-editable
installs. Hermes always runs from the repo checkout so this is belt-
and-suspenders, but prevents breakage if the install method changes.
* fix(memory): correct pip-to-import name mapping for dep checks
The heuristic dep.replace('-', '_') fails for packages where the pip
name differs from the import name: honcho-ai→honcho, mem0ai→mem0,
hindsight-client→hindsight_client. Added explicit mapping table so
hermes memory setup doesn't try to reinstall already-installed packages.
* chore: remove dead code from old plugin memory registration path
- hermes_cli/plugins.py: removed register_memory_provider(),
_memory_providers list, get_plugin_memory_providers() — memory
providers now use plugins/memory/ discovery, not the general plugin system
- hermes_cli/main.py: stripped 74 lines of dead honcho argparse
subparsers (setup, status, sessions, map, peer, mode, tokens,
identity, migrate) — kept only the migration redirect
- agent/memory_provider.py: updated docstring to reflect new
registration path
- tests: replaced TestPluginMemoryProviderRegistration with
TestPluginMemoryDiscovery that tests the actual plugins/memory/
discovery system. Added 3 new tests (discover, load, nonexistent).
* chore: delete dead honcho_integration/cli.py and its tests
cli.py (794 lines) was the old 'hermes honcho' command handler — nobody
calls it since cmd_honcho was replaced with a migration redirect.
Deleted tests that imported from removed code:
- tests/honcho_integration/test_cli.py (tested _resolve_api_key)
- tests/honcho_integration/test_config_isolation.py (tested CLI config paths)
- tests/tools/test_honcho_tools.py (tested the deleted tools/honcho_tools.py)
Remaining honcho_integration/ files (actively used by the plugin):
- client.py (445 lines) — config loading, SDK client creation
- session.py (991 lines) — session management, queries, flush
* refactor: move honcho_integration/ into the honcho plugin
Moves client.py (445 lines) and session.py (991 lines) from the
top-level honcho_integration/ package into plugins/memory/honcho/.
No Honcho code remains in the main codebase.
- plugins/memory/honcho/client.py — config loading, SDK client creation
- plugins/memory/honcho/session.py — session management, queries, flush
- Updated all imports: run_agent.py (auto-migration), hermes_cli/doctor.py,
plugin __init__.py, session.py cross-import, all tests
- Removed honcho_integration/ package and pyproject.toml entry
- Renamed tests/honcho_integration/ → tests/honcho_plugin/
* docs: update architecture + gateway-internals for memory provider system
- architecture.md: replaced honcho_integration/ with plugins/memory/
- gateway-internals.md: replaced Honcho-specific session routing and
flush lifecycle docs with generic memory provider interface docs
* fix: update stale mock path for resolve_active_host after honcho plugin migration
* fix(memory): address review feedback — P0 lifecycle, ABC contract, honcho CLI restore
Review feedback from Honcho devs (erosika):
P0 — Provider lifecycle:
- Remove on_session_end() + shutdown_all() from run_conversation() tail
(was killing providers after every turn in multi-turn sessions)
- Add shutdown_memory_provider() method on AIAgent for callers
- Wire shutdown into CLI atexit, reset_conversation, gateway stop/expiry
Bug fixes:
- Remove sync_honcho=False kwarg from /btw callsites (TypeError crash)
- Fix doctor.py references to dead 'hermes honcho setup' command
- Cache prefetch_all() before tool loop (was re-calling every iteration)
ABC contract hardening (all backwards-compatible):
- Add session_id kwarg to prefetch/sync_turn/queue_prefetch
- Make on_pre_compress() return str (provider insights in compression)
- Add **kwargs to on_turn_start() for runtime context
- Add on_delegation() hook for parent-side subagent observation
- Document agent_context/agent_identity/agent_workspace kwargs on
initialize() (prevents cron corruption, enables profile scoping)
- Fix docstring: single external provider, not multiple
Honcho CLI restoration:
- Add plugins/memory/honcho/cli.py (from main's honcho_integration/cli.py
with imports adapted to plugin path)
- Restore full hermes honcho command with all subcommands (status, peer,
mode, tokens, identity, enable/disable, sync, peers, --target-profile)
- Restore auto-clone on profile creation + sync on hermes update
- hermes honcho setup now redirects to hermes memory setup
* fix(memory): wire on_delegation, skip_memory for cron/flush, fix ByteRover return type
- Wire on_delegation() in delegate_tool.py — parent's memory provider
is notified with task+result after each subagent completes
- Add skip_memory=True to cron scheduler (prevents cron system prompts
from corrupting user representations — closes#4052)
- Add skip_memory=True to gateway flush agent (throwaway agent shouldn't
activate memory provider)
- Fix ByteRover on_pre_compress() return type: None -> str
* fix(honcho): port profile isolation fixes from PR #4632
Ports 5 bug fixes found during profile testing (erosika's PR #4632):
1. 3-tier config resolution — resolve_config_path() now checks
$HERMES_HOME/honcho.json → ~/.hermes/honcho.json → ~/.honcho/config.json
(non-default profiles couldn't find shared host blocks)
2. Thread host=_host_key() through from_global_config() in cmd_setup,
cmd_status, cmd_identity (--target-profile was being ignored)
3. Use bare profile name as aiPeer (not host key with dots) — Honcho's
peer ID pattern is ^[a-zA-Z0-9_-]+$, dots are invalid
4. Wrap add_peers() in try/except — was fatal on new AI peers, killed
all message uploads for the session
5. Gate Honcho clone behind --clone/--clone-all on profile create
(bare create should be blank-slate)
Also: sanitize assistant_peer_id via _sanitize_id()
* fix(tests): add module cleanup fixture to test_cli_provider_resolution
test_cli_provider_resolution._import_cli() wipes tools.*, cli, and
run_agent from sys.modules to force fresh imports, but had no cleanup.
This poisoned all subsequent tests on the same xdist worker — mocks
targeting tools.file_tools, tools.send_message_tool, etc. patched the
NEW module object while already-imported functions still referenced
the OLD one. Caused ~25 cascade failures: send_message KeyError,
process_registry FileNotFoundError, file_read_guards timeouts,
read_loop_detection file-not-found, mcp_oauth None port, and
provider_parity/codex_execution stale tool lists.
Fix: autouse fixture saves all affected modules before each test and
restores them after, matching the pattern in
test_managed_browserbase_and_modal.py.
Anthropic extended thinking blocks include an opaque 'signature' field
required for thinking chain continuity across multi-turn tool-use
conversations. Previously, normalize_anthropic_response() extracted
only the thinking text and set reasoning_details=None, discarding the
signature. On subsequent turns the API could not verify the chain.
Changes:
- _to_plain_data(): new recursive SDK-to-dict converter with depth cap
(20 levels) and path-based cycle detection for safety
- _extract_preserved_thinking_blocks(): rehydrates preserved thinking
blocks (including signature) from reasoning_details on assistant
messages, placing them before tool_use blocks as Anthropic requires
- normalize_anthropic_response(): stores full thinking blocks in
reasoning_details via _to_plain_data()
- _extract_reasoning(): adds 'thinking' key to the detail lookup chain
so Anthropic-format details are found alongside OpenRouter format
Salvaged from PR #4503 by @priveperfumes — focused on the thinking
block continuity fix only (cache strategy and other changes excluded).
Setup wizard now shows existing allowed_users when reconfiguring a
platform and preserves them if the user presses Enter. Previously the
wizard would display a misleading "No allowlist set" warning even when
the .env still held the original IDs.
Also downgrades the "provider X has no API key configured" log from
WARNING to DEBUG in resolve_provider_client — callers already handle
the None return with their own contextual messages. This eliminates
noisy startup warnings for providers in the fallback chain that the
user never configured (e.g. minimax).
- Add missing `from agent.credential_pool import load_pool` import to
auxiliary_client.py (introduced by the credential pool feature in main)
- Thread `args` through `select_provider_and_model(args=None)` so TLS
options from `cmd_model` reach `_model_flow_nous`
- Mock `_require_tty` in test_cmd_model_forwards_nous_login_tls_options
so it can run in non-interactive test environments
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
OpenAI's newer models (GPT-5, Codex) give stronger instruction-following
weight to the 'developer' role vs 'system'. Swap the role at the API
boundary in _build_api_kwargs() for the chat_completions path so internal
message representation stays consistent ('system' everywhere).
Applies regardless of provider — OpenRouter, Nous portal, direct, etc.
The codex_responses path (direct OpenAI) uses 'instructions' instead of
message roles, so it's unaffected.
DEVELOPER_ROLE_MODELS constant in prompt_builder.py defines the matching
model name substrings: ('gpt-5', 'codex').
When PyYAML is unavailable or YAML frontmatter is malformed, the fallback
parser may return metadata as a string instead of a dict. This causes
AttributeError when calling .get("hermes") on the string.
Added explicit type checks to handle cases where metadata or hermes fields
are not dicts, preventing the crash.
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
The total_tokens field includes cache_read + cache_write tokens, but
the display only showed input + output — making the math look wrong
(e.g. 765K + 134K displayed but total said 9.2M). Now shows a cache
line when cache tokens are present so all visible numbers sum to the
displayed total.
Affects both terminal (hermes insights) and gateway (/insights)
formats.
Show inline diffs in the CLI transcript when write_file, patch, or
skill_manage modifies files. Captures a filesystem snapshot before the
tool runs, computes a unified diff after, and renders it with ANSI
coloring in the activity feed.
Adds tool_start_callback and tool_complete_callback hooks to AIAgent
for pre/post tool execution notifications.
Also fixes _extract_parallel_scope_path to normalize relative paths
to absolute, preventing the parallel overlap detection from missing
conflicts when the same file is referenced with different path styles.
Gated by display.inline_diffs config option (default: true).
Based on PR #3774 by @kshitijk4poor.
Three bugs prevented credential pool rotation from working when multiple
Codex OAuth tokens were configured:
1. credential_pool was dropped during smart model turn routing.
resolve_turn_route() constructed runtime dicts without it, so the
AIAgent was created without pool access. Fixed in smart_model_routing.py
(no-route and fallback paths), cli.py, and gateway/run.py.
2. Eager fallback fired before pool rotation on 429. The rate-limit
handler at line ~7180 switched to a fallback provider immediately,
before _recover_with_credential_pool got a chance to rotate to the
next credential. Now deferred when the pool still has credentials.
3. (Non-issue) Retry budget was reported as too small, but successful
pool rotations already skip retry_count increment — no change needed.
Reported by community member Schinsly who identified all three root
causes and verified the fix locally with multiple Codex accounts.
Follow-up to PR #4305 — .config/gh was added to the write-deny list
but missed from _SENSITIVE_HOME_DIRS, leaving GitHub CLI OAuth tokens
exposed via @file:~/.config/gh/hosts.yml context injection.
- Add gho_, ghu_, ghs_, ghr_ prefix patterns (OAuth, user-to-server,
server-to-server, and refresh tokens) — all four types used by
GitHub Apps and Copilot auth flows were absent from _PREFIX_PATTERNS
- Snapshot HERMES_REDACT_SECRETS at module import time instead of
re-reading os.getenv() on every call, preventing runtime env mutations
(e.g. LLM-generated export commands) from disabling redaction
* feat(auth): add same-provider credential pools and rotation UX
Add same-provider credential pooling so Hermes can rotate across
multiple credentials for a single provider, recover from exhausted
credentials without jumping providers immediately, and configure
that behavior directly in hermes setup.
- agent/credential_pool.py: persisted per-provider credential pools
- hermes auth add/list/remove/reset CLI commands
- 429/402/401 recovery with pool rotation in run_agent.py
- Setup wizard integration for pool strategy configuration
- Auto-seeding from env vars and existing OAuth state
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Salvaged from PR #2647
* fix(tests): prevent pool auto-seeding from host env in credential pool tests
Tests for non-pool Anthropic paths and auth remove were failing when
host env vars (ANTHROPIC_API_KEY) or file-backed OAuth credentials
were present. The pool auto-seeding picked these up, causing unexpected
pool entries in tests.
- Mock _select_pool_entry in auxiliary_client OAuth flag tests
- Clear Anthropic env vars and mock _seed_from_singletons in auth remove test
* feat(auth): add thread safety, least_used strategy, and request counting
- Add threading.Lock to CredentialPool for gateway thread safety
(concurrent requests from multiple gateway sessions could race on
pool state mutations without this)
- Add 'least_used' rotation strategy that selects the credential
with the lowest request_count, distributing load more evenly
- Add request_count field to PooledCredential for usage tracking
- Add mark_used() method to increment per-credential request counts
- Wrap select(), mark_exhausted_and_rotate(), and try_refresh_current()
with lock acquisition
- Add tests: least_used selection, mark_used counting, concurrent
thread safety (4 threads × 20 selects with no corruption)
* feat(auth): add interactive mode for bare 'hermes auth' command
When 'hermes auth' is called without a subcommand, it now launches an
interactive wizard that:
1. Shows full credential pool status across all providers
2. Offers a menu: add, remove, reset cooldowns, set strategy
3. For OAuth-capable providers (anthropic, nous, openai-codex), the
add flow explicitly asks 'API key or OAuth login?' — making it
clear that both auth types are supported for the same provider
4. Strategy picker shows all 4 options (fill_first, round_robin,
least_used, random) with the current selection marked
5. Remove flow shows entries with indices for easy selection
The subcommand paths (hermes auth add/list/remove/reset) still work
exactly as before for scripted/non-interactive use.
* fix(tests): update runtime_provider tests for config.yaml source of truth (#4165)
Tests were using OPENAI_BASE_URL env var which is no longer consulted
after #4165. Updated to use model config (provider, base_url, api_key)
which is the new single source of truth for custom endpoint URLs.
* feat(auth): support custom endpoint credential pools keyed by provider name
Custom OpenAI-compatible endpoints all share provider='custom', making
the provider-keyed pool useless. Now pools for custom endpoints are
keyed by 'custom:<normalized_name>' where the name comes from the
custom_providers config list (auto-generated from URL hostname).
- Pool key format: 'custom:together.ai', 'custom:local-(localhost:8080)'
- load_pool('custom:name') seeds from custom_providers api_key AND
model.api_key when base_url matches
- hermes auth add/list now shows custom endpoints alongside registry
providers
- _resolve_openrouter_runtime and _resolve_named_custom_runtime check
pool before falling back to single config key
- 6 new tests covering custom pool keying, seeding, and listing
* docs: add Excalidraw diagram of full credential pool flow
Comprehensive architecture diagram showing:
- Credential sources (env vars, auth.json OAuth, config.yaml, CLI)
- Pool storage and auto-seeding
- Runtime resolution paths (registry, custom, OpenRouter)
- Error recovery (429 retry-then-rotate, 402 immediate, 401 refresh)
- CLI management commands and strategy configuration
Open at: https://excalidraw.com/#json=2Ycqhqpi6f12E_3ITyiwh,c7u9jSt5BwrmiVzHGbm87g
* fix(tests): update setup wizard pool tests for unified select_provider_and_model flow
The setup wizard now delegates to select_provider_and_model() instead
of using its own prompt_choice-based provider picker. Tests needed:
- Mock select_provider_and_model as no-op (provider pre-written to config)
- Call _stub_tts BEFORE custom prompt_choice mock (it overwrites it)
- Pre-write model.provider to config so the pool step is reached
* docs: add comprehensive credential pool documentation
- New page: website/docs/user-guide/features/credential-pools.md
Full guide covering quick start, CLI commands, rotation strategies,
error recovery, custom endpoint pools, auto-discovery, thread safety,
architecture, and storage format.
- Updated fallback-providers.md to reference credential pools as the
first layer of resilience (same-provider rotation before cross-provider)
- Added hermes auth to CLI commands reference with usage examples
- Added credential_pool_strategies to configuration guide
* chore: remove excalidraw diagram from repo (external link only)
* refactor: simplify credential pool code — extract helpers, collapse extras, dedup patterns
- _load_config_safe(): replace 4 identical try/except/import blocks
- _iter_custom_providers(): shared generator for custom provider iteration
- PooledCredential.extra dict: collapse 11 round-trip-only fields
(token_type, scope, client_id, portal_base_url, obtained_at,
expires_in, agent_key_id, agent_key_expires_in, agent_key_reused,
agent_key_obtained_at, tls) into a single extra dict with
__getattr__ for backward-compatible access
- _available_entries(): shared exhaustion-check between select and peek
- Dedup anthropic OAuth seeding (hermes_pkce + claude_code identical)
- SimpleNamespace replaces class _Args boilerplate in auth_commands
- _try_resolve_from_custom_pool(): shared pool-check in runtime_provider
Net -17 lines. All 383 targeted tests pass.
---------
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
OPENAI_BASE_URL was written to .env AND config.yaml, creating a dual-source
confusion. Users (especially Docker) would see the URL in .env and assume
that's where all config lives, then wonder why LLM_MODEL in .env didn't work.
Changes:
- Remove all 27 save_env_value("OPENAI_BASE_URL", ...) calls across main.py,
setup.py, and tools_config.py
- Remove OPENAI_BASE_URL env var reading from runtime_provider.py, cli.py,
models.py, and gateway/run.py
- Remove LLM_MODEL/HERMES_MODEL env var reading from gateway/run.py and
auxiliary_client.py — config.yaml model.default is authoritative
- Vision base URL now saved to config.yaml auxiliary.vision.base_url
(both setup wizard and tools_config paths)
- Tests updated to set config values instead of env vars
Convention enforced: .env is for SECRETS only (API keys). All other
configuration (model names, base URLs, provider selection) lives
exclusively in config.yaml.
- Add api.fireworks.ai to _URL_TO_PROVIDER for automatic provider detection
- Add fireworks to PROVIDER_TO_MODELS_DEV mapped to 'fireworks-ai' (the
correct models.dev provider key — original PR used 'fireworks' which
would silently fail the lookup)
Cherry-picked from PR #3989 with models.dev key fix.
Co-authored-by: sroecker <sroecker@users.noreply.github.com>
Claude Code >=2.1.81 checks for a 'scopes' array containing 'user:inference'
in ~/.claude/.credentials.json before accepting stored OAuth tokens as valid.
When Hermes refreshes the token, it writes only accessToken, refreshToken, and
expiresAt — omitting the scopes field. This causes Claude Code to report
'loggedIn: false' and refuse to start, even though the token is valid.
This commit:
- Parses the 'scope' field from the OAuth refresh response
- Passes it to _write_claude_code_credentials() as a keyword argument
- Persists the scopes array in the claudeAiOauth credential store
- Preserves existing scopes when the refresh response omits the field
Tested against Claude Code v2.1.87 on Linux — auth status correctly reports
loggedIn: true and claude --print works after this fix.
Co-authored-by: Nick <git@flybynight.io>
* fix: treat non-sk-ant- prefixed keys (Azure AI Foundry) as regular API keys, not OAuth tokens
* fix: treat non-sk-ant- keys as regular API keys, not OAuth tokens
_is_oauth_token() returned True for any key not starting with
sk-ant-api, misclassifying Azure AI Foundry keys as OAuth tokens
and sending Bearer auth instead of x-api-key → 401 rejection.
Real Anthropic OAuth tokens all start with sk-ant-oat (confirmed
from live .credentials.json). Non-sk-ant- keys are third-party
provider keys that should use x-api-key.
Test fixtures updated to use realistic sk-ant-oat01- prefixed
tokens instead of fake strings.
Salvaged from PR #4075 by @HangGlidersRule.
---------
Co-authored-by: Clawdbot <clawdbot@openclaw.ai>
MiniMax's /anthropic endpoints implement Anthropic's Messages API but
require Authorization: Bearer instead of x-api-key. Without this fix,
MiniMax users get 401 errors in gateway sessions.
Adds _requires_bearer_auth() to detect MiniMax endpoints and route
through auth_token in the Anthropic SDK. Check runs before OAuth
token detection so MiniMax keys aren't misclassified as setup tokens.
Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
ElevenLabs (sk_), Tavily (tvly-), and Exa (exa_) keys were not covered
by _PREFIX_PATTERNS, leaking in plain text via printenv or log output.
Salvaged from PR #3790 by @memosr. Tests rewritten with correct
assertions (original tests had vacuously true checks).
Co-authored-by: memosr <memosr@users.noreply.github.com>
* add .aac audio file format support to transcription tool
* fix(agent): support full context length resolution for direct Gemini API endpoints
Add generativelanguage.googleapis.com to _URL_TO_PROVIDER so direct
Gemini API users get correct 1M+ context length instead of the 128K
unknown-proxy fallback.
Co-authored-by: bb873 <bb873@users.noreply.github.com>
---------
Co-authored-by: Adrian Scott <adrian@adrianscott.com>
Co-authored-by: bb873 <bb873@users.noreply.github.com>
The auxiliary client's auto-detection chain was a black box — when
compression, summarization, or memory flush failed, the only clue was
a generic 'Request timed out' with no indication of which provider was
tried or why it was skipped.
Now logs at INFO level:
- 'Auxiliary auto-detect: using local/custom (qwen3.5-9b) — skipped:
openrouter, nous' when auto-detection picks a provider
- 'Auxiliary compression: using auto (qwen3.5-9b) at http://localhost:11434/v1'
before each auxiliary call
- 'Auxiliary compression: provider custom unavailable, falling back to
openrouter' on fallback
- Clear warning with actionable guidance when NO provider is available:
'Set OPENROUTER_API_KEY or configure a local model in config.yaml'
Local inference servers (Ollama, llama.cpp, vLLM, LM Studio) don't
require API keys, but the auxiliary client's _resolve_custom_runtime()
rejected endpoints with empty keys — causing the auto-detection chain
to skip the user's local server entirely. This broke compression,
summarization, and memory flush for users running local models without
an OpenRouter/cloud API key.
The main CLI already had this fix (PR #2556, 'no-key-required'
placeholder), but the auxiliary client's resolution path was missed.
Two fixes:
- _resolve_custom_runtime(): use 'no-key-required' placeholder instead
of returning None when base_url is present but key is empty
- resolve_provider_client() custom branch: same placeholder fallback
for explicit_base_url without explicit_api_key
Updates 2 tests that expected the old (broken) behavior.
Tool call previews (paths, commands, queries) were hardcoded to truncate
at 35-40 chars across CLI spinners, completion lines, and gateway progress
messages. Users could not see full file paths in tool output.
New config option: display.tool_preview_length (default 0 = no limit).
Set a positive number to truncate at that length.
Changes:
- display.py: module-level _tool_preview_max_len with getter/setter;
build_tool_preview() and get_cute_tool_message() _trunc/_path respect it
- cli.py: reads config at startup, spinner widget respects config
- gateway/run.py: reads config per-message, progress callback respects config
- run_agent.py: removed redundant 30-char quiet-mode spinner truncation
- config.py: added display.tool_preview_length to DEFAULT_CONFIG
Reported by kriskaminski
Add skills.external_dirs config option — a list of additional directories
to scan for skills alongside ~/.hermes/skills/. External dirs are read-only:
skill creation/editing always writes to the local dir. Local skills take
precedence when names collide.
This lets users share skills across tools/agents without copying them into
Hermes's own directory (e.g. ~/.agents/skills, /shared/team-skills).
Changes:
- agent/skill_utils.py: add get_external_skills_dirs() and get_all_skills_dirs()
- agent/prompt_builder.py: scan external dirs in build_skills_system_prompt()
- tools/skills_tool.py: _find_all_skills() and skill_view() search external dirs;
security check recognizes configured external dirs as trusted
- agent/skill_commands.py: /skill slash commands discover external skills
- hermes_cli/config.py: add skills.external_dirs to DEFAULT_CONFIG
- cli-config.yaml.example: document the option
- tests/agent/test_external_skills.py: 11 tests covering discovery, precedence,
deduplication, and skill_view for external skills
Requested by community member primco.
Background agent's KawaiiSpinner wrote \r-based animation and stop()
messages through StdoutProxy, colliding with prompt_toolkit's status bar.
Two fixes:
- display.py: use isinstance(out, StdoutProxy) instead of fragile
hasattr+name check for detecting prompt_toolkit's stdout wrapper
- cli.py: silence bg agent's raw spinner (_print_fn=no-op) and route
thinking updates through the TUI widget only when no foreground
agent is active; clear spinner text in finally block with same guard
Closes#2718
Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>