Commit Graph

612 Commits

Author SHA1 Message Date
alt-glitch
1010e5fa3c refactor: remove redundant local imports already available at module level
Sweep ~74 redundant local imports across 21 files where the same module
was already imported at the top level. Also includes type fixes and lint
cleanups on the same branch.
2026-04-21 00:50:58 -07:00
Teknium
328223576b
feat(skills+terminal): make bundled skill scripts runnable out of the box (#13384)
* feat(skills): inject absolute skill dir and expand ${HERMES_SKILL_DIR} templates

When a skill loads, the activation message now exposes the absolute
skill directory and substitutes ${HERMES_SKILL_DIR} /
${HERMES_SESSION_ID} tokens in the SKILL.md body, so skills with
bundled scripts can instruct the agent to run them by absolute path
without an extra skill_view round-trip.

Also adds opt-in inline-shell expansion: !`cmd` snippets in SKILL.md
are pre-executed (with the skill directory as CWD) and their stdout is
inlined into the message before the agent reads it. Off by default —
enable via skills.inline_shell in config.yaml — because any snippet
runs on the host without approval.

Changes:
- agent/skill_commands.py: template substitution, inline-shell
  expansion, absolute skill-dir header, supporting-files list now
  shows both relative and absolute forms.
- hermes_cli/config.py: new skills.template_vars,
  skills.inline_shell, skills.inline_shell_timeout knobs.
- tests/agent/test_skill_commands.py: coverage for header, both
  template tokens (present and missing session id), template_vars
  disable, inline-shell default-off, enabled, CWD, and timeout.
- website/docs/developer-guide/creating-skills.md: documents the
  template tokens, the absolute-path header, and the opt-in inline
  shell with its security caveat.

Validation: tests/agent/ 1591 passed (includes 9 new tests).
E2E: loaded a real skill in an isolated HERMES_HOME; confirmed
${HERMES_SKILL_DIR} resolves to the absolute path, ${HERMES_SESSION_ID}
resolves to the passed task_id, !`date` runs when opt-in is set, and
stays literal when it isn't.

* feat(terminal): source ~/.bashrc (and user-listed init files) into session snapshot

bash login shells don't source ~/.bashrc, so tools that install themselves
there — nvm, asdf, pyenv, cargo, custom PATH exports — stay invisible to
the environment snapshot Hermes builds once per session.  Under systemd
or any context with a minimal parent env, that surfaces as
'node: command not found' in the terminal tool even though the binary
is reachable from every interactive shell on the machine.

Changes:
- tools/environments/local.py: before the login-shell snapshot bootstrap
  runs, prepend guarded 'source <file>' lines for each resolved init
  file.  Missing files are skipped, each source is wrapped with a
  '[ -r ... ] && . ... || true' guard so a broken rc can't abort the
  bootstrap.
- hermes_cli/config.py: new terminal.shell_init_files (explicit list,
  supports ~ and ${VAR}) and terminal.auto_source_bashrc (default on)
  knobs.  When shell_init_files is set it takes precedence; when it's
  empty and auto_source_bashrc is on, ~/.bashrc gets auto-sourced.
- tests/tools/test_local_shell_init.py: 10 tests covering the resolver
  (auto-bashrc, missing file, explicit override, ~/${VAR} expansion,
  opt-out) and the prelude builder (quoting, guarded sourcing), plus
  a real-LocalEnvironment snapshot test that confirms exports in the
  init file land in subsequent commands' environment.
- website/docs/reference/faq.md: documents the fix in Troubleshooting,
  including the zsh-user pattern of sourcing ~/.zshrc or nvm.sh
  directly via shell_init_files.

Validation: 10/10 new tests pass; tests/tools/test_local_*.py 40/40
pass; tests/agent/ 1591/1591 pass; tests/hermes_cli/test_config.py
50/50 pass.  E2E in an isolated HERMES_HOME: confirmed that a fake
~/.bashrc setting a marker var and PATH addition shows up in a real
LocalEnvironment().execute() call, that auto_source_bashrc=false
suppresses it, that an explicit shell_init_files entry wins over the
auto default, and that a missing bashrc is silently skipped.
2026-04-21 00:39:19 -07:00
Teknium
62cbeb6367
test: stop testing mutable data — convert change-detectors to invariants (#13363)
Catalog snapshots, config version literals, and enumeration counts are data
that changes as designed. Tests that assert on those values add no
behavioral coverage — they just break CI on every routine update and cost
engineering time to 'fix.'

Replace with invariants where one exists, delete where none does.

Deleted (pure snapshots):
- TestMinimaxModelCatalog (3 tests): 'MiniMax-M2.7 in models' et al
- TestGeminiModelCatalog: 'gemini-2.5-pro in models', 'gemini-3.x in models'
- test_browser_camofox_state::test_config_version_matches_current_schema
  (docstring literally said it would break on unrelated bumps)

Relaxed (keep plumbing check, drop snapshot):
- Xiaomi / Arcee / Kimi moonshot / Kimi coding / HuggingFace static lists:
  now assert 'provider exists and has >= 1 entry' instead of specific names
- HuggingFace main/models.py consistency test: drop 'len >= 6' floor

Dynamicized (follow source, not a literal):
- 3x test_config.py migration tests: raw['_config_version'] ==
  DEFAULT_CONFIG['_config_version'] instead of hardcoded 21

Fixed stale tests against intentional behavior changes:
- test_insights::test_gateway_format_hides_cost: name matches new behavior
  (no dollar figures); remove contradicting '$' in text assertion
- test_config::prefers_api_then_url_then_base_url: flipped per PR #9332;
  rename + update to base_url > url > api
- test_anthropic_adapter: relax assert_called_once() (xdist-flaky) to
  assert called — contract is 'credential flowed through'
- test_interrupt_propagation: add provider/model/_base_url to bare-agent
  fixture so the stale-timeout code path resolves

Fixed stale integration tests against opt-in plugin gate:
- transform_tool_result + transform_terminal_output: write plugins.enabled
  allow-list to config.yaml and reset the plugin manager singleton

Source fix (real consistency invariant):
- agent/model_metadata.py: add moonshotai/Kimi-K2.6 context length
  (262144, same as K2.5). test_model_metadata_has_context_lengths was
  correctly catching the gap.

Policy:
- AGENTS.md Testing section: new subsection 'Don't write change-detector
  tests' with do/don't examples. Reviewers should reject catalog-snapshot
  assertions in new tests.

Covers every test that failed on the last completed main CI run
(24703345583) except test_modal_sandbox_fixes::test_terminal_tool_present
+ test_terminal_and_file_toolsets_resolve_all_tools, which now pass both
alone and with the full tests/tools/ directory (xdist ordering flake that
resolved itself).
2026-04-20 23:20:33 -07:00
kshitijk4poor
7ab5eebd03 feat: add transport types + migrate Anthropic normalize path
Add agent/transports/types.py with three shared dataclasses:
- NormalizedResponse: content, tool_calls, finish_reason, reasoning, usage, provider_data
- ToolCall: id, name, arguments, provider_data (per-tool-call protocol metadata)
- Usage: prompt_tokens, completion_tokens, total_tokens, cached_tokens

Add normalize_anthropic_response_v2() to anthropic_adapter.py — wraps the
existing v1 function and maps its output to NormalizedResponse. One call site
in run_agent.py (the main normalize branch) uses v2 with a back-compat shim
to SimpleNamespace for downstream code.

No ABC, no registry, no streaming, no client lifecycle. Those land in PR 3
with the first concrete transport (AnthropicTransport).

46 new tests:
- test_types.py: dataclass construction, build_tool_call, map_finish_reason
- test_anthropic_normalize_v2.py: v1-vs-v2 regression tests (text, tools,
  thinking, mixed, stop reasons, mcp prefix stripping, edge cases)

Part of the provider transport refactor (PR 2 of 9).
2026-04-20 23:06:00 -07:00
Teknium
dbb7e00e7e fix: sweep remaining provider-URL substring checks across codebase
Completes the hostname-hardening sweep — every substring check against a
provider host in live-routing code is now hostname-based. This closes the
same false-positive class for OpenRouter, GitHub Copilot, Kimi, Qwen,
ChatGPT/Codex, Bedrock, GitHub Models, Vercel AI Gateway, Nous, Z.AI,
Moonshot, Arcee, and MiniMax that the original PR closed for OpenAI, xAI,
and Anthropic.

New helper:
- utils.base_url_host_matches(base_url, domain) — safe counterpart to
  'domain in base_url'. Accepts hostname equality and subdomain matches;
  rejects path segments, host suffixes, and prefix collisions.

Call sites converted (real-code only; tests, optional-skills, red-teaming
scripts untouched):

run_agent.py (10 sites):
- AIAgent.__init__ Bedrock branch, ChatGPT/Codex branch (also path check)
- header cascade for openrouter / copilot / kimi / qwen / chatgpt
- interleaved-thinking trigger (openrouter + claude)
- _is_openrouter_url(), _is_qwen_portal()
- is_native_anthropic check
- github-models-vs-copilot detection (3 sites)
- reasoning-capable route gate (nousresearch, vercel, github)
- codex-backend detection in API kwargs build
- fallback api_mode Bedrock detection

agent/auxiliary_client.py (7 sites):
- extra-headers cascades in 4 distinct client-construction paths
  (resolve custom, resolve auto, OpenRouter-fallback-to-custom,
  _async_client_from_sync, resolve_provider_client explicit-custom,
  resolve_auto_with_codex)
- _is_openrouter_client() base_url sniff

agent/usage_pricing.py:
- resolve_billing_route openrouter branch

agent/model_metadata.py:
- _is_openrouter_base_url(), Bedrock context-length lookup

hermes_cli/providers.py:
- determine_api_mode Bedrock heuristic

hermes_cli/runtime_provider.py:
- _is_openrouter_url flag for API-key preference (issues #420, #560)

hermes_cli/doctor.py:
- Kimi User-Agent header for /models probes

tools/delegate_tool.py:
- subagent Codex endpoint detection

trajectory_compressor.py:
- _detect_provider() cascade (8 providers: openrouter, nous, codex, zai,
  kimi-coding, arcee, minimax-cn, minimax)

cli.py, gateway/run.py:
- /model-switch cache-enabled hint (openrouter + claude)

Bedrock detection tightened from 'bedrock-runtime in url' to
'hostname starts with bedrock-runtime. AND host is under amazonaws.com'.
ChatGPT/Codex detection tightened from 'chatgpt.com/backend-api/codex in
url' to 'hostname is chatgpt.com AND path contains /backend-api/codex'.

Tests:
- tests/test_base_url_hostname.py extended with a base_url_host_matches
  suite (exact match, subdomain, path-segment rejection, host-suffix
  rejection, host-prefix rejection, empty-input, case-insensitivity,
  trailing dot).

Validation: 651 targeted tests pass (runtime_provider, minimax, bedrock,
gemini, auxiliary, codex_cloudflare, usage_pricing, compressor_fallback,
fallback_model, openai_client_lifecycle, provider_parity, cli_provider_resolution,
delegate, credential_pool, context_compressor, plus the 4 hostname test
modules). 26-assertion E2E call-site verification across 6 modules passes.
2026-04-20 22:14:29 -07:00
Teknium
cecf84daf7 fix: extend hostname-match provider detection across remaining call sites
Aslaaen's fix in the original PR covered _detect_api_mode_for_url and the
two openai/xai sites in run_agent.py. This finishes the sweep: the same
substring-match false-positive class (e.g. https://api.openai.com.evil/v1,
https://proxy/api.openai.com/v1, https://api.anthropic.com.example/v1)
existed in eight more call sites, and the hostname helper was duplicated
in two modules.

- utils: add shared base_url_hostname() (single source of truth).
- hermes_cli/runtime_provider, run_agent: drop local duplicates, import
  from utils. Reuse the cached AIAgent._base_url_hostname attribute
  everywhere it's already populated.
- agent/auxiliary_client: switch codex-wrap auto-detect, max_completion_tokens
  gate (auxiliary_max_tokens_param), and custom-endpoint max_tokens kwarg
  selection to hostname equality.
- run_agent: native-anthropic check in the Claude-style model branch
  and in the AIAgent init provider-auto-detect branch.
- agent/model_metadata: Anthropic /v1/models context-length lookup.
- hermes_cli/providers.determine_api_mode: anthropic / openai URL
  heuristics for custom/unknown providers (the /anthropic path-suffix
  convention for third-party gateways is preserved).
- tools/delegate_tool: anthropic detection for delegated subagent
  runtimes.
- hermes_cli/setup, hermes_cli/tools_config: setup-wizard vision-endpoint
  native-OpenAI detection (paired with deduping the repeated check into
  a single is_native_openai boolean per branch).

Tests:
- tests/test_base_url_hostname.py covers the helper directly
  (path-containing-host, host-suffix, trailing dot, port, case).
- tests/hermes_cli/test_determine_api_mode_hostname.py adds the same
  regression class for determine_api_mode, plus a test that the
  /anthropic third-party gateway convention still wins.

Also: add asslaenn5@gmail.com → Aslaaen to scripts/release.py AUTHOR_MAP.
2026-04-20 22:14:29 -07:00
jerilynzheng
b117538798 feat: attribution default_headers for ai-gateway provider
Requests through Vercel AI Gateway now carry referrerUrl / appName /
User-Agent attribution so traffic shows up in the gateway's analytics.
Adds _AI_GATEWAY_HEADERS in auxiliary_client and a new
ai-gateway.vercel.sh branch in _apply_client_headers_for_base_url.
2026-04-20 21:02:28 -07:00
Peter Fontana
3988c3c245 feat: shell hooks — wire shell scripts as Hermes hook callbacks
Users can declare shell scripts in config.yaml under a hooks: block that
fire on plugin-hook events (pre_tool_call, post_tool_call, pre_llm_call,
subagent_stop, etc). Scripts receive JSON on stdin, can return JSON on
stdout to block tool calls or inject context pre-LLM.

Key design:
- Registers closures on existing PluginManager._hooks dict — zero changes
  to invoke_hook() call sites
- subprocess.run(shell=False) via shlex.split — no shell injection
- First-use consent per (event, command) pair, persisted to allowlist JSON
- Bypass via --accept-hooks, HERMES_ACCEPT_HOOKS=1, or hooks_auto_accept
- hermes hooks list/test/revoke/doctor CLI subcommands
- Adds subagent_stop hook event fired after delegate_task children exit
- Claude Code compatible response shapes accepted

Cherry-picked from PR #13143 by @pefontana.
2026-04-20 20:53:51 -07:00
Tanner Fokkens
cde7283821 fix: forward auth when probing local model metadata
Pass the user's configured api_key through local-server detection and
context-length probes (detect_local_server_type, _query_local_context_length,
query_ollama_num_ctx) and use LM Studio's native /api/v1/models endpoint in
fetch_endpoint_model_metadata when a loaded instance is present — so the
probed context length is the actual runtime value the user loaded the model
at, not just the model's theoretical max.

Helps local-LLM users whose auto-detected context length was wrong, causing
compression failures and context-overrun crashes.
2026-04-20 20:51:56 -07:00
entropidelic
3368814a3d fix(security): redact secrets from context compaction input and output
Three-layer defense against secrets leaking into compaction summaries:
1. Input redaction: redact_sensitive_text() on message content and tool
   call arguments in _serialize_for_summary() before sending to summarizer
2. Prompt instructions: NEVER include API keys/tokens/passwords in the
   summarizer preamble, template Critical Context section, and focus topic
3. Output redaction: redact_sensitive_text() on the summary output and
   _previous_summary for iterative updates

Reuses existing agent/redact.py patterns (sk-*, ghp_*, key=value, etc).

Cherry-picked from PR #9200 by @entropidelic.
2026-04-20 16:07:13 -07:00
Teknium
3cba81ebed
fix(kimi): omit temperature entirely for Kimi/Moonshot models (#13157)
Kimi's gateway selects the correct temperature server-side based on the
active mode (thinking -> 1.0, non-thinking -> 0.6).  Sending any
temperature value — even the previously "correct" one — conflicts with
gateway-managed defaults.

Replaces the old approach of forcing specific temperature values (0.6
for non-thinking, 1.0 for thinking) with an OMIT_TEMPERATURE sentinel
that tells all call sites to strip the temperature key from API kwargs
entirely.

Changes:
- agent/auxiliary_client.py: OMIT_TEMPERATURE sentinel, _is_kimi_model()
  prefix check (covers all kimi-* models), _fixed_temperature_for_model()
  returns sentinel for kimi models.  _build_call_kwargs() strips temp.
- run_agent.py: _build_api_kwargs, flush_memories, and summary generation
  paths all handle the sentinel by popping/omitting temperature.
- trajectory_compressor.py: _effective_temperature_for_model returns None
  for kimi (sentinel mapped), direct client calls use kwargs dict to
  conditionally include temperature.
- mini_swe_runner.py: same sentinel handling via wrapper function.
- 6 test files updated: all 'forces temperature X' assertions replaced
  with 'temperature not in kwargs' assertions.

Net: -76 lines (171 added, 247 removed).
Inspired by PR #13137 (@kshitijk4poor).
2026-04-20 12:23:05 -07:00
kshitijk4poor
ff56bebdf3 refactor: extract codex_responses logic into dedicated adapter
Extract 12 Codex Responses API format-conversion and normalization functions
from run_agent.py into agent/codex_responses_adapter.py, following the
existing pattern of anthropic_adapter.py and bedrock_adapter.py.

run_agent.py: 12,550 → 11,865 lines (-685 lines)

Functions moved:
- _chat_content_to_responses_parts (multimodal content conversion)
- _summarize_user_message_for_log (multimodal message logging)
- _deterministic_call_id (cache-safe fallback IDs)
- _split_responses_tool_id (composite ID splitting)
- _derive_responses_function_call_id (fc_ prefix conversion)
- _responses_tools (schema format conversion)
- _chat_messages_to_responses_input (message format conversion)
- _preflight_codex_input_items (input validation)
- _preflight_codex_api_kwargs (API kwargs validation)
- _extract_responses_message_text (response text extraction)
- _extract_responses_reasoning_text (reasoning extraction)
- _normalize_codex_response (full response normalization)

All functions are stateless module-level functions. AIAgent methods remain
as thin one-line wrappers. Both module-level helpers are re-exported from
run_agent.py for backward compatibility with existing test imports.

Includes multimodal inline image support (PR #12969) that the original PR
was missing.

Based on PR #12975 by @kshitijk4poor.
2026-04-20 11:53:17 -07:00
Teknium
d587d62eba
feat: replace kimi-k2.5 with kimi-k2.6 on OpenRouter and Nous Portal (#13148)
* feat(security): URL query param + userinfo + form body redaction

Port from nearai/ironclaw#2529.

Hermes already has broad value-shape coverage in agent/redact.py
(30+ vendor prefixes, JWTs, DB connstrs, etc.) but missed three
key-name-based patterns that catch opaque tokens without recognizable
prefixes:

1. URL query params - OAuth callback codes (?code=...),
   access_token, refresh_token, signature, etc. These are opaque and
   won't match any prefix regex. Now redacted by parameter NAME.

2. URL userinfo (https://user:pass@host) - for non-DB schemes. DB
   schemes were already handled by _DB_CONNSTR_RE.

3. Form-urlencoded body (k=v pairs joined by ampersands) -
   conservative, only triggers on clean pure-form inputs with no
   other text.

Sensitive key allowlist matches ironclaw's (exact case-insensitive,
NOT substring - so token_count and session_id pass through).

Tests: +20 new test cases across 3 test classes. All 75 redact tests
pass; gateway/test_pii_redaction and tools/test_browser_secret_exfil
also green.

Known pre-existing limitation: _ENV_ASSIGN_RE greedy match swallows
whole all-caps ENV-style names + trailing text when followed by
another assignment. Left untouched here (out of scope); URL query
redaction handles the lowercase case.

* feat: replace kimi-k2.5 with kimi-k2.6 on OpenRouter and Nous Portal

Update model catalogs for OpenRouter (fallback snapshot), Nous Portal,
and NVIDIA NIM to reference moonshotai/kimi-k2.6.  Add kimi-k2.6 to
the fixed-temperature frozenset in auxiliary_client.py so the 0.6
contract is enforced on aggregator routings.

Native Moonshot provider lists (kimi-coding, kimi-coding-cn, moonshot,
opencode-zen, opencode-go) are unchanged — those use Moonshot's own
model IDs which are unaffected.
2026-04-20 11:49:54 -07:00
Austin Pickett
720e1c65b2
Merge branch 'main' into feat/dashboard-skill-analytics 2026-04-20 05:25:49 -07:00
kshitijk4poor
bc2559c44d fix: remove codex spark model support
Drop gpt-5.3-codex-spark from Codex forward-compat synthesis,
provider catalogs, and context metadata now that the API no longer
supports it.
2026-04-20 04:51:44 -07:00
Linux2010
b869bf206c fix(error_classifier): handle dict-typed message fields without crashing
When API providers return Pydantic-style validation errors where
body['message'] or body['error']['message'] is a dict (e.g.
{"detail": [...]}), the error classifier was crashing with
AttributeError: 'dict' object has no attribute 'lower'.

The 'or ""' fallback only handles None/falsy values. A non-empty
dict is truthy and passes through to .lower(), which fails.

Fix: Wrap all 5 call sites with str() before calling .lower().
This is a no-op for strings and safely converts dicts to their
repr for pattern matching (no false positives on classification
patterns like 'rate limit', 'context length', etc.).

Closes #11233
2026-04-20 02:40:20 -07:00
haileymarshall
49282b6e04 fix(gemini): assign unique stream indices to parallel tool calls
The streaming translator in agent/gemini_cloudcode_adapter.py keyed OpenAI
tool-call indices by function name, so when the model emitted multiple
parallel functionCall parts with the same name in a single turn (e.g.
three read_file calls in one response), they all collapsed onto index 0.
Downstream aggregators that key chunks by index would overwrite or drop
all but the first call.

Replace the name-keyed dict with a per-stream counter that persists across
SSE events. Each functionCall part now gets a fresh, unique index,
matching the non-streaming path which already uses enumerate(parts).

Add TestTranslateStreamEvent covering parallel-same-name calls, index
persistence across events, and finish-reason promotion to tool_calls.
2026-04-20 02:10:53 -07:00
Ruzzgar
60236862ee fix(agent): fall back when rg is blocked for @folder references 2026-04-20 01:56:41 -07:00
helix4u
6ab78401c9 fix(aux): add session_search extra_body and concurrency controls
Adds auxiliary.<task>.extra_body config passthrough so reasoning-heavy
OpenAI-compatible providers can receive provider-specific request fields
(e.g. enable_thinking: false on GLM) on auxiliary calls, and bounds
session_search summary fan-out with auxiliary.session_search.max_concurrency
(default 3, clamped 1-5) to avoid 429 bursts on small providers.

- agent/auxiliary_client.py: extract _get_auxiliary_task_config helper,
  add _get_task_extra_body, merge config+explicit extra_body with explicit winning
- hermes_cli/config.py: extra_body defaults on all aux tasks +
  session_search.max_concurrency; _config_version 19 -> 20
- tools/session_search_tool.py: semaphore around _summarize_all gather
- tests: coverage in test_auxiliary_client, test_session_search, test_aux_config
- docs: user-guide/configuration.md + fallback-providers.md

Co-authored-by: Teknium <teknium@nousresearch.com>
2026-04-20 00:47:39 -07:00
kagura-agent
9b60ffc47f fix: include api.moonshot.cn in public API temperature override (#12745)
kimi-k2.5 on api.moonshot.cn/v1 rejects temperature=0.6 with HTTP 400, same
as api.moonshot.ai. The public API check now matches both domains.
2026-04-20 00:32:06 -07:00
helix4u
8155ebd7c4 fix(gemini): sanitize tool schemas for Google providers 2026-04-20 00:26:18 -07:00
Teknium
fc5fda5e38
fix(display): render <missing old_text> in memory previews instead of empty quotes (#12852)
When the model omits old_text on memory replace/remove, the tool preview
rendered as '~memory: ""' / '-memory: ""', which obscured what went wrong.
Render '<missing old_text>' in that case so the failure mode is legible
in the activity feed.

Narrow salvage from #12456 / #12831 — only the display-layer fix, not the
schema/API changes.
2026-04-19 22:45:47 -07:00
Teknium
65a31ee0d5
fix(anthropic): complete third-party Anthropic-compatible provider support (#12846)
Third-party gateways that speak the native Anthropic protocol (MiniMax,
Zhipu GLM, Alibaba DashScope, Kimi, LiteLLM proxies) now work end-to-end
with the same feature set as direct api.anthropic.com callers.  Synthesizes
eight stale community PRs into one consolidated change.

Five fixes:

- URL detection: consolidate three inline `endswith("/anthropic")`
  checks in runtime_provider.py into the shared _detect_api_mode_for_url
  helper.  Third-party /anthropic endpoints now auto-resolve to
  api_mode=anthropic_messages via one code path instead of three.

- OAuth leak-guard: all five sites that assign `_is_anthropic_oauth`
  (__init__, switch_model, _try_refresh_anthropic_client_credentials,
  _swap_credential, _try_activate_fallback) now gate on
  `provider == "anthropic"` so a stale ANTHROPIC_TOKEN never trips
  Claude-Code identity injection on third-party endpoints.  Previously
  only 2 of 5 sites were guarded.

- Prompt caching: new method `_anthropic_prompt_cache_policy()` returns
  `(should_cache, use_native_layout)` per endpoint.  Replaces three
  inline conditions and the `native_anthropic=(api_mode=='anthropic_messages')`
  call-site flag.  Native Anthropic and third-party Anthropic gateways
  both get the native cache_control layout; OpenRouter gets envelope
  layout.  Layout is persisted in `_primary_runtime` so fallback
  restoration preserves the per-endpoint choice.

- Auxiliary client: `_try_custom_endpoint` honors
  `api_mode=anthropic_messages` and builds `AnthropicAuxiliaryClient`
  instead of silently downgrading to an OpenAI-wire client.  Degrades
  gracefully to OpenAI-wire when the anthropic SDK isn't installed.

- Config hygiene: `_update_config_for_provider` (hermes_cli/auth.py)
  clears stale `api_key`/`api_mode` when switching to a built-in
  provider, so a previous MiniMax custom endpoint's credentials can't
  leak into a later OpenRouter session.

- Truncation continuation: length-continuation and tool-call-truncation
  retry now cover `anthropic_messages` in addition to `chat_completions`
  and `bedrock_converse`.  Reuses the existing `_build_assistant_message`
  path via `normalize_anthropic_response()` so the interim message
  shape is byte-identical to the non-truncated path.

Tests: 6 new files, 42 test cases.  Targeted run + tests/run_agent,
tests/agent, tests/hermes_cli all pass (4554 passed).

Synthesized from (credits preserved via Co-authored-by trailers):
  #7410  @nocoo           — URL detection helper
  #7393  @keyuyuan        — OAuth 5-site guard
  #7367  @n-WN            — OAuth guard (narrower cousin, kept comment)
  #8636  @sgaofen         — caching helper + native-vs-proxy layout split
  #10954 @Only-Code-A     — caching on anthropic_messages+Claude
  #7648  @zhongyueming1121 — aux client anthropic_messages branch
  #6096  @hansnow         — /model switch clears stale api_mode
  #9691  @TroyMitchell911 — anthropic_messages truncation continuation

Closes: #7366, #8294 (third-party Anthropic identity + caching).
Supersedes: #7410, #7367, #7393, #8636, #10954, #7648, #6096, #9691.
Rejects:    #9621 (OpenAI-wire caching with incomplete blocklist — risky),
            #7242 (superseded by #9691, stale branch),
            #8321 (targets smart_model_routing which was removed in #12732).

Co-authored-by: nocoo <nocoo@users.noreply.github.com>
Co-authored-by: Keyu Yuan <leoyuan0099@gmail.com>
Co-authored-by: Zoee <30841158+n-WN@users.noreply.github.com>
Co-authored-by: sgaofen <135070653+sgaofen@users.noreply.github.com>
Co-authored-by: Only-Code-A <bxzt2006@163.com>
Co-authored-by: zhongyueming <mygamez@163.com>
Co-authored-by: Xiaohan Li <hansnow@users.noreply.github.com>
Co-authored-by: Troy Mitchell <i@troy-y.org>
2026-04-19 22:43:09 -07:00
taeng0204
6f79b8f01d fix(kimi): route temperature override by base_url — kimi-k2.5 needs 1.0 on api.moonshot.ai
Follow-up to #12144.  That PR standardized the kimi-k2.* temperature lock
against the Coding Plan endpoint (api.kimi.com/coding/v1) docs, where
non-thinking models require 0.6.  Verified empirically against Moonshot
(April 2026) that the public chat endpoint (api.moonshot.ai/v1) has a
different contract for kimi-k2.5: it only accepts temperature=1, and rejects
0.6 with:

    HTTP 400 "invalid temperature: only 1 is allowed for this model"

Users hit the public endpoint when KIMI_API_KEY is a legacy sk-* key (the
sk-kimi-* prefix routes to Coding Plan — see hermes_cli/auth.py).  So for
Coding Plan subscribers the fix from #12144 is correct, but for public-API
users it reintroduces the exact 400 reported in #9125.

Reproduction on api.moonshot.ai/v1 + kimi-k2.5:
  temperature=1.0 → 200 OK
  temperature=0.6 → 400 "only 1 is allowed"     ← #12144 default
  temperature=None → 200 OK

Other kimi-k2.* models are unaffected empirically — turbo-preview accepts
0.6 and thinking-turbo accepts 1.0 on both endpoints — so only kimi-k2.5
diverges.

Fix: thread the client's actual base_url through _build_call_kwargs (the
parameter already existed but callers passed config-level resolved_base_url;
for auto-detected routes that was often empty).  _fixed_temperature_for_model
now checks api.moonshot.ai first via an explicit _KIMI_PUBLIC_API_OVERRIDES
map, then falls back to the Coding Plan defaults.  Tests parametrize over
endpoint + model to lock both contracts.

Closes #9125.
2026-04-19 18:54:35 -07:00
Teknium
424e9f36b0
refactor: remove smart_model_routing feature (#12732)
Smart model routing (auto-routing short/simple turns to a cheap model
across providers) was opt-in and disabled by default.  This removes the
feature wholesale: the routing module, its config keys, docs, tests, and
the orchestration scaffolding it required in cli.py / gateway/run.py /
cron/scheduler.py.

The /fast (Priority Processing / Anthropic fast mode) feature kept its
hooks into _resolve_turn_agent_config — those still build a route dict
and attach request_overrides when the model supports it; the route now
just always uses the session's primary model/provider rather than
running prompts through choose_cheap_model_route() first.

Also removed:
- DEFAULT_CONFIG['smart_model_routing'] block and matching commented-out
  example sections in hermes_cli/config.py and cli-config.yaml.example
- _load_smart_model_routing() / self._smart_model_routing on GatewayRunner
- self._smart_model_routing / self._active_agent_route_signature on
  HermesCLI (signature kept; just no longer initialised through the
  smart-routing pipeline)
- route_label parameter on HermesCLI._init_agent (only set by smart
  routing; never read elsewhere)
- 'Smart Model Routing' section in website/docs/integrations/providers.md
- tip in hermes_cli/tips.py
- entries in hermes_cli/dump.py + hermes_cli/web_server.py
- row in skills/autonomous-ai-agents/hermes-agent/SKILL.md

Tests:
- Deleted tests/agent/test_smart_model_routing.py
- Rewrote tests/agent/test_credential_pool_routing.py to target the
  simplified _resolve_turn_agent_config directly (preserves credential
  pool propagation + 429 rotation coverage)
- Dropped 'cheap model' test from test_cli_provider_resolution.py
- Dropped resolve_turn_route patches from cli + gateway test_fast_command
  — they now exercise the real method end-to-end
- Removed _smart_model_routing stub assignments from gateway/cron test
  helpers

Targeted suites: 74/74 in the directly affected test files;
tests/agent + tests/cron + tests/cli pass except 5 failures that
already exist on main (cron silent-delivery + alias quick-command).
2026-04-19 18:12:55 -07:00
kshitijk4poor
d393104bad fix(gemini): tighten native routing and streaming replay
- only use the native adapter for the canonical Gemini native endpoint
- keep custom and /openai base URLs on the OpenAI-compatible path
- preserve Hermes keepalive transport injection for native Gemini clients
- stabilize streaming tool-call replay across repeated SSE events
- add follow-up tests for base_url precedence, async streaming, and duplicate tool-call chunks
2026-04-19 12:40:08 -07:00
kshitijk4poor
3dea497b20 feat(providers): route gemini through the native AI Studio API
- add a native Gemini adapter over generateContent/streamGenerateContent
- switch the built-in gemini provider off the OpenAI-compatible endpoint
- preserve thought signatures and native functionResponse replay
- route auxiliary Gemini clients through the same adapter
- add focused unit coverage plus native-provider integration checks
2026-04-19 12:40:08 -07:00
Teknium
db60c98276
docs(memory): steer agents to save declarative facts, not instructions (#12665)
Imperative memory entries ('Always respond concisely', 'Run tests with
pytest -n 4') get re-read as directives in future sessions, causing
repeated work or overriding the user's current request. Add a short
phrasing guideline to MEMORY_GUIDANCE so the model writes declarative
facts instead ('User prefers concise responses', 'Project uses pytest
with xdist').

Credit: observation from @Mariandipietra on X.
2026-04-19 12:00:53 -07:00
Teknium
cca3278079 fix(codex): pin correct Cloudflare headers and extend to auxiliary client
The cherry-picked salvage (admin28980's commit) added codex headers only on the
primary chat client path, with two inaccuracies:

  - originator was 'hermes-agent' — Cloudflare whitelists codex_cli_rs,
    codex_vscode, codex_sdk_ts, and Codex* prefixes. 'hermes-agent' isn't on
    the list, so the header had no mitigating effect on the 403 (the
    account-id header alone may have been carrying the fix).
  - account-id header was 'ChatGPT-Account-Id' — upstream codex-rs auth.rs
    uses canonical 'ChatGPT-Account-ID' (PascalCase, trailing -ID).

Also, the auxiliary client (_try_codex + resolve_provider_client raw_codex
branch) constructs OpenAI clients against the same chatgpt.com endpoint with
no default headers at all — so compression, title generation, vision, session
search, and web_extract all still 403 from VPS IPs.

Consolidate the header set into _codex_cloudflare_headers() in
agent/auxiliary_client.py (natural home next to _read_codex_access_token and
the existing JWT decode logic) and call it from all four insertion points:

  - run_agent.py: AIAgent.__init__ (initial construction)
  - run_agent.py: _apply_client_headers_for_base_url (credential rotation)
  - agent/auxiliary_client.py: _try_codex (aux client)
  - agent/auxiliary_client.py: resolve_provider_client raw_codex branch

Net: -36/+55 lines, -25 lines of duplicated inline JWT decode replaced by a
single helper. User-Agent switched to 'codex_cli_rs/0.0.0 (Hermes Agent)' to
match the codex-rs shape while keeping product attribution.

Tests in tests/agent/test_codex_cloudflare_headers.py cover:
  - originator value, User-Agent shape, canonical header casing
  - account-ID extraction from a real JWT fixture
  - graceful handling of malformed / non-string / claim-missing tokens
  - wiring at all four insertion points (primary init, rotation, both aux paths)
  - non-chatgpt base URLs (openrouter) do NOT get codex headers
  - switching away from chatgpt.com drops the headers
2026-04-19 11:59:25 -07:00
Teknium
f1fe29d1c3 feat(providers): extend request_timeout_seconds to all client paths
Follow-up on top of mvanhorn's cherry-picked commit. Original PR only
wired request_timeout_seconds into the explicit-creds OpenAI branch at
run_agent.py init; router-based implicit auth, native Anthropic, and the
fallback chain were still hardcoded to SDK defaults.

- agent/anthropic_adapter.py: build_anthropic_client() accepts an optional
  timeout kwarg (default 900s preserved when unset/invalid).
- run_agent.py: resolve per-provider/per-model timeout once at init; apply
  to Anthropic native init + post-refresh rebuild + stale/interrupt
  rebuilds + switch_model + _restore_primary_runtime + the OpenAI
  implicit-auth path + _try_activate_fallback (with immediate client
  rebuild so the first fallback request carries the configured timeout).
- tests: cover anthropic adapter kwarg honoring; widen mock signatures
  to accept the new timeout kwarg.
- docs/example: clarify that the knob now applies to every transport,
  the fallback chain, and rebuilds after credential rotation.
2026-04-19 11:23:00 -07:00
Dusk1e
fd119a1c4a fix(agent): refresh skills prompt cache when disabled skills change 2026-04-19 11:16:24 -07:00
Teknium
13294c2d18 feat(compression): summaries now respect the conversation's language
Context compaction summaries were always produced in English regardless
of the conversation language, which injected English context into
non-English conversations and muddied the continuation experience.

Adds a one-sentence instruction to the shared `_summarizer_preamble`
used by both the initial-compaction and iterative-update prompt paths.
Placing it in the preamble (rather than adding it separately to each
prompt) means both code paths stay in sync with one edit.

Ported from anomalyco/opencode#20581. The original PR (#4670) landed
before main's prompt templates were refactored to share the
`_summarizer_preamble` and `_template_sections` blocks, so the
cherry-pick conflicted on the now-obsolete inline sections; re-applied
the essential one-line change on top of the current structure.

Verified: 48/48 existing compressor tests pass.
2026-04-19 11:05:14 -07:00
Teknium
b02833f32d
fix(codex): Hermes owns its own Codex auth; stop touching ~/.codex/auth.json (#12360)
Codex OAuth refresh tokens are single-use and rotate on every refresh.
Sharing them with the Codex CLI / VS Code via ~/.codex/auth.json made
concurrent use of both tools a race: whoever refreshed last invalidated
the other side's refresh_token.  On top of that, the silent auto-import
path picked up placeholder / aborted-auth data from ~/.codex/auth.json
(e.g. literal {"access_token":"access-new","refresh_token":"refresh-new"})
and seeded it into the Hermes pool as an entry the selector could
eventually pick.

Hermes now owns its own Codex auth state end-to-end:

Removed
- agent/credential_pool.py: _sync_codex_entry_from_cli() method,
  its pre-refresh + retry + _available_entries call sites, and the
  post-refresh write-back to ~/.codex/auth.json.
- agent/credential_pool.py: auto-import from ~/.codex/auth.json in
  _seed_from_singletons() — users now run `hermes auth openai-codex`
  explicitly.
- hermes_cli/auth.py: silent runtime migration in
  resolve_codex_runtime_credentials() — now surfaces
  `codex_auth_missing` directly (message already points to `hermes auth`).
- hermes_cli/auth.py: post-refresh write-back in
  _refresh_codex_auth_tokens().
- hermes_cli/auth.py: dead helper _write_codex_cli_tokens() and its 4
  tests in test_auth_codex_provider.py.

Kept
- hermes_cli/auth.py: _import_codex_cli_tokens() — still used by the
  interactive `hermes auth openai-codex` setup flow for a user-gated
  one-time import (with "a separate login is recommended" messaging).

User-visible impact
- On existing installs with Hermes auth already present: no change.
- On a fresh install where the user has only logged in via Codex CLI:
  `hermes chat --provider openai-codex` now fails with "No Codex
  credentials stored. Run `hermes auth` to authenticate." The
  interactive setup flow then detects ~/.codex/auth.json and offers a
  one-time import.
- On an install where Codex CLI later refreshes its token: Hermes is
  unaffected (we no longer read from that file at runtime).

Tests
- tests/hermes_cli/test_auth_codex_provider.py: 15/15 pass.
- tests/hermes_cli/test_auth_commands.py: 20/20 pass.
- tests/agent/test_credential_pool.py: 31/31 pass.
- Live E2E on openai-codex/gpt-5.4: 1 API call, 1.7s latency,
  3 log lines, no refresh events, no auth drama.

The related 14:52 refresh-loop bug (hundreds of rotations/minute on a
single entry) is a separate issue — that requires a refresh-attempt
cap on the auth-recovery path in run_agent.py, which remains open.
2026-04-18 19:19:46 -07:00
helix4u
ca32a2a60b fix(gemini): restore bearer auth on openai route 2026-04-18 12:52:01 -07:00
helix4u
a7dd6a3449 fix(gemini): hide stale and low-TPM Google models 2026-04-18 12:52:01 -07:00
helix4u
2eab7ee15f fix(gemini): hide low-TPM Gemma models from exposed lists 2026-04-18 12:52:01 -07:00
Honghua Yang
3128d9fcd2 fix(context_compressor): keep tool-call arguments JSON valid when shrinking
Pass 3 of `_prune_old_tool_results` previously shrunk long `function.arguments`
blobs by slicing the raw JSON string at byte 200 and appending the literal
text `...[truncated]`. That routinely produced payloads like::

    {"path": "/foo.md", "content": "# Long markdown
    ...[truncated]

— an unterminated string with no closing brace. Strict providers (observed
on MiniMax) reject this as `invalid function arguments json string` with a
non-retryable 400. Because the broken call survives in the session history,
every subsequent turn re-sends the same malformed payload and gets the same
400, locking the session into a re-send loop until the call falls out of
the window.

Fix: parse the arguments first, shrink long string leaves inside the parsed
structure, and re-serialise. Non-string values (paths, ints, booleans, lists)
pass through intact. Arguments that are not valid JSON to begin with (rare,
some backends use non-JSON tool args) are returned unchanged rather than
replaced with something neither we nor the provider can parse.

Observed in the wild: a `write_file` with ~800 chars of markdown `content`
triggered this on a real session against MiniMax-M2.7; every turn after
compression got rejected until the session was manually reset.

Tests:
- 7 direct tests of `_truncate_tool_call_args_json` covering valid-JSON
  output, non-JSON pass-through, nested structures, non-string leaves,
  scalar JSON, and Unicode preservation
- 1 end-to-end test through `_prune_old_tool_results` Pass 3 that
  reproduces the exact failure payload shape from the incident

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 12:40:56 -07:00
kshitij
c14b3b5880
fix(kimi): force fixed temperature on kimi-k2.* models (k2.5, thinking, turbo) (#12144)
* fix(kimi): force fixed temperature on kimi-k2.* models (k2.5, thinking, turbo)

The prior override only matched the literal model name "kimi-for-coding",
but Moonshot's coding endpoint is hit with real model IDs such as
`kimi-k2.5`, `kimi-k2-turbo-preview`, `kimi-k2-thinking`, etc.  Those
requests bypassed the override and kept the caller's temperature, so
Moonshot returns HTTP 400 "invalid temperature: only 0.6 is allowed for
this model" (or 1.0 for thinking variants).

Match the whole kimi-k2.* family:
  * kimi-k2-thinking / kimi-k2-thinking-turbo -> 1.0 (thinking mode)
  * all other kimi-k2.* -> 0.6 (non-thinking / instant mode)

Also accept an optional vendor prefix (e.g. `moonshotai/kimi-k2.5`) so
aggregator routings are covered.

* refactor(kimi): whitelist-match kimi coding models instead of prefix

Addresses review feedback on PR #12144.

- Replace `startswith("kimi-k2")` with explicit frozensets sourced from
  Moonshot's kimi-for-coding model list.  The prefix match would have also
  clamped `kimi-k2-instruct` / `kimi-k2-instruct-0905`, which are the
  separate non-coding K2 family with variable temperature (recommended 0.6
  but not enforced — see huggingface.co/moonshotai/Kimi-K2-Instruct).
- Confirmed via platform.kimi.ai docs that all five coding models
  (k2.5, k2-turbo-preview, k2-0905-preview, k2-thinking, k2-thinking-turbo)
  share the fixed-temperature lock, so the preview-model mapping is no
  longer an assumption.
- Drop the fragile `"thinking" in bare` substring test for a set lookup.
- Log a debug line on each override so operators can see when Hermes
  silently rewrites temperature.
- Update class docstring.  Extend the negative test to parametrize over
  kimi-k2-instruct, Kimi-K2-Instruct-0905, and a hypothetical future
  kimi-k2-experimental name — all must keep the caller's temperature.
2026-04-18 09:35:51 -07:00
AviArora02-commits
994faacce8 fix: suppress Authorization: Bearer for Gemini provider to prevent HTTP 400 (#7893) 2026-04-17 21:30:17 -07:00
Teknium
2297c5f5ce fix(auth): restore --label for hermes auth add nous --type oauth
persist_nous_credentials() now accepts an optional label kwarg which
gets embedded in providers.nous under the 'label' key.
_seed_from_singletons() prefers the embedded label over the
auto-derived label_from_token() fingerprint when materialising the
pool entry, so re-seeding on every load_pool('nous') preserves the
user's chosen label.

auth_commands.py threads --label through to the helper, restoring
parity with how other OAuth providers (anthropic, codex, google,
qwen) honor the flag.

Tests: 4 new (embed, reseed-survives, no-label fallback, end-to-end
through auth_add_command). All 390 nous/auth/credential_pool tests
pass.
2026-04-17 19:13:40 -07:00
Teknium
a155b4a159
feat(auxiliary): default 'auto' routing to main model for all users (#11900)
Before: aggregator users (OpenRouter / Nous Portal) running 'auto'
routing for auxiliary tasks — compression, vision, web extraction,
session search, etc. — got routed to a cheap provider-side default
model (Gemini Flash).  Non-aggregator users already got their main
model.  Behavior was inconsistent and surprising — users picked
Claude / GPT / their preferred model, but side tasks ran on
Gemini Flash.

After: 'auto' means "use my main chat model" for every user,
regardless of provider type.  Only when the main provider has no
working client does the fallback chain run (OpenRouter → Nous →
custom → Codex → API-key providers).  Explicit per-task overrides
in config.yaml (auxiliary.<task>.provider / .model) still win —
they are a hard constraint, not subject to the auto policy.

Vision auto-detection follows the same policy: try main provider +
main model first (with _PROVIDER_VISION_MODELS overrides preserved
for providers like xiaomi and zai that ship a dedicated multimodal
model distinct from their chat model).  Aggregator strict vision
backends are fallbacks, not the primary path.

Changes:
  - agent/auxiliary_client.py: _resolve_auto() drops the
    `_AGGREGATOR_PROVIDERS` guard.  resolve_vision_provider_client()
    auto branch unifies aggregator and exotic-provider paths —
    everyone goes through resolve_provider_client() with main_model.
    Dead _AGGREGATOR_PROVIDERS constant removed (was only used by
    the guard we just removed).
  - hermes_cli/main.py: aux config menu copy updated to reflect
    the new semantics ("'auto' means 'use my main model'").
  - tests/agent/test_auxiliary_main_first.py: 12 regression tests
    covering OpenRouter/Nous/DeepSeek main paths, runtime-override
    wins, explicit-config wins, vision override preservation for
    exotic providers, and fallback-chain activation when the main
    provider has no working client.

Co-authored-by: teknium1 <teknium@nousresearch.com>
2026-04-17 19:13:23 -07:00
Michel Belleau
d465fc5869 fix(skills): use frontmatter name in skills index instead of directory name
build_skills_system_prompt() was using the skill directory name (skill_name)
when appending to skills_by_category in all three code paths (snapshot cache,
cold filesystem scan, external dirs). This meant any skill whose directory name
differed from its frontmatter `name` field would appear under the wrong name in
the system prompt, causing LLM routing failures.

The snapshot entry already stores both skill_name (dir) and frontmatter_name
(declared); switch the three tuple appends to use frontmatter_name. Also fix
the external-dir dedup set (seen_skill_names) to track frontmatter names for
consistency with the local-skill tuples now stored under frontmatter_name.

Fixes #11777

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 18:56:37 -07:00
helix4u
2b60478fc2 fix(kimi): force kimi-for-coding temperature to 0.6 2026-04-17 15:49:14 -07:00
Teknium
c6fd2619f7
fix(gemini-cli): surface MODEL_CAPACITY_EXHAUSTED cleanly + drop retired gemma-4-26b (#11833)
Google-side 429 Code Assist errors now flow through Hermes' normal rate-limit
path (status_code on the exception, Retry-After preserved via error.response)
instead of being opaque RuntimeErrors. User sees a one-line capacity message
instead of a 500-char JSON dump.

Changes
- CodeAssistError grows status_code / response / retry_after / details attrs.
  _extract_status_code in error_classifier picks up status_code and classifies
  429 as FailoverReason.rate_limit, so fallback_providers triggers the same
  way it does for SDK errors. run_agent.py line ~10428 already walks
  error.response.headers for Retry-After — preserving the response means that
  path just works.
- _gemini_http_error parses the Google error envelope (error.status +
  error.details[].reason from google.rpc.ErrorInfo, retryDelay from
  google.rpc.RetryInfo). MODEL_CAPACITY_EXHAUSTED / RESOURCE_EXHAUSTED / 404
  model-not-found each produce a human-readable message; unknown shapes fall
  back to the previous raw-body format.
- Drop gemma-4-26b-it from hermes_cli/models.py, hermes_cli/setup.py, and
  agent/model_metadata.py — Google returned 404 for it today in local repro.
  Kept gemma-4-31b-it (capacity-constrained but not retired).

Validation
|                           | Before                         | After                                     |
|---------------------------|--------------------------------|-------------------------------------------|
| Error message             | 'Code Assist returned HTTP 429: {500 chars JSON}' | 'Gemini capacity exhausted for gemini-2.5-pro (Google-side throttle...)' |
| status_code on error      | None (opaque RuntimeError)     | 429                                       |
| Classifier reason         | unknown (string-match fallback) | FailoverReason.rate_limit                |
| Retry-After honored       | ignored                        | extracted from RetryInfo or header        |
| gemma-4-26b-it picker     | advertised (404s on Google)    | removed                                   |

Unit + E2E tests cover non-streaming 429, streaming 429, 404 model-not-found,
Retry-After header fallback, malformed body, and classifier integration.
Targeted suites: tests/agent/test_gemini_cloudcode.py (81 tests), full
tests/hermes_cli (2203 tests) green.

Co-authored-by: teknium1 <teknium@nousresearch.com>
2026-04-17 15:34:12 -07:00
Teknium
f362083c64 fix(providers): complete NVIDIA NIM parity with other providers
Follow-up on the native NVIDIA NIM provider salvage. The original PR wired
PROVIDER_REGISTRY + HERMES_OVERLAYS correctly but missed several touchpoints
required for full parity with other OpenAI-compatible providers (xai,
huggingface, deepseek, zai).

Gaps closed:

- hermes_cli/main.py:
  - Add 'nvidia' to the _model_flow_api_key_provider dispatch tuple so
    selecting 'NVIDIA NIM' in `hermes model` actually runs the api-key
    provider flow (previously fell through silently).
  - Add 'nvidia' to `hermes chat --provider` argparse choices so the
    documented test command (`hermes chat --provider nvidia --model ...`)
    parses successfully.

- hermes_cli/config.py: Register NVIDIA_API_KEY and NVIDIA_BASE_URL in
  OPTIONAL_ENV_VARS so setup wizard can prompt for them and they're
  auto-added to the subprocess env blocklist.

- hermes_cli/doctor.py: Add NVIDIA NIM row to `_apikey_providers` so
  `hermes doctor` probes https://integrate.api.nvidia.com/v1/models.

- hermes_cli/dump.py: Add NVIDIA_API_KEY → 'nvidia' mapping for
  `hermes dump` credential masking.

- tests/tools/test_local_env_blocklist.py: Extend registry_vars fixture
  with NVIDIA_API_KEY to verify it's blocked from leaking into subprocesses.

- agent/model_metadata.py: Add 'nemotron' → 131072 context-length entry
  so all Nemotron variants get 128K context via substring match (rather
  than falling back to MINIMUM_CONTEXT_LENGTH).

- hermes_cli/models.py: Fix hallucinated model ID
  'nvidia/nemotron-3-nano-8b-a4b' → 'nvidia/nemotron-3-nano-30b-a3b'
  (verified against live integrate.api.nvidia.com/v1/models catalog).
  Expand curated list from 5 to 9 agentic models mapping to OpenRouter
  defaults per provider-guide convention: add qwen3.5-397b-a17b,
  deepseek-v3.2, llama-3.3-nemotron-super-49b-v1.5, gpt-oss-120b.

- cli-config.yaml.example: Document 'nvidia' provider option.

- scripts/release.py: Map asurla@nvidia.com → anniesurla in AUTHOR_MAP
  for CI attribution.

E2E verified: `hermes chat --provider nvidia ...` now reaches NVIDIA's
endpoint (returns 401 with bogus key instead of argparse error);
`hermes doctor` detects NVIDIA NIM when NVIDIA_API_KEY is set.
2026-04-17 13:47:46 -07:00
asurla
3b569ff576 feat(providers): add native NVIDIA NIM provider
Adds NVIDIA NIM as a first-class provider: ProviderConfig in
auth.py, HermesOverlay in providers.py, curated models
(Nemotron plus other open source models hosted on
build.nvidia.com), URL mapping in model_metadata.py, aliases
(nim, nvidia-nim, build-nvidia, nemotron), and env var tests.

Docs updated: providers page, quickstart table, fallback
providers table, and README provider list.
2026-04-17 13:47:46 -07:00
Teknium
f268215019
fix(auth): codex auth remove no longer silently undone by auto-import (#11485)
* feat(skills): add 'hermes skills reset' to un-stick bundled skills

When a user edits a bundled skill, sync flags it as user_modified and
skips it forever. The problem: if the user later tries to undo the edit
by copying the current bundled version back into ~/.hermes/skills/, the
manifest still holds the old origin hash from the last successful
sync, so the fresh bundled hash still doesn't match and the skill stays
stuck as user_modified.

Adds an escape hatch for this case.

  hermes skills reset <name>
      Drops the skill's entry from ~/.hermes/skills/.bundled_manifest and
      re-baselines against the user's current copy. Future 'hermes update'
      runs accept upstream changes again. Non-destructive.

  hermes skills reset <name> --restore
      Also deletes the user's copy and re-copies the bundled version.
      Use when you want the pristine upstream skill back.

Also available as /skills reset in chat.

- tools/skills_sync.py: new reset_bundled_skill(name, restore=False)
- hermes_cli/skills_hub.py: do_reset() + wired into skills_command and
  handle_skills_slash; added to the slash /skills help panel
- hermes_cli/main.py: argparse entry for 'hermes skills reset'
- tests/tools/test_skills_sync.py: 5 new tests covering the stuck-flag
  repro, --restore, unknown-skill error, upstream-removed-skill, and
  no-op on already-clean state
- website/docs/user-guide/features/skills.md: new 'Bundled skill updates'
  section explaining the origin-hash mechanic + reset usage

* fix(auth): codex auth remove no longer silently undone by auto-import

'hermes auth remove openai-codex' appeared to succeed but the credential
reappeared on the next command.  Two compounding bugs:

1. _seed_from_singletons() for openai-codex unconditionally re-imports
   tokens from ~/.codex/auth.json whenever the Hermes auth store is
   empty (by design — the Codex CLI and Hermes share that file).  There
   was no suppression check, unlike the claude_code seed path.

2. auth_remove_command's cleanup branch only matched
   removed.source == 'device_code' exactly.  Entries added via
   'hermes auth add openai-codex' have source 'manual:device_code', so
   for those the Hermes auth store's providers['openai-codex'] state was
   never cleared on remove — the next load_pool() re-seeded straight
   from there.

Net effect: there was no way to make a codex removal stick short of
manually editing both ~/.hermes/auth.json and ~/.codex/auth.json before
opening Hermes again.

Fix:

- Add unsuppress_credential_source() helper (mirrors
  suppress_credential_source()).
- Gate the openai-codex branch in _seed_from_singletons() with
  is_source_suppressed(), matching the claude_code pattern.
- Broaden auth_remove_command's codex match to handle both
  'device_code' and 'manual:device_code' (via endswith check), always
  call suppress_credential_source(), and print guidance about the
  unchanged ~/.codex/auth.json file.
- Clear the suppression marker in auth_add_command's openai-codex
  branch so re-linking via 'hermes auth add openai-codex' works.

~/.codex/auth.json is left untouched — that's the Codex CLI's own
credential store, not ours to delete.

Tests cover: unsuppress helper behavior, remove of both source
variants, add clears suppression, seed respects suppression.  E2E
verified: remove → load → add → load flow now behaves correctly.
2026-04-17 04:10:17 -07:00
Teknium
e33cb65a98
fix(insights): hide cache read/write and cost metrics from display (#11477)
The cache-read, cache-write, and total estimated-cost values shown in
/insights (and the per-model Cost column) were unreliable. Hide them from
both terminal and gateway renderings.

The underlying data pipeline is untouched — sessions still store
cache_read_tokens, cache_write_tokens, and estimated_cost_usd; the web
server, /usage command, and status bar are unaffected. Only the
InsightsEngine display layer is trimmed.

Changes:
- format_terminal: drop 'Cache read / Cache write' line, drop 'Est. cost'
  from the Total tokens row, drop per-model 'Cost' column, drop the
  '* Cost N/A for custom/self-hosted' footnote.
- format_gateway: drop cache breakdown from Tokens line, drop 'Est. cost'
  line, drop per-model cost suffix.
- Tests updated to assert these strings are now absent.
2026-04-17 01:02:06 -07:00
Teknium
3524ccfcc4
feat(gemini): add Google Gemini CLI OAuth provider via Cloud Code Assist (free + paid tiers) (#11270)
* feat(gemini): add Google Gemini CLI OAuth provider via Cloud Code Assist

Adds 'google-gemini-cli' as a first-class inference provider with native
OAuth authentication against Google, hitting the Cloud Code Assist backend
(cloudcode-pa.googleapis.com) that powers Google's official gemini-cli.
Supports both the free tier (generous daily quota, personal accounts) and
paid tiers (Standard/Enterprise via GCP projects).

Architecture
============
Three new modules under agent/:

1. google_oauth.py (625 lines) — PKCE Authorization Code flow
   - Google's public gemini-cli desktop OAuth client baked in (env-var overrides supported)
   - Cross-process file lock (fcntl POSIX / msvcrt Windows) with thread-local re-entrancy
   - Packed refresh format 'refresh_token|project_id|managed_project_id' on disk
   - In-flight refresh deduplication — concurrent requests don't double-refresh
   - invalid_grant → wipe credentials, prompt re-login
   - Headless detection (SSH/HERMES_HEADLESS) → paste-mode fallback
   - Refresh 60 s before expiry, atomic write with fsync+replace

2. google_code_assist.py (350 lines) — Code Assist control plane
   - load_code_assist(): POST /v1internal:loadCodeAssist (prod → sandbox fallback)
   - onboard_user(): POST /v1internal:onboardUser with LRO polling up to 60 s
   - retrieve_user_quota(): POST /v1internal:retrieveUserQuota → QuotaBucket list
   - VPC-SC detection (SECURITY_POLICY_VIOLATED → force standard-tier)
   - resolve_project_context(): env → config → discovered → onboarded priority
   - Matches Google's gemini-cli User-Agent / X-Goog-Api-Client / Client-Metadata

3. gemini_cloudcode_adapter.py (640 lines) — OpenAI↔Gemini translation
   - GeminiCloudCodeClient mimics openai.OpenAI interface (.chat.completions.create)
   - Full message translation: system→systemInstruction, tool_calls↔functionCall,
     tool results→functionResponse with sentinel thoughtSignature
   - Tools → tools[].functionDeclarations, tool_choice → toolConfig modes
   - GenerationConfig pass-through (temperature, max_tokens, top_p, stop)
   - Thinking config normalization (thinkingBudget, thinkingLevel, includeThoughts)
   - Request envelope {project, model, user_prompt_id, request}
   - Streaming: SSE (?alt=sse) with thought-part → reasoning stream separation
   - Response unwrapping (Code Assist wraps Gemini response in 'response' field)
   - finishReason mapping to OpenAI convention (STOP→stop, MAX_TOKENS→length, etc.)

Provider registration — all 9 touchpoints
==========================================
- hermes_cli/auth.py: PROVIDER_REGISTRY, aliases, resolver, status fn, dispatch
- hermes_cli/models.py: _PROVIDER_MODELS, CANONICAL_PROVIDERS, aliases
- hermes_cli/providers.py: HermesOverlay, ALIASES
- hermes_cli/config.py: OPTIONAL_ENV_VARS (HERMES_GEMINI_CLIENT_ID/_SECRET/_PROJECT_ID)
- hermes_cli/runtime_provider.py: dispatch branch + pool-entry branch
- hermes_cli/main.py: _model_flow_google_gemini_cli with upfront policy warning
- hermes_cli/auth_commands.py: pool handler, _OAUTH_CAPABLE_PROVIDERS
- hermes_cli/doctor.py: 'Google Gemini OAuth' health check
- run_agent.py: single dispatch branch in _create_openai_client

/gquota slash command
======================
Shows Code Assist quota buckets with 20-char progress bars, per (model, tokenType).
Registered in hermes_cli/commands.py, handler _handle_gquota_command in cli.py.

Attribution
===========
Derived with significant reference to:
- jenslys/opencode-gemini-auth (MIT) — OAuth flow shape, request envelope,
  public client credentials, retry semantics. Attribution preserved in module
  docstrings.
- clawdbot/extensions/google — VPC-SC handling, project discovery pattern.
- PR #10176 (@sliverp) — PKCE module structure.
- PR #10779 (@newarthur) — cross-process file locking pattern.

Supersedes PRs #6745, #10176, #10779 (to be closed on merge with credit).

Upfront policy warning
======================
Google considers using the gemini-cli OAuth client with third-party software
a policy violation. The interactive flow shows a clear warning and requires
explicit 'y' confirmation before OAuth begins. Documented prominently in
website/docs/integrations/providers.md.

Tests
=====
74 new tests in tests/agent/test_gemini_cloudcode.py covering:
- PKCE S256 roundtrip
- Packed refresh format parse/format/roundtrip
- Credential I/O (0600 perms, atomic write, packed on disk)
- Token lifecycle (fresh/expiring/force-refresh/invalid_grant/rotation preservation)
- Project ID env resolution (3 env vars, priority order)
- Headless detection
- VPC-SC detection (JSON-nested + text match)
- loadCodeAssist parsing + VPC-SC → standard-tier fallback
- onboardUser: free-tier allows empty project, paid requires it, LRO polling
- retrieveUserQuota parsing
- resolve_project_context: 3 short-circuit paths + discovery + onboarding
- build_gemini_request: messages → contents, system separation, tool_calls,
  tool_results, tools[], tool_choice (auto/required/specific), generationConfig,
  thinkingConfig normalization
- Code Assist envelope wrap shape
- Response translation: text, functionCall, thought → reasoning,
  unwrapped response, empty candidates, finish_reason mapping
- GeminiCloudCodeClient end-to-end with mocked HTTP
- Provider registration (9 tests: registry, 4 alias forms, no-regression on
  google-gemini alias, models catalog, determine_api_mode, _OAUTH_CAPABLE_PROVIDERS
  preservation, config env vars)
- Auth status dispatch (logged-in + not)
- /gquota command registration
- run_gemini_oauth_login_pure pool-dict shape

All 74 pass. 349 total tests pass across directly-touched areas (existing
test_api_key_providers, test_auth_qwen_provider, test_gemini_provider,
test_cli_init, test_cli_provider_resolution, test_registry all still green).

Coexistence with existing 'gemini' (API-key) provider
=====================================================
The existing gemini API-key provider is completely untouched. Its alias
'google-gemini' still resolves to 'gemini', not 'google-gemini-cli'.
Users can have both configured simultaneously; 'hermes model' shows both
as separate options.

* feat(gemini): ship Google's public gemini-cli OAuth client as default

Pivots from 'scrape-from-local-gemini-cli' (clawdbot pattern) to
'ship-creds-in-source' (opencode-gemini-auth pattern) for zero-setup UX.

These are Google's PUBLIC gemini-cli desktop OAuth credentials, published
openly in Google's own open-source gemini-cli repository. Desktop OAuth
clients are not confidential — PKCE provides the security, not the
client_secret. Shipping them here matches opencode-gemini-auth (MIT) and
Google's own distribution model.

Resolution order is now:
  1. HERMES_GEMINI_CLIENT_ID / _SECRET env vars (power users, custom GCP clients)
  2. Shipped public defaults (common case — works out of the box)
  3. Scrape from locally installed gemini-cli (fallback for forks that
     deliberately wipe the shipped defaults)
  4. Helpful error with install / env-var hints

The credential strings are composed piecewise at import time to keep
reviewer intent explicit (each constant is paired with a comment about
why it's non-confidential) and to bypass naive secret scanners.

UX impact: users no longer need 'npm install -g @google/gemini-cli' as a
prerequisite. Just 'hermes model' -> 'Google Gemini (OAuth)' works out
of the box.

Scrape path is retained as a safety net. Tests cover all four resolution
steps (env / shipped default / scrape fallback / hard failure).

79 new unit tests pass (was 76, +3 for the new resolution behaviors).
2026-04-16 16:49:00 -07:00
Teknium
25c7b1baa7
fix: handle httpx.Timeout object in CopilotACPClient (#11058)
run_agent.py passes httpx.Timeout(connect=30, read=120, write=1800,
pool=30) as the timeout kwarg on the streaming path. The OpenAI SDK
handles this natively, but CopilotACPClient._create_chat_completion()
called float(timeout or default), which raises TypeError because
httpx.Timeout doesn't implement __float__.

Normalize the timeout before passing to _run_prompt: plain floats/ints
pass through, httpx.Timeout objects get their largest component
extracted (write=1800s is the correct wall-clock budget for the ACP
subprocess), and None falls back to the 900s default.
2026-04-16 12:05:11 -07:00
Trev
63d06dd93d fix(agent): downgrade xhigh→max on Anthropic pre-4.7 adaptive models
Regression from #11161 (Claude Opus 4.7 migration, commit 0517ac3e).

The Opus 4.7 migration changed `ADAPTIVE_EFFORT_MAP["xhigh"]` from "max"
(the pre-migration alias) to "xhigh" to preserve the new 4.7 effort level
as distinct from max. This is correct for 4.7, but Opus/Sonnet 4.6 only
expose 4 levels (low/medium/high/max) — sending "xhigh" there now 400s:

    BadRequestError [HTTP 400]: This model does not support effort
    level 'xhigh'. Supported levels: high, low, max, medium.

Users who set reasoning_effort=xhigh as their default (xhigh is the
recommended default for coding/agentic on 4.7 per the Anthropic migration
guide) now 400 every request the moment they switch back to a 4.6 model
via `/model` or config. Verified live against the Anthropic API on
`anthropic==0.94.0`.

Fix: make the mapping model-aware. Add `_supports_xhigh_effort()`
predicate (matches 4-7/4.7 substrings, mirroring the existing
`_supports_adaptive_thinking` / `_forbids_sampling_params` pattern).
On pre-4.7 adaptive models, downgrade xhigh→max (the strongest effort
those models accept, restoring pre-migration behavior). On 4.7+, keep
xhigh as a distinct level.

Per Anthropic's migration guide, xhigh is 4.7-only:
https://platform.claude.com/docs/en/about-claude/models/migration-guide
> Opus 4.7 effort levels: max, xhigh (new), high, medium, low.
> Opus 4.6 effort levels: max, high, medium, low.
SDK typing confirms: `anthropic.types.OutputConfigParam.effort: Literal[
"low", "medium", "high", "max"]` (v0.94.0 not yet updated for xhigh).

## Test plan

Verified live on macOS 15.5 / anthropic==0.94.0:

    claude-opus-4-6 + effort=xhigh → output_config.effort=max  → 200 OK
    claude-opus-4-7 + effort=xhigh → output_config.effort=xhigh → 200 OK
    claude-opus-4-6 + effort=max   → output_config.effort=max  → 200 OK
    claude-opus-4-7 + effort=max   → output_config.effort=max  → 200 OK

`tests/agent/test_anthropic_adapter.py` — 120 pass (replaced 1 bugged
test that asserted the broken behavior, added 1 for 4.7 preservation).

Full adapter suite: 120 passed in 1.05s.
Broader suite (agent + run_agent + cli/gateway reasoning): 2140 passed
(2 pre-existing failures on clean upstream/main, unrelated).

## Platforms

Tested on macOS 15.5. No platform-specific code paths touched.
2026-04-16 12:00:56 -07:00
trevthefoolish
0517ac3e93 fix(agent): complete Claude Opus 4.7 API migration
Claude Opus 4.7 introduced several breaking API changes that the current
codebase partially handled but not completely. This patch finishes the
migration per the official migration guide at
https://platform.claude.com/docs/en/about-claude/models/migration-guide

Fixes NousResearch/hermes-agent#11137

Breaking-change coverage:

1. Adaptive thinking + output_config.effort — 4.7 is now recognized by
   _supports_adaptive_thinking() (extends previous 4.6-only gate).

2. Sampling parameter stripping — 4.7 returns 400 for any non-default
   temperature / top_p / top_k. build_anthropic_kwargs drops them as a
   safety net; the OpenAI-protocol auxiliary path (_build_call_kwargs)
   and AnthropicCompletionsAdapter.create() both early-exit before
   setting temperature for 4.7+ models. This keeps flush_memories and
   structured-JSON aux paths that hardcode temperature from 400ing
   when the aux model is flipped to 4.7.

3. thinking.display = "summarized" — 4.7 defaults display to "omitted",
   which silently hides reasoning text from Hermes's CLI activity feed
   during long tool runs. Restoring "summarized" preserves 4.6 UX.

4. Effort level mapping — xhigh now maps to xhigh (was xhigh→max, which
   silently over-efforted every coding/agentic request). max is now a
   distinct ceiling per Anthropic's 5-level effort model.

5. New stop_reason values — refusal and model_context_window_exceeded
   were silently collapsed to "stop" (end_turn) by the adapter's
   stop_reason_map. Now mapped to "content_filter" and "length"
   respectively, matching upstream finish-reason handling already in
   bedrock_adapter.

6. Model catalogs — claude-opus-4-7 added to the Anthropic provider
   list, anthropic/claude-opus-4.7 added at top of OpenRouter fallback
   catalog (recommended), claude-opus-4-7 added to model_metadata
   DEFAULT_CONTEXT_LENGTHS (1M, matching 4.6 per migration guide).

7. Prefill docstrings — run_agent.AIAgent and BatchRunner now document
   that Anthropic Sonnet/Opus 4.6+ reject a trailing assistant-role
   prefill (400).

8. Tests — 4 new tests in test_anthropic_adapter covering display
   default, xhigh preservation, max on 4.7, refusal / context-overflow
   stop_reason mapping, plus the sampling-param predicate. test_model_metadata
   accepts 4.7 at 1M context.

Tested on macOS 15.5 (darwin). 119 tests pass in
tests/agent/test_anthropic_adapter.py, 1320 pass in tests/agent/.
2026-04-16 10:48:20 -07:00
sontianye
f19ca50cd9 fix(context_compressor): always keep last user message in tail to prevent active-task loss
Ensure _align_boundary_backward never pushes the last user message
into the compressed region. Without this, compression could delete
the user active task instruction mid-session.

Cherry-picked from #10969 by @sontianye. Fixes #10896.
2026-04-16 07:45:31 -07:00
lrawnsley
8c1276c0bf fix: pass resolved args to resolve_vision_provider_client()
resolve_vision_provider_client() was receiving the raw call_llm
parameters instead of the resolved provider/model/key/url from
_resolve_task_provider_model(). This caused config overrides
(auxiliary.vision.provider, etc.) to be silently discarded.

Cherry-picked from #10901 by @lrawnsley.
2026-04-16 07:45:13 -07:00
Teknium
fe12042e50
fix: remove context pressure warnings entirely (#11039)
The gateway compression notifications were already removed in commit cc63b2d1
(PR #4139), but the agent-level context pressure warnings (85%/95% tiered
alerts via _emit_context_pressure) were still firing on both CLI and gateway.

Removed:
- _emit_context_pressure method and all call sites in run_conversation()
- Class-level dedup state (_context_pressure_last_warned, _CONTEXT_PRESSURE_COOLDOWN)
- Instance attribute _context_pressure_warned_at
- Pressure reset logic in _compress_context
- format_context_pressure and format_context_pressure_gateway from agent/display.py
- Orphaned ANSI constants that only served these functions
- tests/run_agent/test_context_pressure.py (all 361 lines)

Compression itself continues to run silently in the background.
Closes #3784
2026-04-16 06:44:23 -07:00
Teknium
0c1217d01e feat(xai): upgrade to Responses API, add TTS provider
Cherry-picked and trimmed from PR #10600 by Jaaneek.

- Switch xAI transport from openai_chat to codex_responses (Responses API)
- Add codex_responses detection for xAI in all runtime_provider resolution paths
- Add xAI api_mode detection in AIAgent.__init__ (provider name + URL auto-detect)
- Add extra_headers passthrough for codex_responses requests
- Add x-grok-conv-id session header for xAI prompt caching
- Add xAI reasoning support (encrypted_content include, no effort param)
- Move x-grok-conv-id from chat_completions path to codex_responses path
- Add xAI TTS provider (dedicated /v1/tts endpoint with Opus conversion)
- Add xAI provider aliases (grok, x-ai, x.ai) across auth, models, providers, auxiliary
- Trim xAI model list to agentic models (grok-4.20-reasoning, grok-4-1-fast-reasoning)
- Add XAI_API_KEY/XAI_BASE_URL to OPTIONAL_ENV_VARS
- Add xAI TTS config section, setup wizard entry, tools_config provider option
- Add shared xai_http.py helper for User-Agent string

Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com>
2026-04-16 02:24:08 -07:00
nosleepcassette
3c859e35dc fix: skin spinner faces and verbs not applied at runtime
Skins define waiting_faces, thinking_faces, and thinking_verbs in their
spinner config, but all 7 call sites in run_agent.py used hardcoded class
constants. Add three classmethods on KawaiiSpinner that query the active
skin first and fall back to the class constants, matching the existing
pattern used for wings/tool_prefix/tool_emojis.

Co-authored-by: nosleepcassette <nosleepcassette@users.noreply.github.com>
2026-04-16 02:22:19 -07:00
kshitijk4poor
1b61ec470b feat: add Ollama Cloud as built-in provider
Add ollama-cloud as a first-class provider with full parity to existing
API-key providers (gemini, zai, minimax, etc.):

- PROVIDER_REGISTRY entry with OLLAMA_API_KEY env var
- Provider aliases: ollama -> custom (local), ollama_cloud -> ollama-cloud
- models.dev integration for accurate context lengths
- URL-to-provider mapping (ollama.com -> ollama-cloud)
- Passthrough model normalization (preserves Ollama model:tag format)
- Default auxiliary model (nemotron-3-nano:30b)
- HermesOverlay in providers.py
- CLI --provider choices, CANONICAL_PROVIDERS entry
- Dynamic model discovery with disk caching (1hr TTL)
- 37 provider-specific tests

Cherry-picked from PR #6038 by kshitijk4poor. Closes #3926
2026-04-16 02:22:09 -07:00
Teknium
cc6e8941db
feat(honcho): context injection overhaul, 5-tool surface, cost safety, session isolation (#10619)
Salvaged from PR #9884 by erosika. Cherry-picked plugin changes onto
current main with minimal core modifications.

Plugin changes (plugins/memory/honcho/):
- New honcho_reasoning tool (5th tool, splits LLM calls from honcho_context)
- Two-layer context injection: base context (summary + representation + card)
  on contextCadence, dialectic supplement on dialecticCadence
- Multi-pass dialectic depth (1-3 passes) with early bail-out on strong signal
- Cold/warm prompt selection based on session state
- dialecticCadence defaults to 3 (was 1) — ~66% fewer Honcho LLM calls
- Session summary injection for conversational continuity
- Bidirectional peer targeting on all 5 tools
- Correctness fixes: peer param fallback, None guard on set_peer_card,
  schema validation, signal_sufficient anchored regex, mid->medium level fix

Core changes (~20 lines across 3 files):
- agent/memory_manager.py: Enhanced sanitize_context() to strip full
  <memory-context> blocks and system notes (prevents leak from saveMessages)
- run_agent.py: gateway_session_key param for stable per-chat Honcho sessions,
  on_turn_start() call before prefetch_all() for cadence tracking,
  sanitize_context() on user messages to strip leaked memory blocks
- gateway/run.py: skip_memory=True on 2 temp agents (prevents orphan sessions),
  gateway_session_key threading to main agent

Tests: 509 passed (3 skipped — honcho SDK not installed locally)
Docs: Updated honcho.md, memory-providers.md, tools-reference.md, SKILL.md

Co-authored-by: erosika <erosika@users.noreply.github.com>
2026-04-15 19:12:19 -07:00
flobo3
c6398fcaab fix(prompt): list all supported Telegram markdown formatting 2026-04-15 17:54:13 -07:00
Teknium
4fdcae6c91
fix: use absolute skill_dir for external skills (#10313) (#10587)
_load_skill_payload() reconstructed skill_dir as SKILLS_DIR / relative_path,
which is wrong for external skills from skills.external_dirs — they live
outside SKILLS_DIR entirely. Scripts and linked files failed to load.

Fix: skill_view() now includes the absolute skill_dir in its result dict.
_load_skill_payload() uses that directly when available, falling back to
the SKILLS_DIR-relative reconstruction only for legacy responses.

Closes #10313
2026-04-15 17:22:55 -07:00
Teknium
9d9b424390
fix: Nous Portal rate limit guard — prevent retry amplification (#10568)
When Nous returns a 429, the retry amplification chain burns up to 9
API requests per conversation turn (3 SDK retries × 3 Hermes retries),
each counting against RPH and deepening the rate limit. With multiple
concurrent sessions (cron + gateway + auxiliary), this creates a spiral
where retries keep the limit tapped indefinitely.

New module: agent/nous_rate_guard.py
- Shared file-based rate limit state (~/.hermes/rate_limits/nous.json)
- Parses reset time from x-ratelimit-reset-requests-1h, x-ratelimit-
  reset-requests, retry-after headers, or error context
- Falls back to 5-minute default cooldown if no header data
- Atomic writes (tempfile + rename) for cross-process safety
- Auto-cleanup of expired state files

run_agent.py changes:
- Top-of-retry-loop guard: when another session already recorded Nous
  as rate-limited, skip the API call entirely. Try fallback provider
  first, then return a clear message with the reset time.
- On 429 from Nous: record rate limit state and skip further retries
  (sets retry_count = max_retries to trigger fallback path)
- On success from Nous: clear the rate limit state so other sessions
  know they can resume

auxiliary_client.py changes:
- _try_nous() checks rate guard before attempting Nous in the auxiliary
  fallback chain. When rate-limited, returns (None, None) so the chain
  skips to the next provider instead of piling more requests onto Nous.

This eliminates three sources of amplification:
1. Hermes-level retries (saves 6 of 9 calls per turn)
2. Cross-session retries (cron + gateway all skip Nous)
3. Auxiliary fallback to Nous (compression/session_search skip too)

Includes 24 tests covering the rate guard module, header parsing,
state lifecycle, and auxiliary client integration.
2026-04-15 16:31:48 -07:00
JiaDe WU
0cb8c51fa5 feat: native AWS Bedrock provider via Converse API
Salvaged from PR #7920 by JiaDe-Wu — cherry-picked Bedrock-specific
additions onto current main, skipping stale-branch reverts (293 commits
behind).

Dual-path architecture:
  - Claude models → AnthropicBedrock SDK (prompt caching, thinking budgets)
  - Non-Claude models → Converse API via boto3 (Nova, DeepSeek, Llama, Mistral)

Includes:
  - Core adapter (agent/bedrock_adapter.py, 1098 lines)
  - Full provider registration (auth, models, providers, config, runtime, main)
  - IAM credential chain + Bedrock API Key auth modes
  - Dynamic model discovery via ListFoundationModels + ListInferenceProfiles
  - Streaming with delta callbacks, error classification, guardrails
  - hermes doctor + hermes auth integration
  - /usage pricing for 7 Bedrock models
  - 130 automated tests (79 unit + 28 integration + follow-up fixes)
  - Documentation (website/docs/guides/aws-bedrock.md)
  - boto3 optional dependency (pip install hermes-agent[bedrock])

Co-authored-by: JiaDe WU <40445668+JiaDe-Wu@users.noreply.github.com>
2026-04-15 16:17:17 -07:00
MestreY0d4-Uninter
f4724803b4 fix(runtime): surface malformed proxy env and base URL before client init
When proxy env vars (HTTP_PROXY, HTTPS_PROXY, ALL_PROXY) contain
malformed URLs — e.g. 'http://127.0.0.1:6153export' from a broken
shell config — the OpenAI/httpx client throws a cryptic 'Invalid port'
error that doesn't identify the offending variable.

Add _validate_proxy_env_urls() and _validate_base_url() in
auxiliary_client.py, called from resolve_provider_client() and
_create_openai_client() to fail fast with a clear, actionable error
message naming the broken env var or URL.

Closes #6360
Co-authored-by: MestreY0d4-Uninter <MestreY0d4-Uninter@users.noreply.github.com>
2026-04-15 16:10:53 -07:00
Teknium
ee9c0a3ed0
fix(security): add JWT token and Discord mention redaction (#10547)
Found via trace data audit: JWT tokens (eyJ...) and Discord snowflake
mentions (<@ID>) were passing through unredacted.

JWT pattern: matches 1/2/3-part tokens starting with eyJ (base64 for '{').
Zero false-positive risk — no normal text matches eyJ + 10+ base64url chars.

Discord pattern: matches <@digits> and <@!digits> with 17-20 digit snowflake
IDs. Syntactically unique to Discord's mention format.

Both patterns follow the same structural-uniqueness standard as existing
prefix patterns (sk-, ghp_, AKIA, etc.).
2026-04-15 16:08:52 -07:00
helix4u
96cc556055 fix(copilot): preserve base URL and gpt-5-mini routing 2026-04-15 15:04:14 -07:00
Teknium
6391b46779
fix: bound auxiliary client cache to prevent fd exhaustion in long-running gateways (#10200) (#10470)
The _client_cache used event loop id() as part of the cache key, so
every new worker-thread event loop created a new entry for the same
provider config.  In long-running gateways where threads are recycled
frequently, this caused unbounded cache growth — each stale entry
held an unclosed AsyncOpenAI client with its httpx connection pool,
eventually exhausting file descriptors.

Fix: remove loop_id from the cache key and instead validate on each
async cache hit that the cached loop is the current, open loop.  If
the loop changed or was closed, the stale entry is replaced in-place
rather than creating an additional entry.  This bounds cache growth
to at most one entry per unique provider config.

Also adds a _CLIENT_CACHE_MAX_SIZE (64) safety belt with FIFO
eviction as defense-in-depth against any remaining unbounded growth.

Cross-loop safety is preserved: different event loops still get
different client instances (validated by existing test suite).

Closes #10200
2026-04-15 13:16:28 -07:00
zhiheng.liu
7cb06e3bb3 refactor(memory): drop on_session_reset — commit-only is enough
OV transparently handles message history across /new and /compress: old
messages stay in the same session and extraction is idempotent, so there's
no need to rebind providers to a new session_id. The only thing the
session boundary actually needs is to trigger extraction.

- MemoryProvider / MemoryManager: remove on_session_reset hook
- OpenViking: remove on_session_reset override (nothing to do)
- AIAgent: replace rotate_memory_session with commit_memory_session
  (just calls on_session_end, no rebind)
- cli.py / run_agent.py: single commit_memory_session call at the
  session boundary before session_id rotates
- tests: replace on_session_reset coverage with routing tests for
  MemoryManager.on_session_end

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-15 11:28:45 -07:00
zhiheng.liu
8275fa597a refactor(memory): promote on_session_reset to base provider hook
Replace hasattr-forked OpenViking-specific paths with a proper base-class
hook. Collapse the two agent wrappers into a single rotate_memory_session
so callers don't orchestrate commit + rebind themselves.

- MemoryProvider: add on_session_reset(new_session_id) as a default no-op
- MemoryManager: on_session_reset fans out unconditionally (no hasattr,
  no builtin skip — base no-op covers it)
- OpenViking: rename reset_session -> on_session_reset; drop the explicit
  POST /api/v1/sessions (OV auto-creates on first message) and the two
  debug raise_for_status wrappers
- AIAgent: collapse commit_memory_session + reinitialize_memory_session
  into rotate_memory_session(new_sid, messages)
- cli.py / run_agent.py: replace hasattr blocks and the split calls with
  a single unconditional rotate_memory_session call; compression path
  now passes the real messages list instead of []
- tests: align with on_session_reset, assert reset does NOT POST /sessions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-15 11:28:45 -07:00
zhiheng.liu
7856d304f2 fix(openviking): commit session on /new and context compression
The OpenViking memory provider extracts memories when its session is
committed (POST /api/v1/sessions/{id}/commit).  Before this fix, the
CLI had two code paths that changed the active session_id without ever
committing the outgoing OpenViking session:

1. /new (new_session() in cli.py) — called flush_memories() to write
   MEMORY.md, then immediately discarded the old session_id.  The
   accumulated OpenViking session was never committed, so all context
   from that session was lost before extraction could run.

2. /compress and auto-compress (_compress_context() in run_agent.py) —
   split the SQLite session (new session_id) but left the OpenViking
   provider pointing at the old session_id with no commit, meaning all
   messages synced to OpenViking were silently orphaned.

The gateway already handles session commit on /new and /reset via
shutdown_memory_provider() on the cached agent; the CLI path did not.

Fix: introduce a lightweight session-transition lifecycle alongside
the existing full shutdown path:

- OpenVikingMemoryProvider.reset_session(new_session_id): waits for
  in-flight background threads, resets per-session counters, and
  creates the new OV session via POST /api/v1/sessions — without
  tearing down the HTTP client (avoids connection overhead on /new).

- MemoryManager.restart_session(new_session_id): calls reset_session()
  on providers that implement it; falls back to initialize() for
  providers that do not.  Skips the builtin provider (no per-session
  state).

- AIAgent.commit_memory_session(messages): wraps
  memory_manager.on_session_end() without shutdown — commits OV session
  for extraction but leaves the provider alive for the next session.

- AIAgent.reinitialize_memory_session(new_session_id): wraps
  memory_manager.restart_session() — transitions all external providers
  to the new session after session_id has been assigned.

Call sites:
- cli.py new_session(): commit BEFORE session_id changes, reinitialize
  AFTER — ensuring OV extraction runs on the correct session and the
  new session is immediately ready for the next turn.
- run_agent._compress_context(): same pattern, inside the
  if self._session_db: block where the session_id split happens.

/compress and auto-compress are functionally identical at this layer:
both call _compress_context(), so both are fixed by the same change.

Tests added to tests/agent/test_memory_provider.py:
- TestMemoryManagerRestartSession: reset_session() routing, builtin
  skip, initialize() fallback, failure tolerance, empty-manager noop.
- TestOpenVikingResetSession: session_id update, per-session state
  clear, POST /api/v1/sessions call, API failure tolerance, no-client
  noop.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 11:28:45 -07:00
Teknium
722331a57d
fix: replace hardcoded ~/.hermes with display_hermes_home() in agent-facing text (#10285)
Tool schema descriptions and tool return values contained hardcoded
~/.hermes paths that the model sees and uses. When HERMES_HOME is set
to a custom path (Docker containers, profiles), the agent would still
reference ~/.hermes — looking at the wrong directory.

Fixes 6 locations across 5 files:
- tools/tts_tool.py: output_path schema description
- tools/cronjob_tools.py: script path schema description
- tools/skill_manager_tool.py: skill_manage schema description
- tools/skills_tool.py: two tool return messages
- agent/skill_commands.py: skill config injection text

All now use display_hermes_home() which resolves to the actual
HERMES_HOME path (e.g. /opt/data for Docker, ~/.hermes/profiles/X
for profiles, ~/.hermes for default).

Reported by: Sandeep Narahari (PrithviDevs)
2026-04-15 04:57:55 -07:00
Arihant Sethia
857b543543 feat: add skill analytics to the dashboard
Expose skill usage in analytics so the dashboard and insights output can
show which skills the agent loads and manages over time.

This adds skill aggregation to the InsightsEngine by extracting
`skill_view` and `skill_manage` calls from assistant tool_calls,
computing per-skill totals, and including the results in both terminal
and gateway insights formatting. It also extends the dashboard analytics
API and Analytics page to render a Top Skills table.

Terminology is aligned with the skills docs:
  - Agent Loaded = `skill_view` events
  - Agent Managed = `skill_manage` actions

Architecture:
  - agent/insights.py collects and aggregates per-skill usage
  - hermes_cli/web_server.py exposes `skills` on `/api/analytics/usage`
  - web/src/lib/api.ts adds analytics skill response types
  - web/src/pages/AnalyticsPage.tsx renders the Top Skills table
  - web/src/i18n/{en,zh}.ts updates user-facing labels

Tests:
  - tests/agent/test_insights.py covers skill aggregation and formatting
  - tests/hermes_cli/test_web_server.py covers analytics API contract
    including the `skills` payload
  - verified with `cd web && npm run build`

Files changed:
  - agent/insights.py
  - hermes_cli/web_server.py
  - tests/agent/test_insights.py
  - tests/hermes_cli/test_web_server.py
  - web/src/i18n/en.ts
  - web/src/i18n/types.ts
  - web/src/i18n/zh.ts
  - web/src/lib/api.ts
  - web/src/pages/AnalyticsPage.tsx
2026-04-15 06:44:43 +00:00
Teknium
772cfb6c4e
fix: stale agent timeout, uv venv detection, empty response after tools, compression model fallback (#9051, #8620, #9400) (#10093)
Four independent fixes:

1. Reset activity timestamp on cached agent reuse (#9051)
   When the gateway reuses a cached AIAgent for a new turn, the
   _last_activity_ts from the previous turn (possibly hours ago)
   carried over. The inactivity timeout handler immediately saw
   the agent as idle for hours and killed it.

   Fix: reset _last_activity_ts, _last_activity_desc, and
   _api_call_count when retrieving an agent from the cache.

2. Detect uv-managed virtual environments (#8620 sub-issue 1)
   The systemd unit generator fell back to sys.executable (uv's
   standalone Python) when running under 'uv run', because
   sys.prefix == sys.base_prefix. The generated ExecStart pointed
   to a Python binary without site-packages.

   Fix: check VIRTUAL_ENV env var before falling back to
   sys.executable. uv sets VIRTUAL_ENV even when sys.prefix
   doesn't reflect the venv.

3. Nudge model to continue after empty post-tool response (#9400)
   Weaker models sometimes return empty after tool calls. The agent
   silently abandoned the remaining work.

   Fix: append assistant('(empty)') + user nudge message and retry
   once. Resets after each successful tool round.

4. Compression model fallback on permanent errors (#8620 sub-issue 4)
   When the default summary model (gemini-3-flash) returns 503
   'model_not_found' on custom proxies, the compressor entered a
   600s cooldown, leaving context growing unbounded.

   Fix: detect permanent model-not-found errors (503, 404,
   'model_not_found', 'no available channel') and fall back to
   using the main model for compression instead of entering
   cooldown. One-time fallback with immediate retry.

Test plan: 40 compressor tests + 97 gateway/CLI tests + 9 venv tests pass
2026-04-14 22:38:17 -07:00
kshitijk4poor
9855190f23 feat(compressor): smart collapse, dedup, anti-thrashing, template upgrade, hardening
Combined salvage of PRs #9661, #9663, #9674, #9677, #9678 by kshitijk4poor.

- Smart tool output collapse: informative 1-line summaries replace generic placeholder
- Dedup identical tool results via MD5 hash, truncate large tool_call arguments
- Anti-thrashing: skip compression after 2 consecutive <10% savings passes
- Structured action-log summary template with numbered actions and Active State
- Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown

Follow-up fixes applied during salvage:
- web_extract: reads 'urls' (list) not 'url' (original PR bug)
- Multimodal list content guards in dedup and prune passes
- Kept 'Relevant Files' section in template (original PR removed it)

Skipped PRs #9665 (user msg preservation — duplication risk) and #9675 (dead code).
2026-04-14 22:21:25 -07:00
Julien Talbot
3b50821555 feat(xai): add xAI/Grok to provider prefix stripping
Add 'xai', 'x-ai', 'x.ai', 'grok' to _PROVIDER_PREFIXES so that
colon-prefixed model names (e.g. xai:grok-4.20) are stripped correctly
for context length lookups.

Cherry-picked from PR #9184 by @Julientalbot.
2026-04-14 16:43:42 -07:00
Teknium
6448e1da23
feat(zai): add GLM-5V-Turbo support for coding plan (#9907)
- Add glm-5v-turbo to OpenRouter, Nous, and native Z.AI model lists
- Add glm-5v context length entry (200K tokens) to model metadata
- Update Z.AI endpoint probe to try multiple candidate models per
  endpoint (glm-5.1, glm-5v-turbo, glm-4.7) — fixes detection for
  newer coding plan accounts that lack older models
- Add zai to _PROVIDER_VISION_MODELS so auxiliary vision tasks
  (vision_analyze, browser screenshots) route through 5v

Fixes #9888
2026-04-14 16:26:01 -07:00
Teknium
a37a095980 fix: detect qwen-oauth provider via CLI tokens in /model picker
Seed qwen-oauth credentials from resolve_qwen_runtime_credentials() in
_seed_from_singletons(). Users who authenticate via 'qwen auth qwen-oauth'
store tokens in ~/.qwen/oauth_creds.json which the runtime resolver reads
but the credential pool couldn't detect — same gap pattern as copilot.

Uses refresh_if_expiring=False to avoid network calls during discovery.
2026-04-14 11:16:26 -07:00
Marvae
0bd3f521ae fix: detect copilot provider via gh auth token in /model picker
Seed copilot credentials from resolve_copilot_token() in the credential
pool's _seed_from_singletons(), alongside the existing anthropic and
openai-codex seeding logic. This makes copilot appear in the /model
provider picker when the user authenticates solely through gh auth token.

Cherry-picked from PR #9767 by Marvae.
2026-04-14 11:16:26 -07:00
N0nb0at
b21b3bfd68 feat(plugins): namespaced skill registration for plugin skill bundles
Add ctx.register_skill() API so plugins can ship SKILL.md files under
a 'plugin:skill' namespace, preventing name collisions with built-in
Hermes skills. skill_view() detects the ':' separator and routes to
the plugin registry while bare names continue through the existing
flat-tree scan unchanged.

Key additions:
- agent/skill_utils: parse_qualified_name(), is_valid_namespace()
- hermes_cli/plugins: PluginContext.register_skill(), PluginManager
  skill registry (find/list/remove)
- tools/skills_tool: qualified name dispatch in skill_view(),
  _serve_plugin_skill() with full guards (disabled, platform,
  injection scan), bundle context banner with sibling listing,
  stale registry self-heal
- Hoisted _INJECTION_PATTERNS to module level (dedup)
- Updated skill_view schema description

Based on PR #9334 by N0nb0at. Lean P1 salvage — omits autogen shim
(P2) for a simpler first merge.

Closes #8422
2026-04-14 10:42:58 -07:00
walli
884cd920d4 feat(gateway): unify QQBot branding, add PLATFORM_HINTS, fix streaming, restore missing setup functions
- Rename platform from 'qq' to 'qqbot' across all integration points
  (Platform enum, toolset, config keys, import paths, file rename qq.py → qqbot.py)
- Add PLATFORM_HINTS for QQBot in prompt_builder (QQ supports markdown)
- Set SUPPORTS_MESSAGE_EDITING = False to skip streaming on QQ
  (prevents duplicate messages from non-editable partial + final sends)
- Add _send_qqbot() standalone send function for cron/send_message tool
- Add interactive _setup_qq() wizard in hermes_cli/setup.py
- Restore missing _setup_signal/email/sms/dingtalk/feishu/wecom/wecom_callback
  functions that were lost during the original merge
2026-04-14 00:11:49 -07:00
Kenny Xie
cdd44817f2 fix(anthropic): send fast mode speed via extra_body 2026-04-13 22:32:39 -07:00
Teknium
943c01536f
feat: add openrouter/elephant-alpha to curated model lists (#9378)
* Add hermes debug share instructions to all issue templates

- bug_report.yml: Add required Debug Report section with hermes debug share
  and /debug instructions, make OS/Python/Hermes version optional (covered
  by debug report), demote old logs field to optional supplementary
- setup_help.yml: Replace hermes doctor reference with hermes debug share,
  add Debug Report section with fallback chain (debug share -> --local -> doctor)
- feature_request.yml: Add optional Debug Report section for environment context

All templates now guide users to run hermes debug share (or /debug in chat)
and paste the resulting paste.rs links, giving maintainers system info,
config, and recent logs in one step.

* feat: add openrouter/elephant-alpha to curated model lists

- Add to OPENROUTER_MODELS (free, positioned above GPT models)
- Add to _PROVIDER_MODELS["nous"] mirror list
- Add 256K context window fallback in model_metadata.py
2026-04-13 21:16:14 -07:00
Teknium
d15efc9c1b
fix: correct GPT-5 family context lengths in fallback defaults (#9309)
The generic 'gpt-5' fallback was set to 128,000 — which is the max
OUTPUT tokens, not the context window. GPT-5 base and most variants
(codex, mini) have 400,000 context. This caused /model to report
128k for models like gpt-5.3-codex when models.dev was unavailable.

Added specific entries for GPT-5 variants with different context sizes:
- gpt-5.4, gpt-5.4-pro: 1,050,000 (1.05M)
- gpt-5.4-mini, gpt-5.4-nano: 400,000
- gpt-5.3-codex-spark: 128,000 (reduced)
- gpt-5.1-chat: 128,000 (chat variant)
- gpt-5 (catch-all): 400,000

Sources: https://developers.openai.com/api/docs/models
2026-04-13 19:22:23 -07:00
Teknium
f324222b79
fix: add vLLM/local server error patterns + MCP initial connection retry (#9281)
Port two improvements inspired by Kilo-Org/kilocode analysis:

1. Error classifier: add context overflow patterns for vLLM, Ollama,
   and llama.cpp/llama-server. These local inference servers return
   different error formats than cloud providers (e.g., 'exceeds the
   max_model_len', 'context length exceeded', 'slot context'). Without
   these patterns, context overflow errors from local servers are
   misclassified as format errors, causing infinite retries instead
   of triggering compression.

2. MCP initial connection retry: previously, if the very first
   connection attempt to an MCP server failed (e.g., transient DNS
   blip at startup), the server was permanently marked as failed with
   no retry. Post-connect reconnection had 5 retries with exponential
   backoff, but initial connection had zero. Now initial connections
   retry up to 3 times with backoff before giving up, matching the
   resilience of post-connect reconnection.
   (Inspired by Kilo Code's MCP server disappearing fix in v1.3.3)

Tests: 6 new error classifier tests, 4 new MCP retry tests, 1
updated existing test. All 276 affected tests pass.
2026-04-13 18:46:14 -07:00
arthurbr11
0a4cf5b3e1 feat(providers): add Arcee AI as direct API provider
Adds Arcee AI as a standard direct provider (ARCEEAI_API_KEY) with
Trinity models: trinity-large-thinking, trinity-large-preview, trinity-mini.

Standard OpenAI-compatible provider checklist: auth.py, config.py,
models.py, main.py, providers.py, doctor.py, model_normalize.py,
model_metadata.py, setup.py, trajectory_compressor.py.

Based on PR #9274 by arthurbr11, simplified to a standard direct
provider without dual-endpoint OpenRouter routing.
2026-04-13 18:40:06 -07:00
Teknium
8d023e43ed
refactor: remove dead code — 1,784 lines across 77 files (#9180)
Deep scan with vulture, pyflakes, and manual cross-referencing identified:
- 41 dead functions/methods (zero callers in production)
- 7 production-dead functions (only test callers, tests deleted)
- 5 dead constants/variables
- ~35 unused imports across agent/, hermes_cli/, tools/, gateway/

Categories of dead code removed:
- Refactoring leftovers: _set_default_model, _setup_copilot_reasoning_selection,
  rebuild_lookups, clear_session_context, get_logs_dir, clear_session
- Unused API surface: search_models_dev, get_pricing, skills_categories,
  get_read_files_summary, clear_read_tracker, menu_labels, get_spinner_list
- Dead compatibility wrappers: schedule_cronjob, list_cronjobs, remove_cronjob
- Stale debug helpers: get_debug_session_info copies in 4 tool files
  (centralized version in debug_helpers.py already exists)
- Dead gateway methods: send_emote, send_notice (matrix), send_reaction
  (bluebubbles), _normalize_inbound_text (feishu), fetch_room_history
  (matrix), _start_typing_indicator (signal), parse_feishu_post_content
- Dead constants: NOUS_API_BASE_URL, SKILLS_TOOL_DESCRIPTION,
  FILE_TOOLS, VALID_ASPECT_RATIOS, MEMORY_DIR
- Unused UI code: _interactive_provider_selection,
  _interactive_model_selection (superseded by prompt_toolkit picker)

Test suite verified: 609 tests covering affected files all pass.
Tests for removed functions deleted. Tests using removed utilities
(clear_read_tracker, MEMORY_DIR) updated to use internal APIs directly.
2026-04-13 16:32:04 -07:00
Teknium
b27eaaa4db fix: improve ACP type check and restore comment accuracy
- Use isinstance() with try/except import for CopilotACPClient check
  in _to_async_client instead of fragile __class__.__name__ string check
- Restore accurate comment: GPT-5.x models *require* (not 'often require')
  the Responses API on OpenAI/OpenRouter; ACP is the exception, not a
  softening of the requirement
- Add inline comment explaining the ACP exclusion rationale
2026-04-13 16:17:43 -07:00
helix4u
8680f61f8b fix(copilot-acp): keep acp runtime off responses path 2026-04-13 16:17:43 -07:00
Teknium
0e60a9dc25 fix: add kimi-coding-cn to remaining provider touchpoints
Follow-up for salvaged PR #7637. Adds kimi-coding-cn to:
- model_normalize.py (prefix strip)
- providers.py (models.dev mapping)
- runtime_provider.py (credential resolution)
- setup.py (model list + setup label)
- doctor.py (health check)
- trajectory_compressor.py (URL detection)
- models_dev.py (registry mapping)
- integrations/providers.md (docs)
2026-04-13 11:20:37 -07:00
hcshen0111
2b3aa36242 feat(providers): add kimi-coding-cn provider for mainland China users
Cherry-picked from PR #7637 by hcshen0111.
Adds kimi-coding-cn provider with dedicated KIMI_CN_API_KEY env var
and api.moonshot.cn/v1 endpoint for China-region Moonshot users.
2026-04-13 11:20:37 -07:00
墨綠BG
c449cd1af5 fix(config): restore custom providers after v11→v12 migration
The v11→v12 migration converts custom_providers (list) into providers
(dict), then deletes the list. But all runtime resolvers read from
custom_providers — after migration, named custom endpoints silently stop
resolving and fallback chains fail with AuthError.

Add get_compatible_custom_providers() that reads from both config schemas
(legacy custom_providers list + v12+ providers dict), normalizes entries,
deduplicates, and returns a unified list. Update ALL consumers:

- hermes_cli/runtime_provider.py: _get_named_custom_provider() + key_env
- hermes_cli/auth_commands.py: credential pool provider names
- hermes_cli/main.py: model picker + _model_flow_named_custom()
- agent/auxiliary_client.py: key_env + custom_entry model fallback
- agent/credential_pool.py: _iter_custom_providers()
- cli.py + gateway/run.py: /model switch custom_providers passthrough
- run_agent.py + gateway/run.py: per-model context_length lookup

Also: use config.pop() instead of del for safer migration, fix stale
_config_version assertions in tests, add pool mock to codex test.

Co-authored-by: 墨綠BG <s5460703@gmail.com>
Closes #8776, salvaged from PR #8814
2026-04-13 10:50:52 -07:00
luyao618
8ec1608642 fix(agent): propagate api_mode to vision provider resolution
resolve_vision_provider_client() computed resolved_api_mode from config
but never passed it to downstream resolve_provider_client() or
_get_cached_client() calls, causing custom providers with
api_mode: anthropic_messages to crash when used for vision tasks.

Also remove the for_vision special case in _normalize_aux_provider()
that incorrectly discarded named custom provider identifiers.

Fixes #8857

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 05:02:54 -07:00
Teknium
e3ffe5b75f
fix: remove legacy compression.summary_* config and env var fallbacks (#8992)
Remove the backward-compat code paths that read compression provider/model
settings from legacy config keys and env vars, which caused silent failures
when auto-detection resolved to incompatible backends.

What changed:
- Remove compression.summary_model, summary_provider, summary_base_url from
  DEFAULT_CONFIG and cli.py defaults
- Remove backward-compat block in _resolve_task_provider_model() that read
  from the legacy compression section
- Remove _get_auxiliary_provider() and _get_auxiliary_env_override() helper
  functions (AUXILIARY_*/CONTEXT_* env var readers)
- Remove env var fallback chain for per-task overrides
- Update hermes config show to read from auxiliary.compression
- Add config migration (v16→17) that moves non-empty legacy values to
  auxiliary.compression and strips the old keys
- Update example config and openclaw migration script
- Remove/update tests for deleted code paths

Compression model/provider is now configured exclusively via:
  auxiliary.compression.provider / auxiliary.compression.model

Closes #8923
2026-04-13 04:59:26 -07:00
Richard Li
82901695ff feat(wecom): add platform hint for native media sending 2026-04-13 04:46:04 -07:00
ismell0992-afk
3e99964789 fix(agent): prefer Ollama Modelfile num_ctx over GGUF training max
_query_local_context_length was checking model_info.context_length
(the GGUF training max) before num_ctx (the Modelfile runtime override),
inverse to query_ollama_num_ctx. The two helpers therefore disagreed on
the same model:

  hermes-brain:qwen3-14b-ctx32k     # Modelfile: num_ctx 32768
  underlying qwen3:14b GGUF         # qwen3.context_length: 40960

query_ollama_num_ctx correctly returned 32768 (the value Ollama will
actually allocate KV cache for). _query_local_context_length returned
40960, which let ContextCompressor grow conversations past 32768 before
triggering compression — at which point Ollama silently truncated the
prefix, corrupting context.

Swap the order so num_ctx is checked first, matching query_ollama_num_ctx.
Adds a parametrized test that seeds both values and asserts num_ctx wins.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-13 04:24:07 -07:00
Teknium
400fe9b2a1 fix: add <thought> stripping to auxiliary_client + tests
auxiliary_client.py had its own regex mirroring _strip_think_blocks
but was missing the <thought> variant. Also adds test coverage for
<thought> paired and orphaned tags.
2026-04-12 12:44:49 -07:00
Teknium
7a67b13506
fix: title_generator no longer logs as 'compression' task
Changed task='compression' to task='title_generation' so auto-title
calls don't pollute logs with false compression alarms.
2026-04-12 04:17:18 -07:00
Teknium
17c72f176d
fix: make skill loading instructions more aggressive in system prompt (#8286)
The previous wording ('If one clearly matches') set too high a threshold,
and 'If none match, proceed normally' was an easy escape hatch for lazy
models. Now:

- Lowered threshold: 'matches or is even partially relevant'
- Added MUST directive and 'err on the side of loading' guidance
- Replaced permissive closer with 'only proceed without if genuinely none
  are relevant'

This should reduce cases where the agent skips loading relevant skills
unless explicitly forced.
2026-04-12 03:03:16 -07:00
Teknium
b321330362
feat: add WSL environment hint to system prompt (#8285)
When running inside WSL (Windows Subsystem for Linux), inject a hint into
the system prompt explaining that the Windows host filesystem is mounted
at /mnt/c/, /mnt/d/, etc. This lets the agent naturally translate Windows
paths (Desktop, Documents) to their /mnt/ equivalents without the user
needing to configure anything.

Uses the existing is_wsl() detection from hermes_constants (cached,
checks /proc/version for 'microsoft'). Adds build_environment_hints()
in prompt_builder.py — extensible for Termux, Docker, etc. later.

Closes the UX gap where WSL users had to manually explain path
translation to the agent every session.
2026-04-12 02:26:28 -07:00
Teknium
95fa78eb6c
fix: write refreshed Codex tokens back to ~/.codex/auth.json (#8277)
OpenAI OAuth refresh tokens are single-use and rotate on every refresh.
When Hermes refreshes a Codex token, it consumed the old refresh_token
but never wrote the new pair back to ~/.codex/auth.json. This caused
Codex CLI and VS Code to fail with 'refresh_token_reused' on their
next refresh attempt.

This mirrors the existing Anthropic write-back pattern where refreshed
tokens are written to ~/.claude/.credentials.json via
_write_claude_code_credentials().

Changes:
- Add _write_codex_cli_tokens() in hermes_cli/auth.py (parallel to
  _write_claude_code_credentials in anthropic_adapter.py)
- Call it from _refresh_codex_auth_tokens() (non-pool refresh path)
- Call it from credential_pool._refresh_entry() (pool happy path + retry)
- Add tests for the new write-back behavior
- Update existing test docstring to clarify _save_codex_tokens vs
  _write_codex_cli_tokens separation

Fixes refresh token conflict reported by @ec12edfae2cb221
2026-04-12 02:05:20 -07:00
Teknium
a1220977d3
fix: make skill loading instructions more aggressive in system prompt (#8209)
The previous wording ('If one clearly matches') set too high a threshold,
and 'If none match, proceed normally' was an easy escape hatch for lazy
models. Now:

- Lowered threshold: 'matches or is even partially relevant'
- Added MUST directive and 'err on the side of loading' guidance
- Replaced permissive closer with 'only proceed without if genuinely none
  are relevant'

This should reduce cases where the agent skips loading relevant skills
unless explicitly forced.
2026-04-12 01:46:34 -07:00
Teknium
078dba015d
fix: three provider-related bugs (#8161, #8181, #8147) (#8243)
- Add openai/openai-codex -> openai mapping to PROVIDER_TO_MODELS_DEV
  so context-length lookups use models.dev data instead of 128k fallback.
  Fixes #8161.

- Set api_mode from custom_providers entry when switching via hermes model,
  and clear stale api_mode when the entry has none. Also extract api_mode
  in _named_custom_provider_map(). Fixes #8181.

- Convert OpenAI image_url content blocks to Anthropic image blocks when
  the endpoint is Anthropic-compatible (MiniMax, MiniMax-CN, or any URL
  containing /anthropic). Fixes #8147.
2026-04-12 01:44:18 -07:00
Harish Kukreja
b1f13a8c5f fix(agent): route compression aux through live session runtime 2026-04-12 01:34:52 -07:00
Teknium
eb2a49f95a
fix: openai-codex and anthropic not appearing in /model picker for external credentials (#8224)
Users whose credentials exist only in external files — OpenAI Codex
OAuth tokens in ~/.codex/auth.json or Anthropic Claude Code credentials
in ~/.claude/.credentials.json — would not see those providers in the
/model picker, even though hermes auth and hermes model detected them.

Root cause: list_authenticated_providers() only checked the raw Hermes
auth store and env vars. External credential file fallbacks (Codex CLI
import, Claude Code file discovery) were never triggered.

Fix (three parts):
1. _seed_from_singletons() in credential_pool.py: openai-codex now
   imports from ~/.codex/auth.json when the Hermes auth store is empty,
   mirroring resolve_codex_runtime_credentials().
2. list_authenticated_providers() in model_switch.py: auth store + pool
   checks now run for ALL providers (not just OAuth auth_type), catching
   providers like anthropic that support both API key and OAuth.
3. list_authenticated_providers(): direct check for anthropic external
   credential files (Claude Code, Hermes PKCE). The credential pool
   intentionally gates anthropic behind is_provider_explicitly_configured()
   to prevent auxiliary tasks from silently consuming tokens. The /model
   picker bypasses this gate since it is discovery-oriented.
2026-04-12 00:33:42 -07:00
Teknium
1cec910b6a
fix: improve context compaction to prevent model answering stale questions (#8107)
After compression, models (especially Kimi 2.5) would sometimes respond
to questions from the summary instead of the latest user message. This
happened ~30% of the time on Telegram.

Root cause: the summary's 'Next Steps' section read as active instructions,
and the SUMMARY_PREFIX didn't explicitly tell the model to ignore questions
in the summary. When the summary merged into the first tail message, there
was no clear separator between historical context and the actual user message.

Changes inspired by competitor analysis (Claude Code, OpenCode, Codex):

1. SUMMARY_PREFIX rewritten with explicit 'Do NOT answer questions from
   this summary — respond ONLY to the latest user message AFTER it'

2. Summarizer preamble (shared by both prompts) adds:
   - 'Do NOT respond to any questions' (from OpenCode's approach)
   - 'Different assistant' framing (from Codex) to create psychological
     distance between summary content and active conversation

3. New summary sections:
   - '## Resolved Questions' — tracks already-answered questions with
     their answers, preventing re-answering (from Claude Code's
     'Pending user asks' pattern)
   - '## Pending User Asks' — explicitly marks unanswered questions
   - '## Remaining Work' replaces '## Next Steps' — passive framing
     avoids reading as active instructions

4. merge-summary-into-tail path now inserts a clear separator:
   '--- END OF CONTEXT SUMMARY — respond to the message below ---'

5. Iterative update prompt now instructs: 'Move answered questions to
   Resolved Questions' to maintain the resolved/pending distinction
   across multiple compactions.
2026-04-11 19:43:58 -07:00
Teknium
a0a02c1bc0
feat: /compress <focus> — guided compression with focus topic (#8017)
Adds an optional focus topic to /compress: `/compress database schema`
guides the summariser to preserve information related to the focus topic
(60-70% of summary budget) while compressing everything else more aggressively.
Inspired by Claude Code's /compact <focus>.

Changes:
- context_compressor.py: focus_topic parameter on _generate_summary() and
  compress(); appends FOCUS TOPIC guidance block to the LLM prompt
- run_agent.py: focus_topic parameter on _compress_context(), passed through
  to the compressor
- cli.py: _manual_compress() extracts focus topic from command string,
  preserves existing manual_compression_feedback integration (no regression)
- gateway/run.py: _handle_compress_command() extracts focus from event args
  and passes through — full gateway parity
- commands.py: args_hint="[focus topic]" on /compress CommandDef

Salvaged from PR #7459 (CLI /compress focus only — /context command deferred).
15 new tests across CLI, compressor, and gateway.
2026-04-11 19:23:29 -07:00
Teknium
5c2ecdec49
fix: use ceiling division for token estimation, deduplicate inline formula
Switch estimate_tokens_rough(), estimate_messages_tokens_rough(), and
estimate_request_tokens_rough() from floor division (len // 4) to
ceiling division ((len + 3) // 4). Short texts (1-3 chars) previously
estimated as 0 tokens, causing the compressor and pre-flight checks to
systematically undercount when many short tool results are present.

Also replaced the inline duplicate formula in run_conversation()
(total_chars // 4) with a call to the shared
estimate_messages_tokens_rough() function.

Updated 4 tests that hardcoded floor-division expected values.

Related: issue #6217, PR #6629
2026-04-11 16:33:40 -07:00
Teknium
c8aff74632
fix: prevent agent from stopping mid-task — compression floor, budget overhaul, activity tracking
Three root causes of the 'agent stops mid-task' gateway bug:

1. Compression threshold floor (64K tokens minimum)
   - The 50% threshold on a 100K-context model fired at 50K tokens,
     causing premature compression that made models lose track of
     multi-step plans.  Now threshold_tokens = max(50% * context, 64K).
   - Models with <64K context are rejected at startup with a clear error.

2. Budget warning removal — grace call instead
   - Removed the 70%/90% iteration budget warnings entirely.  These
     injected '[BUDGET WARNING: Provide your final response NOW]' into
     tool results, causing models to abandon complex tasks prematurely.
   - Now: no warnings during normal execution.  When the budget is
     actually exhausted (90/90), inject a user message asking the model
     to summarise, allow one grace API call, and only then fall back
     to _handle_max_iterations.

3. Activity touches during long terminal execution
   - _wait_for_process polls every 0.2s but never reported activity.
     The gateway's inactivity timeout (default 1800s) would fire during
     long-running commands that appeared 'idle.'
   - Now: thread-local activity callback fires every 10s during the
     poll loop, keeping the gateway's activity tracker alive.
   - Agent wires _touch_activity into the callback before each tool call.

Also: docs update noting 64K minimum context requirement.

Closes #7915 (root cause was agent-loop termination, not Weixin delivery limits).
2026-04-11 16:18:57 -07:00
Teknium
8c3935ebe8
fix: is_local_endpoint misses Docker/Podman DNS names (#7950)
* fix(tools): neutralize shell injection in _write_to_sandbox via path quoting

_write_to_sandbox interpolated storage_dir and remote_path directly into
a shell command passed to env.execute(). Paths containing shell
metacharacters (spaces, semicolons, $(), backticks) could trigger
arbitrary command execution inside the sandbox.

Fix: wrap both paths with shlex.quote(). Clean paths (alphanumeric +
slashes/hyphens/dots) are left unmodified by shlex.quote, so existing
behavior is unchanged. Paths with unsafe characters get single-quoted.

Tests added for spaces, $(command) substitution, and semicolon injection.

* fix: is_local_endpoint misses Docker/Podman DNS names

host.docker.internal, host.containers.internal, gateway.docker.internal,
and host.lima.internal are well-known DNS names that container runtimes
use to resolve the host machine. Users running Ollama on the host with
the agent in Docker/Podman hit the default 120s stream timeout instead
of the bumped 1800s because these hostnames weren't recognized as local.

Add _CONTAINER_LOCAL_SUFFIXES tuple and suffix check in
is_local_endpoint(). Tests cover all three runtime families plus a
negative case for domains that merely contain the suffix as a substring.
2026-04-11 14:46:18 -07:00
Teknium
04c1c5d53f
refactor: extract shared helpers to deduplicate repeated code patterns (#7917)
* refactor: add shared helper modules for code deduplication

New modules:
- gateway/platforms/helpers.py: MessageDeduplicator, TextBatchAggregator,
  strip_markdown, ThreadParticipationTracker, redact_phone
- hermes_cli/cli_output.py: print_info/success/warning/error, prompt helpers
- tools/path_security.py: validate_within_dir, has_traversal_component
- utils.py additions: safe_json_loads, read_json_file, read_jsonl,
  append_jsonl, env_str/lower/int/bool helpers
- hermes_constants.py additions: get_config_path, get_skills_dir,
  get_logs_dir, get_env_path

* refactor: migrate gateway adapters to shared helpers

- MessageDeduplicator: discord, slack, dingtalk, wecom, weixin, mattermost
- strip_markdown: bluebubbles, feishu, sms
- redact_phone: sms, signal
- ThreadParticipationTracker: discord, matrix
- _acquire/_release_platform_lock: telegram, discord, slack, whatsapp,
  signal, weixin

Net -316 lines across 19 files.

* refactor: migrate CLI modules to shared helpers

- tools_config.py: use cli_output print/prompt + curses_radiolist (-117 lines)
- setup.py: use cli_output print helpers + curses_radiolist (-101 lines)
- mcp_config.py: use cli_output prompt (-15 lines)
- memory_setup.py: use curses_radiolist (-86 lines)

Net -263 lines across 5 files.

* refactor: migrate to shared utility helpers

- safe_json_loads: agent/display.py (4 sites)
- get_config_path: skill_utils.py, hermes_logging.py, hermes_time.py
- get_skills_dir: skill_utils.py, prompt_builder.py
- Token estimation dedup: skills_tool.py imports from model_metadata
- Path security: skills_tool, cronjob_tools, skill_manager_tool, credential_files
- Non-atomic YAML writes: doctor.py, config.py now use atomic_yaml_write
- Platform dict: new platforms.py, skills_config + tools_config derive from it
- Anthropic key: new get_anthropic_key() in auth.py, used by doctor/status/config/main

* test: update tests for shared helper migrations

- test_dingtalk: use _dedup.is_duplicate() instead of _is_duplicate()
- test_mattermost: use _dedup instead of _seen_posts/_prune_seen
- test_signal: import redact_phone from helpers instead of signal
- test_discord_connect: _platform_lock_identity instead of _token_lock_identity
- test_telegram_conflict: updated lock error message format
- test_skill_manager_tool: 'escapes' instead of 'boundary' in error msgs
2026-04-11 13:59:52 -07:00
Teknium
976bad5bde
refactor(auxiliary): config.yaml takes priority over env vars for aux task settings (#7889)
The auxiliary client previously checked env vars (AUXILIARY_{TASK}_PROVIDER,
AUXILIARY_{TASK}_MODEL, etc.) before config.yaml's auxiliary.{task}.* section.
This violated the project's '.env is for secrets only' policy — these are
behavioral settings, not API keys.

Flipped the resolution order in _resolve_task_provider_model():
  1. Explicit args (always win)
  2. config.yaml auxiliary.{task}.* (PRIMARY)
  3. Env var overrides (backward-compat fallback only)
  4. 'auto' (full auto-detection chain)

Env var reading code is kept for backward compatibility but config.yaml
now takes precedence. Updated module docstring and function docstring.

Also removed AUXILIARY_VISION_MODEL from _EXTRA_ENV_KEYS in config.py.
2026-04-11 11:21:59 -07:00
Teknium
d4bb44d4b9 docs: add Xiaomi MiMo to all provider docs + fix MiMo-V2-Flash ctx len
- environment-variables.md: XIAOMI_API_KEY, XIAOMI_BASE_URL, provider list
- cli-commands.md: --provider choices
- integrations/providers.md: provider table, Chinese providers section,
  config example, base URL list, choosing table, fallback providers list
- fallback-providers.md: supported providers table, auto-detection chain
- Fix XiaomiMiMo/MiMo-V2-Flash context length 32768 → 256000 (OpenRouter entry)
2026-04-11 11:17:52 -07:00
kshitijk4poor
6693e2a497 feat(xiaomi): add Xiaomi MiMo as first-class provider
Cherry-picked from PR #7702 by kshitijk4poor.

Adds Xiaomi MiMo as a direct provider (XIAOMI_API_KEY) with models:
- mimo-v2-pro (1M context), mimo-v2-omni (256K, multimodal), mimo-v2-flash (256K, cheapest)

Standard OpenAI-compatible provider checklist: auth.py, config.py, models.py,
main.py, providers.py, doctor.py, model_normalize.py, model_metadata.py,
models_dev.py, auxiliary_client.py, .env.example, cli-config.yaml.example.

Follow-up: vision tasks use mimo-v2-omni (multimodal) instead of the user's
main model. Non-vision aux uses the user's selected model. Added
_PROVIDER_VISION_MODELS dict for provider-specific vision model overrides.
On failure, falls back to aggregators (gemini flash) via existing fallback chain.

Corrects pre-existing context lengths: mimo-v2-pro 1048576→1000000,
mimo-v2-omni 1048576→256000, adds mimo-v2-flash 256000.

36 tests covering registry, aliases, auto-detect, credentials, models.dev,
normalization, URL mapping, providers module, doctor, aux client, vision
model override, and agent init.
2026-04-11 11:17:52 -07:00
kshitijk4poor
50bb4fe010 fix(vision): auto-resize oversized images, increase default timeout, fix vision capability detection
Cherry-picked from PR #7749 by kshitijk4poor with modifications:

- Raise hard image limit from 5 MB to 20 MB (matches most restrictive provider)
- Send images at full resolution first; only auto-resize to 5 MB on API failure
- Add _is_image_size_error() helper to detect size-related API rejections
- Auto-resize uses Pillow (soft dep) with progressive downscale + JPEG quality reduction
- Fix get_model_capabilities() to check modalities.input for vision support
- Increase default vision timeout from 30s to 120s (matches hardcoded fallback intent)
- Applied retry-with-resize to both vision_analyze_tool and browser_vision

Closes #7740
2026-04-11 11:12:50 -07:00
kshitijk4poor
af9caec44f fix(qwen): correct context lengths for qwen3-coder models and send max_tokens to portal
Based on PR #7285 by @kshitijk4poor.

Two bugs affecting Qwen OAuth users:

1. Wrong context window — qwen3-coder-plus showed 128K instead of 1M.
   Added specific entries before the generic qwen catch-all:
   - qwen3-coder-plus: 1,000,000 (corrected from PR's 1,048,576 per
     official Alibaba Cloud docs and OpenRouter)
   - qwen3-coder: 262,144

2. Random stopping — max_tokens was suppressed for Qwen Portal, so the
   server applied its own low default. Reasoning models exhaust that on
   thinking tokens. Now: honor explicit max_tokens, default to 65536
   when unset.

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-04-11 03:29:31 -07:00
aaronagent
307697688e fix: prevent zombie processes, redact cron stderr, skip symlinks in skill enumeration
process_registry.py: _reader_loop() has process.wait() after the try-except
block (line 380).  If the reader thread crashes with an unexpected exception
(e.g. MemoryError, KeyboardInterrupt), control exits the except handler but
skips wait() — leaving the child as a zombie process.  Move wait() and the
cleanup into a finally block so the child is always reaped.

cron/scheduler.py: _run_job_script() only redacts secrets in stdout on the
SUCCESS path (line 417-421).  When a cron script fails (non-zero exit), both
stdout and stderr are returned WITHOUT redaction (lines 407-413).  A script
that accidentally prints an API key to stderr during a failure would leak it
into the LLM context.  Move redaction before the success/failure branch so
both paths benefit.

skill_commands.py: _build_skill_message() enumerates supporting files using
rglob("*") but only checks is_file() (line 171) without filtering symlinks.
PR #6693 added symlink protection to scan_skill_commands() but missed this
function.  A malicious skill can create symlinks in references/ pointing to
arbitrary files, exposing their paths (and potentially content via skill_view)
to the LLM.  Add is_symlink() check to match the guard in scan_skill_commands.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 02:03:20 -07:00
kshitijk4poor
c89719ad9c fix: warn and clear stale OPENAI_BASE_URL on provider switch (#5161) 2026-04-11 01:52:58 -07:00
kshitijk4poor
d3c5d65563 fix(auxiliary): validate response shape in call_llm/async_call_llm (#7264)
async_call_llm (and call_llm) can return non-OpenAI objects from
custom providers or adapter shims, crashing downstream consumers
with misleading AttributeError ('str' has no attribute 'choices').

Add _validate_llm_response() that checks the response has the
expected .choices[0].message shape before returning. Wraps all
return paths in call_llm, async_call_llm, and fallback paths.
Fails fast with a clear RuntimeError identifying the task, response
type, and a preview of the malformed payload.

Closes #7264
2026-04-11 01:52:58 -07:00
ran
4f5e8b22a7 fix: drop incompatible model slugs on auxiliary client cache hit
`resolve_provider_client()` already drops OpenRouter-format model slugs
(containing "/") when the resolved provider is not OpenRouter (line 1097).
However, `_get_cached_client()` returns `model or cached_default` directly
on cache hits, bypassing this check entirely.

When the main provider is openai-codex, the auto-detection chain (Step 1
of `_resolve_auto`) caches a CodexAuxiliaryClient. Subsequent auxiliary
calls for different tasks (e.g. compression with `summary_model:
google/gemini-3-flash-preview`) hit the cache and pass the OpenRouter-
format model slug straight to the Codex Responses API, which does not
understand it and returns an empty `response.output`.

This causes two user-visible failures:
- "Invalid API response shape" (empty output after 3 retries)
- "Context length exceeded, cannot compress further" (compression itself
  fails through the same path)

Add `_compat_model()` helper that mirrors the "/" check from
`resolve_provider_client()` and call it on the cache-hit return path.
2026-04-11 01:52:58 -07:00
kshitijk4poor
eeb8b4b00f fix(auxiliary): harden fallback behavior for non-OpenRouter users
Four fixes to auxiliary_client.py:

1. Respect explicit provider as hard constraint (#7559)
   When auxiliary.{task}.provider is explicitly set (not 'auto'),
   connection/payment errors no longer silently fallback to cloud
   providers. Local-only users (Ollama, vLLM) will no longer get
   unexpected OpenRouter billing from auxiliary tasks.

2. Eliminate model='default' sentinel (#7512)
   _resolve_api_key_provider() no longer sends literal 'default' as
   model name to APIs. Providers without a known aux model in
   _API_KEY_PROVIDER_AUX_MODELS are skipped instead of producing
   model_not_supported errors.

3. Add payment/connection fallback to async_call_llm (#7512)
   async_call_llm now mirrors sync call_llm's fallback logic for
   payment (402) and connection errors. Previously, async consumers
   (session_search, web_tools, vision) got hard failures with no
   recovery. Also fixes hardcoded 'openrouter' fallback to use the
   full auto-detection chain.

4. Use accurate error reason in fallback logs (#7512)
   _try_payment_fallback() now accepts a reason parameter and uses
   it in log messages. Connection timeouts are no longer misleadingly
   logged as 'payment error'.

Closes #7559
Closes #7512
2026-04-11 01:52:58 -07:00
kshitijk4poor
ffbd80f5fc fix(auxiliary): honor api_mode in auxiliary client (#6800)
The auxiliary client always calls client.chat.completions.create(),
ignoring the api_mode config flag. This breaks codex-family models
(e.g. gpt-5.3-codex) on direct OpenAI API keys, which need the
/v1/responses endpoint.

Changes:
- Expand _resolve_task_provider_model to return api_mode (5-tuple)
- Read api_mode from auxiliary.{task}.api_mode config and env vars
  (AUXILIARY_{TASK}_API_MODE)
- Pass api_mode through _get_cached_client to resolve_provider_client
- Add _needs_codex_wrap/_wrap_if_needed helpers that wrap plain OpenAI
  clients in CodexAuxiliaryClient when api_mode=codex_responses or
  when auto-detection finds api.openai.com + codex model pattern
- Apply wrapping at all custom endpoint, named custom provider, and
  API-key provider return paths
- Update test mocks for the new 5-tuple return format

Users can now set:
  auxiliary:
    compression:
      model: gpt-5.3-codex
      base_url: https://api.openai.com/v1
      api_mode: codex_responses

Closes #6800
2026-04-11 01:52:58 -07:00
Long Hao
58b62e3e43 feat(skin): make all CLI colors skin-aware
Refactor hardcoded color constants throughout the CLI to resolve from
the active skin engine, so custom themes fully control the visual
appearance.

cli.py:
- Replace _GOLD constant with _ACCENT (_SkinAwareAnsi class) that
  lazily resolves response_border from the active skin
- Rename _GOLD_DEFAULT to _ACCENT_ANSI_DEFAULT
- Make _build_compact_banner() read banner_title/accent/dim from skin
- Make session resume notifications use _accent_hex()
- Make status line use skin colors (accent_color, separator_color,
  label_color instead of cryptic _dim_c/_dim_c2/_accent_c/_label_c)
- Reset _ACCENT cache on /skin switch

agent/display.py:
- Replace hardcoded diff ANSI escapes with skin-aware functions:
  _diff_dim(), _diff_file(), _diff_hunk(), _diff_minus(), _diff_plus()
  (renamed from SCREAMING_CASE _ANSI_* to snake_case)
- Add reset_diff_colors() for cache invalidation on skin switch
2026-04-11 01:47:48 -07:00
kshitijk4poor
d442f25a2f fix: align MiniMax provider with official API docs
Aligns MiniMax provider with official API documentation. Fixes 6 bugs:
transport mismatch (openai_chat -> anthropic_messages), credential leak
in switch_model(), prompt caching sent to non-Anthropic endpoints,
dot-to-hyphen model name corruption, trajectory compressor URL routing,
and stale doctor health check.

Also corrects context window (204,800), thinking support (manual mode),
max output (131,072), and model catalog (M2 family only on /anthropic).

Source: https://platform.minimax.io/docs/api-reference/text-anthropic-api

Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
2026-04-11 01:04:41 -07:00
Teknium
caf371da18
fix: MiniMax/Alibaba incorrectly detected as Anthropic OAuth, causing mcp_ tool prefix (#7509)
_is_oauth_token() returned True for any key not starting with 'sk-ant-api',
which means MiniMax and Alibaba API keys were falsely treated as Anthropic
OAuth tokens. This triggered the Claude Code compatibility path:
- All tool names prefixed with mcp_ (e.g. mcp_terminal, mcp_web_search)
- System prompt injected with 'You are Claude Code' identity
- 'Hermes Agent' replaced with 'Claude Code' throughout

Fix: Make _is_oauth_token() positively identify Anthropic OAuth tokens by
their key format instead of using a broad catch-all:
- sk-ant-* (but not sk-ant-api-*) -> setup tokens, managed keys
- eyJ* -> JWTs from Anthropic OAuth flow
- Everything else -> False (MiniMax, Alibaba, etc.)

Reported by stefan171.
2026-04-11 00:43:01 -07:00
Kenny Xie
1ffd92cc94 fix(gateway): make manual compression feedback truthful 2026-04-10 21:16:53 -07:00
hermes-agent-dhabibi
c1af614289 fix: wrap copilot Responses-API models in CodexAuxiliaryClient for auxiliary tasks
GPT-5+ models (except gpt-5-mini) are only accessible via the Responses
API on Copilot. When these models were configured as the compression
summary_model (or any auxiliary task), the plain OpenAI client sent them
to /chat/completions which returned a 400 error:

    model "gpt-5.4-mini" is not accessible via the /chat/completions endpoint

resolve_provider_client() now checks _should_use_copilot_responses_api()
for the copilot provider and wraps the client in CodexAuxiliaryClient
when needed, routing calls through responses.stream() transparently.

Adds tests for both the wrapping (gpt-5.4-mini) and non-wrapping
(gpt-4.1-mini) paths.
2026-04-10 21:16:53 -07:00
Teknium
3fe6938176 fix: robust context engine interface — config selection, plugin discovery, ABC completeness
Follow-up fixes for the context engine plugin slot (PR #5700):

- Enhance ContextEngine ABC: add threshold_percent, protect_first_n,
  protect_last_n as class attributes; complete update_model() default
  with threshold recalculation; clarify on_session_end() lifecycle docs
- Add ContextCompressor.update_model() override for model/provider/
  base_url/api_key updates
- Replace all direct compressor internal access in run_agent.py with
  ABC interface: switch_model(), fallback restore, context probing
  all use update_model() now; _context_probed guarded with getattr/
  hasattr for plugin engine compatibility
- Create plugins/context_engine/ directory with discovery module
  (mirrors plugins/memory/ pattern) — discover_context_engines(),
  load_context_engine()
- Add context.engine config key to DEFAULT_CONFIG (default: compressor)
- Config-driven engine selection in run_agent.__init__: checks config,
  then plugins/context_engine/<name>/, then general plugin system,
  falls back to built-in ContextCompressor
- Wire on_session_end() in shutdown_memory_provider() at real session
  boundaries (CLI exit, /reset, gateway expiry)
2026-04-10 19:15:50 -07:00
Stephen Schoettler
5d8dd622bc feat: wire context engine tools, session lifecycle, and tool dispatch
- Inject engine tool schemas into agent tool surface after compressor init
- Call on_session_start() with session_id, hermes_home, platform, model
- Dispatch engine tool calls (lcm_grep, etc.) before regular tool handler
- 55/55 tests pass
2026-04-10 19:15:50 -07:00
Stephen Schoettler
92382fb00e feat: wire context engine plugin slot into agent and plugin system
- PluginContext.register_context_engine() lets plugins replace the
  built-in ContextCompressor with a custom ContextEngine implementation
- PluginManager stores the registered engine; only one allowed
- run_agent.py checks for a plugin engine at init before falling back
  to the default ContextCompressor
- reset_session_state() now calls engine.on_session_reset() instead of
  poking internal attributes directly
- ContextCompressor.on_session_reset() handles its own internals
  (_context_probed, _previous_summary, etc.)
- 19 new tests covering ABC contract, defaults, plugin slot registration,
  rejection of duplicates/non-engines, and compressor reset behavior
- All 34 existing compressor tests pass unchanged
2026-04-10 19:15:50 -07:00
Stephen Schoettler
fe7e6c156c feat: add ContextEngine ABC, refactor ContextCompressor to inherit from it
Introduces agent/context_engine.py — an abstract base class that defines
the pluggable context engine interface. ContextCompressor now inherits
from ContextEngine as the default implementation.

No behavior change. All 34 existing compressor tests pass.

This is the foundation for a context engine plugin slot, enabling
third-party engines like LCM (Lossless Context Management) to replace
the built-in compressor via the plugin system.
2026-04-10 19:15:50 -07:00
0xFrank-eth
e8034e2f6a fix(gateway): replace os.environ session state with contextvars for concurrency safety
When two gateway messages arrived concurrently, _set_session_env wrote
HERMES_SESSION_PLATFORM/CHAT_ID/CHAT_NAME/THREAD_ID into the process-global
os.environ. Because asyncio tasks share the same process, Message B would
overwrite Message A's values mid-flight, causing background-task notifications
and tool calls to route to the wrong thread/chat.

Replace os.environ with Python's contextvars.ContextVar. Each asyncio task
(and any run_in_executor thread it spawns) gets its own copy, so concurrent
messages never interfere.

Changes:
- New gateway/session_context.py with ContextVar definitions, set/clear/get
  helpers, and os.environ fallback for CLI/cron/test backward compatibility
- gateway/run.py: _set_session_env returns reset tokens, _clear_session_env
  accepts them for proper cleanup in finally blocks
- All tool consumers updated: cronjob_tools, send_message_tool, skills_tool,
  terminal_tool (both notify_on_complete AND check_interval blocks), tts_tool,
  agent/skill_utils, agent/prompt_builder
- Tests updated for new contextvar-based API

Fixes #7358

Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
2026-04-10 17:04:38 -07:00
Billard
475cbce775 fix(aux): honor api_mode for custom auxiliary endpoints 2026-04-10 16:47:44 -07:00
Julien Talbot
8bcb8b8e87 feat(providers): add native xAI provider
Adds xAI as a first-class provider: ProviderConfig in auth.py,
HermesOverlay in providers.py, 11 curated Grok models, URL mapping
in model_metadata.py, aliases (x-ai, x.ai), and env var tests.
Uses standard OpenAI-compatible chat completions.

Closes #7050
2026-04-10 13:40:38 -07:00
WAXLYY
c6e1add6f1 fix(agent): preserve quoted @file references with spaces 2026-04-10 13:05:01 -07:00
Teknium
be4f049f46 fix: salvage follow-ups for Weixin adapter (#6747)
- Remove sys.path.insert hack (leftover from standalone dev)
- Add token lock (acquire_scoped_lock/release_scoped_lock) in
  connect()/disconnect() to prevent duplicate pollers across profiles
- Fix get_connected_platforms: WEIXIN check must precede generic
  token/api_key check (requires both token AND account_id)
- Add WEIXIN_HOME_CHANNEL_NAME to _EXTRA_ENV_KEYS
- Add gateway setup wizard with QR login flow
- Add platform status check for partially configured state
- Add weixin.md docs page with full adapter documentation
- Update environment-variables.md reference with all 11 env vars
- Update sidebars.ts to include weixin docs page
- Wire all gateway integration points onto current main

Salvaged from PR #6747 by Zihan Huang.
2026-04-10 05:54:37 -07:00
Kenny Xie
b730c2955a fix(model): normalize direct provider ids in auxiliary routing 2026-04-10 05:52:45 -07:00
Teknium
5fc5ced972 fix: add Alibaba/DashScope rate-limit pattern to error classifier
Port from anomalyco/opencode#21355: Alibaba's DashScope API returns a
unique throttling message ('Request rate increased too quickly...') that
doesn't match standard rate-limit patterns ('rate limit', 'too many
requests'). This caused Alibaba errors to fall through to the 'unknown'
category rather than being properly classified as rate_limit with
appropriate backoff/rotation.

Add 'rate increased too quickly' to _RATE_LIMIT_PATTERNS and test with
the exact error message observed from the Alibaba provider.
2026-04-10 05:52:45 -07:00
Teknium
6d2fa03837
fix: UTF-8 config encoding, pairing hint, credential_pool key, header normalization (#7174)
Four small fixes: (1) UTF-8 encoding for config open (@zhangchn #7063), (2) pairing hint placeholders (@konsisumer #7057), (3) missing credential_pool in cheap route (@kuishou68 #7025), (4) case-insensitive rate limit headers (@kuishou68 #7019).
2026-04-10 05:33:48 -07:00
xwp
5a1cce53e4 fix(auxiliary): skip anthropic in fallback chain when not explicitly configured
_resolve_api_key_provider() now checks is_provider_explicitly_configured
before calling _try_anthropic().  Previously, any auxiliary fallback
(e.g. when kimi-coding key was invalid) would silently discover and use
Claude Code OAuth tokens — consuming the user's Claude Max subscription
without their knowledge.

This is the auxiliary-client counterpart of the setup-wizard gate in
PR #4210.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 05:19:21 -07:00
xwp
419b719c2b fix(auth): make 'auth remove' for claude_code prevent re-seeding
Previously, removing a claude_code credential from the anthropic pool
only printed a note — the next load_pool() re-seeded it from
~/.claude/.credentials.json.  Now writes a 'suppressed_sources' flag
to auth.json that _seed_from_singletons checks before seeding.

Follows the pattern of env: source removal (clears .env var) and
device_code removal (clears auth store state).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 05:19:21 -07:00
xwp
f3fb3eded4 fix(auth): gate Claude Code credential seeding behind explicit provider config
_seed_from_singletons('anthropic') now checks
is_provider_explicitly_configured('anthropic') before reading
~/.claude/.credentials.json.  Without this, the auxiliary client
fallback chain silently discovers and uses Claude Code tokens when
the user's primary provider key is invalid — consuming their Claude
Max subscription quota without consent.

Follows the same gating pattern as PR #4210 (setup wizard gate)
but applied to the credential pool seeding path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 05:19:21 -07:00
alt-glitch
96c060018a fix: remove 115 verified dead code symbols across 46 production files
Automated dead code audit using vulture + coverage.py + ast-grep intersection,
confirmed by Opus deep verification pass. Every symbol verified to have zero
production callers (test imports excluded from reachability analysis).

Removes ~1,534 lines of dead production code across 46 files and ~1,382 lines
of stale test code. 3 entire files deleted (agent/builtin_memory_provider.py,
hermes_cli/checklist.py, tests/hermes_cli/test_setup_model_selection.py).

Co-authored-by: alt-glitch <balyan.sid@gmail.com>
2026-04-10 03:44:43 -07:00
aaronagent
9afe1784bd fix: hidden_div regex bypass with newlines, credential config silent failure, webhook route error severity
prompt_builder.py: The `hidden_div` detection pattern uses `.*` which does not
match newlines in Python regex (re.DOTALL is not passed).  An attacker can bypass
detection by splitting the style attribute across lines:
  `<div style="color:red;\ndisplay: none">injected content</div>`
Replace `.*` with `[\s\S]*?` to match across line boundaries.

credential_files.py: `_load_config_files()` catches all exceptions at DEBUG level
(line 171), making YAML parse failures invisible in production logs.  Users whose
credential files silently fail to mount into sandboxes have no diagnostic clue.
Promote to WARNING to match the severity pattern used by the path validation
warnings at lines 150 and 158 in the same function.

webhook.py: `_reload_dynamic_routes()` logs JSON parse failures at WARNING (line
265) but the impact — stale/corrupted dynamic routes persisting silently — warrants
ERROR level to ensure operator visibility in alerting pipelines.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 03:05:04 -07:00
aaronagent
738f0bac13 fix: align auth-by-message classification with status-code path, decode URLs before secret check
error_classifier.py: Message-only auth errors ("invalid api key", "unauthorized",
etc.) were classified as retryable=True (line 707), inconsistent with the HTTP 401
path (line 432) which correctly uses retryable=False + should_fallback=True.  The
mismatch causes 3 wasted retries with the same broken credential before fallback,
while 401 errors immediately attempt fallback.  Align the message-based path to
match: retryable=False, should_fallback=True.

web_tools.py: The _PREFIX_RE secret-detection check in web_extract_tool() runs
against the raw URL string (line 1196).  URL-encoded secrets like %73k-1234... (
sk-1234...) bypass the filter because the regex expects literal ASCII.  Add
urllib.parse.unquote() before the check so percent-encoded variants are also caught.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 03:05:04 -07:00
Julien Talbot
b577697189 fix(model_metadata): add xAI Grok context length fallbacks
xAI /v1/models does not return context_length metadata, so Hermes
probes down to the 128k default whenever a user configures a custom
provider pointing at https://api.x.ai/v1. This forces every xAI user
to manually override model.context_length in config.yaml (2M for
Grok 4.20 / 4.1-fast / 4-fast) or lose most of the usable context
window.

Add DEFAULT_CONTEXT_LENGTHS entries for the Grok family so the
fallback lookup returns the correct value via substring matching.
Values sourced from models.dev (2026-04) and cross-checked against
the xAI /v1/models listing:

  - grok-4.20-*          2,000,000  (reasoning, non-reasoning, multi-agent)
  - grok-4-1-fast-*      2,000,000
  - grok-4-fast-*        2,000,000
  - grok-4 / grok-4-0709   256,000
  - grok-code-fast-1       256,000
  - grok-3*                131,072
  - grok-2 / latest        131,072
  - grok-2-vision*           8,192
  - grok (catch-all)       131,072

Keys are ordered longest-first so that specific variants match before
the catch-all, consistent with the existing Claude/Gemma/MiniMax entries.

Add TestDefaultContextLengths.test_grok_models_context_lengths and
test_grok_substring_matching to pin the values and verify the full
lookup path. All 77 tests in test_model_metadata.py pass.
2026-04-10 03:04:19 -07:00
Cocoon-Break
45034b746f
fix: set retryable=False for message-based auth errors in _classify_by_message() (#7027)
Auth errors matched by message pattern were incorrectly marked retryable=True, causing futile retry loops. Aligns with _classify_by_status() which already sets retryable=False for 401/403. Fixes #7026. Contributed by @kuishou68.
2026-04-10 02:48:45 -07:00
kshitijk4poor
9431f82aff fix: update Kimi Coding User-Agent to KimiCLI/1.30.0
The hardcoded User-Agent 'KimiCLI/1.3' is outdated — Kimi CLI is now at
v1.30.0. The stale version string causes intermittent 403 errors from
Kimi's coding endpoint ('only available for Coding Agents').

Update all 8 occurrences across run_agent.py, auxiliary_client.py, and
doctor.py to 'KimiCLI/1.30.0' to match the current official Kimi CLI.
2026-04-10 02:37:28 -07:00
Teknium
8779a268a7
feat: add Anthropic Fast Mode support to /fast command (#7037)
Extends the /fast command to support Anthropic's Fast Mode beta in addition
to OpenAI Priority Processing. When enabled on Claude Opus 4.6, adds
speed:"fast" and the fast-mode-2026-02-01 beta header to API requests for
~2.5x faster output token throughput.

Changes:
- hermes_cli/models.py: Add _ANTHROPIC_FAST_MODE_MODELS registry,
  model_supports_fast_mode() now recognizes Claude Opus 4.6,
  resolve_fast_mode_overrides() returns {speed: fast} for Anthropic
  vs {service_tier: priority} for OpenAI
- agent/anthropic_adapter.py: Add _FAST_MODE_BETA constant,
  build_anthropic_kwargs() accepts fast_mode=True which injects
  speed:fast + beta header via extra_headers (skipped for third-party
  Anthropic-compatible endpoints like MiniMax)
- run_agent.py: Pass fast_mode to build_anthropic_kwargs in the
  anthropic_messages path of _build_api_kwargs()
- cli.py: Update _handle_fast_command with provider-aware messaging
  (shows 'Anthropic Fast Mode' vs 'Priority Processing')
- hermes_cli/commands.py: Update /fast description to mention both
  providers
- tests: 13 new tests covering Anthropic model detection, override
  resolution, CLI availability, routing, adapter kwargs, and
  third-party endpoint safety
2026-04-10 02:32:15 -07:00
emozilla
bda9aa17cb fix(streaming): prevent <think> in prose from suppressing response output
When the model mentions <think> as literal text in its response (e.g.
"(/think not producing <think> tags)"), the streaming display treated it
as a reasoning block opener and suppressed everything after it. The
response box would close with truncated content and no error — the API
response was complete but the display ate it.

Root cause: _stream_delta() matched <think> anywhere in the text stream
regardless of position. Real reasoning blocks always start at the
beginning of a line; mentions in prose appear mid-sentence.

Fix: track line position across streaming deltas with a
_stream_last_was_newline flag. Only enter reasoning suppression when
the tag appears at a block boundary (start of stream, after a newline,
or after only whitespace on the current line). Add a _flush_stream()
safety net that recovers buffered content if no closing tag is found
by end-of-stream.

Also fixes three related issues discovered during investigation:

- anthropic_adapter: _get_anthropic_max_output() now normalizes dots to
  hyphens so 'claude-opus-4.6' matches the 'claude-opus-4-6' table key
  (was returning 32K instead of 128K)

- run_agent: send explicit max_tokens for Claude models on Nous Portal,
  same as OpenRouter — both proxy to Anthropic's API which requires it.
  Without it the backend defaults to a low limit that truncates responses.

- run_agent: reset truncated_tool_call_retries after successful tool
  execution so a single truncation doesn't poison the entire conversation.
2026-04-09 22:16:36 -07:00
Teknium
4caa635803 fix: add auth.json write-back for Codex retry and valid-token early-return paths
The Codex retry block and valid-token short-circuit in _refresh_entry()
both return early, bypassing the auth.json sync at the end of the method.
This adds _sync_device_code_entry_to_auth_store() calls on both paths
so refreshed/synced tokens are written back to auth.json regardless of
which code path succeeds.
2026-04-09 21:48:50 -07:00