hermes-agent

Author	SHA1	Message	Date
Trev	63d06dd93d	fix(agent): downgrade xhigh→max on Anthropic pre-4.7 adaptive models Regression from #11161 (Claude Opus 4.7 migration, commit `0517ac3e`). The Opus 4.7 migration changed `ADAPTIVE_EFFORT_MAP["xhigh"]` from "max" (the pre-migration alias) to "xhigh" to preserve the new 4.7 effort level as distinct from max. This is correct for 4.7, but Opus/Sonnet 4.6 only expose 4 levels (low/medium/high/max) — sending "xhigh" there now 400s: BadRequestError [HTTP 400]: This model does not support effort level 'xhigh'. Supported levels: high, low, max, medium. Users who set reasoning_effort=xhigh as their default (xhigh is the recommended default for coding/agentic on 4.7 per the Anthropic migration guide) now 400 every request the moment they switch back to a 4.6 model via `/model` or config. Verified live against the Anthropic API on `anthropic==0.94.0`. Fix: make the mapping model-aware. Add `_supports_xhigh_effort()` predicate (matches 4-7/4.7 substrings, mirroring the existing `_supports_adaptive_thinking` / `_forbids_sampling_params` pattern). On pre-4.7 adaptive models, downgrade xhigh→max (the strongest effort those models accept, restoring pre-migration behavior). On 4.7+, keep xhigh as a distinct level. Per Anthropic's migration guide, xhigh is 4.7-only: https://platform.claude.com/docs/en/about-claude/models/migration-guide > Opus 4.7 effort levels: max, xhigh (new), high, medium, low. > Opus 4.6 effort levels: max, high, medium, low. SDK typing confirms: `anthropic.types.OutputConfigParam.effort: Literal[ "low", "medium", "high", "max"]` (v0.94.0 not yet updated for xhigh). ## Test plan Verified live on macOS 15.5 / anthropic==0.94.0: claude-opus-4-6 + effort=xhigh → output_config.effort=max → 200 OK claude-opus-4-7 + effort=xhigh → output_config.effort=xhigh → 200 OK claude-opus-4-6 + effort=max → output_config.effort=max → 200 OK claude-opus-4-7 + effort=max → output_config.effort=max → 200 OK `tests/agent/test_anthropic_adapter.py` — 120 pass (replaced 1 bugged test that asserted the broken behavior, added 1 for 4.7 preservation). Full adapter suite: 120 passed in 1.05s. Broader suite (agent + run_agent + cli/gateway reasoning): 2140 passed (2 pre-existing failures on clean upstream/main, unrelated). ## Platforms Tested on macOS 15.5. No platform-specific code paths touched.	2026-04-16 12:00:56 -07:00
trevthefoolish	0517ac3e93	fix(agent): complete Claude Opus 4.7 API migration Claude Opus 4.7 introduced several breaking API changes that the current codebase partially handled but not completely. This patch finishes the migration per the official migration guide at https://platform.claude.com/docs/en/about-claude/models/migration-guide Fixes NousResearch/hermes-agent#11137 Breaking-change coverage: 1. Adaptive thinking + output_config.effort — 4.7 is now recognized by _supports_adaptive_thinking() (extends previous 4.6-only gate). 2. Sampling parameter stripping — 4.7 returns 400 for any non-default temperature / top_p / top_k. build_anthropic_kwargs drops them as a safety net; the OpenAI-protocol auxiliary path (_build_call_kwargs) and AnthropicCompletionsAdapter.create() both early-exit before setting temperature for 4.7+ models. This keeps flush_memories and structured-JSON aux paths that hardcode temperature from 400ing when the aux model is flipped to 4.7. 3. thinking.display = "summarized" — 4.7 defaults display to "omitted", which silently hides reasoning text from Hermes's CLI activity feed during long tool runs. Restoring "summarized" preserves 4.6 UX. 4. Effort level mapping — xhigh now maps to xhigh (was xhigh→max, which silently over-efforted every coding/agentic request). max is now a distinct ceiling per Anthropic's 5-level effort model. 5. New stop_reason values — refusal and model_context_window_exceeded were silently collapsed to "stop" (end_turn) by the adapter's stop_reason_map. Now mapped to "content_filter" and "length" respectively, matching upstream finish-reason handling already in bedrock_adapter. 6. Model catalogs — claude-opus-4-7 added to the Anthropic provider list, anthropic/claude-opus-4.7 added at top of OpenRouter fallback catalog (recommended), claude-opus-4-7 added to model_metadata DEFAULT_CONTEXT_LENGTHS (1M, matching 4.6 per migration guide). 7. Prefill docstrings — run_agent.AIAgent and BatchRunner now document that Anthropic Sonnet/Opus 4.6+ reject a trailing assistant-role prefill (400). 8. Tests — 4 new tests in test_anthropic_adapter covering display default, xhigh preservation, max on 4.7, refusal / context-overflow stop_reason mapping, plus the sampling-param predicate. test_model_metadata accepts 4.7 at 1M context. Tested on macOS 15.5 (darwin). 119 tests pass in tests/agent/test_anthropic_adapter.py, 1320 pass in tests/agent/.	2026-04-16 10:48:20 -07:00
sontianye	f19ca50cd9	fix(context_compressor): always keep last user message in tail to prevent active-task loss Ensure _align_boundary_backward never pushes the last user message into the compressed region. Without this, compression could delete the user active task instruction mid-session. Cherry-picked from #10969 by @sontianye. Fixes #10896.	2026-04-16 07:45:31 -07:00
lrawnsley	8c1276c0bf	fix: pass resolved args to resolve_vision_provider_client() resolve_vision_provider_client() was receiving the raw call_llm parameters instead of the resolved provider/model/key/url from _resolve_task_provider_model(). This caused config overrides (auxiliary.vision.provider, etc.) to be silently discarded. Cherry-picked from #10901 by @lrawnsley.	2026-04-16 07:45:13 -07:00
Teknium	fe12042e50	fix: remove context pressure warnings entirely (#11039 ) The gateway compression notifications were already removed in commit `cc63b2d1` (PR #4139), but the agent-level context pressure warnings (85%/95% tiered alerts via _emit_context_pressure) were still firing on both CLI and gateway. Removed: - _emit_context_pressure method and all call sites in run_conversation() - Class-level dedup state (_context_pressure_last_warned, _CONTEXT_PRESSURE_COOLDOWN) - Instance attribute _context_pressure_warned_at - Pressure reset logic in _compress_context - format_context_pressure and format_context_pressure_gateway from agent/display.py - Orphaned ANSI constants that only served these functions - tests/run_agent/test_context_pressure.py (all 361 lines) Compression itself continues to run silently in the background. Closes #3784	2026-04-16 06:44:23 -07:00
Teknium	0c1217d01e	feat(xai): upgrade to Responses API, add TTS provider Cherry-picked and trimmed from PR #10600 by Jaaneek. - Switch xAI transport from openai_chat to codex_responses (Responses API) - Add codex_responses detection for xAI in all runtime_provider resolution paths - Add xAI api_mode detection in AIAgent.__init__ (provider name + URL auto-detect) - Add extra_headers passthrough for codex_responses requests - Add x-grok-conv-id session header for xAI prompt caching - Add xAI reasoning support (encrypted_content include, no effort param) - Move x-grok-conv-id from chat_completions path to codex_responses path - Add xAI TTS provider (dedicated /v1/tts endpoint with Opus conversion) - Add xAI provider aliases (grok, x-ai, x.ai) across auth, models, providers, auxiliary - Trim xAI model list to agentic models (grok-4.20-reasoning, grok-4-1-fast-reasoning) - Add XAI_API_KEY/XAI_BASE_URL to OPTIONAL_ENV_VARS - Add xAI TTS config section, setup wizard entry, tools_config provider option - Add shared xai_http.py helper for User-Agent string Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com>	2026-04-16 02:24:08 -07:00
nosleepcassette	3c859e35dc	fix: skin spinner faces and verbs not applied at runtime Skins define waiting_faces, thinking_faces, and thinking_verbs in their spinner config, but all 7 call sites in run_agent.py used hardcoded class constants. Add three classmethods on KawaiiSpinner that query the active skin first and fall back to the class constants, matching the existing pattern used for wings/tool_prefix/tool_emojis. Co-authored-by: nosleepcassette <nosleepcassette@users.noreply.github.com>	2026-04-16 02:22:19 -07:00
kshitijk4poor	1b61ec470b	feat: add Ollama Cloud as built-in provider Add ollama-cloud as a first-class provider with full parity to existing API-key providers (gemini, zai, minimax, etc.): - PROVIDER_REGISTRY entry with OLLAMA_API_KEY env var - Provider aliases: ollama -> custom (local), ollama_cloud -> ollama-cloud - models.dev integration for accurate context lengths - URL-to-provider mapping (ollama.com -> ollama-cloud) - Passthrough model normalization (preserves Ollama model:tag format) - Default auxiliary model (nemotron-3-nano:30b) - HermesOverlay in providers.py - CLI --provider choices, CANONICAL_PROVIDERS entry - Dynamic model discovery with disk caching (1hr TTL) - 37 provider-specific tests Cherry-picked from PR #6038 by kshitijk4poor. Closes #3926	2026-04-16 02:22:09 -07:00
Teknium	cc6e8941db	feat(honcho): context injection overhaul, 5-tool surface, cost safety, session isolation (#10619 ) Salvaged from PR #9884 by erosika. Cherry-picked plugin changes onto current main with minimal core modifications. Plugin changes (plugins/memory/honcho/): - New honcho_reasoning tool (5th tool, splits LLM calls from honcho_context) - Two-layer context injection: base context (summary + representation + card) on contextCadence, dialectic supplement on dialecticCadence - Multi-pass dialectic depth (1-3 passes) with early bail-out on strong signal - Cold/warm prompt selection based on session state - dialecticCadence defaults to 3 (was 1) — ~66% fewer Honcho LLM calls - Session summary injection for conversational continuity - Bidirectional peer targeting on all 5 tools - Correctness fixes: peer param fallback, None guard on set_peer_card, schema validation, signal_sufficient anchored regex, mid->medium level fix Core changes (~20 lines across 3 files): - agent/memory_manager.py: Enhanced sanitize_context() to strip full <memory-context> blocks and system notes (prevents leak from saveMessages) - run_agent.py: gateway_session_key param for stable per-chat Honcho sessions, on_turn_start() call before prefetch_all() for cadence tracking, sanitize_context() on user messages to strip leaked memory blocks - gateway/run.py: skip_memory=True on 2 temp agents (prevents orphan sessions), gateway_session_key threading to main agent Tests: 509 passed (3 skipped — honcho SDK not installed locally) Docs: Updated honcho.md, memory-providers.md, tools-reference.md, SKILL.md Co-authored-by: erosika <erosika@users.noreply.github.com>	2026-04-15 19:12:19 -07:00
flobo3	c6398fcaab	fix(prompt): list all supported Telegram markdown formatting	2026-04-15 17:54:13 -07:00
Teknium	4fdcae6c91	fix: use absolute skill_dir for external skills (#10313 ) (#10587 ) _load_skill_payload() reconstructed skill_dir as SKILLS_DIR / relative_path, which is wrong for external skills from skills.external_dirs — they live outside SKILLS_DIR entirely. Scripts and linked files failed to load. Fix: skill_view() now includes the absolute skill_dir in its result dict. _load_skill_payload() uses that directly when available, falling back to the SKILLS_DIR-relative reconstruction only for legacy responses. Closes #10313	2026-04-15 17:22:55 -07:00
Teknium	9d9b424390	fix: Nous Portal rate limit guard — prevent retry amplification (#10568 ) When Nous returns a 429, the retry amplification chain burns up to 9 API requests per conversation turn (3 SDK retries × 3 Hermes retries), each counting against RPH and deepening the rate limit. With multiple concurrent sessions (cron + gateway + auxiliary), this creates a spiral where retries keep the limit tapped indefinitely. New module: agent/nous_rate_guard.py - Shared file-based rate limit state (~/.hermes/rate_limits/nous.json) - Parses reset time from x-ratelimit-reset-requests-1h, x-ratelimit- reset-requests, retry-after headers, or error context - Falls back to 5-minute default cooldown if no header data - Atomic writes (tempfile + rename) for cross-process safety - Auto-cleanup of expired state files run_agent.py changes: - Top-of-retry-loop guard: when another session already recorded Nous as rate-limited, skip the API call entirely. Try fallback provider first, then return a clear message with the reset time. - On 429 from Nous: record rate limit state and skip further retries (sets retry_count = max_retries to trigger fallback path) - On success from Nous: clear the rate limit state so other sessions know they can resume auxiliary_client.py changes: - _try_nous() checks rate guard before attempting Nous in the auxiliary fallback chain. When rate-limited, returns (None, None) so the chain skips to the next provider instead of piling more requests onto Nous. This eliminates three sources of amplification: 1. Hermes-level retries (saves 6 of 9 calls per turn) 2. Cross-session retries (cron + gateway all skip Nous) 3. Auxiliary fallback to Nous (compression/session_search skip too) Includes 24 tests covering the rate guard module, header parsing, state lifecycle, and auxiliary client integration.	2026-04-15 16:31:48 -07:00
JiaDe WU	0cb8c51fa5	feat: native AWS Bedrock provider via Converse API Salvaged from PR #7920 by JiaDe-Wu — cherry-picked Bedrock-specific additions onto current main, skipping stale-branch reverts (293 commits behind). Dual-path architecture: - Claude models → AnthropicBedrock SDK (prompt caching, thinking budgets) - Non-Claude models → Converse API via boto3 (Nova, DeepSeek, Llama, Mistral) Includes: - Core adapter (agent/bedrock_adapter.py, 1098 lines) - Full provider registration (auth, models, providers, config, runtime, main) - IAM credential chain + Bedrock API Key auth modes - Dynamic model discovery via ListFoundationModels + ListInferenceProfiles - Streaming with delta callbacks, error classification, guardrails - hermes doctor + hermes auth integration - /usage pricing for 7 Bedrock models - 130 automated tests (79 unit + 28 integration + follow-up fixes) - Documentation (website/docs/guides/aws-bedrock.md) - boto3 optional dependency (pip install hermes-agent[bedrock]) Co-authored-by: JiaDe WU <40445668+JiaDe-Wu@users.noreply.github.com>	2026-04-15 16:17:17 -07:00
MestreY0d4-Uninter	f4724803b4	fix(runtime): surface malformed proxy env and base URL before client init When proxy env vars (HTTP_PROXY, HTTPS_PROXY, ALL_PROXY) contain malformed URLs — e.g. 'http://127.0.0.1:6153export' from a broken shell config — the OpenAI/httpx client throws a cryptic 'Invalid port' error that doesn't identify the offending variable. Add _validate_proxy_env_urls() and _validate_base_url() in auxiliary_client.py, called from resolve_provider_client() and _create_openai_client() to fail fast with a clear, actionable error message naming the broken env var or URL. Closes #6360 Co-authored-by: MestreY0d4-Uninter <MestreY0d4-Uninter@users.noreply.github.com>	2026-04-15 16:10:53 -07:00
Teknium	ee9c0a3ed0	fix(security): add JWT token and Discord mention redaction (#10547 ) Found via trace data audit: JWT tokens (eyJ...) and Discord snowflake mentions (<@ID>) were passing through unredacted. JWT pattern: matches 1/2/3-part tokens starting with eyJ (base64 for '{'). Zero false-positive risk — no normal text matches eyJ + 10+ base64url chars. Discord pattern: matches <@digits> and <@!digits> with 17-20 digit snowflake IDs. Syntactically unique to Discord's mention format. Both patterns follow the same structural-uniqueness standard as existing prefix patterns (sk-, ghp_, AKIA, etc.).	2026-04-15 16:08:52 -07:00
helix4u	96cc556055	fix(copilot): preserve base URL and gpt-5-mini routing	2026-04-15 15:04:14 -07:00
Teknium	6391b46779	fix: bound auxiliary client cache to prevent fd exhaustion in long-running gateways (#10200 ) (#10470 ) The _client_cache used event loop id() as part of the cache key, so every new worker-thread event loop created a new entry for the same provider config. In long-running gateways where threads are recycled frequently, this caused unbounded cache growth — each stale entry held an unclosed AsyncOpenAI client with its httpx connection pool, eventually exhausting file descriptors. Fix: remove loop_id from the cache key and instead validate on each async cache hit that the cached loop is the current, open loop. If the loop changed or was closed, the stale entry is replaced in-place rather than creating an additional entry. This bounds cache growth to at most one entry per unique provider config. Also adds a _CLIENT_CACHE_MAX_SIZE (64) safety belt with FIFO eviction as defense-in-depth against any remaining unbounded growth. Cross-loop safety is preserved: different event loops still get different client instances (validated by existing test suite). Closes #10200	2026-04-15 13:16:28 -07:00
zhiheng.liu	7cb06e3bb3	refactor(memory): drop on_session_reset — commit-only is enough OV transparently handles message history across /new and /compress: old messages stay in the same session and extraction is idempotent, so there's no need to rebind providers to a new session_id. The only thing the session boundary actually needs is to trigger extraction. - MemoryProvider / MemoryManager: remove on_session_reset hook - OpenViking: remove on_session_reset override (nothing to do) - AIAgent: replace rotate_memory_session with commit_memory_session (just calls on_session_end, no rebind) - cli.py / run_agent.py: single commit_memory_session call at the session boundary before session_id rotates - tests: replace on_session_reset coverage with routing tests for MemoryManager.on_session_end Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-15 11:28:45 -07:00
zhiheng.liu	8275fa597a	refactor(memory): promote on_session_reset to base provider hook Replace hasattr-forked OpenViking-specific paths with a proper base-class hook. Collapse the two agent wrappers into a single rotate_memory_session so callers don't orchestrate commit + rebind themselves. - MemoryProvider: add on_session_reset(new_session_id) as a default no-op - MemoryManager: on_session_reset fans out unconditionally (no hasattr, no builtin skip — base no-op covers it) - OpenViking: rename reset_session -> on_session_reset; drop the explicit POST /api/v1/sessions (OV auto-creates on first message) and the two debug raise_for_status wrappers - AIAgent: collapse commit_memory_session + reinitialize_memory_session into rotate_memory_session(new_sid, messages) - cli.py / run_agent.py: replace hasattr blocks and the split calls with a single unconditional rotate_memory_session call; compression path now passes the real messages list instead of [] - tests: align with on_session_reset, assert reset does NOT POST /sessions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-15 11:28:45 -07:00
zhiheng.liu	7856d304f2	fix(openviking): commit session on /new and context compression The OpenViking memory provider extracts memories when its session is committed (POST /api/v1/sessions/{id}/commit). Before this fix, the CLI had two code paths that changed the active session_id without ever committing the outgoing OpenViking session: 1. /new (new_session() in cli.py) — called flush_memories() to write MEMORY.md, then immediately discarded the old session_id. The accumulated OpenViking session was never committed, so all context from that session was lost before extraction could run. 2. /compress and auto-compress (_compress_context() in run_agent.py) — split the SQLite session (new session_id) but left the OpenViking provider pointing at the old session_id with no commit, meaning all messages synced to OpenViking were silently orphaned. The gateway already handles session commit on /new and /reset via shutdown_memory_provider() on the cached agent; the CLI path did not. Fix: introduce a lightweight session-transition lifecycle alongside the existing full shutdown path: - OpenVikingMemoryProvider.reset_session(new_session_id): waits for in-flight background threads, resets per-session counters, and creates the new OV session via POST /api/v1/sessions — without tearing down the HTTP client (avoids connection overhead on /new). - MemoryManager.restart_session(new_session_id): calls reset_session() on providers that implement it; falls back to initialize() for providers that do not. Skips the builtin provider (no per-session state). - AIAgent.commit_memory_session(messages): wraps memory_manager.on_session_end() without shutdown — commits OV session for extraction but leaves the provider alive for the next session. - AIAgent.reinitialize_memory_session(new_session_id): wraps memory_manager.restart_session() — transitions all external providers to the new session after session_id has been assigned. Call sites: - cli.py new_session(): commit BEFORE session_id changes, reinitialize AFTER — ensuring OV extraction runs on the correct session and the new session is immediately ready for the next turn. - run_agent._compress_context(): same pattern, inside the if self._session_db: block where the session_id split happens. /compress and auto-compress are functionally identical at this layer: both call _compress_context(), so both are fixed by the same change. Tests added to tests/agent/test_memory_provider.py: - TestMemoryManagerRestartSession: reset_session() routing, builtin skip, initialize() fallback, failure tolerance, empty-manager noop. - TestOpenVikingResetSession: session_id update, per-session state clear, POST /api/v1/sessions call, API failure tolerance, no-client noop. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 11:28:45 -07:00
Teknium	722331a57d	fix: replace hardcoded ~/.hermes with display_hermes_home() in agent-facing text (#10285 ) Tool schema descriptions and tool return values contained hardcoded ~/.hermes paths that the model sees and uses. When HERMES_HOME is set to a custom path (Docker containers, profiles), the agent would still reference ~/.hermes — looking at the wrong directory. Fixes 6 locations across 5 files: - tools/tts_tool.py: output_path schema description - tools/cronjob_tools.py: script path schema description - tools/skill_manager_tool.py: skill_manage schema description - tools/skills_tool.py: two tool return messages - agent/skill_commands.py: skill config injection text All now use display_hermes_home() which resolves to the actual HERMES_HOME path (e.g. /opt/data for Docker, ~/.hermes/profiles/X for profiles, ~/.hermes for default). Reported by: Sandeep Narahari (PrithviDevs)	2026-04-15 04:57:55 -07:00
Arihant Sethia	857b543543	feat: add skill analytics to the dashboard Expose skill usage in analytics so the dashboard and insights output can show which skills the agent loads and manages over time. This adds skill aggregation to the InsightsEngine by extracting `skill_view` and `skill_manage` calls from assistant tool_calls, computing per-skill totals, and including the results in both terminal and gateway insights formatting. It also extends the dashboard analytics API and Analytics page to render a Top Skills table. Terminology is aligned with the skills docs: - Agent Loaded = `skill_view` events - Agent Managed = `skill_manage` actions Architecture: - agent/insights.py collects and aggregates per-skill usage - hermes_cli/web_server.py exposes `skills` on `/api/analytics/usage` - web/src/lib/api.ts adds analytics skill response types - web/src/pages/AnalyticsPage.tsx renders the Top Skills table - web/src/i18n/{en,zh}.ts updates user-facing labels Tests: - tests/agent/test_insights.py covers skill aggregation and formatting - tests/hermes_cli/test_web_server.py covers analytics API contract including the `skills` payload - verified with `cd web && npm run build` Files changed: - agent/insights.py - hermes_cli/web_server.py - tests/agent/test_insights.py - tests/hermes_cli/test_web_server.py - web/src/i18n/en.ts - web/src/i18n/types.ts - web/src/i18n/zh.ts - web/src/lib/api.ts - web/src/pages/AnalyticsPage.tsx	2026-04-15 06:44:43 +00:00
Teknium	772cfb6c4e	fix: stale agent timeout, uv venv detection, empty response after tools, compression model fallback (#9051 , #8620 , #9400 ) (#10093 ) Four independent fixes: 1. Reset activity timestamp on cached agent reuse (#9051) When the gateway reuses a cached AIAgent for a new turn, the _last_activity_ts from the previous turn (possibly hours ago) carried over. The inactivity timeout handler immediately saw the agent as idle for hours and killed it. Fix: reset _last_activity_ts, _last_activity_desc, and _api_call_count when retrieving an agent from the cache. 2. Detect uv-managed virtual environments (#8620 sub-issue 1) The systemd unit generator fell back to sys.executable (uv's standalone Python) when running under 'uv run', because sys.prefix == sys.base_prefix. The generated ExecStart pointed to a Python binary without site-packages. Fix: check VIRTUAL_ENV env var before falling back to sys.executable. uv sets VIRTUAL_ENV even when sys.prefix doesn't reflect the venv. 3. Nudge model to continue after empty post-tool response (#9400) Weaker models sometimes return empty after tool calls. The agent silently abandoned the remaining work. Fix: append assistant('(empty)') + user nudge message and retry once. Resets after each successful tool round. 4. Compression model fallback on permanent errors (#8620 sub-issue 4) When the default summary model (gemini-3-flash) returns 503 'model_not_found' on custom proxies, the compressor entered a 600s cooldown, leaving context growing unbounded. Fix: detect permanent model-not-found errors (503, 404, 'model_not_found', 'no available channel') and fall back to using the main model for compression instead of entering cooldown. One-time fallback with immediate retry. Test plan: 40 compressor tests + 97 gateway/CLI tests + 9 venv tests pass	2026-04-14 22:38:17 -07:00
kshitijk4poor	9855190f23	feat(compressor): smart collapse, dedup, anti-thrashing, template upgrade, hardening Combined salvage of PRs #9661, #9663, #9674, #9677, #9678 by kshitijk4poor. - Smart tool output collapse: informative 1-line summaries replace generic placeholder - Dedup identical tool results via MD5 hash, truncate large tool_call arguments - Anti-thrashing: skip compression after 2 consecutive <10% savings passes - Structured action-log summary template with numbered actions and Active State - Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown Follow-up fixes applied during salvage: - web_extract: reads 'urls' (list) not 'url' (original PR bug) - Multimodal list content guards in dedup and prune passes - Kept 'Relevant Files' section in template (original PR removed it) Skipped PRs #9665 (user msg preservation — duplication risk) and #9675 (dead code).	2026-04-14 22:21:25 -07:00
Julien Talbot	3b50821555	feat(xai): add xAI/Grok to provider prefix stripping Add 'xai', 'x-ai', 'x.ai', 'grok' to _PROVIDER_PREFIXES so that colon-prefixed model names (e.g. xai:grok-4.20) are stripped correctly for context length lookups. Cherry-picked from PR #9184 by @Julientalbot.	2026-04-14 16:43:42 -07:00
Teknium	6448e1da23	feat(zai): add GLM-5V-Turbo support for coding plan (#9907 ) - Add glm-5v-turbo to OpenRouter, Nous, and native Z.AI model lists - Add glm-5v context length entry (200K tokens) to model metadata - Update Z.AI endpoint probe to try multiple candidate models per endpoint (glm-5.1, glm-5v-turbo, glm-4.7) — fixes detection for newer coding plan accounts that lack older models - Add zai to _PROVIDER_VISION_MODELS so auxiliary vision tasks (vision_analyze, browser screenshots) route through 5v Fixes #9888	2026-04-14 16:26:01 -07:00
Teknium	a37a095980	fix: detect qwen-oauth provider via CLI tokens in /model picker Seed qwen-oauth credentials from resolve_qwen_runtime_credentials() in _seed_from_singletons(). Users who authenticate via 'qwen auth qwen-oauth' store tokens in ~/.qwen/oauth_creds.json which the runtime resolver reads but the credential pool couldn't detect — same gap pattern as copilot. Uses refresh_if_expiring=False to avoid network calls during discovery.	2026-04-14 11:16:26 -07:00
Marvae	0bd3f521ae	fix: detect copilot provider via gh auth token in /model picker Seed copilot credentials from resolve_copilot_token() in the credential pool's _seed_from_singletons(), alongside the existing anthropic and openai-codex seeding logic. This makes copilot appear in the /model provider picker when the user authenticates solely through gh auth token. Cherry-picked from PR #9767 by Marvae.	2026-04-14 11:16:26 -07:00
N0nb0at	b21b3bfd68	feat(plugins): namespaced skill registration for plugin skill bundles Add ctx.register_skill() API so plugins can ship SKILL.md files under a 'plugin:skill' namespace, preventing name collisions with built-in Hermes skills. skill_view() detects the ':' separator and routes to the plugin registry while bare names continue through the existing flat-tree scan unchanged. Key additions: - agent/skill_utils: parse_qualified_name(), is_valid_namespace() - hermes_cli/plugins: PluginContext.register_skill(), PluginManager skill registry (find/list/remove) - tools/skills_tool: qualified name dispatch in skill_view(), _serve_plugin_skill() with full guards (disabled, platform, injection scan), bundle context banner with sibling listing, stale registry self-heal - Hoisted _INJECTION_PATTERNS to module level (dedup) - Updated skill_view schema description Based on PR #9334 by N0nb0at. Lean P1 salvage — omits autogen shim (P2) for a simpler first merge. Closes #8422	2026-04-14 10:42:58 -07:00
walli	884cd920d4	feat(gateway): unify QQBot branding, add PLATFORM_HINTS, fix streaming, restore missing setup functions - Rename platform from 'qq' to 'qqbot' across all integration points (Platform enum, toolset, config keys, import paths, file rename qq.py → qqbot.py) - Add PLATFORM_HINTS for QQBot in prompt_builder (QQ supports markdown) - Set SUPPORTS_MESSAGE_EDITING = False to skip streaming on QQ (prevents duplicate messages from non-editable partial + final sends) - Add _send_qqbot() standalone send function for cron/send_message tool - Add interactive _setup_qq() wizard in hermes_cli/setup.py - Restore missing _setup_signal/email/sms/dingtalk/feishu/wecom/wecom_callback functions that were lost during the original merge	2026-04-14 00:11:49 -07:00
Kenny Xie	cdd44817f2	fix(anthropic): send fast mode speed via extra_body	2026-04-13 22:32:39 -07:00
Teknium	943c01536f	feat: add openrouter/elephant-alpha to curated model lists (#9378 ) * Add hermes debug share instructions to all issue templates - bug_report.yml: Add required Debug Report section with hermes debug share and /debug instructions, make OS/Python/Hermes version optional (covered by debug report), demote old logs field to optional supplementary - setup_help.yml: Replace hermes doctor reference with hermes debug share, add Debug Report section with fallback chain (debug share -> --local -> doctor) - feature_request.yml: Add optional Debug Report section for environment context All templates now guide users to run hermes debug share (or /debug in chat) and paste the resulting paste.rs links, giving maintainers system info, config, and recent logs in one step. * feat: add openrouter/elephant-alpha to curated model lists - Add to OPENROUTER_MODELS (free, positioned above GPT models) - Add to _PROVIDER_MODELS["nous"] mirror list - Add 256K context window fallback in model_metadata.py	2026-04-13 21:16:14 -07:00
Teknium	d15efc9c1b	fix: correct GPT-5 family context lengths in fallback defaults (#9309 ) The generic 'gpt-5' fallback was set to 128,000 — which is the max OUTPUT tokens, not the context window. GPT-5 base and most variants (codex, mini) have 400,000 context. This caused /model to report 128k for models like gpt-5.3-codex when models.dev was unavailable. Added specific entries for GPT-5 variants with different context sizes: - gpt-5.4, gpt-5.4-pro: 1,050,000 (1.05M) - gpt-5.4-mini, gpt-5.4-nano: 400,000 - gpt-5.3-codex-spark: 128,000 (reduced) - gpt-5.1-chat: 128,000 (chat variant) - gpt-5 (catch-all): 400,000 Sources: https://developers.openai.com/api/docs/models	2026-04-13 19:22:23 -07:00
Teknium	f324222b79	fix: add vLLM/local server error patterns + MCP initial connection retry (#9281 ) Port two improvements inspired by Kilo-Org/kilocode analysis: 1. Error classifier: add context overflow patterns for vLLM, Ollama, and llama.cpp/llama-server. These local inference servers return different error formats than cloud providers (e.g., 'exceeds the max_model_len', 'context length exceeded', 'slot context'). Without these patterns, context overflow errors from local servers are misclassified as format errors, causing infinite retries instead of triggering compression. 2. MCP initial connection retry: previously, if the very first connection attempt to an MCP server failed (e.g., transient DNS blip at startup), the server was permanently marked as failed with no retry. Post-connect reconnection had 5 retries with exponential backoff, but initial connection had zero. Now initial connections retry up to 3 times with backoff before giving up, matching the resilience of post-connect reconnection. (Inspired by Kilo Code's MCP server disappearing fix in v1.3.3) Tests: 6 new error classifier tests, 4 new MCP retry tests, 1 updated existing test. All 276 affected tests pass.	2026-04-13 18:46:14 -07:00
arthurbr11	0a4cf5b3e1	feat(providers): add Arcee AI as direct API provider Adds Arcee AI as a standard direct provider (ARCEEAI_API_KEY) with Trinity models: trinity-large-thinking, trinity-large-preview, trinity-mini. Standard OpenAI-compatible provider checklist: auth.py, config.py, models.py, main.py, providers.py, doctor.py, model_normalize.py, model_metadata.py, setup.py, trajectory_compressor.py. Based on PR #9274 by arthurbr11, simplified to a standard direct provider without dual-endpoint OpenRouter routing.	2026-04-13 18:40:06 -07:00
Teknium	8d023e43ed	refactor: remove dead code — 1,784 lines across 77 files (#9180 ) Deep scan with vulture, pyflakes, and manual cross-referencing identified: - 41 dead functions/methods (zero callers in production) - 7 production-dead functions (only test callers, tests deleted) - 5 dead constants/variables - ~35 unused imports across agent/, hermes_cli/, tools/, gateway/ Categories of dead code removed: - Refactoring leftovers: _set_default_model, _setup_copilot_reasoning_selection, rebuild_lookups, clear_session_context, get_logs_dir, clear_session - Unused API surface: search_models_dev, get_pricing, skills_categories, get_read_files_summary, clear_read_tracker, menu_labels, get_spinner_list - Dead compatibility wrappers: schedule_cronjob, list_cronjobs, remove_cronjob - Stale debug helpers: get_debug_session_info copies in 4 tool files (centralized version in debug_helpers.py already exists) - Dead gateway methods: send_emote, send_notice (matrix), send_reaction (bluebubbles), _normalize_inbound_text (feishu), fetch_room_history (matrix), _start_typing_indicator (signal), parse_feishu_post_content - Dead constants: NOUS_API_BASE_URL, SKILLS_TOOL_DESCRIPTION, FILE_TOOLS, VALID_ASPECT_RATIOS, MEMORY_DIR - Unused UI code: _interactive_provider_selection, _interactive_model_selection (superseded by prompt_toolkit picker) Test suite verified: 609 tests covering affected files all pass. Tests for removed functions deleted. Tests using removed utilities (clear_read_tracker, MEMORY_DIR) updated to use internal APIs directly.	2026-04-13 16:32:04 -07:00
Teknium	b27eaaa4db	fix: improve ACP type check and restore comment accuracy - Use isinstance() with try/except import for CopilotACPClient check in _to_async_client instead of fragile __class__.__name__ string check - Restore accurate comment: GPT-5.x models require (not 'often require') the Responses API on OpenAI/OpenRouter; ACP is the exception, not a softening of the requirement - Add inline comment explaining the ACP exclusion rationale	2026-04-13 16:17:43 -07:00
helix4u	8680f61f8b	fix(copilot-acp): keep acp runtime off responses path	2026-04-13 16:17:43 -07:00
Teknium	0e60a9dc25	fix: add kimi-coding-cn to remaining provider touchpoints Follow-up for salvaged PR #7637. Adds kimi-coding-cn to: - model_normalize.py (prefix strip) - providers.py (models.dev mapping) - runtime_provider.py (credential resolution) - setup.py (model list + setup label) - doctor.py (health check) - trajectory_compressor.py (URL detection) - models_dev.py (registry mapping) - integrations/providers.md (docs)	2026-04-13 11:20:37 -07:00
hcshen0111	2b3aa36242	feat(providers): add kimi-coding-cn provider for mainland China users Cherry-picked from PR #7637 by hcshen0111. Adds kimi-coding-cn provider with dedicated KIMI_CN_API_KEY env var and api.moonshot.cn/v1 endpoint for China-region Moonshot users.	2026-04-13 11:20:37 -07:00
墨綠BG	c449cd1af5	fix(config): restore custom providers after v11→v12 migration The v11→v12 migration converts custom_providers (list) into providers (dict), then deletes the list. But all runtime resolvers read from custom_providers — after migration, named custom endpoints silently stop resolving and fallback chains fail with AuthError. Add get_compatible_custom_providers() that reads from both config schemas (legacy custom_providers list + v12+ providers dict), normalizes entries, deduplicates, and returns a unified list. Update ALL consumers: - hermes_cli/runtime_provider.py: _get_named_custom_provider() + key_env - hermes_cli/auth_commands.py: credential pool provider names - hermes_cli/main.py: model picker + _model_flow_named_custom() - agent/auxiliary_client.py: key_env + custom_entry model fallback - agent/credential_pool.py: _iter_custom_providers() - cli.py + gateway/run.py: /model switch custom_providers passthrough - run_agent.py + gateway/run.py: per-model context_length lookup Also: use config.pop() instead of del for safer migration, fix stale _config_version assertions in tests, add pool mock to codex test. Co-authored-by: 墨綠BG <s5460703@gmail.com> Closes #8776, salvaged from PR #8814	2026-04-13 10:50:52 -07:00
luyao618	8ec1608642	fix(agent): propagate api_mode to vision provider resolution resolve_vision_provider_client() computed resolved_api_mode from config but never passed it to downstream resolve_provider_client() or _get_cached_client() calls, causing custom providers with api_mode: anthropic_messages to crash when used for vision tasks. Also remove the for_vision special case in _normalize_aux_provider() that incorrectly discarded named custom provider identifiers. Fixes #8857 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 05:02:54 -07:00
Teknium	e3ffe5b75f	fix: remove legacy compression.summary_* config and env var fallbacks (#8992 ) Remove the backward-compat code paths that read compression provider/model settings from legacy config keys and env vars, which caused silent failures when auto-detection resolved to incompatible backends. What changed: - Remove compression.summary_model, summary_provider, summary_base_url from DEFAULT_CONFIG and cli.py defaults - Remove backward-compat block in _resolve_task_provider_model() that read from the legacy compression section - Remove _get_auxiliary_provider() and _get_auxiliary_env_override() helper functions (AUXILIARY_/CONTEXT_ env var readers) - Remove env var fallback chain for per-task overrides - Update hermes config show to read from auxiliary.compression - Add config migration (v16→17) that moves non-empty legacy values to auxiliary.compression and strips the old keys - Update example config and openclaw migration script - Remove/update tests for deleted code paths Compression model/provider is now configured exclusively via: auxiliary.compression.provider / auxiliary.compression.model Closes #8923	2026-04-13 04:59:26 -07:00
Richard Li	82901695ff	feat(wecom): add platform hint for native media sending	2026-04-13 04:46:04 -07:00
ismell0992-afk	3e99964789	fix(agent): prefer Ollama Modelfile num_ctx over GGUF training max _query_local_context_length was checking model_info.context_length (the GGUF training max) before num_ctx (the Modelfile runtime override), inverse to query_ollama_num_ctx. The two helpers therefore disagreed on the same model: hermes-brain:qwen3-14b-ctx32k # Modelfile: num_ctx 32768 underlying qwen3:14b GGUF # qwen3.context_length: 40960 query_ollama_num_ctx correctly returned 32768 (the value Ollama will actually allocate KV cache for). _query_local_context_length returned 40960, which let ContextCompressor grow conversations past 32768 before triggering compression — at which point Ollama silently truncated the prefix, corrupting context. Swap the order so num_ctx is checked first, matching query_ollama_num_ctx. Adds a parametrized test that seeds both values and asserts num_ctx wins. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-13 04:24:07 -07:00
Teknium	400fe9b2a1	fix: add <thought> stripping to auxiliary_client + tests auxiliary_client.py had its own regex mirroring _strip_think_blocks but was missing the <thought> variant. Also adds test coverage for <thought> paired and orphaned tags.	2026-04-12 12:44:49 -07:00
Teknium	7a67b13506	fix: title_generator no longer logs as 'compression' task Changed task='compression' to task='title_generation' so auto-title calls don't pollute logs with false compression alarms.	2026-04-12 04:17:18 -07:00
Teknium	17c72f176d	fix: make skill loading instructions more aggressive in system prompt (#8286 ) The previous wording ('If one clearly matches') set too high a threshold, and 'If none match, proceed normally' was an easy escape hatch for lazy models. Now: - Lowered threshold: 'matches or is even partially relevant' - Added MUST directive and 'err on the side of loading' guidance - Replaced permissive closer with 'only proceed without if genuinely none are relevant' This should reduce cases where the agent skips loading relevant skills unless explicitly forced.	2026-04-12 03:03:16 -07:00
Teknium	b321330362	feat: add WSL environment hint to system prompt (#8285 ) When running inside WSL (Windows Subsystem for Linux), inject a hint into the system prompt explaining that the Windows host filesystem is mounted at /mnt/c/, /mnt/d/, etc. This lets the agent naturally translate Windows paths (Desktop, Documents) to their /mnt/ equivalents without the user needing to configure anything. Uses the existing is_wsl() detection from hermes_constants (cached, checks /proc/version for 'microsoft'). Adds build_environment_hints() in prompt_builder.py — extensible for Termux, Docker, etc. later. Closes the UX gap where WSL users had to manually explain path translation to the agent every session.	2026-04-12 02:26:28 -07:00
Teknium	95fa78eb6c	fix: write refreshed Codex tokens back to ~/.codex/auth.json (#8277 ) OpenAI OAuth refresh tokens are single-use and rotate on every refresh. When Hermes refreshes a Codex token, it consumed the old refresh_token but never wrote the new pair back to ~/.codex/auth.json. This caused Codex CLI and VS Code to fail with 'refresh_token_reused' on their next refresh attempt. This mirrors the existing Anthropic write-back pattern where refreshed tokens are written to ~/.claude/.credentials.json via _write_claude_code_credentials(). Changes: - Add _write_codex_cli_tokens() in hermes_cli/auth.py (parallel to _write_claude_code_credentials in anthropic_adapter.py) - Call it from _refresh_codex_auth_tokens() (non-pool refresh path) - Call it from credential_pool._refresh_entry() (pool happy path + retry) - Add tests for the new write-back behavior - Update existing test docstring to clarify _save_codex_tokens vs _write_codex_cli_tokens separation Fixes refresh token conflict reported by @ec12edfae2cb221	2026-04-12 02:05:20 -07:00
Teknium	a1220977d3	fix: make skill loading instructions more aggressive in system prompt (#8209 ) The previous wording ('If one clearly matches') set too high a threshold, and 'If none match, proceed normally' was an easy escape hatch for lazy models. Now: - Lowered threshold: 'matches or is even partially relevant' - Added MUST directive and 'err on the side of loading' guidance - Replaced permissive closer with 'only proceed without if genuinely none are relevant' This should reduce cases where the agent skips loading relevant skills unless explicitly forced.	2026-04-12 01:46:34 -07:00
Teknium	078dba015d	fix: three provider-related bugs (#8161 , #8181 , #8147 ) (#8243 ) - Add openai/openai-codex -> openai mapping to PROVIDER_TO_MODELS_DEV so context-length lookups use models.dev data instead of 128k fallback. Fixes #8161. - Set api_mode from custom_providers entry when switching via hermes model, and clear stale api_mode when the entry has none. Also extract api_mode in _named_custom_provider_map(). Fixes #8181. - Convert OpenAI image_url content blocks to Anthropic image blocks when the endpoint is Anthropic-compatible (MiniMax, MiniMax-CN, or any URL containing /anthropic). Fixes #8147.	2026-04-12 01:44:18 -07:00
Harish Kukreja	b1f13a8c5f	fix(agent): route compression aux through live session runtime	2026-04-12 01:34:52 -07:00
Teknium	eb2a49f95a	fix: openai-codex and anthropic not appearing in /model picker for external credentials (#8224 ) Users whose credentials exist only in external files — OpenAI Codex OAuth tokens in ~/.codex/auth.json or Anthropic Claude Code credentials in ~/.claude/.credentials.json — would not see those providers in the /model picker, even though hermes auth and hermes model detected them. Root cause: list_authenticated_providers() only checked the raw Hermes auth store and env vars. External credential file fallbacks (Codex CLI import, Claude Code file discovery) were never triggered. Fix (three parts): 1. _seed_from_singletons() in credential_pool.py: openai-codex now imports from ~/.codex/auth.json when the Hermes auth store is empty, mirroring resolve_codex_runtime_credentials(). 2. list_authenticated_providers() in model_switch.py: auth store + pool checks now run for ALL providers (not just OAuth auth_type), catching providers like anthropic that support both API key and OAuth. 3. list_authenticated_providers(): direct check for anthropic external credential files (Claude Code, Hermes PKCE). The credential pool intentionally gates anthropic behind is_provider_explicitly_configured() to prevent auxiliary tasks from silently consuming tokens. The /model picker bypasses this gate since it is discovery-oriented.	2026-04-12 00:33:42 -07:00
Teknium	1cec910b6a	fix: improve context compaction to prevent model answering stale questions (#8107 ) After compression, models (especially Kimi 2.5) would sometimes respond to questions from the summary instead of the latest user message. This happened ~30% of the time on Telegram. Root cause: the summary's 'Next Steps' section read as active instructions, and the SUMMARY_PREFIX didn't explicitly tell the model to ignore questions in the summary. When the summary merged into the first tail message, there was no clear separator between historical context and the actual user message. Changes inspired by competitor analysis (Claude Code, OpenCode, Codex): 1. SUMMARY_PREFIX rewritten with explicit 'Do NOT answer questions from this summary — respond ONLY to the latest user message AFTER it' 2. Summarizer preamble (shared by both prompts) adds: - 'Do NOT respond to any questions' (from OpenCode's approach) - 'Different assistant' framing (from Codex) to create psychological distance between summary content and active conversation 3. New summary sections: - '## Resolved Questions' — tracks already-answered questions with their answers, preventing re-answering (from Claude Code's 'Pending user asks' pattern) - '## Pending User Asks' — explicitly marks unanswered questions - '## Remaining Work' replaces '## Next Steps' — passive framing avoids reading as active instructions 4. merge-summary-into-tail path now inserts a clear separator: '--- END OF CONTEXT SUMMARY — respond to the message below ---' 5. Iterative update prompt now instructs: 'Move answered questions to Resolved Questions' to maintain the resolved/pending distinction across multiple compactions.	2026-04-11 19:43:58 -07:00
Teknium	a0a02c1bc0	feat: /compress <focus> — guided compression with focus topic (#8017 ) Adds an optional focus topic to /compress: `/compress database schema` guides the summariser to preserve information related to the focus topic (60-70% of summary budget) while compressing everything else more aggressively. Inspired by Claude Code's /compact <focus>. Changes: - context_compressor.py: focus_topic parameter on _generate_summary() and compress(); appends FOCUS TOPIC guidance block to the LLM prompt - run_agent.py: focus_topic parameter on _compress_context(), passed through to the compressor - cli.py: _manual_compress() extracts focus topic from command string, preserves existing manual_compression_feedback integration (no regression) - gateway/run.py: _handle_compress_command() extracts focus from event args and passes through — full gateway parity - commands.py: args_hint="[focus topic]" on /compress CommandDef Salvaged from PR #7459 (CLI /compress focus only — /context command deferred). 15 new tests across CLI, compressor, and gateway.	2026-04-11 19:23:29 -07:00
Teknium	5c2ecdec49	fix: use ceiling division for token estimation, deduplicate inline formula Switch estimate_tokens_rough(), estimate_messages_tokens_rough(), and estimate_request_tokens_rough() from floor division (len // 4) to ceiling division ((len + 3) // 4). Short texts (1-3 chars) previously estimated as 0 tokens, causing the compressor and pre-flight checks to systematically undercount when many short tool results are present. Also replaced the inline duplicate formula in run_conversation() (total_chars // 4) with a call to the shared estimate_messages_tokens_rough() function. Updated 4 tests that hardcoded floor-division expected values. Related: issue #6217, PR #6629	2026-04-11 16:33:40 -07:00
Teknium	c8aff74632	fix: prevent agent from stopping mid-task — compression floor, budget overhaul, activity tracking Three root causes of the 'agent stops mid-task' gateway bug: 1. Compression threshold floor (64K tokens minimum) - The 50% threshold on a 100K-context model fired at 50K tokens, causing premature compression that made models lose track of multi-step plans. Now threshold_tokens = max(50% * context, 64K). - Models with <64K context are rejected at startup with a clear error. 2. Budget warning removal — grace call instead - Removed the 70%/90% iteration budget warnings entirely. These injected '[BUDGET WARNING: Provide your final response NOW]' into tool results, causing models to abandon complex tasks prematurely. - Now: no warnings during normal execution. When the budget is actually exhausted (90/90), inject a user message asking the model to summarise, allow one grace API call, and only then fall back to _handle_max_iterations. 3. Activity touches during long terminal execution - _wait_for_process polls every 0.2s but never reported activity. The gateway's inactivity timeout (default 1800s) would fire during long-running commands that appeared 'idle.' - Now: thread-local activity callback fires every 10s during the poll loop, keeping the gateway's activity tracker alive. - Agent wires _touch_activity into the callback before each tool call. Also: docs update noting 64K minimum context requirement. Closes #7915 (root cause was agent-loop termination, not Weixin delivery limits).	2026-04-11 16:18:57 -07:00
Teknium	8c3935ebe8	fix: is_local_endpoint misses Docker/Podman DNS names (#7950 ) * fix(tools): neutralize shell injection in _write_to_sandbox via path quoting _write_to_sandbox interpolated storage_dir and remote_path directly into a shell command passed to env.execute(). Paths containing shell metacharacters (spaces, semicolons, $(), backticks) could trigger arbitrary command execution inside the sandbox. Fix: wrap both paths with shlex.quote(). Clean paths (alphanumeric + slashes/hyphens/dots) are left unmodified by shlex.quote, so existing behavior is unchanged. Paths with unsafe characters get single-quoted. Tests added for spaces, $(command) substitution, and semicolon injection. * fix: is_local_endpoint misses Docker/Podman DNS names host.docker.internal, host.containers.internal, gateway.docker.internal, and host.lima.internal are well-known DNS names that container runtimes use to resolve the host machine. Users running Ollama on the host with the agent in Docker/Podman hit the default 120s stream timeout instead of the bumped 1800s because these hostnames weren't recognized as local. Add _CONTAINER_LOCAL_SUFFIXES tuple and suffix check in is_local_endpoint(). Tests cover all three runtime families plus a negative case for domains that merely contain the suffix as a substring.	2026-04-11 14:46:18 -07:00
Teknium	04c1c5d53f	refactor: extract shared helpers to deduplicate repeated code patterns (#7917 ) * refactor: add shared helper modules for code deduplication New modules: - gateway/platforms/helpers.py: MessageDeduplicator, TextBatchAggregator, strip_markdown, ThreadParticipationTracker, redact_phone - hermes_cli/cli_output.py: print_info/success/warning/error, prompt helpers - tools/path_security.py: validate_within_dir, has_traversal_component - utils.py additions: safe_json_loads, read_json_file, read_jsonl, append_jsonl, env_str/lower/int/bool helpers - hermes_constants.py additions: get_config_path, get_skills_dir, get_logs_dir, get_env_path * refactor: migrate gateway adapters to shared helpers - MessageDeduplicator: discord, slack, dingtalk, wecom, weixin, mattermost - strip_markdown: bluebubbles, feishu, sms - redact_phone: sms, signal - ThreadParticipationTracker: discord, matrix - _acquire/_release_platform_lock: telegram, discord, slack, whatsapp, signal, weixin Net -316 lines across 19 files. * refactor: migrate CLI modules to shared helpers - tools_config.py: use cli_output print/prompt + curses_radiolist (-117 lines) - setup.py: use cli_output print helpers + curses_radiolist (-101 lines) - mcp_config.py: use cli_output prompt (-15 lines) - memory_setup.py: use curses_radiolist (-86 lines) Net -263 lines across 5 files. * refactor: migrate to shared utility helpers - safe_json_loads: agent/display.py (4 sites) - get_config_path: skill_utils.py, hermes_logging.py, hermes_time.py - get_skills_dir: skill_utils.py, prompt_builder.py - Token estimation dedup: skills_tool.py imports from model_metadata - Path security: skills_tool, cronjob_tools, skill_manager_tool, credential_files - Non-atomic YAML writes: doctor.py, config.py now use atomic_yaml_write - Platform dict: new platforms.py, skills_config + tools_config derive from it - Anthropic key: new get_anthropic_key() in auth.py, used by doctor/status/config/main * test: update tests for shared helper migrations - test_dingtalk: use _dedup.is_duplicate() instead of _is_duplicate() - test_mattermost: use _dedup instead of _seen_posts/_prune_seen - test_signal: import redact_phone from helpers instead of signal - test_discord_connect: _platform_lock_identity instead of _token_lock_identity - test_telegram_conflict: updated lock error message format - test_skill_manager_tool: 'escapes' instead of 'boundary' in error msgs	2026-04-11 13:59:52 -07:00
Teknium	976bad5bde	refactor(auxiliary): config.yaml takes priority over env vars for aux task settings (#7889 ) The auxiliary client previously checked env vars (AUXILIARY_{TASK}_PROVIDER, AUXILIARY_{TASK}_MODEL, etc.) before config.yaml's auxiliary.{task}.* section. This violated the project's '.env is for secrets only' policy — these are behavioral settings, not API keys. Flipped the resolution order in _resolve_task_provider_model(): 1. Explicit args (always win) 2. config.yaml auxiliary.{task}.* (PRIMARY) 3. Env var overrides (backward-compat fallback only) 4. 'auto' (full auto-detection chain) Env var reading code is kept for backward compatibility but config.yaml now takes precedence. Updated module docstring and function docstring. Also removed AUXILIARY_VISION_MODEL from _EXTRA_ENV_KEYS in config.py.	2026-04-11 11:21:59 -07:00
Teknium	d4bb44d4b9	docs: add Xiaomi MiMo to all provider docs + fix MiMo-V2-Flash ctx len - environment-variables.md: XIAOMI_API_KEY, XIAOMI_BASE_URL, provider list - cli-commands.md: --provider choices - integrations/providers.md: provider table, Chinese providers section, config example, base URL list, choosing table, fallback providers list - fallback-providers.md: supported providers table, auto-detection chain - Fix XiaomiMiMo/MiMo-V2-Flash context length 32768 → 256000 (OpenRouter entry)	2026-04-11 11:17:52 -07:00
kshitijk4poor	6693e2a497	feat(xiaomi): add Xiaomi MiMo as first-class provider Cherry-picked from PR #7702 by kshitijk4poor. Adds Xiaomi MiMo as a direct provider (XIAOMI_API_KEY) with models: - mimo-v2-pro (1M context), mimo-v2-omni (256K, multimodal), mimo-v2-flash (256K, cheapest) Standard OpenAI-compatible provider checklist: auth.py, config.py, models.py, main.py, providers.py, doctor.py, model_normalize.py, model_metadata.py, models_dev.py, auxiliary_client.py, .env.example, cli-config.yaml.example. Follow-up: vision tasks use mimo-v2-omni (multimodal) instead of the user's main model. Non-vision aux uses the user's selected model. Added _PROVIDER_VISION_MODELS dict for provider-specific vision model overrides. On failure, falls back to aggregators (gemini flash) via existing fallback chain. Corrects pre-existing context lengths: mimo-v2-pro 1048576→1000000, mimo-v2-omni 1048576→256000, adds mimo-v2-flash 256000. 36 tests covering registry, aliases, auto-detect, credentials, models.dev, normalization, URL mapping, providers module, doctor, aux client, vision model override, and agent init.	2026-04-11 11:17:52 -07:00
kshitijk4poor	50bb4fe010	fix(vision): auto-resize oversized images, increase default timeout, fix vision capability detection Cherry-picked from PR #7749 by kshitijk4poor with modifications: - Raise hard image limit from 5 MB to 20 MB (matches most restrictive provider) - Send images at full resolution first; only auto-resize to 5 MB on API failure - Add _is_image_size_error() helper to detect size-related API rejections - Auto-resize uses Pillow (soft dep) with progressive downscale + JPEG quality reduction - Fix get_model_capabilities() to check modalities.input for vision support - Increase default vision timeout from 30s to 120s (matches hardcoded fallback intent) - Applied retry-with-resize to both vision_analyze_tool and browser_vision Closes #7740	2026-04-11 11:12:50 -07:00
kshitijk4poor	af9caec44f	fix(qwen): correct context lengths for qwen3-coder models and send max_tokens to portal Based on PR #7285 by @kshitijk4poor. Two bugs affecting Qwen OAuth users: 1. Wrong context window — qwen3-coder-plus showed 128K instead of 1M. Added specific entries before the generic qwen catch-all: - qwen3-coder-plus: 1,000,000 (corrected from PR's 1,048,576 per official Alibaba Cloud docs and OpenRouter) - qwen3-coder: 262,144 2. Random stopping — max_tokens was suppressed for Qwen Portal, so the server applied its own low default. Reasoning models exhaust that on thinking tokens. Now: honor explicit max_tokens, default to 65536 when unset. Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>	2026-04-11 03:29:31 -07:00
aaronagent	307697688e	fix: prevent zombie processes, redact cron stderr, skip symlinks in skill enumeration process_registry.py: _reader_loop() has process.wait() after the try-except block (line 380). If the reader thread crashes with an unexpected exception (e.g. MemoryError, KeyboardInterrupt), control exits the except handler but skips wait() — leaving the child as a zombie process. Move wait() and the cleanup into a finally block so the child is always reaped. cron/scheduler.py: _run_job_script() only redacts secrets in stdout on the SUCCESS path (line 417-421). When a cron script fails (non-zero exit), both stdout and stderr are returned WITHOUT redaction (lines 407-413). A script that accidentally prints an API key to stderr during a failure would leak it into the LLM context. Move redaction before the success/failure branch so both paths benefit. skill_commands.py: _build_skill_message() enumerates supporting files using rglob("*") but only checks is_file() (line 171) without filtering symlinks. PR #6693 added symlink protection to scan_skill_commands() but missed this function. A malicious skill can create symlinks in references/ pointing to arbitrary files, exposing their paths (and potentially content via skill_view) to the LLM. Add is_symlink() check to match the guard in scan_skill_commands. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 02:03:20 -07:00
kshitijk4poor	c89719ad9c	fix: warn and clear stale OPENAI_BASE_URL on provider switch (#5161 )	2026-04-11 01:52:58 -07:00
kshitijk4poor	d3c5d65563	fix(auxiliary): validate response shape in call_llm/async_call_llm (#7264 ) async_call_llm (and call_llm) can return non-OpenAI objects from custom providers or adapter shims, crashing downstream consumers with misleading AttributeError ('str' has no attribute 'choices'). Add _validate_llm_response() that checks the response has the expected .choices[0].message shape before returning. Wraps all return paths in call_llm, async_call_llm, and fallback paths. Fails fast with a clear RuntimeError identifying the task, response type, and a preview of the malformed payload. Closes #7264	2026-04-11 01:52:58 -07:00
ran	4f5e8b22a7	fix: drop incompatible model slugs on auxiliary client cache hit `resolve_provider_client()` already drops OpenRouter-format model slugs (containing "/") when the resolved provider is not OpenRouter (line 1097). However, `_get_cached_client()` returns `model or cached_default` directly on cache hits, bypassing this check entirely. When the main provider is openai-codex, the auto-detection chain (Step 1 of `_resolve_auto`) caches a CodexAuxiliaryClient. Subsequent auxiliary calls for different tasks (e.g. compression with `summary_model: google/gemini-3-flash-preview`) hit the cache and pass the OpenRouter- format model slug straight to the Codex Responses API, which does not understand it and returns an empty `response.output`. This causes two user-visible failures: - "Invalid API response shape" (empty output after 3 retries) - "Context length exceeded, cannot compress further" (compression itself fails through the same path) Add `_compat_model()` helper that mirrors the "/" check from `resolve_provider_client()` and call it on the cache-hit return path.	2026-04-11 01:52:58 -07:00
kshitijk4poor	eeb8b4b00f	fix(auxiliary): harden fallback behavior for non-OpenRouter users Four fixes to auxiliary_client.py: 1. Respect explicit provider as hard constraint (#7559) When auxiliary.{task}.provider is explicitly set (not 'auto'), connection/payment errors no longer silently fallback to cloud providers. Local-only users (Ollama, vLLM) will no longer get unexpected OpenRouter billing from auxiliary tasks. 2. Eliminate model='default' sentinel (#7512) _resolve_api_key_provider() no longer sends literal 'default' as model name to APIs. Providers without a known aux model in _API_KEY_PROVIDER_AUX_MODELS are skipped instead of producing model_not_supported errors. 3. Add payment/connection fallback to async_call_llm (#7512) async_call_llm now mirrors sync call_llm's fallback logic for payment (402) and connection errors. Previously, async consumers (session_search, web_tools, vision) got hard failures with no recovery. Also fixes hardcoded 'openrouter' fallback to use the full auto-detection chain. 4. Use accurate error reason in fallback logs (#7512) _try_payment_fallback() now accepts a reason parameter and uses it in log messages. Connection timeouts are no longer misleadingly logged as 'payment error'. Closes #7559 Closes #7512	2026-04-11 01:52:58 -07:00
kshitijk4poor	ffbd80f5fc	fix(auxiliary): honor api_mode in auxiliary client (#6800 ) The auxiliary client always calls client.chat.completions.create(), ignoring the api_mode config flag. This breaks codex-family models (e.g. gpt-5.3-codex) on direct OpenAI API keys, which need the /v1/responses endpoint. Changes: - Expand _resolve_task_provider_model to return api_mode (5-tuple) - Read api_mode from auxiliary.{task}.api_mode config and env vars (AUXILIARY_{TASK}_API_MODE) - Pass api_mode through _get_cached_client to resolve_provider_client - Add _needs_codex_wrap/_wrap_if_needed helpers that wrap plain OpenAI clients in CodexAuxiliaryClient when api_mode=codex_responses or when auto-detection finds api.openai.com + codex model pattern - Apply wrapping at all custom endpoint, named custom provider, and API-key provider return paths - Update test mocks for the new 5-tuple return format Users can now set: auxiliary: compression: model: gpt-5.3-codex base_url: https://api.openai.com/v1 api_mode: codex_responses Closes #6800	2026-04-11 01:52:58 -07:00
Long Hao	58b62e3e43	feat(skin): make all CLI colors skin-aware Refactor hardcoded color constants throughout the CLI to resolve from the active skin engine, so custom themes fully control the visual appearance. cli.py: - Replace _GOLD constant with _ACCENT (_SkinAwareAnsi class) that lazily resolves response_border from the active skin - Rename _GOLD_DEFAULT to _ACCENT_ANSI_DEFAULT - Make _build_compact_banner() read banner_title/accent/dim from skin - Make session resume notifications use _accent_hex() - Make status line use skin colors (accent_color, separator_color, label_color instead of cryptic _dim_c/_dim_c2/_accent_c/_label_c) - Reset _ACCENT cache on /skin switch agent/display.py: - Replace hardcoded diff ANSI escapes with skin-aware functions: _diff_dim(), _diff_file(), _diff_hunk(), _diff_minus(), _diff_plus() (renamed from SCREAMING_CASE _ANSI_* to snake_case) - Add reset_diff_colors() for cache invalidation on skin switch	2026-04-11 01:47:48 -07:00
kshitijk4poor	d442f25a2f	fix: align MiniMax provider with official API docs Aligns MiniMax provider with official API documentation. Fixes 6 bugs: transport mismatch (openai_chat -> anthropic_messages), credential leak in switch_model(), prompt caching sent to non-Anthropic endpoints, dot-to-hyphen model name corruption, trajectory compressor URL routing, and stale doctor health check. Also corrects context window (204,800), thinking support (manual mode), max output (131,072), and model catalog (M2 family only on /anthropic). Source: https://platform.minimax.io/docs/api-reference/text-anthropic-api Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>	2026-04-11 01:04:41 -07:00
Teknium	caf371da18	fix: MiniMax/Alibaba incorrectly detected as Anthropic OAuth, causing mcp_ tool prefix (#7509 ) _is_oauth_token() returned True for any key not starting with 'sk-ant-api', which means MiniMax and Alibaba API keys were falsely treated as Anthropic OAuth tokens. This triggered the Claude Code compatibility path: - All tool names prefixed with mcp_ (e.g. mcp_terminal, mcp_web_search) - System prompt injected with 'You are Claude Code' identity - 'Hermes Agent' replaced with 'Claude Code' throughout Fix: Make _is_oauth_token() positively identify Anthropic OAuth tokens by their key format instead of using a broad catch-all: - sk-ant-* (but not sk-ant-api-) -> setup tokens, managed keys - eyJ -> JWTs from Anthropic OAuth flow - Everything else -> False (MiniMax, Alibaba, etc.) Reported by stefan171.	2026-04-11 00:43:01 -07:00
Kenny Xie	1ffd92cc94	fix(gateway): make manual compression feedback truthful	2026-04-10 21:16:53 -07:00
hermes-agent-dhabibi	c1af614289	fix: wrap copilot Responses-API models in CodexAuxiliaryClient for auxiliary tasks GPT-5+ models (except gpt-5-mini) are only accessible via the Responses API on Copilot. When these models were configured as the compression summary_model (or any auxiliary task), the plain OpenAI client sent them to /chat/completions which returned a 400 error: model "gpt-5.4-mini" is not accessible via the /chat/completions endpoint resolve_provider_client() now checks _should_use_copilot_responses_api() for the copilot provider and wraps the client in CodexAuxiliaryClient when needed, routing calls through responses.stream() transparently. Adds tests for both the wrapping (gpt-5.4-mini) and non-wrapping (gpt-4.1-mini) paths.	2026-04-10 21:16:53 -07:00
Teknium	3fe6938176	fix: robust context engine interface — config selection, plugin discovery, ABC completeness Follow-up fixes for the context engine plugin slot (PR #5700): - Enhance ContextEngine ABC: add threshold_percent, protect_first_n, protect_last_n as class attributes; complete update_model() default with threshold recalculation; clarify on_session_end() lifecycle docs - Add ContextCompressor.update_model() override for model/provider/ base_url/api_key updates - Replace all direct compressor internal access in run_agent.py with ABC interface: switch_model(), fallback restore, context probing all use update_model() now; _context_probed guarded with getattr/ hasattr for plugin engine compatibility - Create plugins/context_engine/ directory with discovery module (mirrors plugins/memory/ pattern) — discover_context_engines(), load_context_engine() - Add context.engine config key to DEFAULT_CONFIG (default: compressor) - Config-driven engine selection in run_agent.__init__: checks config, then plugins/context_engine/<name>/, then general plugin system, falls back to built-in ContextCompressor - Wire on_session_end() in shutdown_memory_provider() at real session boundaries (CLI exit, /reset, gateway expiry)	2026-04-10 19:15:50 -07:00
Stephen Schoettler	5d8dd622bc	feat: wire context engine tools, session lifecycle, and tool dispatch - Inject engine tool schemas into agent tool surface after compressor init - Call on_session_start() with session_id, hermes_home, platform, model - Dispatch engine tool calls (lcm_grep, etc.) before regular tool handler - 55/55 tests pass	2026-04-10 19:15:50 -07:00
Stephen Schoettler	92382fb00e	feat: wire context engine plugin slot into agent and plugin system - PluginContext.register_context_engine() lets plugins replace the built-in ContextCompressor with a custom ContextEngine implementation - PluginManager stores the registered engine; only one allowed - run_agent.py checks for a plugin engine at init before falling back to the default ContextCompressor - reset_session_state() now calls engine.on_session_reset() instead of poking internal attributes directly - ContextCompressor.on_session_reset() handles its own internals (_context_probed, _previous_summary, etc.) - 19 new tests covering ABC contract, defaults, plugin slot registration, rejection of duplicates/non-engines, and compressor reset behavior - All 34 existing compressor tests pass unchanged	2026-04-10 19:15:50 -07:00
Stephen Schoettler	fe7e6c156c	feat: add ContextEngine ABC, refactor ContextCompressor to inherit from it Introduces agent/context_engine.py — an abstract base class that defines the pluggable context engine interface. ContextCompressor now inherits from ContextEngine as the default implementation. No behavior change. All 34 existing compressor tests pass. This is the foundation for a context engine plugin slot, enabling third-party engines like LCM (Lossless Context Management) to replace the built-in compressor via the plugin system.	2026-04-10 19:15:50 -07:00
0xFrank-eth	e8034e2f6a	fix(gateway): replace os.environ session state with contextvars for concurrency safety When two gateway messages arrived concurrently, _set_session_env wrote HERMES_SESSION_PLATFORM/CHAT_ID/CHAT_NAME/THREAD_ID into the process-global os.environ. Because asyncio tasks share the same process, Message B would overwrite Message A's values mid-flight, causing background-task notifications and tool calls to route to the wrong thread/chat. Replace os.environ with Python's contextvars.ContextVar. Each asyncio task (and any run_in_executor thread it spawns) gets its own copy, so concurrent messages never interfere. Changes: - New gateway/session_context.py with ContextVar definitions, set/clear/get helpers, and os.environ fallback for CLI/cron/test backward compatibility - gateway/run.py: _set_session_env returns reset tokens, _clear_session_env accepts them for proper cleanup in finally blocks - All tool consumers updated: cronjob_tools, send_message_tool, skills_tool, terminal_tool (both notify_on_complete AND check_interval blocks), tts_tool, agent/skill_utils, agent/prompt_builder - Tests updated for new contextvar-based API Fixes #7358 Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>	2026-04-10 17:04:38 -07:00
Billard	475cbce775	fix(aux): honor api_mode for custom auxiliary endpoints	2026-04-10 16:47:44 -07:00
Julien Talbot	8bcb8b8e87	feat(providers): add native xAI provider Adds xAI as a first-class provider: ProviderConfig in auth.py, HermesOverlay in providers.py, 11 curated Grok models, URL mapping in model_metadata.py, aliases (x-ai, x.ai), and env var tests. Uses standard OpenAI-compatible chat completions. Closes #7050	2026-04-10 13:40:38 -07:00
WAXLYY	c6e1add6f1	fix(agent): preserve quoted @file references with spaces	2026-04-10 13:05:01 -07:00
Teknium	be4f049f46	fix: salvage follow-ups for Weixin adapter (#6747 ) - Remove sys.path.insert hack (leftover from standalone dev) - Add token lock (acquire_scoped_lock/release_scoped_lock) in connect()/disconnect() to prevent duplicate pollers across profiles - Fix get_connected_platforms: WEIXIN check must precede generic token/api_key check (requires both token AND account_id) - Add WEIXIN_HOME_CHANNEL_NAME to _EXTRA_ENV_KEYS - Add gateway setup wizard with QR login flow - Add platform status check for partially configured state - Add weixin.md docs page with full adapter documentation - Update environment-variables.md reference with all 11 env vars - Update sidebars.ts to include weixin docs page - Wire all gateway integration points onto current main Salvaged from PR #6747 by Zihan Huang.	2026-04-10 05:54:37 -07:00
Kenny Xie	b730c2955a	fix(model): normalize direct provider ids in auxiliary routing	2026-04-10 05:52:45 -07:00
Teknium	5fc5ced972	fix: add Alibaba/DashScope rate-limit pattern to error classifier Port from anomalyco/opencode#21355: Alibaba's DashScope API returns a unique throttling message ('Request rate increased too quickly...') that doesn't match standard rate-limit patterns ('rate limit', 'too many requests'). This caused Alibaba errors to fall through to the 'unknown' category rather than being properly classified as rate_limit with appropriate backoff/rotation. Add 'rate increased too quickly' to _RATE_LIMIT_PATTERNS and test with the exact error message observed from the Alibaba provider.	2026-04-10 05:52:45 -07:00
Teknium	6d2fa03837	fix: UTF-8 config encoding, pairing hint, credential_pool key, header normalization (#7174 ) Four small fixes: (1) UTF-8 encoding for config open (@zhangchn #7063), (2) pairing hint placeholders (@konsisumer #7057), (3) missing credential_pool in cheap route (@kuishou68 #7025), (4) case-insensitive rate limit headers (@kuishou68 #7019).	2026-04-10 05:33:48 -07:00
xwp	5a1cce53e4	fix(auxiliary): skip anthropic in fallback chain when not explicitly configured _resolve_api_key_provider() now checks is_provider_explicitly_configured before calling _try_anthropic(). Previously, any auxiliary fallback (e.g. when kimi-coding key was invalid) would silently discover and use Claude Code OAuth tokens — consuming the user's Claude Max subscription without their knowledge. This is the auxiliary-client counterpart of the setup-wizard gate in PR #4210. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-10 05:19:21 -07:00
xwp	419b719c2b	fix(auth): make 'auth remove' for claude_code prevent re-seeding Previously, removing a claude_code credential from the anthropic pool only printed a note — the next load_pool() re-seeded it from ~/.claude/.credentials.json. Now writes a 'suppressed_sources' flag to auth.json that _seed_from_singletons checks before seeding. Follows the pattern of env: source removal (clears .env var) and device_code removal (clears auth store state). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-10 05:19:21 -07:00
xwp	f3fb3eded4	fix(auth): gate Claude Code credential seeding behind explicit provider config _seed_from_singletons('anthropic') now checks is_provider_explicitly_configured('anthropic') before reading ~/.claude/.credentials.json. Without this, the auxiliary client fallback chain silently discovers and uses Claude Code tokens when the user's primary provider key is invalid — consuming their Claude Max subscription quota without consent. Follows the same gating pattern as PR #4210 (setup wizard gate) but applied to the credential pool seeding path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-10 05:19:21 -07:00
alt-glitch	96c060018a	fix: remove 115 verified dead code symbols across 46 production files Automated dead code audit using vulture + coverage.py + ast-grep intersection, confirmed by Opus deep verification pass. Every symbol verified to have zero production callers (test imports excluded from reachability analysis). Removes ~1,534 lines of dead production code across 46 files and ~1,382 lines of stale test code. 3 entire files deleted (agent/builtin_memory_provider.py, hermes_cli/checklist.py, tests/hermes_cli/test_setup_model_selection.py). Co-authored-by: alt-glitch <balyan.sid@gmail.com>	2026-04-10 03:44:43 -07:00
aaronagent	9afe1784bd	fix: hidden_div regex bypass with newlines, credential config silent failure, webhook route error severity prompt_builder.py: The `hidden_div` detection pattern uses `.` which does not match newlines in Python regex (re.DOTALL is not passed). An attacker can bypass detection by splitting the style attribute across lines: `<div style="color:red;\ndisplay: none">injected content</div>` Replace `.` with `[\s\S]*?` to match across line boundaries. credential_files.py: `_load_config_files()` catches all exceptions at DEBUG level (line 171), making YAML parse failures invisible in production logs. Users whose credential files silently fail to mount into sandboxes have no diagnostic clue. Promote to WARNING to match the severity pattern used by the path validation warnings at lines 150 and 158 in the same function. webhook.py: `_reload_dynamic_routes()` logs JSON parse failures at WARNING (line 265) but the impact — stale/corrupted dynamic routes persisting silently — warrants ERROR level to ensure operator visibility in alerting pipelines. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-04-10 03:05:04 -07:00
aaronagent	738f0bac13	fix: align auth-by-message classification with status-code path, decode URLs before secret check error_classifier.py: Message-only auth errors ("invalid api key", "unauthorized", etc.) were classified as retryable=True (line 707), inconsistent with the HTTP 401 path (line 432) which correctly uses retryable=False + should_fallback=True. The mismatch causes 3 wasted retries with the same broken credential before fallback, while 401 errors immediately attempt fallback. Align the message-based path to match: retryable=False, should_fallback=True. web_tools.py: The _PREFIX_RE secret-detection check in web_extract_tool() runs against the raw URL string (line 1196). URL-encoded secrets like %73k-1234... ( sk-1234...) bypass the filter because the regex expects literal ASCII. Add urllib.parse.unquote() before the check so percent-encoded variants are also caught. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-04-10 03:05:04 -07:00
Julien Talbot	b577697189	fix(model_metadata): add xAI Grok context length fallbacks xAI /v1/models does not return context_length metadata, so Hermes probes down to the 128k default whenever a user configures a custom provider pointing at https://api.x.ai/v1. This forces every xAI user to manually override model.context_length in config.yaml (2M for Grok 4.20 / 4.1-fast / 4-fast) or lose most of the usable context window. Add DEFAULT_CONTEXT_LENGTHS entries for the Grok family so the fallback lookup returns the correct value via substring matching. Values sourced from models.dev (2026-04) and cross-checked against the xAI /v1/models listing: - grok-4.20-* 2,000,000 (reasoning, non-reasoning, multi-agent) - grok-4-1-fast-* 2,000,000 - grok-4-fast-* 2,000,000 - grok-4 / grok-4-0709 256,000 - grok-code-fast-1 256,000 - grok-3* 131,072 - grok-2 / latest 131,072 - grok-2-vision* 8,192 - grok (catch-all) 131,072 Keys are ordered longest-first so that specific variants match before the catch-all, consistent with the existing Claude/Gemma/MiniMax entries. Add TestDefaultContextLengths.test_grok_models_context_lengths and test_grok_substring_matching to pin the values and verify the full lookup path. All 77 tests in test_model_metadata.py pass.	2026-04-10 03:04:19 -07:00
Cocoon-Break	45034b746f	fix: set retryable=False for message-based auth errors in _classify_by_message() (#7027 ) Auth errors matched by message pattern were incorrectly marked retryable=True, causing futile retry loops. Aligns with _classify_by_status() which already sets retryable=False for 401/403. Fixes #7026. Contributed by @kuishou68.	2026-04-10 02:48:45 -07:00
kshitijk4poor	9431f82aff	fix: update Kimi Coding User-Agent to KimiCLI/1.30.0 The hardcoded User-Agent 'KimiCLI/1.3' is outdated — Kimi CLI is now at v1.30.0. The stale version string causes intermittent 403 errors from Kimi's coding endpoint ('only available for Coding Agents'). Update all 8 occurrences across run_agent.py, auxiliary_client.py, and doctor.py to 'KimiCLI/1.30.0' to match the current official Kimi CLI.	2026-04-10 02:37:28 -07:00
Teknium	8779a268a7	feat: add Anthropic Fast Mode support to /fast command (#7037 ) Extends the /fast command to support Anthropic's Fast Mode beta in addition to OpenAI Priority Processing. When enabled on Claude Opus 4.6, adds speed:"fast" and the fast-mode-2026-02-01 beta header to API requests for ~2.5x faster output token throughput. Changes: - hermes_cli/models.py: Add _ANTHROPIC_FAST_MODE_MODELS registry, model_supports_fast_mode() now recognizes Claude Opus 4.6, resolve_fast_mode_overrides() returns {speed: fast} for Anthropic vs {service_tier: priority} for OpenAI - agent/anthropic_adapter.py: Add _FAST_MODE_BETA constant, build_anthropic_kwargs() accepts fast_mode=True which injects speed:fast + beta header via extra_headers (skipped for third-party Anthropic-compatible endpoints like MiniMax) - run_agent.py: Pass fast_mode to build_anthropic_kwargs in the anthropic_messages path of _build_api_kwargs() - cli.py: Update _handle_fast_command with provider-aware messaging (shows 'Anthropic Fast Mode' vs 'Priority Processing') - hermes_cli/commands.py: Update /fast description to mention both providers - tests: 13 new tests covering Anthropic model detection, override resolution, CLI availability, routing, adapter kwargs, and third-party endpoint safety	2026-04-10 02:32:15 -07:00
emozilla	bda9aa17cb	fix(streaming): prevent <think> in prose from suppressing response output When the model mentions <think> as literal text in its response (e.g. "(/think not producing <think> tags)"), the streaming display treated it as a reasoning block opener and suppressed everything after it. The response box would close with truncated content and no error — the API response was complete but the display ate it. Root cause: _stream_delta() matched <think> anywhere in the text stream regardless of position. Real reasoning blocks always start at the beginning of a line; mentions in prose appear mid-sentence. Fix: track line position across streaming deltas with a _stream_last_was_newline flag. Only enter reasoning suppression when the tag appears at a block boundary (start of stream, after a newline, or after only whitespace on the current line). Add a _flush_stream() safety net that recovers buffered content if no closing tag is found by end-of-stream. Also fixes three related issues discovered during investigation: - anthropic_adapter: _get_anthropic_max_output() now normalizes dots to hyphens so 'claude-opus-4.6' matches the 'claude-opus-4-6' table key (was returning 32K instead of 128K) - run_agent: send explicit max_tokens for Claude models on Nous Portal, same as OpenRouter — both proxy to Anthropic's API which requires it. Without it the backend defaults to a low limit that truncates responses. - run_agent: reset truncated_tool_call_retries after successful tool execution so a single truncation doesn't poison the entire conversation.	2026-04-09 22:16:36 -07:00
Teknium	4caa635803	fix: add auth.json write-back for Codex retry and valid-token early-return paths The Codex retry block and valid-token short-circuit in _refresh_entry() both return early, bypassing the auth.json sync at the end of the method. This adds _sync_device_code_entry_to_auth_store() calls on both paths so refreshed/synced tokens are written back to auth.json regardless of which code path succeeds.	2026-04-09 21:48:50 -07:00
Ben Barclay	a64d8a83e1	fix: proactive Codex CLI sync before refresh + retry on failure	2026-04-09 21:48:50 -07:00
Ben Barclay	dfde4058cf	fix: sync refreshed OAuth tokens from pool back to auth.json providers	2026-04-09 21:48:50 -07:00
kshitijk4poor	08e2a1a51e	fix(anthropic): omit tool-streaming beta on MiniMax endpoints MiniMax's Anthropic-compatible endpoints reject requests that include the fine-grained-tool-streaming beta header — every tool-use message triggers a connection error (~18s timeout). Regular chat works fine. Add _common_betas_for_base_url() that filters out the tool-streaming beta for Bearer-auth (MiniMax) endpoints while keeping all other betas. All four client-construction branches now use the filtered list. Based on #6528 by @HiddenPuppy. Original cherry-picked from PR #6688 by kshitijk4poor. Fixes #6510, fixes #6555.	2026-04-09 17:53:52 -07:00
sprmn24	e053433c84	fix(error_classifier): disambiguate usage-limit patterns in _classify_by_message _classify_by_message had no handling for _USAGE_LIMIT_PATTERNS, so messages like 'usage limit exceeded, try again in 5 minutes' arriving without an HTTP status code fell through to FailoverReason.unknown instead of rate_limit. Apply the same billing/rate-limit disambiguation that _classify_402 already uses: USAGE_LIMIT_PATTERNS + transient signal → rate_limit, USAGE_LIMIT_PATTERNS alone → billing. Add 4 tests covering the no-status-code usage-limit path.	2026-04-09 16:24:13 -07:00
Teknium	97308707e9	fix: insert static fallback when compression summary fails When _generate_summary() failed (no provider, timeout, model error), the compressor silently dropped all middle turns with just a debug log. The agent would then see head + tail with no explanation of the gap, causing total context amnesia (generic greetings instead of continuing the conversation). Now generates a static fallback marker that tells the model context was lost and to continue from the recent tail messages. The fallback flows through the same role-alternation logic as a real summary so message structure stays valid.	2026-04-09 14:28:56 -07:00
Teknium	c6974fd108	fix: allow custom endpoint users to use main model for auxiliary tasks Step 1 of _resolve_auto() explicitly excluded 'custom' providers, forcing custom endpoint users through the fragile fallback chain instead of using their known-working main model credentials. This caused silent compression failures for users on local OpenAI- compatible endpoints — the summary generation would fail, middle turns would be silently dropped, and the agent would lose all conversation context. Remove 'custom' from the exclusion list so custom endpoint users get the same main-model-first treatment as DeepSeek, Anthropic, Gemini, and other direct providers.	2026-04-09 13:23:56 -07:00
KUSH42	34d06a9802	fix(compaction): don't halve context_length on output-cap-too-large errors When the API returns "max_tokens too large given prompt" (input tokens are within the context window, but input + requested output > window), the old code incorrectly routed through the same handler as "prompt too long" errors, calling get_next_probe_tier() and permanently halving context_length. This made things worse: the window was fine, only the requested output size needed trimming for that one call. Two distinct error classes now handled separately: Prompt too long — input itself exceeds context window. Fix: compress history + halve context_length (existing behaviour, unchanged). Output cap too large — input OK, but input + max_tokens > window. Fix: parse available_tokens from the error message, set a one-shot _ephemeral_max_output_tokens override for the retry, and leave context_length completely untouched. Changes: - agent/model_metadata.py: add parse_available_output_tokens_from_error() that detects Anthropic's "available_tokens: N" error format and returns the available output budget, or None for all other error types. - run_agent.py: call the new parser first in the is_context_length_error block; if it fires, set _ephemeral_max_output_tokens (with a 64-token safety margin) and break to retry without touching context_length. _build_api_kwargs consumes the ephemeral value exactly once then clears it so subsequent calls use self.max_tokens normally. - agent/anthropic_adapter.py: expand build_anthropic_kwargs docstring to clearly document the max_tokens (output cap) vs context_length (total window) distinction, which is a persistent source of confusion due to the OpenAI-inherited "max_tokens" name. - cli-config.yaml.example: add inline comments explaining both keys side by side where users are most likely to look. - website/docs/integrations/providers.md: add a callout box at the top of "Context Length Detection" and clarify the troubleshooting entry. - tests/test_ctx_halving_fix.py: 24 tests across four classes covering the parser, build_anthropic_kwargs clamping, ephemeral one-shot consumption, and the invariant that context_length is never mutated on output-cap errors.	2026-04-09 11:27:41 -07:00
Teknium	3007174a61	fix: prevent 400 format errors from triggering compression loop on Codex Responses API (#6751 ) The error classifier's generic-400 heuristic only extracted err_body_msg from the nested body structure (body['error']['message']), missing the flat body format used by OpenAI's Responses API (body['message']). This caused descriptive 400 errors like 'Invalid input[index].name: string does not match pattern' to appear generic when the session was large, misclassifying them as context overflow and triggering an infinite compression loop. Added flat-body fallback in _classify_400() consistent with the parent classify_api_error() function's existing handling at line 297-298.	2026-04-09 11:11:34 -07:00
Yang Zhi	110cdd573a	fix(auxiliary_client): inject KimiCLI User-Agent for custom endpoint sync clients When is explicitly set to , the custom-endpoint path in creates a plain client without provider-specific headers. This means sync vision calls (e.g. ) use the generic User-Agent and get rejected by Kimi's coding endpoint with a 403: 'Kimi For Coding is currently only available for Coding Agents such as Kimi CLI...' The async converter already injects , and the auto-detected API-key provider path also injects it, but the explicit custom endpoint shortcut was missing it entirely. This patch adds the same injection to the custom endpoint branch, and updates all existing Kimi header sites to for consistency. Fixes <issue number to be filled in>	2026-04-09 11:11:25 -07:00
Yang Zhi	4d1b988070	fix(credential_pool): use _resolve_kimi_base_url when seeding kimi-coding pool The credential pool seeder (_seed_from_env) hardcoded the base URL for API-key providers without running provider-specific auto-detection. For kimi-coding, this caused sk-kimi- prefixed keys to be seeded with the legacy api.moonshot.ai/v1 endpoint instead of api.kimi.com/coding/v1, resulting in HTTP 401 on the first request. Import and call _resolve_kimi_base_url for kimi-coding so the pool uses the correct endpoint based on the key prefix, matching the runtime credential resolver behavior. Also fix a comment: sk-kimi- keys are issued by kimi.com/code, not platform.kimi.ai. Fixes #5561	2026-04-09 11:11:25 -07:00
Teknium	1ec1f6a68a	fix: model fallback — stale model on Nous login + connection error fallback (#6554 ) Two bugs in the model fallback system: 1. Nous login leaves stale model in config (provider=nous, model=opus from previous OpenRouter setup). Fixed by deferring the config.yaml provider write until AFTER model selection completes, and passing the selected model atomically via _update_config_for_provider's default_model parameter. Previously, _update_config_for_provider was called before model selection — if selection failed (free tier, no models, exception), config stayed as nous+opus permanently. 2. Codex/stale providers in auxiliary fallback can't connect but block the auto-detection chain. Added _is_connection_error() detection (APIConnectionError, APITimeoutError, DNS failures, connection refused) alongside the existing _is_payment_error() check in call_llm(). When a provider endpoint is unreachable, the system now falls back to the next available provider instead of crashing.	2026-04-09 10:38:53 -07:00
Teknium	1a3ae6ac6e	feat: structured API error classification for smart failover (#6514 ) Add agent/error_classifier.py with a priority-ordered classification pipeline that replaces scattered inline string-matching in the retry loop with structured error taxonomy and recovery hints. FailoverReason enum (14 categories): auth, auth_permanent, billing, rate_limit, overloaded, server_error, timeout, context_overflow, payload_too_large, model_not_found, format_error, thinking_signature, long_context_tier, unknown. ClassifiedError dataclass carries reason + recovery action hints (retryable, should_compress, should_rotate_credential, should_fallback). Key improvements over inline matching: - 402 disambiguation: 'insufficient credits' = billing (immediate rotate), 'usage limit, try again' = rate_limit (backoff first) - OpenRouter 403 'key limit exceeded' correctly classified as billing - Error cause chain walking (walks __cause__/__context__ up to 5 levels) - Body message included in pattern matching (SDK str() misses it) - Server disconnect + large session check ordered before generic transport catch so RemoteProtocolError triggers compression when appropriate - Chinese error message support for context overflow run_agent.py: replaced 6 inline detection blocks with classifier calls, net -55 lines. All recovery actions (pool rotation, fallback activation, compression, transport recovery) unchanged. 65 new unit tests + 10 E2E tests + live tests with real SDK error objects. Inspired by OpenClaw's failover error classification system.	2026-04-09 04:10:11 -07:00
Teknium	8dfc96dbbb	feat: capture provider rate limit headers and show in /usage (#6541 ) Parse x-ratelimit-* headers from inference API responses (Nous Portal, OpenRouter, OpenAI-compatible) and display them in the /usage command. - New agent/rate_limit_tracker.py: parse 12 rate limit headers (RPM/RPH/ TPM/TPH limits, remaining, reset timers), format as progress bars (CLI) or compact one-liner (gateway) - Hook into streaming path in run_agent.py: stream.response.headers is available on the OpenAI SDK Stream object before chunks are consumed - CLI /usage: appends rate limit section with progress bars + warnings when any bucket exceeds 80% - Gateway /usage: appends compact rate limit summary - 24 unit tests covering parsing, formatting, edge cases Headers captured per response: x-ratelimit-{limit,remaining,reset}-{requests,tokens}{,-1h} Example CLI display: Nous Rate Limits (captured just now): Requests/min [░░░░░░░░░░░░░░░░░░░░] 0.1% 1/800 used (799 left, resets in 59s) Tokens/hr [░░░░░░░░░░░░░░░░░░░░] 0.0% 49/336.0M (336.0M left, resets in 52m)	2026-04-09 03:43:14 -07:00
konsisumer	3c8ec7037c	fix(agent): catch PermissionError in subdirectory hint discovery Wrap is_dir() in _is_valid_subdir() and is_file() in _load_hints_for_directory() with OSError handlers so that inaccessible directories (e.g. /root from a non-root Daytona host user) are silently skipped instead of crashing the agent. The existing PermissionError PRs for prompt_builder.py (#6247, #6321, #6355) do not cover subdirectory_hints.py, which was identified as a separate crash path in the #6214 comments. Ref: #6214	2026-04-09 03:10:30 -07:00
Teknium	b408379e9d	fix: reduce credential exhaustion TTL from 24 hours to 1 hour (#6504 ) The 24-hour default cooldown for 402-exhausted credentials was far too aggressive — if a user tops up credits or the 402 was caused by an oversized max_tokens request rather than true billing exhaustion, they shouldn't have to wait a full day. Reduce to 1 hour (matching the existing 429 TTL). Inspired by PR #6493 (michalkomar).	2026-04-09 02:37:23 -07:00
Cherif Yaya	5cf4fac2aa	fix: restore codex fallback auth-store lookup	2026-04-09 01:56:10 -07:00
Hunter B	894e8c8a8f	fix: resolve opencode.ai context window to 1M and clean up display formatting Two issues resolved: 1. Add opencode.ai to _URL_TO_PROVIDER mapping so base_url routes through models.dev lookup (which has mimo-v2-pro at 1M context) instead of falling back to probing /models (404) and defaulting to 128K. 2. Fix _format_context_length to round cleanly: 1048576 → '1M' instead of '1.048576M'. Applies same rounding logic to K values.	2026-04-09 01:43:22 -07:00
BongSuCHOI	d12f8db0b8	fix(compaction): token-budget primary tail protection Tail protection was effectively message-count based despite having a token budget, because protect_last_n=20 acted as a hard floor. A single 50K-token tool output would cause all 20 recent messages to be preserved regardless of budget, leaving little room for summarization. Changes: - _find_tail_cut_by_tokens: min_tail reduced from protect_last_n (20) to 3; token budget is now the primary criterion - Soft ceiling at 1.5x budget to avoid cutting mid-oversized-message - _prune_old_tool_results: accepts optional protect_tail_tokens so pruning also respects the token budget instead of a fixed count - compress() minimum message check relaxed from protect_first_n + protect_last_n + 1 to protect_first_n + 3 + 1 - Tool group alignment (no splitting tool_call/result) preserved	2026-04-08 23:54:23 -07:00
Teknium	d97f6cec7f	feat(gateway): add BlueBubbles iMessage platform adapter (#6437 ) Adds Apple iMessage as a gateway platform via BlueBubbles macOS server. Architecture: - Webhook-based inbound (event-driven, no polling/dedup needed) - Email/phone → chat GUID resolution for user-friendly addressing - Private API safety (checks helper_connected before tapback/typing) - Inbound attachment downloading (images, audio, documents cached locally) - Markdown stripping for clean iMessage delivery - Smart progress suppression for platforms without message editing Based on PR #5869 by @benjaminsehl (webhook architecture, GUID resolution, Private API safety, progress suppression) with inbound attachment downloading from PR #4588 by @1960697431 (attachment cache routing). Integration points: Platform enum, env config, adapter factory, auth maps, cron delivery, send_message routing, channel directory, platform hints, toolset definition, setup wizard, status display. 27 tests covering config, adapter, webhook parsing, GUID resolution, attachment download routing, toolset consistency, and prompt hints.	2026-04-08 23:54:03 -07:00
SHL0MS	8567031433	fix: improve context compression quality — named constants, tool tracking, degradation warning Three targeted improvements to the compression system: 1. Replace hardcoded truncation limits with named class constants (_CONTENT_MAX=6000, _CONTENT_HEAD=4000, _CONTENT_TAIL=1500, _TOOL_ARGS_MAX=1500, _TOOL_ARGS_HEAD=1200). Previous limits (3000/500) heavily truncated the summarizer's input — a 200-line edit got cut to 3000 chars before the summarizer ever saw it. 2. Add '## Tools & Patterns' section to both compression prompt templates (first-pass and iterative). Preserves working tool invocations, preferred flags, and tool-specific discoveries across compaction boundaries. 3. Warn users on 2nd+ compression: 'Session compressed N times — accuracy may degrade. Consider /new to start fresh.' Ref #499	2026-04-08 20:54:23 -07:00
kshitijk4poor	875a72e4c8	fix: normalize httpx.URL base_url + strip thinking signatures for third-party endpoints Two linked fixes for MiniMax Anthropic-compatible fallback: 1. Normalize httpx.URL to str before calling .rstrip() in auth/provider detection helpers. Some client objects expose base_url as httpx.URL, not str — crashed with AttributeError in _requires_bearer_auth() and _is_third_party_anthropic_endpoint(). Also fixes _try_activate_fallback() to use the already-stringified fb_base_url instead of raw httpx.URL. 2. Strip Anthropic-proprietary thinking block signatures when targeting third-party Anthropic-compatible endpoints (MiniMax, Azure AI Foundry, self-hosted proxies). These endpoints cannot validate Anthropic's signatures and reject them with HTTP 400 'Invalid signature in thinking block'. Now threads base_url through convert_messages_to_anthropic() → build_anthropic_kwargs() so signature management is endpoint-aware. Based on PR #4945 by kshitijk4poor (rstrip fix). Fixes #4944.	2026-04-08 16:39:29 -07:00
Teknium	7156f8d866	fix: CI test failures — metadata key, cli console, docker env, vision order (#6294 ) Fixes 9 test failures on current main, incorporating ideas from PR stack #6219-#6222 by xinbenlv with corrections: - model_metadata: sync HF context length key casing (minimaxai/minimax-m2.5 → MiniMaxAI/MiniMax-M2.5) - cli.py: route quick command error output through self.console instead of creating a new ChatConsole() instance - docker.py: explicit docker_forward_env entries now bypass the Hermes secret blocklist (intentional opt-in wins over generic filter) - auxiliary_client: revert _read_main_provider() to simple provider.strip().lower() — the _normalize_aux_provider() call introduced in `5c03f2e7` stripped the custom: prefix, breaking named custom provider resolution - auxiliary_client: flip vision auto-detection order to active provider → OpenRouter → Nous → stop (was OR → Nous → active) - test: update vision priority test to match new order Based on PR #6219-#6222 by xinbenlv.	2026-04-08 16:37:05 -07:00
Teknium	5d2fc6d928	fix: cleanup Qwen OAuth provider gaps - Add HERMES_QWEN_BASE_URL to OPTIONAL_ENV_VARS in config.py (was missing despite being referenced in code) - Remove redundant qwen-oauth entry from _API_KEY_PROVIDER_AUX_MODELS (non-aggregator providers use their main model for aux tasks automatically)	2026-04-08 13:46:30 -07:00
kshitijk4poor	3377017eb4	feat(qwen): add Qwen OAuth provider with portal request support Based on #6079 by @tunamitom with critical fixes and comprehensive tests. Changes from #6079: - Fix: sanitization overwrite bug — Qwen message prep now runs AFTER codex field sanitization, not before (was silently discarding Qwen transforms) - Fix: missing try/except AuthError in runtime_provider.py — stale Qwen credentials now fall through to next provider on auto-detect - Fix: 'qwen' alias conflict — bare 'qwen' stays mapped to 'alibaba' (DashScope); use 'qwen-portal' or 'qwen-cli' for the OAuth provider - Fix: hardcoded ['coder-model'] replaced with live API fetch + curated fallback list (qwen3-coder-plus, qwen3-coder) - Fix: extract _is_qwen_portal() helper + _qwen_portal_headers() to replace 5 inline 'portal.qwen.ai' string checks and share headers between init and credential swap - Fix: add Qwen branch to _apply_client_headers_for_base_url for mid-session credential swaps - Fix: remove suspicious TypeError catch blocks around _prompt_provider_choice - Fix: handle bare string items in content lists (were silently dropped) - Fix: remove redundant dict() copies after deepcopy in message prep - Revert: unrelated ai-gateway test mock removal and model_switch.py comment deletion New tests (30 test functions): - _qwen_cli_auth_path, _read_qwen_cli_tokens (success + 3 error paths) - _save_qwen_cli_tokens (roundtrip, parent creation, permissions) - _qwen_access_token_is_expiring (5 edge cases: fresh, expired, within skew, None, non-numeric) - _refresh_qwen_cli_tokens (success, preserve old refresh, 4 error paths, default expires_in, disk persistence) - resolve_qwen_runtime_credentials (fresh, auto-refresh, force-refresh, missing token, env override) - get_qwen_auth_status (logged in, not logged in) - Runtime provider resolution (direct, pool entry, alias) - _build_api_kwargs (metadata, vl_high_resolution_images, message formatting, max_tokens suppression)	2026-04-08 13:46:30 -07:00
Teknium	c8a5e36be8	feat(prompting): self-optimized GPT/Codex tool-use guidance via automated behavioral benchmarking (#6120 ) Hermes Agent identified and patched its own prompting blind spots through automated self-evaluation — running 64+ tool-use benchmarks across GPT-5.4 and Codex-5.3, diagnosing 5 failure modes, writing targeted prompt patches, and verifying the fix in a closed loop. Failure modes discovered and fixed: - Mental arithmetic (wrong answers: 39,152,053 vs correct 39,151,253) - User profile hallucination ('Windows 11' when running on Linux) - Time guessing without verification - Clarification-seeking instead of acting ('open where?' for port checks) - Hash computation from memory (SHA-256, encodings) - Confusing system RAM with agent's own persistent memory store Two new XML sections added to OPENAI_MODEL_EXECUTION_GUIDANCE: - <mandatory_tool_use>: explicit categories that must always use tools - <act_dont_ask>: default to action on obvious interpretations Results: gpt-5.4: 68.8% → 100% tool compliance (+31.2pp) gpt-5.3-codex: 62.5% → 100% tool compliance (+37.5pp) Regression: 0/8 conversational prompts over-tooled	2026-04-08 04:06:42 -07:00
Teknium	1368caf66f	fix(anthropic): smart thinking block signature management (#6112 ) Anthropic signs thinking blocks against the full turn content. Any upstream mutation (context compression, session truncation, orphan stripping, message merging) invalidates the signature, causing HTTP 400 'Invalid signature in thinking block' — especially in long-lived gateway sessions. Strategy (following clawdbot/OpenClaw pattern): 1. Strip thinking/redacted_thinking from all assistant messages EXCEPT the last one — preserves reasoning continuity on the current tool-use chain while avoiding stale signature errors on older turns. 2. Downgrade unsigned thinking blocks to plain text — Anthropic can't validate them, but the reasoning content is preserved. 3. Strip cache_control from thinking/redacted_thinking blocks to prevent cache markers from interfering with signature validation. 4. Drop thinking blocks from the second message when merging consecutive assistant messages (role alternation enforcement). 5. Error recovery: on HTTP 400 mentioning 'signature' and 'thinking', strip all reasoning_details from the conversation and retry once. This is the safety net for edge cases the proactive stripping misses. Addresses the issue reported in PR #6086 by @mingginwan while preserving reasoning continuity (their PR stripped ALL thinking blocks unconditionally). Files changed: - agent/anthropic_adapter.py: thinking block management in convert_messages_to_anthropic (strip old turns, downgrade unsigned, strip cache_control, merge-time strip) - run_agent.py: one-shot signature error recovery in retry loop - tests/test_anthropic_adapter.py: 10 new tests covering all cases	2026-04-08 03:38:08 -07:00
kshitij	22d1bda185	fix(minimax): correct context lengths, model catalog, thinking guard, aux model, and config base_url Cherry-picked from PR #6046 by kshitijk4poor with dead code stripped. - Context lengths: 204800 → 1M (M1) / 1048576 (M2.5/M2.7) per official docs - Model catalog: add M1 family, remove deprecated M2.1 and highspeed variants - Thinking guard: skip extended thinking for MiniMax (Anthropic-compat endpoint) - Aux model: MiniMax-M2.7-highspeed → MiniMax-M2.7 (same model, half price) - Config base_url: honour model.base_url for API-key providers (fixes China users) - Stripped unused get_minimax_max_output() / _MINIMAX_MAX_OUTPUT (no consumer) Fixes #5777, #4082, #6039. Closes #3895.	2026-04-08 02:20:46 -07:00
Mibayy	ab271ebe10	fix(vision): simplify vision auto-detection to openrouter → nous → active provider Simplify the vision auto-detection chain from 5 backends (openrouter, nous, codex, anthropic, custom) down to 3: 1. OpenRouter (known vision-capable default model) 2. Nous Portal (known vision-capable default model) 3. Active provider + model (whatever the user is running) 4. Stop This is simpler and more predictable. The active provider step uses resolve_provider_client() which handles all provider types including named custom providers (from #5978). Removed the complex preferred-provider promotion logic and API-level fallback — the chain is short enough that it doesn't need them. Based on PR #5376 by Mibay. Closes #5366.	2026-04-08 01:21:54 -07:00
zocomputer	e1befe5077	feat(agent): add jittered retry backoff Adds agent/retry_utils.py with jittered_backoff() — exponential backoff with additive jitter to prevent thundering-herd retry spikes when multiple gateway sessions hit the same rate-limited provider. Replaces fixed exponential backoff at 4 call sites: - run_agent.py: None-choices retry path (5s base, 120s cap) - run_agent.py: API error retry path (2s base, 60s cap) - trajectory_compressor.py: sync + async summarization retries Thread-safe jitter counter with overflow guards ensures unique seeds across concurrent retries. Trimmed from original PR to keep only wired-in functionality. Co-authored-by: martinp09 <martinp09@users.noreply.github.com>	2026-04-08 00:41:36 -07:00
Teknium	5c03f2e7cc	fix: provider/model resolution — salvage 4 PRs + MiniMax aux URL fix (#5983 ) Salvaged fixes from community PRs: - fix(model_switch): _read_auth_store → _load_auth_store + fix auth store key lookup (was checking top-level dict instead of store['providers']). OAuth providers now correctly detected in /model picker. Cherry-picked from PR #5911 by Xule Lin (linxule). - fix(ollama): pass num_ctx to override 2048 default context window. Ollama defaults to 2048 context regardless of model capabilities. Now auto-detects from /api/show metadata and injects num_ctx into every request. Config override via model.ollama_num_ctx. Fixes #2708. Cherry-picked from PR #5929 by kshitij (kshitijk4poor). - fix(aux): normalize provider aliases for vision/auxiliary routing. Adds _normalize_aux_provider() with 17 aliases (google→gemini, claude→anthropic, glm→zai, etc). Fixes vision routing failure when provider is set to 'google' instead of 'gemini'. Cherry-picked from PR #5793 by e11i (Elizabeth1979). - fix(aux): rewrite MiniMax /anthropic base URLs to /v1 for OpenAI SDK. MiniMax's inference_base_url ends in /anthropic (Anthropic Messages API), but auxiliary client uses OpenAI SDK which appends /chat/completions → 404 at /anthropic/chat/completions. Generic _to_openai_base_url() helper rewrites terminal /anthropic to /v1 for OpenAI-compatible endpoint. Inspired by PR #5786 by Lempkey. Added debug logging to silent exception blocks across all fixes. Co-authored-by: Hermes Agent <hermes@nousresearch.com>	2026-04-07 22:23:28 -07:00
Teknium	8d7a98d2ff	feat: use mimo-v2-pro for non-vision auxiliary tasks on Nous free tier (#6018 ) Free-tier Nous Portal users were getting mimo-v2-omni (a multimodal model) for all auxiliary tasks including compression, session search, and web extraction. Now routes non-vision tasks to mimo-v2-pro (a text model) which is better suited for those workloads. - Added _NOUS_FREE_TIER_AUX_MODEL constant for text auxiliary tasks - _try_nous() accepts vision=False param to select the right model - Vision path (_resolve_strict_vision_backend) passes vision=True - All other callers default to vision=False → mimo-v2-pro	2026-04-07 21:41:05 -07:00
Teknium	cbf1f15cfe	fix(auxiliary): resolve named custom providers and 'main' alias in auxiliary routing (#5978 ) * fix(telegram): replace substring caption check with exact line-by-line match Captions in photo bursts and media group albums were silently dropped when a shorter caption happened to be a substring of an existing one (e.g. "Meeting" lost inside "Meeting agenda"). Extract a shared _merge_caption static helper that splits on "\n\n" and uses exact match with whitespace normalisation, then use it in both _enqueue_photo_event and _queue_media_group_event. Adds 13 unit tests covering the fixed bug scenarios. Cherry-picked from PR #2671 by Dilee. * fix: extend caption substring fix to all platforms Move _merge_caption helper from TelegramAdapter to BasePlatformAdapter so all adapters inherit it. Fix the same substring-containment bug in: - gateway/platforms/base.py (photo burst merging) - gateway/run.py (priority photo follow-up merging) - gateway/platforms/feishu.py (media batch merging) The original fix only covered telegram.py. The same bug existed in base.py and run.py (pure substring check) and feishu.py (list membership without whitespace normalization). * fix(auxiliary): resolve named custom providers and 'main' alias in auxiliary routing Two bugs caused auxiliary tasks (vision, compression, etc.) to fail when using named custom providers defined in config.yaml: 1. 'provider: main' was hardcoded to 'custom', which only checks legacy OPENAI_BASE_URL env vars. Now reads _read_main_provider() to resolve to the actual provider (e.g., 'custom:beans', 'openrouter', 'deepseek'). 2. Named custom provider names (e.g., 'beans') fell through to PROVIDER_REGISTRY which doesn't know about config.yaml entries. Now checks _get_named_custom_provider() before the registry fallback. Fixes both resolve_provider_client() and _normalize_vision_provider() so the fix covers all auxiliary tasks (vision, compression, web_extract, session_search, etc.). Adds 13 unit tests. Reported by Laura via Discord. --------- Co-authored-by: Dilee <uzmpsk.dilekakbas@gmail.com>	2026-04-07 17:59:47 -07:00
Teknium	678a87c477	refactor: add tool_error/tool_result helpers + read_raw_config, migrate 129 callsites Add three reusable helpers to eliminate pervasive boilerplate: tools/registry.py — tool_error() and tool_result(): Every tool handler returns JSON strings. The pattern json.dumps({"error": msg}, ensure_ascii=False) appeared 106 times, and json.dumps({"success": False, "error": msg}, ...) another 23. Now: tool_error(msg) or tool_error(msg, success=False). tool_result() handles arbitrary result dicts: tool_result(success=True, data=payload) or tool_result(some_dict). hermes_cli/config.py — read_raw_config(): Lightweight YAML reader that returns the raw config dict without load_config()'s deep-merge + migration overhead. Available for callsites that just need a single config value. Migration (129 callsites across 32 files): - tools/: browser_camofox (18), file_tools (10), homeassistant (8), web_tools (7), skill_manager (7), cronjob (11), code_execution (4), delegate (5), send_message (4), tts (4), memory (7), session_search (3), mcp (2), clarify (2), skills_tool (3), todo (1), vision (1), browser (1), process_registry (2), image_gen (1) - plugins/memory/: honcho (9), supermemory (9), hindsight (8), holographic (7), openviking (7), mem0 (7), byterover (6), retaindb (2) - agent/: memory_manager (2), builtin_memory_provider (1)	2026-04-07 13:36:38 -07:00
Teknium	ca0459d109	refactor: remove 24 confirmed dead functions — 432 lines of unused code Each function was verified to have exactly 1 reference in the entire codebase (its own definition). Zero calls, zero imports, zero string references anywhere including tests. Removed by category: Superseded wrappers (replaced by newer implementations): - agent/anthropic_adapter.py: run_hermes_oauth_login, refresh_hermes_oauth_token - hermes_cli/callbacks.py: sudo_password_callback (superseded by CLI method) - hermes_cli/setup.py: _set_model_provider, _sync_model_from_disk - tools/file_tools.py: get_file_tools (superseded by registry.register) - tools/cronjob_tools.py: get_cronjob_tool_definitions (same) - tools/terminal_tool.py: _check_dangerous_command (_check_all_guards used) Dead private helpers (lost their callers during refactors): - agent/anthropic_adapter.py: _convert_user_content_part_to_anthropic - agent/display.py: honcho_session_line, write_tty - hermes_cli/providers.py: _build_labels (+ dead _labels_cache var) - hermes_cli/tools_config.py: _prompt_yes_no - hermes_cli/models.py: _extract_model_ids - hermes_cli/uninstall.py: log_error - gateway/platforms/feishu.py: _is_loop_ready - tools/file_operations.py: _read_image (64-line method) - tools/process_registry.py: cleanup_expired - tools/skill_manager_tool.py: check_skill_manage_requirements Dead class methods (zero callers): - run_agent.py: _is_anthropic_url (logic duplicated inline at L618) - run_agent.py: _classify_empty_content_response (68-line method, never wired) - cli.py: reset_conversation (callers all use new_session directly) - cli.py: _clear_current_input (added but never wired in) Other: - gateway/delivery.py: build_delivery_context_for_tool - tools/browser_tool.py: get_active_browser_sessions	2026-04-07 11:41:26 -07:00
Teknium	187e90e425	refactor: replace inline HERMES_HOME re-implementations with get_hermes_home() 16 callsites across 14 files were re-deriving the hermes home path via os.environ.get('HERMES_HOME', ...) instead of using the canonical get_hermes_home() from hermes_constants. This breaks profiles — each profile has its own HERMES_HOME, and the inline fallback defaults to ~/.hermes regardless. Fixed by importing and calling get_hermes_home() at each site. For files already inside the hermes process (agent/, hermes_cli/, tools/, gateway/, plugins/), this is always safe. Files that run outside the process context (mcp_serve.py, mcp_oauth.py) already had correct try/except ImportError fallbacks and were left alone. Skipped: hermes_constants.py (IS the implementation), env_loader.py (bootstrap), profiles.py (intentionally manipulates the env var), standalone scripts (optional-skills/, skills/), and tests.	2026-04-07 10:40:34 -07:00
Teknium	d0ffb111c2	refactor: codebase-wide lint cleanup — unused imports, dead code, and inefficient patterns (#5821 ) Comprehensive cleanup across 80 files based on automated (ruff, pyflakes, vulture) and manual analysis of the entire codebase. Changes by category: Unused imports removed (~95 across 55 files): - Removed genuinely unused imports from all major subsystems - agent/, hermes_cli/, tools/, gateway/, plugins/, cron/ - Includes imports in try/except blocks that were truly unused (vs availability checks which were left alone) Unused variables removed (~25): - Removed dead variables: connected, inner, channels, last_exc, source, new_server_names, verify, pconfig, default_terminal, result, pending_handled, temperature, loop - Dropped unused argparse subparser assignments in hermes_cli/main.py (12 instances of add_parser() where result was never used) Dead code removed: - run_agent.py: Removed dead ternary (None if False else None) and surrounding unreachable branch in identity fallback - run_agent.py: Removed write-only attribute _last_reported_tool - hermes_cli/providers.py: Removed dead @property decorator on module-level function (decorator has no effect outside a class) - gateway/run.py: Removed unused MCP config load before reconnect - gateway/platforms/slack.py: Removed dead SessionSource construction Undefined name bugs fixed (would cause NameError at runtime): - batch_runner.py: Added missing logger = logging.getLogger(__name__) - tools/environments/daytona.py: Added missing Dict and Path imports Unnecessary global statements removed (14): - tools/terminal_tool.py: 5 functions declared global for dicts they only mutated via .pop()/[key]=value (no rebinding) - tools/browser_tool.py: cleanup thread loop only reads flag - tools/rl_training_tool.py: 4 functions only do dict mutations - tools/mcp_oauth.py: only reads the global - hermes_time.py: only reads cached values Inefficient patterns fixed: - startswith/endswith tuple form: 15 instances of x.startswith('a') or x.startswith('b') consolidated to x.startswith(('a', 'b')) - len(x)==0 / len(x)>0: 13 instances replaced with pythonic truthiness checks (not x / bool(x)) - in dict.keys(): 5 instances simplified to in dict - Redefined unused name: removed duplicate _strip_mdv2 import in send_message_tool.py Other fixes: - hermes_cli/doctor.py: Replaced undefined logger.debug() with pass - hermes_cli/config.py: Consolidated chained .endswith() calls Test results: 3934 passed, 17 failed (all pre-existing on main), 19 skipped. Zero regressions.	2026-04-07 10:25:31 -07:00
emozilla	29065cb9b5	feat(nous): free-tier model gating, pricing display, and vision fallback - Show pricing during initial Nous Portal login (was missing from _login_nous, only shown in the already-logged-in hermes model path) - Filter free models for paid subscribers: non-allowlisted free models are hidden; allowlisted models (xiaomi/mimo-v2-pro, xiaomi/mimo-v2-omni) only appear when actually priced as free - Detect free-tier accounts via portal api/oauth/account endpoint (monthly_charge == 0); free-tier users see only free models as selectable, with paid models shown dimmed and unselectable - Use xiaomi/mimo-v2-omni as the auxiliary vision model for free-tier Nous users so vision_analyze and browser_vision work without paid model access (replaces the default google/gemini-3-flash-preview) - Unavailable models rendered via print() before TerminalMenu to avoid simple_term_menu line-width padding artifacts; upgrade URL resolved from auth state portal_base_url (supports staging/custom portals) - Add 21 tests covering filter_nous_free_models, is_nous_free_tier, and partition_nous_models_by_tier	2026-04-07 09:21:48 -07:00
Ben Barclay	b2f477a30b	feat: switch managed browser provider from Browserbase to Browser Use (#5750 ) * feat: switch managed browser provider from Browserbase to Browser Use The Nous subscription tool gateway now routes browser automation through Browser Use instead of Browserbase. This commit: - Adds managed Nous gateway support to BrowserUseProvider (idempotency keys, X-BB-API-Key auth header, external_call_id persistence) - Removes managed gateway support from BrowserbaseProvider (now direct-only via BROWSERBASE_API_KEY/BROWSERBASE_PROJECT_ID) - Updates browser_tool.py fallback: prefers Browser Use over Browserbase - Updates nous_subscription.py: gateway vendor 'browser-use', auto-config sets cloud_provider='browser-use' for new subscribers - Updates tools_config.py: Nous Subscription entry now uses Browser Use - Updates setup.py, cli.py, status.py, prompt_builder.py display strings - Updates all affected tests to match new behavior Browserbase remains fully functional for users with direct API credentials. The change only affects the managed/subscription path. * chore: remove redundant Browser Use hint from system prompt * fix: upgrade Browser Use provider to v3 API - Base URL: api/v2 -> api/v3 (v2 is legacy) - Unified all endpoints to use native Browser Use paths: - POST /browsers (create session, returns cdpUrl) - PATCH /browsers/{id} with {action: stop} (close session) - Removed managed-mode branching that used Browserbase-style /v1/sessions paths — v3 gateway now supports /browsers directly - Removed unused managed_mode variable in close_session * fix(browser-use): use X-Browser-Use-API-Key header for managed mode The managed gateway expects X-Browser-Use-API-Key, not X-BB-API-Key (which is a Browserbase-specific header). Using the wrong header caused a 401 AUTH_ERROR on every managed-mode browser session create. Simplified _headers() to always use X-Browser-Use-API-Key regardless of direct vs managed mode. * fix(nous_subscription): browserbase explicit provider is direct-only Since managed Nous gateway now routes through Browser Use, the browserbase explicit provider path should not check managed_browser_available (which resolves against the browser-use gateway). Simplified to direct-only with managed=False. * fix(browser-use): port missing improvements from PR #5605 - CDP URL normalization: resolve HTTP discovery URLs to websocket after cloud provider create_session() (prevents agent-browser failures) - Managed session payload: send timeout=5 and proxyCountryCode=us for gateway-backed sessions (prevents billing overruns) - Update prompt builder, browser_close schema, and module docstring to replace remaining Browserbase references with Browser Use - Dynamic /browser status detection via _get_cloud_provider() instead of hardcoded env var checks (future-proof for new providers) - Rename post_setup key from 'browserbase' to 'agent_browser' - Update setup hint to mention Browser Use alongside Browserbase - Add tests: CDP normalization, browserbase direct-only guard, managed browser-use gateway, direct browserbase fallback --------- Co-authored-by: rob-maron <132852777+rob-maron@users.noreply.github.com>	2026-04-07 08:40:22 -04:00
Teknium	8b861b77c1	refactor: remove browser_close tool — auto-cleanup handles it (#5792 ) * refactor: remove browser_close tool — auto-cleanup handles it The browser_close tool was called in only 9% of browser sessions (13/144 navigations across 66 sessions), always redundantly — cleanup_browser() already runs via _cleanup_task_resources() at conversation end, and the background inactivity reaper catches anything else. Removing it saves one tool schema slot in every browser-enabled API call. Also fixes a latent bug: cleanup_browser() now handles Camofox sessions too (previously only Browserbase). Camofox sessions were never auto-cleaned per-task because they live in a separate dict from _active_sessions. Files changed (13): - tools/browser_tool.py: remove function, schema, registry entry; add camofox cleanup to cleanup_browser() - toolsets.py, model_tools.py, prompt_builder.py, display.py, acp_adapter/tools.py: remove browser_close from all tool lists - tests/: remove browser_close test, update toolset assertion - docs/skills: remove all browser_close references * fix: repeat browser_scroll 5x per call for meaningful page movement Most backends scroll ~100px per call — barely visible on a typical viewport. Repeating 5x gives ~500px (~half a viewport), making each scroll tool call actually useful. Backend-agnostic approach: works across all 7+ browser backends without needing to configure each one's scroll amount individually. Breaks early on error for the agent-browser path. * feat: auto-return compact snapshot from browser_navigate Every browser session starts with navigate → snapshot. Now navigate returns the compact accessibility tree snapshot inline, saving one tool call per browser task. The snapshot captures the full page DOM (not viewport-limited), so scroll position doesn't affect it. browser_snapshot remains available for refreshing after interactions or getting full=true content. Both Browserbase and Camofox paths auto-snapshot. If the snapshot fails for any reason, navigation still succeeds — the snapshot is a bonus, not a requirement. Schema descriptions updated to guide models: navigate mentions it returns a snapshot, snapshot mentions it's for refresh/full content. * refactor: slim cronjob tool schema — consolidate model/provider, drop unused params Session data (151 calls across 67 sessions) showed several schema properties were never used by models. Consolidated and cleaned up: Removed from schema (still work via backend/CLI): - skill (singular): use skills array instead - reason: pause-only, unnecessary - include_disabled: now defaults to true - base_url: extreme edge case, zero usage - provider (standalone): merged into model object Consolidated: - model + provider → single 'model' object with {model, provider} fields. If provider is omitted, the current main provider is pinned at creation time so the job stays stable even if the user changes their default. Kept: - script: useful data collection feature - skills array: standard interface for skill loading Schema shrinks from 14 to 10 properties. All backend functionality preserved — the Python function signature and handler lambda still accept every parameter. * fix: remove mixture_of_agents from core toolsets — opt-in only via hermes tools MoA was in _HERMES_CORE_TOOLS and composite toolsets (hermes-cli, hermes-messaging, safe), which meant it appeared in every session for anyone with OPENROUTER_API_KEY set. The _DEFAULT_OFF_TOOLSETS gate only works after running 'hermes tools' explicitly. Now MoA only appears when a user explicitly enables it via 'hermes tools'. The moa toolset definition and check_fn remain unchanged — it just needs to be opted into.	2026-04-07 03:28:44 -07:00
Yang Zhi	9e844160f9	fix(credential_pool): auto-detect Z.AI endpoint via probe and cache The credential pool seeder and runtime credential resolver hardcoded api.z.ai/api/paas/v4 for all Z.AI keys. Keys on the Coding Plan (or CN endpoint) would hit the wrong endpoint, causing 401/429 errors on the first request even though a working endpoint exists. Add _resolve_zai_base_url() that: - Respects GLM_BASE_URL env var (no probe when explicitly set) - Probes all candidate endpoints (global, cn, coding-global, coding-cn) via detect_zai_endpoint() to find one that returns HTTP 200 - Caches the detected endpoint in provider state (auth.json) keyed on a SHA-256 hash of the API key so subsequent starts skip the probe - Falls back to the default URL if all probes fail Wire into both _seed_from_env() in the credential pool and resolve_api_key_provider_credentials() in the runtime resolver, matching the pattern from the kimi-coding fix (PR #5566). Fixes the same class of bug as #5561 but for the zai provider.	2026-04-07 00:00:08 -07:00
Mateus Scheuer Macedo	f2c11ff30c	fix(delegate): share credential pools with subagents + per-task leasing Cherry-picked from PR #5580 by MestreY0d4-Uninter. - Share parent's credential pool with child agents for key rotation - Leasing layer spreads parallel children across keys (least-loaded) - Thread-safe acquire_lease/release_lease in CredentialPool - Reverted sneaked-in tool-name restoration change (kept original getattr + isinstance guard pattern)	2026-04-06 23:01:11 -07:00
Teknium	888dc1e680	fix: harden auxiliary codex adapter — dict-shaped items + tool call guard (#5734 ) Two remaining gaps from the codex empty-output spec: 1. Normalize dict-shaped streamed items: output_item.done events may yield dicts (raw/fallback paths) instead of SDK objects. The extraction loop now uses _item_get() that handles both getattr and dict .get() access. 2. Avoid plain-text synthesis when function_call events were streamed: tracks has_function_calls during streaming and skips text-delta synthesis when tool calls are present — prevents collapsing a tool-call response into a fake text message.	2026-04-06 21:35:33 -07:00
Teknium	21b48b2ff5	fix: backfill empty codex output in auxiliary client (#5730 ) The _CodexCompletionsAdapter (used for compression, vision, web_extract, session_search, and memory flush when on the codex provider) streamed responses but discarded all events with 'for _event in stream: pass'. When get_final_response() returned empty output (the same chatgpt.com backend-api shape change), auxiliary calls silently returned None content. Now collects response.output_item.done and text deltas during streaming and backfills empty output — same pattern as _run_codex_stream(). Tested live against chatgpt.com/backend-api/codex with OAuth.	2026-04-06 21:13:22 -07:00
Grateful Dave	e5aaa38ca7	fix: sync openai-codex pool entry from ~/.codex/auth.json on exhaustion (#5610 ) OpenAI OAuth refresh tokens are single-use and rotate on every refresh. When the Codex CLI (or another Hermes profile) refreshes its token, the pool entry's refresh_token becomes stale. Subsequent refresh attempts fail with invalid_grant, and the entry enters a 24-hour exhaustion cooldown with no recovery path. This mirrors the existing _sync_anthropic_entry_from_credentials_file() pattern: when an openai-codex entry is exhausted, compare its refresh_token against ~/.codex/auth.json and sync the fresh pair if they differ. Fixes the common scenario where users run 'codex login' to refresh their token externally and Hermes never picks it up. Co-authored-by: David Andrews (LexGenius.ai) <david@lexgenius.ai>	2026-04-06 18:16:56 -07:00
Teknium	8cf013ecd9	fix: replace stale 'hermes login' refs with 'hermes auth' + fix credential removal re-seeding (#5670 ) Two fixes: 1. Replace all stale 'hermes login' references with 'hermes auth' across auth.py, auxiliary_client.py, delegate_tool.py, config.py, run_agent.py, and documentation. The 'hermes login' command was deprecated; 'hermes auth' now handles OAuth credential management. 2. Fix credential removal not persisting for singleton-sourced credentials (device_code for openai-codex/nous, hermes_pkce for anthropic). auth_remove_command already cleared env vars for env-sourced credentials, but singleton credentials stored in the auth store were re-seeded by _seed_from_singletons() on the next load_pool() call. Now clears the underlying auth store entry when removing singleton-sourced credentials.	2026-04-06 17:17:57 -07:00
KangYu	b26e85bf9d	Fix compaction summary retries for temperature-restricted models	2026-04-06 16:49:57 -07:00
Teknium	150f70f821	feat(skills): add skill config interface + llm-wiki skill (#5635 ) Skills can now declare config.yaml settings via metadata.hermes.config in their SKILL.md frontmatter. Values are stored under skills.config.* namespace, prompted during hermes config migrate, shown in hermes config show, and injected into the skill context at load time. Also adds the llm-wiki skill (Karpathy's LLM Wiki pattern) as the first skill to use the new config interface, declaring wiki.path. Skill config interface (new): - agent/skill_utils.py: extract_skill_config_vars(), discover_all_skill_config_vars(), resolve_skill_config_values(), SKILL_CONFIG_PREFIX - agent/skill_commands.py: _inject_skill_config() injects resolved values into skill messages as [Skill config: ...] block - hermes_cli/config.py: get_missing_skill_config_vars(), skill config prompting in migrate_config(), Skill Settings in show_config() LLM Wiki skill (skills/research/llm-wiki/SKILL.md): - Three-layer architecture (raw sources, wiki pages, schema) - Three operations (ingest, query, lint) - Session orientation, page thresholds, tag taxonomy, update policy, scaling guidance, log rotation, archiving workflow Docs: creating-skills.md, configuration.md, skills.md, skills-catalog.md Closes #5100	2026-04-06 13:49:13 -07:00
Teknium	da02a4e283	fix: auxiliary client payment fallback — retry with next provider on 402 (#5599 ) When a user runs out of OpenRouter credits and switches to Codex (or any other provider), auxiliary tasks (compression, vision, web_extract) would still try OpenRouter first and fail with 402. Two fixes: 1. Payment fallback in call_llm(): When a resolved provider returns HTTP 402 or a credit-related error, automatically retry with the next available provider in the auto-detection chain. Skips the depleted provider and tries Nous → Custom → Codex → API-key providers. 2. Remove hardcoded OpenRouter fallback: The old code fell back specifically to OpenRouter when auto/custom resolution returned no client. Now falls back to the full auto-detection chain, which handles any available provider — not just OpenRouter. Also extracts _get_provider_chain() as a shared function (replaces inline tuple in _resolve_auto and the new fallback), built at call time so test patches on _try_* functions remain visible. Adds 16 tests covering _is_payment_error(), _get_provider_chain(), _try_payment_fallback(), and call_llm() integration with 402 retry.	2026-04-06 12:41:40 -07:00
kshitijk4poor	214e60c951	fix: sanitize Telegram command names to strip invalid characters Telegram Bot API requires command names to contain only lowercase a-z, digits 0-9, and underscores. Skill/plugin names containing characters like +, /, @, or . caused set_my_commands to fail with Bot_command_invalid. Two-layer fix: - scan_skill_commands(): strip non-alphanumeric/non-hyphen chars from cmd_key at source, collapse consecutive hyphens, trim edges, skip names that sanitize to empty string - _sanitize_telegram_name(): centralized helper used by all 3 Telegram name generation sites (core commands, plugin commands, skill commands) with empty-name guard at each call site Closes #5534	2026-04-06 11:27:28 -07:00
Teknium	582dbbbbf7	feat: add grok to TOOL_USE_ENFORCEMENT_MODELS for direct xAI usage (#5595 ) Grok models (x-ai/grok-4.20-beta, grok-code-fast-1) now receive tool-use enforcement guidance, steering them to actually call tools instead of describing intended actions. Matches both OpenRouter (x-ai/grok-*) and direct xAI API usage.	2026-04-06 11:22:07 -07:00
Teknium	cc7136b1ac	fix: update Gemini model catalog + wire models.dev as live model source Follow-up for salvaged PR #5494: - Update model catalog to Gemini 3.x + Gemma 4 (drop deprecated 2.0) - Add list_agentic_models() to models_dev.py with noise filter - Wire models.dev into _model_flow_api_key_provider as primary source (static curated list serves as offline fallback) - Add gemini -> google mapping in PROVIDER_TO_MODELS_DEV - Fix Gemma 4 context lengths to 256K (models.dev values) - Update auxiliary model to gemini-3-flash-preview - Expand tests: 3.x catalog, context lengths, models.dev integration	2026-04-06 10:28:03 -07:00
Teknium	6dfab35501	feat(providers): add Google AI Studio (Gemini) as a first-class provider Cherry-picked from PR #5494 by kshitijk4poor. Adds native Gemini support via Google's OpenAI-compatible endpoint. Zero new dependencies.	2026-04-06 10:28:03 -07:00
MestreY0d4-Uninter	6c12999b8c	fix: bridge tool-calls in copilot-acp adapter Enable Hermes tool execution through the copilot-acp adapter by: - Passing tool schemas and tool_choice into the ACP prompt text - Instructing ACP backend to emit <tool_call>{...}</tool_call> blocks - Parsing XML tool-call blocks and bare JSON fallback back into Hermes-compatible SimpleNamespace tool call objects - Setting finish_reason='tool_calls' when tool calls are extracted - Cleaning tool-call markup from response text Fix duplicate tool call extraction when both XML block and bare JSON regexes matched the same content (XML blocks now take precedence). Cherry-picked from PR #4536 by MestreY0d4-Uninter. Stripped heuristic fallback system (auto-synthesized tool calls from prose) and Portuguese-language patterns — tool execution should be model-decided, not heuristic-guessed.	2026-04-06 01:47:57 -07:00
Teknium	9c96f669a1	feat: centralized logging, instrumentation, hermes logs CLI, gateway noise fix (#5430 ) Adds comprehensive logging infrastructure to Hermes Agent across 4 phases: Phase 1 — Centralized logging - New hermes_logging.py with idempotent setup_logging() used by CLI, gateway, and cron - agent.log (INFO+) and errors.log (WARNING+) with RotatingFileHandler + RedactingFormatter - config.yaml logging: section (level, max_size_mb, backup_count) - All entry points wired (cli.py, main.py, gateway/run.py, run_agent.py) - Fixed debug_helpers.py writing to ./logs/ instead of ~/.hermes/logs/ Phase 2 — Event instrumentation - API calls: model, provider, tokens, latency, cache hit % - Tool execution: name, duration, result size (both sequential + concurrent) - Session lifecycle: turn start (session/model/provider/platform), compression (before/after) - Credential pool: rotation events, exhaustion tracking Phase 3 — hermes logs CLI command - hermes logs / hermes logs -f / hermes logs errors / hermes logs gateway - --level, --session, --since filters - hermes logs list (file sizes + ages) Phase 4 — Gateway bug fix + noise reduction - fix: _async_flush_memories() called with wrong arg count — sessions never flushed - Batched session expiry logs: 6 lines/cycle → 2 summary lines - Added inbound message + response time logging 75 new tests, zero regressions on the full suite.	2026-04-06 00:08:20 -07:00
Teknium	9ca954a274	fix: mem0 API v2 compat, prefetch context fencing, secret redaction (#5423 ) Consolidated salvage from PRs #5301 (qaqcvc), #5339 (lance0), #5058 and #5098 (maymuneth). Mem0 API v2 compatibility (#5301): - All reads use filters={user_id: ...} instead of bare user_id= kwarg - All writes use filters with user_id + agent_id for attribution - Response unwrapping for v2 dict format {results: [...]} - Split _read_filters() vs _write_filters() — reads are user-scoped only for cross-session recall, writes include agent_id - Preserved 'hermes-user' default (no breaking change for existing users) - Omitted run_id scoping from #5301 — cross-session memory is Mem0's core value, session-scoping reads would defeat that purpose Memory prefetch context fencing (#5339): - Wraps prefetched memory in <memory-context> fenced blocks with system note marking content as recalled context, NOT user input - Sanitizes provider output to strip fence-escape sequences, preventing injection where memory content breaks out of the fence - API-call-time only — never persisted to session history Secret redaction (#5058, #5098): - Added prefix patterns for Groq (gsk_), Matrix (syt_), RetainDB (retaindb_), Hindsight (hsk-), Mem0 (mem0_), ByteRover (brv_)	2026-04-05 22:43:33 -07:00
Teknium	0efe7dace7	feat: add GPT/Codex execution discipline guidance for tool persistence (#5414 ) Adds OPENAI_MODEL_EXECUTION_GUIDANCE — XML-tagged behavioral guidance injected for GPT and Codex models alongside the existing tool-use enforcement. Targets four specific failure modes: - <tool_persistence>: retry on empty/partial results instead of giving up - <prerequisite_checks>: do discovery/lookup before jumping to final action - <verification>: check correctness/grounding/formatting before finalizing - <missing_context>: use lookup tools instead of hallucinating Follows the same injection pattern as GOOGLE_MODEL_OPERATIONAL_GUIDANCE for Gemini/Gemma models. Inspired by OpenClaw PR #38953 and OpenAI's GPT-5.4 prompting guide patterns.	2026-04-05 21:51:07 -07:00
Teknium	12724e6295	feat: progressive subdirectory hint discovery (#5291 ) As the agent navigates into subdirectories via tool calls (read_file, terminal, search_files, etc.), automatically discover and load project context files (AGENTS.md, CLAUDE.md, .cursorrules) from those directories. Previously, context files were only loaded from the CWD at session start. If the agent moved into backend/, frontend/, or any subdirectory with its own AGENTS.md, those instructions were never seen. Now, SubdirectoryHintTracker watches tool call arguments for file paths and shell commands, resolves directories, and loads hint files on first access. Discovered hints are appended to the tool result so the model gets relevant context at the moment it starts working in a new area — without modifying the system prompt (preserving prompt caching). Features: - Extracts paths from tool args (path, workdir) and shell commands - Loads AGENTS.md, CLAUDE.md, .cursorrules (first match per directory) - Deduplicates — each directory loaded at most once per session - Ignores paths outside the working directory - Truncates large hint files at 8K chars - Works on both sequential and concurrent tool execution paths Inspired by Block/goose SubdirectoryHintTracker.	2026-04-05 12:33:47 -07:00
analista	4a75aec433	fix(gateway): resolve Telegram's underscored /commands to skill/plugin keys Telegram's Bot API disallows hyphens in command names, so _build_telegram_menu registers /claude-code as /claude_code. When the user taps it from autocomplete, the gateway dispatch did a direct lookup against skill_cmds (keyed on the hyphenated form) and missed, silently falling through to the LLM as plain text. The model would then typically call delegate_task, spawning a Hermes subagent instead of invoking the intended skill. Normalize underscores to hyphens in skill and plugin command lookup, matching the existing pattern in _check_unavailable_skill.	2026-04-05 11:59:28 -07:00
Teknium	4976a8b066	feat: /model command — models.dev primary database + --provider flag (#5181 ) Full overhaul of the model/provider system. ## What changed - models.dev (109 providers, 4000+ models) as primary database for provider identity AND model metadata - --provider flag replaces colon syntax for explicit provider switching - Full ModelInfo/ProviderInfo dataclasses with context, cost, capabilities, modalities - HermesOverlay system merges models.dev + Hermes-specific transport/auth/aggregator flags - User-defined endpoints via config.yaml providers: section - /model (no args) lists authenticated providers with curated model catalog - Rich metadata display: context window, max output, cost/M tokens, capabilities - Config migration: custom_providers list → providers dict (v11→v12) - AIAgent.switch_model() for in-place model swap preserving conversation ## Files agent/models_dev.py, hermes_cli/providers.py, hermes_cli/model_switch.py, hermes_cli/model_normalize.py, cli.py, gateway/run.py, run_agent.py, hermes_cli/config.py, hermes_cli/commands.py	2026-04-05 01:04:44 -07:00
kshitijk4poor	4437354198	Preserve numeric credential labels in auth removal Resolve exact label matches before treating digit-only input as a positional index so destructive auth removal does not mis-target credentials named with numeric labels. Constraint: The CLI remove path must keep supporting existing index-based usage while adding safer label targeting Rejected: Ban numeric labels \| labels are free-form and existing users may already rely on them Confidence: high Scope-risk: narrow Reversibility: clean Directive: When a destructive command accepts multiple identifier forms, prefer exact identity matches before fallback parsing heuristics Tested: Focused pytest slice for auth commands, credential pool recovery, and routing (273 passed); py_compile on changed Python files Not-tested: Full repository pytest suite	2026-04-05 00:20:53 -07:00
kshitijk4poor	65952ac00c	Honor provider reset windows in pooled credential failover Persist structured exhaustion metadata from provider errors, use explicit reset timestamps when available, and expose label-based credential targeting in the auth CLI. This keeps long-lived Codex cooldowns from being misreported as one-hour waits and avoids forcing operators to manage entries by list position alone. Constraint: Existing credential pool JSON needs to remain backward compatible with stored entries that only record status code and timestamp Constraint: Runtime recovery must keep the existing retry-then-rotate semantics for 429s while enriching pool state with provider metadata Rejected: Add a separate credential scheduler subsystem \| too large for the Hermes pool architecture and unnecessary for this fix Rejected: Only change CLI formatting \| would leave runtime rotation blind to resets_at and preserve the serial-failure behavior Confidence: high Scope-risk: moderate Reversibility: clean Directive: Preserve structured rate-limit metadata when new providers expose reset hints; do not collapse back to status-code-only exhaustion tracking Tested: Focused pytest slice for auth commands, credential pool recovery, and routing (272 passed); py_compile on changed Python files; hermes -w auth list/remove smoke test with temporary HERMES_HOME Not-tested: Full repository pytest suite, broader gateway/integration flows outside the touched auth and pool paths	2026-04-05 00:20:53 -07:00
Teknium	93aa01c71c	fix: use main provider model for auxiliary tasks on non-aggregator providers (#5091 ) Users on direct API-key providers (Alibaba, DeepSeek, ZAI, etc.) without an OpenRouter or Nous key would get broken auxiliary tasks (compression, vision, etc.) because _resolve_auto() only tried aggregator providers first, then fell back to iterating PROVIDER_REGISTRY with wrong default model names. Now _resolve_auto() checks the user's main provider first. If it's not an aggregator (OpenRouter/Nous), it uses their main model directly for all auxiliary tasks. Aggregator users still get the cheap gemini-flash model as before. Adds _read_main_provider() to read model.provider from config.yaml, mirroring the existing _read_main_model(). Reported by SkyLinx — Alibaba Coding Plan user getting 400 errors from google/gemini-3-flash-preview being sent to DashScope.	2026-04-04 12:07:43 -07:00
LucidPaths	6367e1c4c0	fix: remove stale test skips, fix regex backtracking, file search bug, and test flakiness Bug fixes: - agent/redact.py: catastrophic regex backtracking in _ENV_ASSIGN_RE — removed re.IGNORECASE and changed [A-Z_]* to [A-Z0-9_]* to restrict matching to actual env var name chars. Without this, the pattern backtracks exponentially on large strings (e.g. 100K tool output), causing test_file_read_guards to time out. - tools/file_operations.py: over-escaped newline in find -printf format string produced literal backslash-n instead of a real newline, breaking file search result parsing (total_count always 1, paths concatenated). Test fixes: - Remove stale pytestmark.skip from 4 test modules that were blanket-skipped as 'Hangs in non-interactive environments' but actually run fine: - test_413_compression.py (12 tests, 25s) - test_file_tools_live.py (71 tests, 24s) - test_code_execution.py (61 tests, 99s) - test_agent_loop_tool_calling.py (has proper OPENROUTER_API_KEY skip already) - test_413_compression.py: fix threshold values in 2 preflight compression tests where context_length was too small for the compressed output to fit in one pass. - test_mcp_probe.py: add missing _MCP_AVAILABLE mock so tests work without MCP SDK. - test_mcp_tool_issue_948.py: inject MCP symbols (StdioServerParameters etc.) when SDK is not installed so patch() targets exist. - test_approve_deny_commands.py: replace time.sleep(0.3) with deterministic polling of _gateway_queues — fixes race condition where resolve fires before threads register their approval entries, causing the test to hang indefinitely. Net effect: +256 tests recovered from skip, 8 real failures fixed.	2026-04-04 10:18:57 -07:00
Stefan Vandermeulen	78ec8b017f	style: add debug log for write-back failure in retry path Address review feedback: replace bare `except: pass` with a debug log when the post-retry write-back to ~/.claude/.credentials.json fails. The write-back is best-effort (token is already resolved), but logging helps troubleshooting. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 23:26:08 -07:00
Stefan Vandermeulen	a70ee1b898	fix: sync OAuth tokens between credential pool and credentials file OAuth refresh tokens are single-use. When multiple consumers share the same Anthropic OAuth session (credential pool entries, Claude Code CLI, multiple Hermes profiles), whichever refreshes first invalidates the refresh token for all others. This causes a cascade: 1. Pool entry tries to refresh with a consumed refresh token → 400 2. Pool marks the credential as "exhausted" with a 24-hour cooldown 3. All subsequent heartbeats skip the credential entirely 4. The fallback to resolve_anthropic_token() only works while the access token in ~/.claude/.credentials.json hasn't expired 5. Once it expires, nothing can auto-recover without manual re-login Fix: - Add _sync_anthropic_entry_from_credentials_file() to detect when ~/.claude/.credentials.json has a newer refresh token and sync it into the pool entry, clearing exhaustion status - After a successful pool refresh, write the new tokens back to ~/.claude/.credentials.json so other consumers stay in sync - On refresh failure, check if the credentials file has a different (newer) refresh token and retry once before marking exhausted - In _available_entries(), sync exhausted claude_code entries from the credentials file before applying the 24-hour cooldown, so a manual re-login or external refresh immediately unblocks agents Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 23:26:08 -07:00
acsezen	831067c5d3	perf: fix O(n²) catastrophic backtracking in redact regex + reorder file read guard Two pre-existing issues causing test_file_read_guards timeouts on CI: 1. agent/redact.py: _ENV_ASSIGN_RE used unbounded [A-Z_]* with IGNORECASE, matching any letter/underscore to end-of-string at each position → O(n²) backtracking on 100K+ char inputs. Bounded to {0,50} since env var names are never that long. 2. tools/file_tools.py: redact_sensitive_text() ran BEFORE the character-count guard, so oversized content (that would be rejected anyway) went through the expensive regex first. Reordered to check size limit before redaction.	2026-04-03 22:40:37 -07:00
Teknium	36aace34aa	fix(opencode-go): strip trailing /v1 from base URL for Anthropic models (#4918 ) The Anthropic SDK appends /v1/messages to the base_url, so OpenCode's base URL https://opencode.ai/zen/go/v1 produced a double /v1 path (https://opencode.ai/zen/go/v1/v1/messages), causing 404s for MiniMax models. Strip trailing /v1 when api_mode is anthropic_messages. Also adds MiMo-V2-Pro, MiMo-V2-Omni, and MiniMax-M2.5 to the OpenCode Go model lists per their updated docs. Fixes #4890	2026-04-03 18:47:51 -07:00
Teknium	7def061fee	feat: add arcee-ai/trinity-large-thinking to recommended models Added to OPENROUTER_MODELS and _PROVIDER_MODELS['nous'] lists. Also added 'trinity' family entry to DEFAULT_CONTEXT_LENGTHS (262K).	2026-04-03 13:45:29 -07:00
Teknium	5db630aae4	fix: respect per-platform disabled skills in Telegram menu and gateway dispatch (#4799 ) Three interconnected bugs caused `hermes skills config` per-platform settings to be silently ignored: 1. telegram_menu_commands() never filtered disabled skills — all skills consumed menu slots regardless of platform config, hitting Telegram's 100 command cap. Now loads disabled skills for 'telegram' and excludes them from the menu. 2. Gateway skill dispatch executed disabled skills because get_skill_commands() (process-global cache) only filters by the global disabled list at scan time. Added per-platform check before execution, returning an actionable 'skill is disabled' message. 3. get_disabled_skill_names() only checked HERMES_PLATFORM env var, but the gateway sets HERMES_SESSION_PLATFORM instead. Added HERMES_SESSION_PLATFORM as fallback, plus an explicit platform= parameter for callers that know their platform (menu builder, gateway dispatch). Also added platform to prompt_builder's skills cache key so multi-platform gateways get correct per-platform skill prompts. Reported by SteveSkedasticity (CLAW community).	2026-04-03 10:10:53 -07:00
Teknium	924bc67eee	feat(memory): pluggable memory provider interface with profile isolation, review fixes, and honcho CLI restoration (#4623 ) * feat(memory): add pluggable memory provider interface with profile isolation Introduces a pluggable MemoryProvider ABC so external memory backends can integrate with Hermes without modifying core files. Each backend becomes a plugin implementing a standard interface, orchestrated by MemoryManager. Key architecture: - agent/memory_provider.py — ABC with core + optional lifecycle hooks - agent/memory_manager.py — single integration point in the agent loop - agent/builtin_memory_provider.py — wraps existing MEMORY.md/USER.md Profile isolation fixes applied to all 6 shipped plugins: - Cognitive Memory: use get_hermes_home() instead of raw env var - Hindsight Memory: check $HERMES_HOME/hindsight/config.json first, fall back to legacy ~/.hindsight/ for backward compat - Hermes Memory Store: replace hardcoded ~/.hermes paths with get_hermes_home() for config loading and DB path defaults - Mem0 Memory: use get_hermes_home() instead of raw env var - RetainDB Memory: auto-derive profile-scoped project name from hermes_home path (hermes-<profile>), explicit env var overrides - OpenViking Memory: read-only, no local state, isolation via .env MemoryManager.initialize_all() now injects hermes_home into kwargs so every provider can resolve profile-scoped storage without importing get_hermes_home() themselves. Plugin system: adds register_memory_provider() to PluginContext and get_plugin_memory_providers() accessor. Based on PR #3825. 46 tests (37 unit + 5 E2E + 4 plugin registration). * refactor(memory): drop cognitive plugin, rewrite OpenViking as full provider Remove cognitive-memory plugin (#727) — core mechanics are broken: decay runs 24x too fast (hourly not daily), prefetch uses row ID as timestamp, search limited by importance not similarity. Rewrite openviking-memory plugin from a read-only search wrapper into a full bidirectional memory provider using the complete OpenViking session lifecycle API: - sync_turn: records user/assistant messages to OpenViking session (threaded, non-blocking) - on_session_end: commits session to trigger automatic memory extraction into 6 categories (profile, preferences, entities, events, cases, patterns) - prefetch: background semantic search via find() endpoint - on_memory_write: mirrors built-in memory writes to the session - is_available: checks env var only, no network calls (ABC compliance) Tools expanded from 3 to 5: - viking_search: semantic search with mode/scope/limit - viking_read: tiered content (abstract ~100tok / overview ~2k / full) - viking_browse: filesystem-style navigation (list/tree/stat) - viking_remember: explicit memory storage via session - viking_add_resource: ingest URLs/docs into knowledge base Uses direct HTTP via httpx (no openviking SDK dependency needed). Response truncation on viking_read to prevent context flooding. * fix(memory): harden Mem0 plugin — thread safety, non-blocking sync, circuit breaker - Remove redundant mem0_context tool (identical to mem0_search with rerank=true, top_k=5 — wastes a tool slot and confuses the model) - Thread sync_turn so it's non-blocking — Mem0's server-side LLM extraction can take 5-10s, was stalling the agent after every turn - Add threading.Lock around _get_client() for thread-safe lazy init (prefetch and sync threads could race on first client creation) - Add circuit breaker: after 5 consecutive API failures, pause calls for 120s instead of hammering a down server every turn. Auto-resets after cooldown. Logs a warning when tripped. - Track success/failure in prefetch, sync_turn, and all tool calls - Wait for previous sync to finish before starting a new one (prevents unbounded thread accumulation on rapid turns) - Clean up shutdown to join both prefetch and sync threads * fix(memory): enforce single external memory provider limit MemoryManager now rejects a second non-builtin provider with a warning. Built-in memory (MEMORY.md/USER.md) is always accepted. Only ONE external plugin provider is allowed at a time. This prevents tool schema bloat (some providers add 3-5 tools each) and conflicting memory backends. The warning message directs users to configure memory.provider in config.yaml to select which provider to activate. Updated all 47 tests to use builtin + one external pattern instead of multiple externals. Added test_second_external_rejected to verify the enforcement. * feat(memory): add ByteRover memory provider plugin Implements the ByteRover integration (from PR #3499 by hieuntg81) as a MemoryProvider plugin instead of direct run_agent.py modifications. ByteRover provides persistent memory via the brv CLI — a hierarchical knowledge tree with tiered retrieval (fuzzy text then LLM-driven search). Local-first with optional cloud sync. Plugin capabilities: - prefetch: background brv query for relevant context - sync_turn: curate conversation turns (threaded, non-blocking) - on_memory_write: mirror built-in memory writes to brv - on_pre_compress: extract insights before context compression Tools (3): - brv_query: search the knowledge tree - brv_curate: store facts/decisions/patterns - brv_status: check CLI version and context tree state Profile isolation: working directory at $HERMES_HOME/byterover/ (scoped per profile). Binary resolution cached with thread-safe double-checked locking. All write operations threaded to avoid blocking the agent (curate can take 120s with LLM processing). * fix(memory): thread remaining sync_turns, fix holographic, add config key Plugin fixes: - Hindsight: thread sync_turn (was blocking up to 30s via _run_in_thread) - RetainDB: thread sync_turn (was blocking on HTTP POST) - Both: shutdown now joins sync threads alongside prefetch threads Holographic retrieval fixes: - reason(): removed dead intersection_key computation (bundled but never used in scoring). Now reuses pre-computed entity_residuals directly, moved role_content encoding outside the inner loop. - contradict(): added _MAX_CONTRADICT_FACTS=500 scaling guard. Above 500 facts, only checks the most recently updated ones to avoid O(n^2) explosion (~125K comparisons at 500 is acceptable). Config: - Added memory.provider key to DEFAULT_CONFIG ("" = builtin only). No version bump needed (deep_merge handles new keys automatically). * feat(memory): extract Honcho as a MemoryProvider plugin Creates plugins/honcho-memory/ as a thin adapter over the existing honcho_integration/ package. All 4 Honcho tools (profile, search, context, conclude) move from the normal tool registry to the MemoryProvider interface. The plugin delegates all work to HonchoSessionManager — no Honcho logic is reimplemented. It uses the existing config chain: $HERMES_HOME/honcho.json -> ~/.honcho/config.json -> env vars. Lifecycle hooks: - initialize: creates HonchoSessionManager via existing client factory - prefetch: background dialectic query - sync_turn: records messages + flushes to API (threaded) - on_memory_write: mirrors user profile writes as conclusions - on_session_end: flushes all pending messages This is a prerequisite for the MemoryManager wiring in run_agent.py. Once wired, Honcho goes through the same provider interface as all other memory plugins, and the scattered Honcho code in run_agent.py can be consolidated into the single MemoryManager integration point. * feat(memory): wire MemoryManager into run_agent.py Adds 8 integration points for the external memory provider plugin, all purely additive (zero existing code modified): 1. Init (~L1130): Create MemoryManager, find matching plugin provider from memory.provider config, initialize with session context 2. Tool injection (~L1160): Append provider tool schemas to self.tools and self.valid_tool_names after memory_manager init 3. System prompt (~L2705): Add external provider's system_prompt_block alongside existing MEMORY.md/USER.md blocks 4. Tool routing (~L5362): Route provider tool calls through memory_manager.handle_tool_call() before the catchall handler 5. Memory write bridge (~L5353): Notify external provider via on_memory_write() when the built-in memory tool writes 6. Pre-compress (~L5233): Call on_pre_compress() before context compression discards messages 7. Prefetch (~L6421): Inject provider prefetch results into the current-turn user message (same pattern as Honcho turn context) 8. Turn sync + session end (~L8161, ~L8172): sync_all() after each completed turn, queue_prefetch_all() for next turn, on_session_end() + shutdown_all() at conversation end All hooks are wrapped in try/except — a failing provider never breaks the agent. The existing memory system, Honcho integration, and all other code paths are completely untouched. Full suite: 7222 passed, 4 pre-existing failures. * refactor(memory): remove legacy Honcho integration from core Extracts all Honcho-specific code from run_agent.py, model_tools.py, toolsets.py, and gateway/run.py. Honcho is now exclusively available as a memory provider plugin (plugins/honcho-memory/). Removed from run_agent.py (-457 lines): - Honcho init block (session manager creation, activation, config) - 8 Honcho methods: _honcho_should_activate, _strip_honcho_tools, _activate_honcho, _register_honcho_exit_hook, _queue_honcho_prefetch, _honcho_prefetch, _honcho_save_user_observation, _honcho_sync - _inject_honcho_turn_context module-level function - Honcho system prompt block (tool descriptions, CLI commands) - Honcho context injection in api_messages building - Honcho params from __init__ (honcho_session_key, honcho_manager, honcho_config) - HONCHO_TOOL_NAMES constant - All honcho-specific tool dispatch forwarding Removed from other files: - model_tools.py: honcho_tools import, honcho params from handle_function_call - toolsets.py: honcho toolset definition, honcho tools from core tools list - gateway/run.py: honcho params from AIAgent constructor calls Removed tests (-339 lines): - 9 Honcho-specific test methods from test_run_agent.py - TestHonchoAtexitFlush class from test_exit_cleanup_interrupt.py Restored two regex constants (_SURROGATE_RE, _BUDGET_WARNING_RE) that were accidentally removed during the honcho function extraction. The honcho_integration/ package is kept intact — the plugin delegates to it. tools/honcho_tools.py registry entries are now dead code (import commented out in model_tools.py) but the file is preserved for reference. Full suite: 7207 passed, 4 pre-existing failures. Zero regressions. * refactor(memory): restructure plugins, add CLI, clean gateway, migration notice Plugin restructure: - Move all memory plugins from plugins/<name>-memory/ to plugins/memory/<name>/ (byterover, hindsight, holographic, honcho, mem0, openviking, retaindb) - New plugins/memory/__init__.py discovery module that scans the directory directly, loading providers by name without the general plugin system - run_agent.py uses load_memory_provider() instead of get_plugin_memory_providers() CLI wiring: - hermes memory setup — interactive curses picker + config wizard - hermes memory status — show active provider, config, availability - hermes memory off — disable external provider (built-in only) - hermes honcho — now shows migration notice pointing to hermes memory setup Gateway cleanup: - Remove _get_or_create_gateway_honcho (already removed in prev commit) - Remove _shutdown_gateway_honcho and _shutdown_all_gateway_honcho methods - Remove all calls to shutdown methods (4 call sites) - Remove _honcho_managers/_honcho_configs dict references Dead code removal: - Delete tools/honcho_tools.py (279 lines, import was already commented out) - Delete tests/gateway/test_honcho_lifecycle.py (131 lines, tested removed methods) - Remove if False placeholder from run_agent.py Migration: - Honcho migration notice on startup: detects existing honcho.json or ~/.honcho/config.json, prints guidance to run hermes memory setup. Only fires when memory.provider is not set and not in quiet mode. Full suite: 7203 passed, 4 pre-existing failures. Zero regressions. * feat(memory): standardize plugin config + add per-plugin documentation Config architecture: - Add save_config(values, hermes_home) to MemoryProvider ABC - Honcho: writes to $HERMES_HOME/honcho.json (SDK native) - Mem0: writes to $HERMES_HOME/mem0.json - Hindsight: writes to $HERMES_HOME/hindsight/config.json - Holographic: writes to config.yaml under plugins.hermes-memory-store - OpenViking/RetainDB/ByteRover: env-var only (default no-op) Setup wizard (hermes memory setup): - Now calls provider.save_config() for non-secret config - Secrets still go to .env via env vars - Only memory.provider activation key goes to config.yaml Documentation: - README.md for each of the 7 providers in plugins/memory/<name>/ - Requirements, setup (wizard + manual), config reference, tools table - Consistent format across all providers The contract for new memory plugins: - get_config_schema() declares all fields (REQUIRED) - save_config() writes native config (REQUIRED if not env-var-only) - Secrets use env_var field in schema, written to .env by wizard - README.md in the plugin directory * docs: add memory providers user guide + developer guide New pages: - user-guide/features/memory-providers.md — comprehensive guide covering all 7 shipped providers (Honcho, OpenViking, Mem0, Hindsight, Holographic, RetainDB, ByteRover). Each with setup, config, tools, cost, and unique features. Includes comparison table and profile isolation notes. - developer-guide/memory-provider-plugin.md — how to build a new memory provider plugin. Covers ABC, required methods, config schema, save_config, threading contract, profile isolation, testing. Updated pages: - user-guide/features/memory.md — replaced Honcho section with link to new Memory Providers page - user-guide/features/honcho.md — replaced with migration redirect to the new Memory Providers page - sidebars.ts — added both new pages to navigation * fix(memory): auto-migrate Honcho users to memory provider plugin When honcho.json or ~/.honcho/config.json exists but memory.provider is not set, automatically set memory.provider: honcho in config.yaml and activate the plugin. The plugin reads the same config files, so all data and credentials are preserved. Zero user action needed. Persists the migration to config.yaml so it only fires once. Prints a one-line confirmation in non-quiet mode. * fix(memory): only auto-migrate Honcho when enabled + credentialed Check HonchoClientConfig.enabled AND (api_key OR base_url) before auto-migrating — not just file existence. Prevents false activation for users who disabled Honcho, stopped using it (config lingers), or have ~/.honcho/ from a different tool. * feat(memory): auto-install pip dependencies during hermes memory setup Reads pip_dependencies from plugin.yaml, checks which are missing, installs them via pip before config walkthrough. Also shows install guidance for external_dependencies (e.g. brv CLI for ByteRover). Updated all 7 plugin.yaml files with pip_dependencies: - honcho: honcho-ai - mem0: mem0ai - openviking: httpx - hindsight: hindsight-client - holographic: (none) - retaindb: requests - byterover: (external_dependencies for brv CLI) * fix: remove remaining Honcho crash risks from cli.py and gateway cli.py: removed Honcho session re-mapping block (would crash importing deleted tools/honcho_tools.py), Honcho flush on compress, Honcho session display on startup, Honcho shutdown on exit, honcho_session_key AIAgent param. gateway/run.py: removed honcho_session_key params from helper methods, sync_honcho param, _honcho.shutdown() block. tests: fixed test_cron_session_with_honcho_key_skipped (was passing removed honcho_key param to _flush_memories_for_session). * fix: include plugins/ in pyproject.toml package list Without this, plugins/memory/ wouldn't be included in non-editable installs. Hermes always runs from the repo checkout so this is belt- and-suspenders, but prevents breakage if the install method changes. * fix(memory): correct pip-to-import name mapping for dep checks The heuristic dep.replace('-', '_') fails for packages where the pip name differs from the import name: honcho-ai→honcho, mem0ai→mem0, hindsight-client→hindsight_client. Added explicit mapping table so hermes memory setup doesn't try to reinstall already-installed packages. * chore: remove dead code from old plugin memory registration path - hermes_cli/plugins.py: removed register_memory_provider(), _memory_providers list, get_plugin_memory_providers() — memory providers now use plugins/memory/ discovery, not the general plugin system - hermes_cli/main.py: stripped 74 lines of dead honcho argparse subparsers (setup, status, sessions, map, peer, mode, tokens, identity, migrate) — kept only the migration redirect - agent/memory_provider.py: updated docstring to reflect new registration path - tests: replaced TestPluginMemoryProviderRegistration with TestPluginMemoryDiscovery that tests the actual plugins/memory/ discovery system. Added 3 new tests (discover, load, nonexistent). * chore: delete dead honcho_integration/cli.py and its tests cli.py (794 lines) was the old 'hermes honcho' command handler — nobody calls it since cmd_honcho was replaced with a migration redirect. Deleted tests that imported from removed code: - tests/honcho_integration/test_cli.py (tested _resolve_api_key) - tests/honcho_integration/test_config_isolation.py (tested CLI config paths) - tests/tools/test_honcho_tools.py (tested the deleted tools/honcho_tools.py) Remaining honcho_integration/ files (actively used by the plugin): - client.py (445 lines) — config loading, SDK client creation - session.py (991 lines) — session management, queries, flush * refactor: move honcho_integration/ into the honcho plugin Moves client.py (445 lines) and session.py (991 lines) from the top-level honcho_integration/ package into plugins/memory/honcho/. No Honcho code remains in the main codebase. - plugins/memory/honcho/client.py — config loading, SDK client creation - plugins/memory/honcho/session.py — session management, queries, flush - Updated all imports: run_agent.py (auto-migration), hermes_cli/doctor.py, plugin __init__.py, session.py cross-import, all tests - Removed honcho_integration/ package and pyproject.toml entry - Renamed tests/honcho_integration/ → tests/honcho_plugin/ * docs: update architecture + gateway-internals for memory provider system - architecture.md: replaced honcho_integration/ with plugins/memory/ - gateway-internals.md: replaced Honcho-specific session routing and flush lifecycle docs with generic memory provider interface docs * fix: update stale mock path for resolve_active_host after honcho plugin migration * fix(memory): address review feedback — P0 lifecycle, ABC contract, honcho CLI restore Review feedback from Honcho devs (erosika): P0 — Provider lifecycle: - Remove on_session_end() + shutdown_all() from run_conversation() tail (was killing providers after every turn in multi-turn sessions) - Add shutdown_memory_provider() method on AIAgent for callers - Wire shutdown into CLI atexit, reset_conversation, gateway stop/expiry Bug fixes: - Remove sync_honcho=False kwarg from /btw callsites (TypeError crash) - Fix doctor.py references to dead 'hermes honcho setup' command - Cache prefetch_all() before tool loop (was re-calling every iteration) ABC contract hardening (all backwards-compatible): - Add session_id kwarg to prefetch/sync_turn/queue_prefetch - Make on_pre_compress() return str (provider insights in compression) - Add *kwargs to on_turn_start() for runtime context - Add on_delegation() hook for parent-side subagent observation - Document agent_context/agent_identity/agent_workspace kwargs on initialize() (prevents cron corruption, enables profile scoping) - Fix docstring: single external provider, not multiple Honcho CLI restoration: - Add plugins/memory/honcho/cli.py (from main's honcho_integration/cli.py with imports adapted to plugin path) - Restore full hermes honcho command with all subcommands (status, peer, mode, tokens, identity, enable/disable, sync, peers, --target-profile) - Restore auto-clone on profile creation + sync on hermes update - hermes honcho setup now redirects to hermes memory setup fix(memory): wire on_delegation, skip_memory for cron/flush, fix ByteRover return type - Wire on_delegation() in delegate_tool.py — parent's memory provider is notified with task+result after each subagent completes - Add skip_memory=True to cron scheduler (prevents cron system prompts from corrupting user representations — closes #4052) - Add skip_memory=True to gateway flush agent (throwaway agent shouldn't activate memory provider) - Fix ByteRover on_pre_compress() return type: None -> str * fix(honcho): port profile isolation fixes from PR #4632 Ports 5 bug fixes found during profile testing (erosika's PR #4632): 1. 3-tier config resolution — resolve_config_path() now checks $HERMES_HOME/honcho.json → ~/.hermes/honcho.json → ~/.honcho/config.json (non-default profiles couldn't find shared host blocks) 2. Thread host=_host_key() through from_global_config() in cmd_setup, cmd_status, cmd_identity (--target-profile was being ignored) 3. Use bare profile name as aiPeer (not host key with dots) — Honcho's peer ID pattern is ^[a-zA-Z0-9_-]+$, dots are invalid 4. Wrap add_peers() in try/except — was fatal on new AI peers, killed all message uploads for the session 5. Gate Honcho clone behind --clone/--clone-all on profile create (bare create should be blank-slate) Also: sanitize assistant_peer_id via _sanitize_id() * fix(tests): add module cleanup fixture to test_cli_provider_resolution test_cli_provider_resolution._import_cli() wipes tools.*, cli, and run_agent from sys.modules to force fresh imports, but had no cleanup. This poisoned all subsequent tests on the same xdist worker — mocks targeting tools.file_tools, tools.send_message_tool, etc. patched the NEW module object while already-imported functions still referenced the OLD one. Caused ~25 cascade failures: send_message KeyError, process_registry FileNotFoundError, file_read_guards timeouts, read_loop_detection file-not-found, mcp_oauth None port, and provider_parity/codex_execution stale tool lists. Fix: autouse fixture saves all affected modules before each test and restores them after, matching the pattern in test_managed_browserbase_and_modal.py.	2026-04-02 15:33:51 -07:00
Teknium	d89cc7fec1	feat(prompt): add Google model operational guidance for Gemini and Gemma (#4641 ) Adapted from OpenCode's gemini.txt. Gemini and Gemma models now get structured operational directives alongside tool-use enforcement: absolute paths, verify-before-edit, dependency checks, conciseness, parallel tool calls, non-interactive flags, autonomous execution. Based on PR #4026, extended to cover Gemma models.	2026-04-02 11:52:34 -07:00
Teknium	585855d2ca	fix: preserve Anthropic thinking block signatures across tool-use turns Anthropic extended thinking blocks include an opaque 'signature' field required for thinking chain continuity across multi-turn tool-use conversations. Previously, normalize_anthropic_response() extracted only the thinking text and set reasoning_details=None, discarding the signature. On subsequent turns the API could not verify the chain. Changes: - _to_plain_data(): new recursive SDK-to-dict converter with depth cap (20 levels) and path-based cycle detection for safety - _extract_preserved_thinking_blocks(): rehydrates preserved thinking blocks (including signature) from reasoning_details on assistant messages, placing them before tool_use blocks as Anthropic requires - normalize_anthropic_response(): stores full thinking blocks in reasoning_details via _to_plain_data() - _extract_reasoning(): adds 'thinking' key to the detail lookup chain so Anthropic-format details are found alongside OpenRouter format Salvaged from PR #4503 by @priveperfumes — focused on the thinking block continuity fix only (cache strategy and other changes excluded).	2026-04-02 10:30:32 -07:00
kshitijk4poor	e94b4b2b40	fix: preserve allowed_users during setup reconfigure and quiet unconfigured provider warnings Setup wizard now shows existing allowed_users when reconfiguring a platform and preserves them if the user presses Enter. Previously the wizard would display a misleading "No allowlist set" warning even when the .env still held the original IDs. Also downgrades the "provider X has no API key configured" log from WARNING to DEBUG in resolve_provider_client — callers already handle the None return with their own contextual messages. This eliminates noisy startup warnings for providers in the fallback chain that the user never configured (e.g. minimax).	2026-04-02 01:00:29 -07:00
Ben	647f99d4dd	fix: resolve post-merge issues in auxiliary_client and model flow - Add missing `from agent.credential_pool import load_pool` import to auxiliary_client.py (introduced by the credential pool feature in main) - Thread `args` through `select_provider_and_model(args=None)` so TLS options from `cmd_model` reach `_model_flow_nous` - Mock `_require_tty` in test_cmd_model_forwards_nous_login_tls_options so it can run in non-interactive test environments Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-02 00:50:40 +00:00
Ben Barclay	a2e56d044b	Merge branch 'main' into rewbs/tool-use-charge-to-subscription	2026-04-02 11:00:35 +11:00
Teknium	3628ccc8c4	feat: use 'developer' role for GPT-5 and Codex models (#4498 ) OpenAI's newer models (GPT-5, Codex) give stronger instruction-following weight to the 'developer' role vs 'system'. Swap the role at the API boundary in _build_api_kwargs() for the chat_completions path so internal message representation stays consistent ('system' everywhere). Applies regardless of provider — OpenRouter, Nous portal, direct, etc. The codex_responses path (direct OpenAI) uses 'instructions' instead of message roles, so it's unaffected. DEVELOPER_ROLE_MODELS constant in prompt_builder.py defines the matching model name substrings: ('gpt-5', 'codex').	2026-04-01 14:49:32 -07:00
Leegenux	3ff9e0101d	fix(skill_utils): add type check for metadata field in extract_skill_conditions When PyYAML is unavailable or YAML frontmatter is malformed, the fallback parser may return metadata as a string instead of a dict. This causes AttributeError when calling .get("hermes") on the string. Added explicit type checks to handle cases where metadata or hermes fields are not dicts, preventing the crash. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>	2026-04-01 11:34:56 -07:00
Teknium	c9dc6c4749	fix(insights): show cache tokens in overview so total adds up (#4428 ) The total_tokens field includes cache_read + cache_write tokens, but the display only showed input + output — making the math look wrong (e.g. 765K + 134K displayed but total said 9.2M). Now shows a cache line when cache tokens are present so all visible numbers sum to the displayed total. Affects both terminal (hermes insights) and gateway (/insights) formats.	2026-04-01 03:06:47 -07:00
kshitijk4poor	935137f0d9	feat: add inline diff previews for write actions Show inline diffs in the CLI transcript when write_file, patch, or skill_manage modifies files. Captures a filesystem snapshot before the tool runs, computes a unified diff after, and renders it with ANSI coloring in the activity feed. Adds tool_start_callback and tool_complete_callback hooks to AIAgent for pre/post tool execution notifications. Also fixes _extract_parallel_scope_path to normalize relative paths to absolute, preventing the parallel overlap detection from missing conflicts when the same file is referenced with different path styles. Gated by display.inline_diffs config option (default: true). Based on PR #3774 by @kshitijk4poor.	2026-04-01 02:13:57 -07:00
Teknium	a7f7e87070	fix: preserve credential_pool through smart routing and defer eager fallback on 429 (#4361 ) Three bugs prevented credential pool rotation from working when multiple Codex OAuth tokens were configured: 1. credential_pool was dropped during smart model turn routing. resolve_turn_route() constructed runtime dicts without it, so the AIAgent was created without pool access. Fixed in smart_model_routing.py (no-route and fallback paths), cli.py, and gateway/run.py. 2. Eager fallback fired before pool rotation on 429. The rate-limit handler at line ~7180 switched to a fallback provider immediately, before _recover_with_credential_pool got a chance to rotate to the next credential. Now deferred when the pool still has credentials. 3. (Non-issue) Retry budget was reported as too small, but successful pool rotations already skip retry_count increment — no change needed. Reported by community member Schinsly who identified all three root causes and verified the fix locally with multiple Codex accounts.	2026-04-01 01:02:34 -07:00
Teknium	d3f1987a05	fix(security): add .config/gh to read protection for @file references (#4327 ) Follow-up to PR #4305 — .config/gh was added to the write-deny list but missed from _SENSITIVE_HOME_DIRS, leaving GitHub CLI OAuth tokens exposed via @file:~/.config/gh/hosts.yml context injection.	2026-03-31 12:48:30 -07:00
maymuneth	655eea2db8	fix(security): protect .docker, .azure, and .config/gh from read and write	2026-03-31 12:47:10 -07:00
Dilee	6dcc3330b3	fix(security): add missing GitHub OAuth token patterns and snapshot redact flag - Add gho_, ghu_, ghs_, ghr_ prefix patterns (OAuth, user-to-server, server-to-server, and refresh tokens) — all four types used by GitHub Apps and Copilot auth flows were absent from _PREFIX_PATTERNS - Snapshot HERMES_REDACT_SECRETS at module import time instead of re-reading os.getenv() on every call, preventing runtime env mutations (e.g. LLM-generated export commands) from disabling redaction	2026-03-31 10:30:48 -07:00
Teknium	8d59881a62	feat(auth): same-provider credential pools with rotation, custom endpoint support, and interactive CLI (#2647 ) * feat(auth): add same-provider credential pools and rotation UX Add same-provider credential pooling so Hermes can rotate across multiple credentials for a single provider, recover from exhausted credentials without jumping providers immediately, and configure that behavior directly in hermes setup. - agent/credential_pool.py: persisted per-provider credential pools - hermes auth add/list/remove/reset CLI commands - 429/402/401 recovery with pool rotation in run_agent.py - Setup wizard integration for pool strategy configuration - Auto-seeding from env vars and existing OAuth state Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com> Salvaged from PR #2647 * fix(tests): prevent pool auto-seeding from host env in credential pool tests Tests for non-pool Anthropic paths and auth remove were failing when host env vars (ANTHROPIC_API_KEY) or file-backed OAuth credentials were present. The pool auto-seeding picked these up, causing unexpected pool entries in tests. - Mock _select_pool_entry in auxiliary_client OAuth flag tests - Clear Anthropic env vars and mock _seed_from_singletons in auth remove test * feat(auth): add thread safety, least_used strategy, and request counting - Add threading.Lock to CredentialPool for gateway thread safety (concurrent requests from multiple gateway sessions could race on pool state mutations without this) - Add 'least_used' rotation strategy that selects the credential with the lowest request_count, distributing load more evenly - Add request_count field to PooledCredential for usage tracking - Add mark_used() method to increment per-credential request counts - Wrap select(), mark_exhausted_and_rotate(), and try_refresh_current() with lock acquisition - Add tests: least_used selection, mark_used counting, concurrent thread safety (4 threads × 20 selects with no corruption) * feat(auth): add interactive mode for bare 'hermes auth' command When 'hermes auth' is called without a subcommand, it now launches an interactive wizard that: 1. Shows full credential pool status across all providers 2. Offers a menu: add, remove, reset cooldowns, set strategy 3. For OAuth-capable providers (anthropic, nous, openai-codex), the add flow explicitly asks 'API key or OAuth login?' — making it clear that both auth types are supported for the same provider 4. Strategy picker shows all 4 options (fill_first, round_robin, least_used, random) with the current selection marked 5. Remove flow shows entries with indices for easy selection The subcommand paths (hermes auth add/list/remove/reset) still work exactly as before for scripted/non-interactive use. * fix(tests): update runtime_provider tests for config.yaml source of truth (#4165) Tests were using OPENAI_BASE_URL env var which is no longer consulted after #4165. Updated to use model config (provider, base_url, api_key) which is the new single source of truth for custom endpoint URLs. * feat(auth): support custom endpoint credential pools keyed by provider name Custom OpenAI-compatible endpoints all share provider='custom', making the provider-keyed pool useless. Now pools for custom endpoints are keyed by 'custom:<normalized_name>' where the name comes from the custom_providers config list (auto-generated from URL hostname). - Pool key format: 'custom:together.ai', 'custom:local-(localhost:8080)' - load_pool('custom:name') seeds from custom_providers api_key AND model.api_key when base_url matches - hermes auth add/list now shows custom endpoints alongside registry providers - _resolve_openrouter_runtime and _resolve_named_custom_runtime check pool before falling back to single config key - 6 new tests covering custom pool keying, seeding, and listing * docs: add Excalidraw diagram of full credential pool flow Comprehensive architecture diagram showing: - Credential sources (env vars, auth.json OAuth, config.yaml, CLI) - Pool storage and auto-seeding - Runtime resolution paths (registry, custom, OpenRouter) - Error recovery (429 retry-then-rotate, 402 immediate, 401 refresh) - CLI management commands and strategy configuration Open at: https://excalidraw.com/#json=2Ycqhqpi6f12E_3ITyiwh,c7u9jSt5BwrmiVzHGbm87g * fix(tests): update setup wizard pool tests for unified select_provider_and_model flow The setup wizard now delegates to select_provider_and_model() instead of using its own prompt_choice-based provider picker. Tests needed: - Mock select_provider_and_model as no-op (provider pre-written to config) - Call _stub_tts BEFORE custom prompt_choice mock (it overwrites it) - Pre-write model.provider to config so the pool step is reached * docs: add comprehensive credential pool documentation - New page: website/docs/user-guide/features/credential-pools.md Full guide covering quick start, CLI commands, rotation strategies, error recovery, custom endpoint pools, auto-discovery, thread safety, architecture, and storage format. - Updated fallback-providers.md to reference credential pools as the first layer of resilience (same-provider rotation before cross-provider) - Added hermes auth to CLI commands reference with usage examples - Added credential_pool_strategies to configuration guide * chore: remove excalidraw diagram from repo (external link only) * refactor: simplify credential pool code — extract helpers, collapse extras, dedup patterns - _load_config_safe(): replace 4 identical try/except/import blocks - _iter_custom_providers(): shared generator for custom provider iteration - PooledCredential.extra dict: collapse 11 round-trip-only fields (token_type, scope, client_id, portal_base_url, obtained_at, expires_in, agent_key_id, agent_key_expires_in, agent_key_reused, agent_key_obtained_at, tls) into a single extra dict with __getattr__ for backward-compatible access - _available_entries(): shared exhaustion-check between select and peek - Dedup anthropic OAuth seeding (hermes_pkce + claude_code identical) - SimpleNamespace replaces class _Args boilerplate in auth_commands - _try_resolve_from_custom_pool(): shared pool-check in runtime_provider Net -17 lines. All 383 targeted tests pass. --------- Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>	2026-03-31 03:10:01 -07:00
Teknium	f890a94c12	refactor: make config.yaml the single source of truth for endpoint URLs (#4165 ) OPENAI_BASE_URL was written to .env AND config.yaml, creating a dual-source confusion. Users (especially Docker) would see the URL in .env and assume that's where all config lives, then wonder why LLM_MODEL in .env didn't work. Changes: - Remove all 27 save_env_value("OPENAI_BASE_URL", ...) calls across main.py, setup.py, and tools_config.py - Remove OPENAI_BASE_URL env var reading from runtime_provider.py, cli.py, models.py, and gateway/run.py - Remove LLM_MODEL/HERMES_MODEL env var reading from gateway/run.py and auxiliary_client.py — config.yaml model.default is authoritative - Vision base URL now saved to config.yaml auxiliary.vision.base_url (both setup wizard and tools_config paths) - Tests updated to set config values instead of env vars Convention enforced: .env is for SECRETS only (API keys). All other configuration (model names, base URLs, provider selection) lives exclusively in config.yaml.	2026-03-30 22:02:53 -07:00
Teknium	3a68ec3172	feat: add Fireworks context length detection support (#4158 ) - Add api.fireworks.ai to _URL_TO_PROVIDER for automatic provider detection - Add fireworks to PROVIDER_TO_MODELS_DEV mapped to 'fireworks-ai' (the correct models.dev provider key — original PR used 'fireworks' which would silently fail the lookup) Cherry-picked from PR #3989 with models.dev key fix. Co-authored-by: sroecker <sroecker@users.noreply.github.com>	2026-03-30 20:37:08 -07:00
Teknium	d30ea65c9b	fix: URL-based auth for third-party Anthropic endpoints + CI test fixes (#4148 ) * fix(tests): mock sys.stdin.isatty for cmd_model TTY guard * fix(tests): update camofox snapshot format + trajectory compressor mock path - test_browser_camofox: mock response now uses snapshot format (accessibility tree) - test_trajectory_compressor: mock _get_async_client instead of setting async_client directly * fix: URL-based auth detection for third-party Anthropic endpoints + test fixes Reverts the key-prefix approach from #4093 which broke JWT and managed key OAuth detection. Instead, detects third-party endpoints by URL: if base_url is set and isn't anthropic.com, it's a proxy (Azure AI Foundry, AWS Bedrock, etc.) that uses x-api-key regardless of key format. Auth decision chain is now: 1. _requires_bearer_auth(url) → MiniMax → Bearer 2. _is_third_party_anthropic_endpoint(url) → Azure/Bedrock → x-api-key 3. _is_oauth_token(key) → OAuth on direct Anthropic → Bearer 4. else → x-api-key Also includes test fixes from PR #4051 by @erosika: - Mock sys.stdin.isatty for cmd_model TTY guard - Update camofox snapshot format mock - Fix trajectory compressor async client mock path --------- Co-authored-by: Erosika <eri@plasticlabs.ai>	2026-03-30 20:36:56 -07:00
Teknium	b2e1a095f8	fix(anthropic): write scopes field to Claude Code credentials on token refresh (#4126 ) Claude Code >=2.1.81 checks for a 'scopes' array containing 'user:inference' in ~/.claude/.credentials.json before accepting stored OAuth tokens as valid. When Hermes refreshes the token, it writes only accessToken, refreshToken, and expiresAt — omitting the scopes field. This causes Claude Code to report 'loggedIn: false' and refuse to start, even though the token is valid. This commit: - Parses the 'scope' field from the OAuth refresh response - Passes it to _write_claude_code_credentials() as a keyword argument - Persists the scopes array in the claudeAiOauth credential store - Preserves existing scopes when the refresh response omits the field Tested against Claude Code v2.1.87 on Linux — auth status correctly reports loggedIn: true and claude --print works after this fix. Co-authored-by: Nick <git@flybynight.io>	2026-03-30 18:35:16 -07:00
Teknium	ffd5d37f9b	fix: treat non-sk-ant- keys as regular API keys, not OAuth tokens (#4093 ) * fix: treat non-sk-ant- prefixed keys (Azure AI Foundry) as regular API keys, not OAuth tokens * fix: treat non-sk-ant- keys as regular API keys, not OAuth tokens _is_oauth_token() returned True for any key not starting with sk-ant-api, misclassifying Azure AI Foundry keys as OAuth tokens and sending Bearer auth instead of x-api-key → 401 rejection. Real Anthropic OAuth tokens all start with sk-ant-oat (confirmed from live .credentials.json). Non-sk-ant- keys are third-party provider keys that should use x-api-key. Test fixtures updated to use realistic sk-ant-oat01- prefixed tokens instead of fake strings. Salvaged from PR #4075 by @HangGlidersRule. --------- Co-authored-by: Clawdbot <clawdbot@openclaw.ai>	2026-03-30 17:41:13 -07:00
Robin Fernandes	1126284c97	Merge branch 'main' into rewbs/tool-use-charge-to-subscription	2026-03-31 09:29:43 +09:00
Robin Fernandes	6e4598ce1e	Merge branch 'main' into rewbs/tool-use-charge-to-subscription	2026-03-31 08:48:54 +09:00
Teknium	7b4fe0528f	fix(auth): use bearer auth for MiniMax Anthropic endpoints (#4028 ) MiniMax's /anthropic endpoints implement Anthropic's Messages API but require Authorization: Bearer instead of x-api-key. Without this fix, MiniMax users get 401 errors in gateway sessions. Adds _requires_bearer_auth() to detect MiniMax endpoints and route through auth_token in the Anthropic SDK. Check runs before OAuth token detection so MiniMax keys aren't misclassified as setup tokens. Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>	2026-03-30 13:19:44 -07:00
Teknium	fb634068df	fix(security): extend secret redaction to ElevenLabs, Tavily and Exa API keys (#3920 ) ElevenLabs (sk_), Tavily (tvly-), and Exa (exa_) keys were not covered by _PREFIX_PATTERNS, leaking in plain text via printenv or log output. Salvaged from PR #3790 by @memosr. Tests rewritten with correct assertions (original tests had vacuously true checks). Co-authored-by: memosr <memosr@users.noreply.github.com>	2026-03-30 08:13:01 -07:00
Teknium	1c900c45e3	fix(agent): support full context length resolution for direct Gemini API endpoints (#3876 ) * add .aac audio file format support to transcription tool * fix(agent): support full context length resolution for direct Gemini API endpoints Add generativelanguage.googleapis.com to _URL_TO_PROVIDER so direct Gemini API users get correct 1M+ context length instead of the 128K unknown-proxy fallback. Co-authored-by: bb873 <bb873@users.noreply.github.com> --------- Co-authored-by: Adrian Scott <adrian@adrianscott.com> Co-authored-by: bb873 <bb873@users.noreply.github.com>	2026-03-29 21:56:07 -07:00
Teknium	e296efbf24	fix: add INFO-level logging for auxiliary provider resolution (#3866 ) The auxiliary client's auto-detection chain was a black box — when compression, summarization, or memory flush failed, the only clue was a generic 'Request timed out' with no indication of which provider was tried or why it was skipped. Now logs at INFO level: - 'Auxiliary auto-detect: using local/custom (qwen3.5-9b) — skipped: openrouter, nous' when auto-detection picks a provider - 'Auxiliary compression: using auto (qwen3.5-9b) at http://localhost:11434/v1' before each auxiliary call - 'Auxiliary compression: provider custom unavailable, falling back to openrouter' on fallback - Clear warning with actionable guidance when NO provider is available: 'Set OPENROUTER_API_KEY or configure a local model in config.yaml'	2026-03-29 21:29:00 -07:00
Robin Fernandes	1cbb1b99cc	Gate tool-gateway behind an env var, so it's not in users' faces until we're ready. Even if users enable it, it'll be blocked server-side for now, until we unlock for non-admin users on tool-gateway.	2026-03-30 13:28:10 +09:00
Teknium	3cc50532d1	fix: auxiliary client uses placeholder key for local servers without auth (#3842 ) Local inference servers (Ollama, llama.cpp, vLLM, LM Studio) don't require API keys, but the auxiliary client's _resolve_custom_runtime() rejected endpoints with empty keys — causing the auto-detection chain to skip the user's local server entirely. This broke compression, summarization, and memory flush for users running local models without an OpenRouter/cloud API key. The main CLI already had this fix (PR #2556, 'no-key-required' placeholder), but the auxiliary client's resolution path was missed. Two fixes: - _resolve_custom_runtime(): use 'no-key-required' placeholder instead of returning None when base_url is present but key is empty - resolve_provider_client() custom branch: same placeholder fallback for explicit_base_url without explicit_api_key Updates 2 tests that expected the old (broken) behavior.	2026-03-29 21:05:36 -07:00
Teknium	e314833c9d	feat(display): configurable tool preview length -- show full paths by default (#3841 ) Tool call previews (paths, commands, queries) were hardcoded to truncate at 35-40 chars across CLI spinners, completion lines, and gateway progress messages. Users could not see full file paths in tool output. New config option: display.tool_preview_length (default 0 = no limit). Set a positive number to truncate at that length. Changes: - display.py: module-level _tool_preview_max_len with getter/setter; build_tool_preview() and get_cute_tool_message() _trunc/_path respect it - cli.py: reads config at startup, spinner widget respects config - gateway/run.py: reads config per-message, progress callback respects config - run_agent.py: removed redundant 30-char quiet-mode spinner truncation - config.py: added display.tool_preview_length to DEFAULT_CONFIG Reported by kriskaminski	2026-03-29 18:02:42 -07:00
Teknium	fcd1645223	feat(skills): support external skill directories via config (#3678 ) Add skills.external_dirs config option — a list of additional directories to scan for skills alongside ~/.hermes/skills/. External dirs are read-only: skill creation/editing always writes to the local dir. Local skills take precedence when names collide. This lets users share skills across tools/agents without copying them into Hermes's own directory (e.g. ~/.agents/skills, /shared/team-skills). Changes: - agent/skill_utils.py: add get_external_skills_dirs() and get_all_skills_dirs() - agent/prompt_builder.py: scan external dirs in build_skills_system_prompt() - tools/skills_tool.py: _find_all_skills() and skill_view() search external dirs; security check recognizes configured external dirs as trusted - agent/skill_commands.py: /skill slash commands discover external skills - hermes_cli/config.py: add skills.external_dirs to DEFAULT_CONFIG - cli-config.yaml.example: document the option - tests/agent/test_external_skills.py: 11 tests covering discovery, precedence, deduplication, and skill_view for external skills Requested by community member primco.	2026-03-29 00:33:30 -07:00
Teknium	bea49e02a3	fix: route /bg spinner through TUI widget to prevent status bar collision (#3643 ) Background agent's KawaiiSpinner wrote \r-based animation and stop() messages through StdoutProxy, colliding with prompt_toolkit's status bar. Two fixes: - display.py: use isinstance(out, StdoutProxy) instead of fragile hasattr+name check for detecting prompt_toolkit's stdout wrapper - cli.py: silence bg agent's raw spinner (_print_fn=no-op) and route thinking updates through the TUI widget only when no foreground agent is active; clear spinner text in finally block with same guard Closes #2718 Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>	2026-03-28 17:29:37 -07:00

... 2 3 4 5 6 ...

612 Commits