hermes-agent

Author	SHA1	Message	Date
teknium1	aedb773f0d	fix: stabilize system prompt across gateway turns for cache hits Two changes to prevent unnecessary Anthropic prompt cache misses in the gateway, where a fresh AIAgent is created per user message: 1. Reuse stored system prompt for continuing sessions: When conversation_history is non-empty, load the system prompt from the session DB instead of rebuilding from disk. The model already has updated memory in its conversation history (it wrote it!), so re-reading memory from disk produces a different system prompt that breaks the cache prefix. 2. Stabilize Honcho context per session: - Only prefetch Honcho context on the first turn (empty history) - Bake Honcho context into the cached system prompt and store to DB - Remove the per-turn Honcho injection from the API call loop This ensures the system message is identical across all turns in a session. Previously, re-fetching Honcho could return different context on each turn, changing the system message and invalidating the cache. Both changes preserve the existing behavior for compression (which invalidates the prompt and rebuilds from scratch) and for the CLI (where the same AIAgent persists and the cached prompt is already stable across turns). Tests: 2556 passed (6 new)	2026-03-09 01:50:58 -07:00
teknium1	7af33accf1	fix: apply secret redaction to file tool outputs Terminal output was already redacted via redact_sensitive_text() but read_file and search_files returned raw content. Now both tools redact secrets before returning results to the LLM. Based on PR #372 by @teyrebaz33 (closes #363) — applied manually due to branch conflicts with the current codebase.	2026-03-09 00:49:46 -07:00
teknium1	77da3bbc95	fix: use correct role for summary message in context compressor The summary message was always injected as 'user' role, which causes consecutive user messages when the last preserved head message is also 'user'. Some APIs reject this (400 error), and it produces malformed training data. Fix: check the role of the last head message and pick the opposite role for the summary — 'user' after assistant/tool, 'assistant' after user. Based on PR #328 by johnh4098. Closes #328.	2026-03-08 23:09:04 -07:00
teknium1	35d57ed752	refactor: unified OAuth/API-key credential resolution for fallback Split fallback provider handling into two clean registries: _FALLBACK_API_KEY_PROVIDERS — env-var-based (openrouter, zai, kimi, minimax) _FALLBACK_OAUTH_PROVIDERS — OAuth-based (openai-codex, nous) New _resolve_fallback_credentials() method handles all three cases (OAuth, API key, custom endpoint) and returns a uniform (key, url, mode) tuple. _try_activate_fallback() is now just validation + client build. Adds Nous Portal as a fallback provider — uses the same OAuth flow as the primary provider (hermes login), returns chat_completions mode. OAuth providers get credential refresh for free: the existing 401 retry handlers (_try_refresh_codex/nous_client_credentials) check self.provider, which is set correctly after fallback activation. 4 new tests (nous activation, nous no-login, codex retained). 27 total fallback tests passing, 2548 full suite.	2026-03-08 21:44:48 -07:00
teyrebaz33	1404f846a7	feat(cli,gateway): add user-defined quick commands that bypass agent loop Implements config-driven quick commands for both CLI and gateway that execute locally without invoking the LLM. Config example (~/.hermes/config.yaml): quick_commands: limits: type: exec command: /home/user/.local/bin/hermes-limits dn: type: exec command: echo daily-note Changes: - hermes_cli/config.py: add quick_commands: {} default - cli.py: check quick_commands before skill commands in process_command() - gateway/run.py: check quick_commands before skill commands in _handle_message() - tests/test_quick_commands.py: 11 tests covering exec, timeout, unsupported type, missing command, priority over skills Closes #744	2026-03-09 07:38:06 +03:00
teknium1	5785bd3272	feat: add openai-codex as fallback provider Codex OAuth uses a different auth flow (OAuth tokens, not env vars) and a different API mode (codex_responses, not chat_completions). The fallback now handles this specially: - Resolves credentials via resolve_codex_runtime_credentials() - Sets api_mode to codex_responses - Fails gracefully if no Codex OAuth session exists Also added to the commented-out config.yaml example. 2 new tests (codex activation + graceful failure).	2026-03-08 21:34:15 -07:00
teknium1	67275641f8	fix: unify gateway session hygiene with agent compression config The gateway had a SEPARATE compression system ('session hygiene') with hardcoded thresholds (100k tokens / 200 messages) that were completely disconnected from the model's context length and the user's compression config in config.yaml. This caused premature auto-compression on Telegram/Discord — triggering at ~60k tokens (from the 200-message threshold) or inconsistent token counts. Changes: - Gateway hygiene now reads model name from config.yaml and uses get_model_context_length() to derive the actual context limit - Compression threshold comes from compression.threshold in config.yaml (default 0.85), same as the agent's ContextCompressor - Removed the message-count-based trigger (was redundant and caused false positives in tool-heavy sessions) - Removed the undocumented session_hygiene config section — the standard compression.* config now controls everything - Env var overrides (CONTEXT_COMPRESSION_THRESHOLD, CONTEXT_COMPRESSION_ENABLED) are respected - Warn threshold is now 95% of model context (was hardcoded 200k) - Updated tests to verify model-aware thresholds, scaling across models, and that message count alone no longer triggers compression For claude-opus-4.6 (200k context) at 85% threshold: gateway hygiene now triggers at 170k tokens instead of the old 100k.	2026-03-08 21:30:48 -07:00
Teknium	816a3ef6f1	Merge pull request #745 from NousResearch/hermes/hermes-f8d56335 feat: browser console tool, annotated screenshots, auto-recording, and dogfood QA skill	2026-03-08 21:29:52 -07:00
teknium1	a8bf414f4a	feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill New browser capabilities and a built-in skill for agent-driven web QA. ## New tool: browser_console Returns console messages (log/warn/error/info) AND uncaught JavaScript exceptions in a single call. Uses agent-browser's 'console' and 'errors' commands through the existing session plumbing. Supports --clear to reset buffers. Verified working in both local and Browserbase cloud modes. ## Enhanced tool: browser_vision(annotate=True) New boolean parameter on browser_vision. When true, agent-browser overlays numbered [N] labels on interactive elements — each [N] maps to ref @eN. Annotation data (element name, role, bounding box) returned alongside the vision analysis. Useful for QA reports and spatial reasoning. ## Config: browser.record_sessions Auto-record browser sessions as WebM video files when enabled: - Starts recording on first browser_navigate - Stops and saves on browser_close - Saves to ~/.hermes/browser_recordings/ - Works in both local and cloud modes (verified) - Disabled by default ## Built-in skill: dogfood Systematic exploratory QA testing for web applications. Teaches the agent a 5-phase workflow: 1. Plan — accept URL, create output dirs, set scope 2. Explore — systematic crawl with annotated screenshots 3. Collect Evidence — screenshots, console errors, JS exceptions 4. Categorize — severity (Critical/High/Medium/Low) and category (Functional/Visual/Accessibility/Console/UX/Content) 5. Report — structured markdown with per-issue evidence Includes: - skills/dogfood/SKILL.md — full workflow instructions - skills/dogfood/references/issue-taxonomy.md — severity/category defs - skills/dogfood/templates/dogfood-report-template.md — report template ## Tests 21 new tests covering: - browser_console message/error parsing, clear flag, empty/failed states - browser_console schema registration - browser_vision annotate schema and flag passing - record_sessions config defaults and recording lifecycle - Dogfood skill file existence and content validation Addresses #315.	2026-03-08 21:28:12 -07:00
Teknium	315f3ea429	Merge pull request #740 from NousResearch/hermes/hermes-3cd7c62d feat: simple fallback model for provider resilience (#737)	2026-03-08 21:16:58 -07:00
teyrebaz33	7241e8784a	feat: hermes skills — enable/disable individual skills and categories (#642 ) Add interactive skill configuration via `hermes skills` command, mirroring the existing `hermes tools` pattern. Changes: - hermes_cli/skills_config.py (new): skills_command() entry point with curses checklist UI + numbered fallback. Supports global and per-platform disable lists, individual skill toggle, and category toggle. - hermes_cli/main.py: register `hermes skills` subcommand - tools/skills_tool.py: add _is_skill_disabled() and filter disabled skills in _find_all_skills(). Resolves platform from argument, HERMES_PLATFORM env var, then falls back to global disabled list. Config schema (config.yaml): skills: disabled: [skill-a] # global platform_disabled: telegram: [skill-b] # per-platform override 22 unit tests, 2489 passed, 0 failed. Closes #642	2026-03-09 07:02:06 +03:00
teknium1	b7d6eae64c	fix: Signal adapter parity pass — integration gaps, clawdbot features, env var simplification Integration gaps fixed (7 files missing Signal): - cron/scheduler.py: Signal in platform_map (cron delivery was broken) - agent/prompt_builder.py: PLATFORM_HINTS for Signal (agent knows it's on Signal) - toolsets.py: hermes-signal toolset + added to hermes-gateway composite - hermes_cli/status.py: Signal + Slack in platform status display - tools/send_message_tool.py: Signal example in target description - tools/cronjob_tools.py: Signal in delivery option docs + schema - gateway/channel_directory.py: Signal in session-based channel discovery Clawdbot parity features added to signal.py: - Self-message filtering: prevents reply loops by checking sender != account - SyncMessage filtering: ignores sync envelopes (sent transcripts, read receipts) - Edit message support: reads dataMessage from editMessage envelope - Mention rendering: replaces \uFFFC placeholders with @identifier text - Jitter in SSE reconnection backoff (20% randomization, prevents thundering herd) Env var simplification (7 → 4): - Removed SIGNAL_DM_POLICY (DM auth follows standard platform pattern via SIGNAL_ALLOWED_USERS + DM pairing, same as Telegram/Discord) - Removed SIGNAL_GROUP_POLICY (derived from SIGNAL_GROUP_ALLOWED_USERS: not set = disabled, set with IDs = allowlist, set with * = open) - Removed SIGNAL_DEBUG (was setting root logger, removed entirely) - Remaining: SIGNAL_HTTP_URL, SIGNAL_ACCOUNT (required), SIGNAL_ALLOWED_USERS, SIGNAL_GROUP_ALLOWED_USERS (optional) Updated all docs (website, AGENTS.md, signal.md) to match.	2026-03-08 21:00:21 -07:00
teknium1	b3765c28d0	fix: restrict fallback providers to actual hermes providers Remove hallucinated providers (openai, deepseek, together, groq, fireworks, mistral, gemini, nous) from the fallback provider map. These don't exist in hermes-agent's provider system. The real supported providers for fallback are: openrouter (OPENROUTER_API_KEY) zai (ZAI_API_KEY) kimi-coding (KIMI_API_KEY) minimax (MINIMAX_API_KEY) minimax-cn (MINIMAX_CN_API_KEY) For any other OpenAI-compatible endpoint, users can use the base_url + api_key_env overrides in the config. Also adds Kimi User-Agent header for kimi fallback (matching the main provider system).	2026-03-08 20:49:55 -07:00
teknium1	161436cfdd	feat: simple fallback model for provider resilience When the primary model/provider fails after retries (rate limit, overload, auth errors, connection failures), Hermes automatically switches to a configured fallback model for the remainder of the session. Config (in ~/.hermes/config.yaml): fallback_model: provider: openrouter model: anthropic/claude-sonnet-4 Supports all major providers: OpenRouter, OpenAI, Nous, DeepSeek, Together, Groq, Fireworks, Mistral, Gemini — plus custom endpoints via base_url and api_key_env overrides. Design principles: - Dead simple: one fallback model, not a chain - One-shot: switches once, doesn't ping-pong back - Zero new dependencies: uses existing OpenAI client - Minimal code: ~100 lines in run_agent.py, ~5 lines in cli.py/gateway - Three trigger points: max retries exhausted, non-retryable client errors, and invalid response exhaustion Does NOT trigger on context overflow or payload-too-large errors (those are handled by the existing compression system). Addresses #737. 25 new tests, 2492 total passing.	2026-03-08 20:22:33 -07:00
teknium1	24f549a692	feat: add Signal messenger gateway platform (#405 ) Complete Signal adapter using signal-cli daemon HTTP API. Based on PR #268 by ibhagwan, rebuilt on current main with bug fixes. Architecture: - SSE streaming for inbound messages with exponential backoff (2s→60s) - JSON-RPC 2.0 for outbound (send, typing, attachments, contacts) - Health monitor detects stale SSE connections (120s threshold) - Phone number redaction in all logs and global redact.py Features: - DM and group message support with separate access policies - DM policies: pairing (default), allowlist, open - Group policies: disabled (default), allowlist, open - Attachment download with magic-byte type detection - Typing indicators (8s refresh interval) - 100MB attachment size limit, 8000 char message limit - E.164 phone + UUID allowlist support Integration: - Platform.SIGNAL enum in gateway/config.py - Signal in _is_user_authorized() allowlist maps (gateway/run.py) - Adapter factory in _create_adapter() (gateway/run.py) - user_id_alt/chat_id_alt fields in SessionSource for UUIDs - send_message tool support via httpx JSON-RPC (not aiohttp) - Interactive setup wizard in 'hermes gateway setup' - Connectivity testing during setup (pings /api/v1/check) - signal-cli detection and install guidance Bug fixes from PR #268: - Timestamp reads from envelope_data (not outer wrapper) - Uses httpx consistently (not aiohttp in send_message tool) - SIGNAL_DEBUG scoped to signal logger (not root) - extract_images regex NOT modified (preserves group numbering) - pairing.py NOT modified (no cross-platform side effects) - No dual authorization (adapter defers to run.py for user auth) - Wildcard uses set membership ('*' in set, not list equality) - .zip default for PK magic bytes (not .docx) No new Python dependencies — uses httpx (already core). External requirement: signal-cli daemon (user-installed). Tests: 30 new tests covering config, init, helpers, session source, phone redaction, authorization, and send_message integration. Co-authored-by: ibhagwan <ibhagwan@users.noreply.github.com>	2026-03-08 20:20:35 -07:00
Teknium	7a8778ac73	Merge pull request #732 from NousResearch/hermes/hermes-2cb83eed docs: comprehensive AGENTS.md audit and corrections	2026-03-08 20:10:32 -07:00
teknium1	763c6d104d	fix: unify gateway session hygiene with agent compression config The gateway had a SEPARATE compression system ('session hygiene') with hardcoded thresholds (100k tokens / 200 messages) that were completely disconnected from the model's context length and the user's compression config in config.yaml. This caused premature auto-compression on Telegram/Discord — triggering at ~60k tokens (from the 200-message threshold) or inconsistent token counts. Changes: - Gateway hygiene now reads model name from config.yaml and uses get_model_context_length() to derive the actual context limit - Compression threshold comes from compression.threshold in config.yaml (default 0.85), same as the agent's ContextCompressor - Removed the message-count-based trigger (was redundant and caused false positives in tool-heavy sessions) - Removed the undocumented session_hygiene config section — the standard compression.* config now controls everything - Env var overrides (CONTEXT_COMPRESSION_THRESHOLD, CONTEXT_COMPRESSION_ENABLED) are respected - Warn threshold is now 95% of model context (was hardcoded 200k) - Updated tests to verify model-aware thresholds, scaling across models, and that message count alone no longer triggers compression For claude-opus-4.6 (200k context) at 85% threshold: gateway hygiene now triggers at 170k tokens instead of the old 100k.	2026-03-08 20:08:02 -07:00
teknium1	2d1a1c1c47	refactor: remove redundant 'openai' auxiliary provider, clean up docs The 'openai' provider was redundant — using OPENAI_BASE_URL + OPENAI_API_KEY with provider: 'main' already covers direct OpenAI API. Provider options are now: auto, openrouter, nous, codex, main. - Removed _try_openai(), _OPENAI_AUX_MODEL, _OPENAI_BASE_URL - Replaced openai tests with codex provider tests - Updated all docs to remove 'openai' option and clarify 'main' - 'main' description now explicitly mentions it works with OpenAI API, local models, and any OpenAI-compatible endpoint Tests: 2467 passed.	2026-03-08 18:50:26 -07:00
teknium1	71e81728ac	feat: Codex OAuth vision support + multimodal content adapter The Codex Responses API (chatgpt.com/backend-api/codex) supports vision via gpt-5.3-codex. This was verified with real API calls using image analysis. Changes to _CodexCompletionsAdapter: - Added _convert_content_for_responses() to translate chat.completions multimodal format to Responses API format: - {type: 'text'} → {type: 'input_text'} - {type: 'image_url', image_url: {url: '...'}} → {type: 'input_image', image_url: '...'} - Fixed: removed 'stream' from resp_kwargs (responses.stream() handles it) - Fixed: removed max_output_tokens and temperature (Codex endpoint rejects them) Provider changes: - Added 'codex' as explicit auxiliary provider option - Vision auto-fallback now includes Codex (OpenRouter → Nous → Codex) since gpt-5.3-codex supports multimodal input - Updated docs with Codex OAuth examples Tested with real Codex OAuth token + ~/.hermes/image2.png — confirmed working end-to-end through the full adapter pipeline. Tests: 2459 passed.	2026-03-08 18:44:33 -07:00
Teknium	ebe60646db	Merge pull request #735 from NousResearch/hermes/hermes-f8d56335 fix: allow non-codex-suffixed models (e.g. gpt-5.4) with OpenAI Codex provider	2026-03-08 18:30:27 -07:00
teknium1	f996d7950b	fix: trust user-selected models with OpenAI Codex provider The Codex model normalization was rejecting any model without 'codex' in its name, forcing a fallback to gpt-5.3-codex. This blocked models like gpt-5.4 that the Codex API actually supports. The fix simplifies _normalize_model_for_provider() to two operations: 1. Strip provider prefixes (API needs bare slugs) 2. Replace the untouched default model with a Codex-compatible one If the user explicitly chose a model — any model — we trust them and let the API be the judge. No allowlists, no slug checks. Also removes the 'codex not in slug' filter from _read_cache_models() so the local cache preserves all API-available models. Inspired by OpenClaw's approach which explicitly lists non-codex models (gpt-5.4, gpt-5.2) as valid Codex models.	2026-03-08 18:29:09 -07:00
teknium1	ae4a674c84	feat: add 'openai' as auxiliary provider option Users can now set provider: "openai" for auxiliary tasks (vision, web extract, compression) to use OpenAI's API directly with their OPENAI_API_KEY. This hits api.openai.com/v1 with gpt-4o-mini as the default model — supports vision since GPT-4o handles image input. Provider options are now: auto, openrouter, nous, openai, main. Changes: - agent/auxiliary_client.py: added _try_openai(), "openai" case in _resolve_forced_provider(), updated auxiliary_max_tokens_param() to use max_completion_tokens for OpenAI - Updated docs: cli-config.yaml.example, AGENTS.md, and user-facing configuration.md with Common Setups section showing OpenAI, OpenRouter, and local model examples - 3 new tests for OpenAI provider resolution Tests: 2459 passed (was 2429).	2026-03-08 18:25:30 -07:00
teknium1	5ae0b731d0	fix: harden auxiliary model config — gateway bridge, vision safety, tests Improvements on top of PR #606 (auxiliary model configuration): 1. Gateway bridge: Added auxiliary.* and compression.summary_provider config bridging to gateway/run.py so config.yaml settings work from messaging platforms (not just CLI). Matches the pattern in cli.py. 2. Vision auto-fallback safety: In auto mode, vision now only tries OpenRouter + Nous Portal (known multimodal-capable providers). Custom endpoints, Codex, and API-key providers are skipped to avoid confusing errors from providers that don't support vision input. Explicit provider override (AUXILIARY_VISION_PROVIDER=main) still allows using any provider. 3. Comprehensive tests (46 new): - _get_auxiliary_provider env var resolution (8 tests) - _resolve_forced_provider with all provider types (8 tests) - Per-task provider routing integration (4 tests) - Vision auto-fallback safety (7 tests) - Config bridging logic (11 tests) - Gateway/CLI bridge parity (2 tests) - Vision model override via env var (2 tests) - DEFAULT_CONFIG shape validation (4 tests) 4. Docs: Added auxiliary_client.py to AGENTS.md project structure. Updated module docstring with separate text/vision resolution chains. Tests: 2429 passed (was 2383).	2026-03-08 18:06:47 -07:00
teknium1	d9f373654b	feat: enhance auxiliary model configuration and environment variable handling - Added support for auxiliary model overrides in the configuration, allowing users to specify providers and models for vision and web extraction tasks. - Updated the CLI configuration example to include new auxiliary model settings. - Enhanced the environment variable mapping in the CLI to accommodate auxiliary model configurations. - Improved the resolution logic for auxiliary clients to support task-specific provider overrides. - Updated relevant documentation and comments for clarity on the new features and their usage.	2026-03-08 18:06:47 -07:00
Teknium	0efbb137e8	Merge pull request #734 from NousResearch/hermes/hermes-f8d56335 feat: display previous messages when resuming a session in CLI	2026-03-08 18:06:00 -07:00
0xbyt4	d8df91dfa8	fix: resolve merge conflict with main in clipboard.py	2026-03-09 03:50:29 +03:00
teknium1	f88343a6da	Merge PR #733 : feat: interactive session browser with search filtering (#718 )	2026-03-08 17:47:42 -07:00
teknium1	491605cfea	feat: add high-value tool result hints for patch and search_files (#722 ) Add contextual [Hint: ...] suffixes to tool results where they save real iterations: - patch (no match): suggests read_file/search_files to verify content before retrying — addresses the common pattern where the agent retries with stale old_string instead of re-reading the file. - search_files (truncated): provides explicit next offset and suggests narrowing the search — clearer than relying on total_count inference. Other hints proposed in #722 (terminal, web_search, web_extract, browser_snapshot, search zero-results, search content-matches) were evaluated and found to be low-value: either already covered by existing mechanisms (read_file pagination, similar-files, schema descriptions) or guidance the agent already follows from its own reasoning. 5 new tests covering hint presence/absence for both tools.	2026-03-08 17:46:28 -07:00
teknium1	3aded1d4e5	feat: display previous messages when resuming a session in CLI When resuming a session via --continue or --resume, show a compact recap of the previous conversation inside a Rich panel before the input prompt. This gives users immediate visual context about what was discussed. Changes: - Add _preload_resumed_session() to load session history early (in run(), before banner) so _init_agent() doesn't need a separate DB round-trip - Add _display_resumed_history() that renders a formatted recap panel: * User messages shown with gold bullet (truncated at 300 chars) * Assistant responses shown with green diamond (truncated at 200 chars / 3 lines) * Tool calls collapsed to count + tool names * System messages and tool results hidden * <REASONING_SCRATCHPAD> blocks stripped from display * Pure-reasoning messages (no visible output) skipped entirely * Capped at last 10 exchanges with 'N earlier messages' indicator * Dim/muted styling distinguishes recap from active conversation - Add display.resume_display config option: 'full' (default) or 'minimal' - Store resume_display as instance variable (like compact) for testability - 27 new tests covering all display scenarios, config, and edge cases Closes #719	2026-03-08 17:45:45 -07:00
teknium1	4f0402ed3a	chore: remove all NOUS_API_KEY references NOUS_API_KEY is unused — vision tools use OPENROUTER_API_KEY or Nous Portal OAuth (auth.json), and MoA tools use OPENROUTER_API_KEY. Removed from: - hermes_cli/config.py: api_keys allowlist for config set routing - .env.example: example env file entry and comment - tests/hermes_cli/test_set_config_value.py: parametrize test data - tests/integration/test_web_tools.py: updated comments and log messages to reference 'auxiliary LLM provider' instead of NOUS_API_KEY No HECATE references found in codebase (already cleaned up).	2026-03-08 17:45:38 -07:00
teknium1	ecac6321c4	feat: interactive session browser with search filtering (#718 ) Add `hermes sessions browse` — a curses-based interactive session picker with live type-to-search filtering, arrow key navigation, and seamless session resume via Enter. Features: - Arrow keys to navigate, Enter to select and resume, Esc/q to quit - Type characters to live-filter sessions by title, preview, source, or ID - Backspace to edit filter, first Esc clears filter, second Esc exits - Adaptive column layout (title/preview, last active, source, ID) - Scrolling support for long session lists - --source flag to filter by platform (cli, telegram, discord, etc.) - --limit flag to control how many sessions to load (default: 50) - Windows fallback: numbered list with input prompt - After selection, seamlessly execs into `hermes --resume <id>` Design decisions: - Separate subcommand (not a flag on -c) — preserves `hermes -c` as-is for instant most-recent-session resume - Uses curses (not simple_term_menu) per Known Pitfalls to avoid the arrow-key ghost-duplication rendering bug in tmux/iTerm - Follows existing curses pattern from hermes_cli/tools_config.py Also fixes: removed redundant `import os` inside cmd_sessions stats block that shadowed the module-level import (would cause UnboundLocalError if browse action was taken in the same function). Tests: 33 new tests covering curses picker, fallback mode, filtering, navigation, edge cases, and argument parser registration.	2026-03-08 17:42:50 -07:00
teknium1	97b1c76b14	test: add regression test for #712 (setup wizard codex import) Verifies that setup.py imports the correct function name (get_codex_model_ids) from codex_models.py. This would have caught the ImportError bug before it reached users.	2026-03-08 17:32:52 -07:00
teknium1	c0520223fd	fix: clipboard BMP conversion file loss and broken test Source code (hermes_cli/clipboard.py): - _convert_to_png() lost the file when both Pillow and ImageMagick were unavailable: path.rename(tmp) moved the file to .bmp, then subprocess.run raised FileNotFoundError, but the file was never renamed back. The final fallback 'return path.exists()' returned False. - Fix: restore the original file in both except handlers by renaming tmp back to path when the original is missing. Test (tests/tools/test_clipboard.py): - test_file_still_usable_when_no_converter expected 'from PIL import Image' to raise an Exception, but Pillow is installed so pytest.raises fired 'DID NOT RAISE'. The test also never called _convert_to_png(). - Fix: properly mock PIL unavailability via patch.dict(sys.modules), actually call _convert_to_png(), and assert the correct result.	2026-03-08 17:22:27 -07:00
teknium1	2e73a9e893	Merge PR #704 : fix: initialize Skills Hub before listing skills Authored by PeterFile. Fixes #703.	2026-03-08 17:10:54 -07:00
teknium1	26bb56b775	feat: add /resume command to gateway for switching to named sessions Messaging users can now switch back to previously-named sessions: - /resume My Project — resolves the title (with auto-lineage) and restores that session's conversation history - /resume (no args) — lists recent titled sessions to choose from Adds SessionStore.switch_session() which ends the current session and points the session entry at the target session ID so the old transcript is loaded on the next message. Running agents are cleared on switch. Completes the session naming feature from PR #720 for gateway users. 8 new tests covering: name resolution, lineage auto-latest, already-on- session check, nonexistent names, agent cleanup, no-DB fallback, and listing titled sessions.	2026-03-08 17:09:00 -07:00
teknium1	95b1130485	fix: normalize incompatible models when provider resolves to Codex When _ensure_runtime_credentials() resolves the provider to openai-codex, check if the active model is Codex-compatible. If not (e.g. the default anthropic/claude-opus-4.6), swap it for the best available Codex model. Also strips provider prefixes the Codex API rejects (openai/gpt-5.3-codex → gpt-5.3-codex). Adds _model_is_default flag so warnings are only shown when the user explicitly chose an incompatible model (not when it's the config default). Fixes #651. Co-inspired-by: stablegenius49 (PR #661) Co-inspired-by: teyrebaz33 (PR #696)	2026-03-08 16:48:56 -07:00
teknium1	3fb8938cd3	fix: search_files now reports error for non-existent paths instead of silent empty results Previously, search_files would silently return 0 results when the search path didn't exist (e.g., /root/.hermes/... when HOME is /home/user). The path was passed to rg/grep/find which would fail silently, and the empty stdout was parsed as 'no matches found'. Changes: - Add path existence check at the top of search() using test -e. Returns SearchResult with a clear error message when path doesn't exist. - Add exit code 2 checks in _search_with_rg() and _search_with_grep() as secondary safety net for other error types (bad regex, permissions). - Add 4 new tests covering: nonexistent path (content mode), nonexistent path (files mode), existing path proceeds normally, rg error exit code. Tests: 37 → 41 in test_file_operations.py, full suite 2330 passed.	2026-03-08 16:47:20 -07:00
dmahan93	7791174ced	feat: add --fuck-it-ship-it flag to bypass dangerous command approvals Adds a fun alias for skipping all dangerous command approval prompts. When passed, sets HERMES_YOLO_MODE=1 which causes check_dangerous_command() to auto-approve everything. Available on both top-level and chat subcommand: hermes --fuck-it-ship-it hermes chat --fuck-it-ship-it Includes 5 tests covering normal blocking, yolo bypass, all patterns, and edge cases (empty string env var).	2026-03-08 18:36:37 -05:00
teknium1	34b4fe495e	fix: add title validation — sanitize, length limit, control char stripping - Add SessionDB.sanitize_title() static method: - Strips ASCII control chars (null, bell, ESC, etc.) except whitespace - Strips problematic Unicode controls (zero-width, RTL override, BOM) - Collapses whitespace runs, strips edges - Normalizes empty/whitespace-only to None - Enforces 100 char max length (raises ValueError) - set_session_title() now calls sanitize_title() internally, so all call sites (CLI, gateway, auto-lineage) are protected - CLI /title handler sanitizes early to show correct feedback - Gateway /title handler sanitizes early to show correct feedback - 24 new tests: sanitize_title (17 cases covering control chars, zero-width, RTL, BOM, emoji, CJK, length, integration), gateway validation (too long, control chars, only-control-chars)	2026-03-08 15:54:51 -07:00
teknium1	4fdd6c0dac	fix: harden session title system + add /title to gateway - Empty string titles normalized to None (prevents uncaught IntegrityError when two sessions both get empty-string titles via the unique index) - Escape SQL LIKE wildcards (%, _) in resolve_session_by_title and get_next_title_in_lineage to prevent false matches on titles like 'test_project' matching 'testXproject #2' - Optimize list_sessions_rich from N+2 queries to a single query with correlated subqueries (preview + last_active computed in SQL) - Add /title slash command to gateway (Telegram, Discord, Slack, WhatsApp) with set and show modes, uniqueness conflict handling - Add /title to gateway /help text and _known_commands - 12 new tests: empty string normalization, multi-empty-title safety, SQL wildcard edge cases, gateway /title set/show/conflict/cross-platform	2026-03-08 15:48:09 -07:00
teknium1	60b6abefd9	feat: session naming with unique titles, auto-lineage, rich listing, resume by name - Schema v4: unique title index, migration from v2/v3 - set/get/resolve session titles with uniqueness enforcement - Auto-lineage: context compression auto-numbers titles (Task -> Task #2 -> Task #3) - resolve_session_by_title: auto-latest finds most recent continuation - list_sessions_rich: preview (first 60 chars) + last_active timestamp - CLI: -c accepts optional name arg (hermes -c 'my project') - CLI: /title command with deferred mode (set before session exists) - CLI: sessions list shows Title, Preview, Last Active, ID - 27 new tests (1844 total passing)	2026-03-08 15:20:29 -07:00
0xbyt4	0c3253a485	fix: mock asyncio.run in mirror test to prevent event loop destruction asyncio.run() closes the event loop after execution, which breaks subsequent tests using asyncio.get_event_loop() (test_send_image_file).	2026-03-09 00:20:19 +03:00
0xbyt4	d0f84c0964	fix: log exceptions instead of silently swallowing in cron scheduler Two 'except Exception: pass' blocks silently hide failures: - mirror_to_session failure: user's message never gets mirrored, no trace - config.yaml parse failure: wrong model used silently Replace with logger.warning so failures are visible in logs.	2026-03-09 00:06:34 +03:00
0xbyt4	67421ed74f	fix: update test_non_empty_has_markers to match todo filtering behavior Completed/cancelled items are now filtered from format_for_injection() output. Update the existing test to verify active items appear and completed items are excluded.	2026-03-08 23:07:38 +03:00
0xbyt4	e2fe1373f3	fix: escalate read/search blocking, track search loops, filter completed todos - Block file reads after 3+ re-reads of same region (no content returned) - Track search_files calls and block repeated identical searches - Filter completed/cancelled todos from post-compression injection to prevent agent from re-doing finished work - Add 10 new tests covering all three fixes	2026-03-08 23:01:21 +03:00
0xbyt4	9eee529a7f	fix: detect and warn on file re-read loops after context compression When context compression summarizes conversation history, the agent loses track of which files it already read and re-reads them in a loop. Users report the agent reading the same files endlessly without writing. Root cause: context compression is lossy — file contents and read history are lost in the summary. After compression, the model thinks it hasn't examined the files yet and reads them again. Fix (two-part): 1. Track file reads per task in file_tools.py. When the same file region is read again, include a _warning in the response telling the model to stop re-reading and use existing information. 2. After context compression, inject a structured message listing all files already read in the session with explicit "do NOT re-read" instruction, preserving read history across compression boundaries. Adds 16 tests covering warning detection, task isolation, summary accuracy, tracker cleanup, and compression history injection.	2026-03-08 20:44:42 +03:00
Verne	333e4abe30	fix: Initialize Skills Hub on list Call ensure_hub_dirs() at the start of hermes skills list so the\nSkills Hub directory structure is created before reading hub\nmetadata.\n\nAdd a regression test covering the empty-home path where\ndoctor recommends running the list command.\n\nRefs: #703	2026-03-09 01:43:59 +08:00
teknium1	cd77c7100c	Merge PR #648 : test: add regression coverage for compressor tool-call boundaries Authored by intertwine. Related to #647.	2026-03-08 06:46:50 -07:00
teknium1	cf810c2950	fix: pre-process CLI clipboard images through vision tool instead of raw embedding Images pasted in the CLI were embedded as raw base64 image_url content parts in the conversation history, which only works with vision-capable models. If the main model (e.g. Nous API) doesn't support vision, this breaks the request and poisons all subsequent messages. Now the CLI uses the same approach as the messaging gateway: images are pre-processed through the auxiliary vision model (Gemini Flash via OpenRouter or Nous Portal) and converted to text descriptions. The local file path is included so the agent can re-examine via vision_analyze if needed. Works with any model. Fixes #638.	2026-03-08 06:22:00 -07:00
teknium1	a23bcb81ce	fix: improve /model user feedback + update docs User messaging improvements: - Rejection: '(>_<) Error: not a valid model' instead of '(^_^) Warning: Error:' - Rejection: shows 'Model unchanged' + tip about /model and /provider - Session-only: explains 'this session only' with reason and 'will revert on restart' - Saved: clear '(saved to config)' confirmation Docs updated: - cli-commands.md, cli.md, messaging/index.md: /model now shows provider:model syntax, /provider command added to tables Test fixes: deduplicated test names, assertions match new messages.	2026-03-08 06:13:12 -07:00
stablegenius49	d07d867718	Fix empty tool selection persistence	2026-03-08 06:11:18 -07:00
teknium1	666f2dd486	feat: /provider command + fix gateway bugs + harden parse_model_input /provider command (CLI + gateway): Shows all providers with auth status (✓/✗), aliases, and active marker. Users can now discover what provider names work with provider:model syntax. Gateway bugs fixed: - Config was saved even when validation.persist=False (told user 'session only' but actually persisted the unvalidated model) - HERMES_INFERENCE_PROVIDER env var not set on provider switch, causing the switch to be silently overridden if that env var was already set parse_model_input hardened: - Colon only treated as provider delimiter if left side is a recognized provider name or alias. 'anthropic/claude-3.5-sonnet:beta' now passes through as a model name instead of trying provider='anthropic/claude-3.5-sonnet'. - HTTP URLs, random colons no longer misinterpreted. 56 tests passing across model validation, CLI commands, and integration.	2026-03-08 06:09:36 -07:00
teknium1	66d3e6a0c2	feat: provider switching via /model + enhanced model display Add provider:model syntax to /model command for runtime provider switching: /model zai:glm-5 → switch to Z.AI provider with glm-5 /model nous:hermes-3 → switch to Nous Portal with hermes-3 /model openrouter:anthropic/claude-sonnet-4.5 → explicit OpenRouter When switching providers, credentials are resolved via resolve_runtime_provider and validated before committing. Both model and provider are saved to config. Provider aliases work (glm: → zai, kimi: → kimi-coding, etc.). Enhanced /model (no args) display now shows: - Current model and provider - Curated model list for the current provider with ← marker - Usage examples including provider:model syntax 39 tests covering parse_model_input, curated_models_for_provider, provider switching (success + credential failure), and display output.	2026-03-08 05:45:59 -07:00
teknium1	4a09ae2985	chore: remove dead module stubs from test_cli_init.py The 200 lines of prompt_toolkit/rich/fire stubs added in PR #650 were guarded by 'if module in sys.modules: return' and never activated since those dependencies are always installed. Removed to keep the test file lean. Also removed unused MagicMock and pytest imports.	2026-03-08 05:35:02 -07:00
teknium1	8c734f2f27	fix: remove OpenRouter '/' format enforcement — let API probe be the authority Not all providers require 'provider/model' format. Removing the rigid format check lets the live API probe handle all validation uniformly. If someone types 'gpt-5.4' on OpenRouter, the probe won't find it and will suggest 'openai/gpt-5.4' — better UX than a format rejection.	2026-03-08 05:31:41 -07:00
teknium1	245d174359	feat: validate /model against live API instead of hardcoded lists Replace the static catalog-based model validation with a live API probe. The /model command now hits the provider's /models endpoint to check if the requested model actually exists: - Model found in API → accepted + saved to config - Model NOT found in API → rejected with 'Error: not a valid model' and fuzzy-match suggestions from the live model list - API unreachable → graceful fallback to hardcoded catalog (session-only for unrecognized models) - Format errors (empty, spaces, missing '/') still caught instantly without a network call The API probe takes ~0.2s for OpenRouter (346 models) and works with any OpenAI-compatible endpoint (Ollama, vLLM, custom, etc.). 32 tests covering all paths: format checks, API found, API not found, API unreachable fallback, CLI integration.	2026-03-08 05:22:20 -07:00
stablegenius49	77f47768dd	fix: improve /history message display	2026-03-08 05:08:57 -07:00
teknium1	90fa9e54ca	fix: guard validate_requested_model + expand test coverage (PR #649 follow-up) - Wrap validate_requested_model in try/except so /model doesn't crash if validation itself fails (falls back to old accept+save behavior) - Remove unnecessary sys.path.insert from both test files - Expand test_model_validation.py: 4 → 23 tests covering normalize_provider, provider_model_ids, empty/whitespace/spaces rejection, OpenRouter format validation, custom endpoints, nous provider, provider aliases, unknown providers, fuzzy suggestions - Expand test_cli_model_command.py: 2 → 5 tests adding known-model save, validation crash fallback, and /model with no argument	2026-03-08 04:47:35 -07:00
stablegenius49	9d3a44e0e8	fix: validate /model values before saving	2026-03-08 04:47:35 -07:00
Teknium	b8120df860	Revert "feat: skill prerequisites — hide skills with unmet runtime dependencies"	2026-03-08 03:58:13 -07:00
teknium1	0df7df52f3	test: expand slash command autocomplete coverage (PR #645 follow-up) - Fix failing test: use display_text/display_meta_text instead of str() on prompt_toolkit FormattedText objects - Add regression guard: EXPECTED_COMMANDS set ensures no command silently disappears from the shared dict - Add edge case tests: non-slash input, empty input, partial vs exact match trailing space, builtin display_meta content - Add skill provider tests: None provider, exception swallowing, description truncation at 50 chars, missing description fallback, exact-match trailing space on skill commands - Total: 15 tests (up from 4)	2026-03-08 03:53:22 -07:00
stablegenius49	bfa27d0a68	fix(cli): unify slash command autocomplete registry	2026-03-08 03:53:22 -07:00
teknium1	5a20c486e3	Merge PR #659 : feat: skill prerequisites — hide skills with unmet runtime dependencies Authored by kshitijk4poor. Fixes #630.	2026-03-08 03:12:35 -07:00
kshitij	f210510276	feat: add prerequisites field to skill spec — hide skills with unmet dependencies Skills can now declare runtime prerequisites (env vars, CLI binaries) via YAML frontmatter. Skills with unmet prerequisites are excluded from the system prompt so the agent never claims capabilities it can't deliver, and skill_view() warns the agent about what's missing. Three layers of defense: - build_skills_system_prompt() filters out unavailable skills - _find_all_skills() flags unmet prerequisites in metadata - skill_view() returns prerequisites_warning with actionable details Tagged 12 bundled skills that have hard runtime dependencies: gif-search (TENOR_API_KEY), notion (NOTION_API_KEY), himalaya, imessage, apple-notes, apple-reminders, openhue, duckduckgo-search, codebase-inspection, blogwatcher, songsee, mcporter. Closes #658 Fixes #630	2026-03-08 13:19:32 +05:30
teknium1	19b6f81ee7	fix: allow Anthropic API URLs as custom OpenAI-compatible endpoints Removed the hard block on base_url containing 'api.anthropic.com'. Anthropic now offers an OpenAI-compatible /chat/completions endpoint, so blocking their URL prevents legitimate use. If the endpoint isn't compatible, the API call will fail with a proper error anyway. Removed from: run_agent.py, mini_swe_runner.py Updated test to verify Anthropic URLs are accepted.	2026-03-07 23:36:35 -08:00
teknium1	b8c3bc7841	feat: browser screenshot sharing via MEDIA: on all messaging platforms browser_vision now saves screenshots persistently to ~/.hermes/browser_screenshots/ and returns the screenshot_path in its JSON response. The model can include MEDIA:<path> in its response to share screenshots as native photos. Changes: - browser_tool.py: Save screenshots persistently, return screenshot_path, auto-cleanup files older than 24 hours, mkdir moved inside try/except - telegram.py: Add send_image_file() — sends local images via bot.send_photo() - discord.py: Add send_image_file() — sends local images via discord.File - slack.py: Add send_image_file() — sends local images via files_upload_v2() (WhatsApp already had send_image_file — no changes needed) - prompt_builder.py: Updated Telegram hint to list image extensions, added Discord and Slack MEDIA: platform hints - browser.md: Document screenshot sharing and 24h cleanup - send_file_integration_map.md: Updated to reflect send_image_file is now implemented on Telegram/Discord/Slack - test_send_image_file.py: 19 tests covering MEDIA: .png extraction, send_image_file on all platforms, and screenshot cleanup Partially addresses #466 (Phase 0: platform adapter gaps for send_image_file).	2026-03-07 22:57:05 -08:00
teknium1	dfd37a4b31	Merge PR #635 : fix: add Kimi Code API support (api.kimi.com/coding/v1) Authored by christomitov. Auto-detects sk-kimi- key prefix and routes to api.kimi.com/coding/v1. Adds User-Agent header for Kimi Code API compatibility. Legacy Moonshot keys continue to work unchanged.	2026-03-07 21:45:27 -08:00
teknium1	4be783446a	fix: wire worktree flag into hermes CLI entry point + docs + tests Critical fixes: - Add --worktree/-w to hermes_cli/main.py argparse (both chat subcommand and top-level parser) so 'hermes -w' works via the actual CLI entry point, not just 'python cli.py -w' - Pass worktree flag through cmd_chat() kwargs to cli_main() - Handle worktree attr in bare 'hermes' and --resume/--continue paths Bug fixes in cli.py: - Skip worktree creation for --list-tools/--list-toolsets (wasteful) - Wrap git worktree subprocess.run in try/except (crash on timeout) - Add stale worktree pruning on startup (_prune_stale_worktrees): removes clean worktrees older than 24h left by crashed/killed sessions Documentation updates: - AGENTS.md: add --worktree to CLI commands table - cli-config.yaml.example: add worktree config section - website/docs/reference/cli-commands.md: add to core commands - website/docs/user-guide/cli.md: add usage examples - website/docs/user-guide/configuration.md: add config docs Test improvements (17 → 31 tests): - Stale worktree pruning (prune old clean, keep recent, keep dirty) - Directory symlink via .worktreeinclude - Edge cases (no commits, not a repo, pre-existing .worktrees/) - CLI flag/config OR logic - TERMINAL_CWD integration - System prompt injection format	2026-03-07 21:05:40 -08:00
teknium1	8d719b180a	feat: git worktree isolation for parallel CLI sessions (--worktree / -w) Add a --worktree (-w) flag to the hermes CLI that creates an isolated git worktree for the session. This allows running multiple hermes-agent instances concurrently on the same repo without file collisions. How it works: - On startup with -w: detects git repo, creates .worktrees/<session>/ with its own branch (hermes/<session-id>), sets TERMINAL_CWD to it - Each agent works in complete isolation — independent HEAD, index, and working tree, shared git object store - On exit: auto-removes worktree and branch if clean, warns and keeps if there are uncommitted changes - .worktreeinclude file support: list gitignored files (.env, .venv/) to auto-copy/symlink into new worktrees - .worktrees/ is auto-added to .gitignore - Agent gets a system prompt note about the worktree context - Config support: set worktree: true in config.yaml to always enable Usage: hermes -w # Interactive mode in worktree hermes -w -q "Fix issue #123" # Single query in worktree # Or in config.yaml: worktree: true Includes 17 tests covering: repo detection, worktree creation, independence verification, cleanup (clean/dirty), .worktreeinclude, .gitignore management, and 10 concurrent worktrees. Closes #652	2026-03-07 20:51:08 -08:00
teknium1	c5a9d1ef9d	Merge branch 'main' into pr-635	2026-03-07 20:36:42 -08:00
teknium1	c7b6f423c7	feat: auto-compress pathologically large gateway sessions (#628 ) Long-lived gateway sessions can accumulate enough history that every new message rehydrates an oversized transcript, causing repeated truncation failures (finish_reason=length). Add a session hygiene check in _handle_message that runs right after loading the transcript and before invoking the agent: 1. Estimate message count and rough token count of the transcript 2. If above configurable thresholds (default: 200 msgs or 100K tokens), auto-compress the transcript proactively 3. Notify the user about the compression with before/after stats 4. If still above warn threshold (default: 200K tokens) after compression, suggest /reset 5. If compression fails on a dangerously large session, warn the user to use /compress or /reset manually Thresholds are configurable via config.yaml: session_hygiene: auto_compress_tokens: 100000 auto_compress_messages: 200 warn_tokens: 200000 This complements the agent's existing preflight compression (which runs inside run_conversation) by catching pathological sessions at the gateway layer before the agent is even created. Includes 12 tests for threshold detection and token estimation.	2026-03-07 20:09:48 -08:00
Bryan Young	fcde9be10d	fix: keep tool-call output runs intact during compression	2026-03-08 03:13:14 +00:00
Christo Mitov	4447e7d71a	fix: add Kimi Code API support (api.kimi.com/coding/v1) Kimi Code (platform.kimi.ai) issues API keys prefixed sk-kimi- that require: 1. A different base URL: api.kimi.com/coding/v1 (not api.moonshot.ai/v1) 2. A User-Agent header identifying a recognized coding agent Without this fix, sk-kimi- keys fail with 401 (wrong endpoint) or 403 ('only available for Coding Agents') errors. Changes: - Auto-detect sk-kimi- key prefix and route to api.kimi.com/coding/v1 - Send User-Agent: KimiCLI/1.0 header for Kimi Code endpoints - Legacy Moonshot keys (api.moonshot.ai) continue to work unchanged - KIMI_BASE_URL env var override still takes priority over auto-detection - Updated .env.example with correct docs and all endpoint options - Fixed doctor.py health check for Kimi Code keys Reference: https://github.com/MoonshotAI/kimi-cli (platforms.py)	2026-03-07 21:00:12 -05:00
teknium1	faab73ad58	Merge PR #573 : fix(doctor): detect OpenAI custom endpoint env settings Authored by stablegenius49. Fixes #572.	2026-03-07 16:16:08 -08:00
vincent	86eed141af	fix: rebuild compressed payload before retry	2026-03-07 18:55:01 -05:00
teknium1	24f6a193e7	fix: remove stale 'model' assertion from delegate_task schema test The 'model' property was removed from DELEGATE_TASK_SCHEMA but the test still asserted its presence, causing CI to fail.	2026-03-07 11:29:55 -08:00
teknium1	d80c30cc92	feat(gateway): proactive async memory flush on session expiry Previously, when a session expired (idle/daily reset), the memory flush ran synchronously inside get_or_create_session — blocking the user's message for 10-60s while an LLM call saved memories. Now a background watcher task (_session_expiry_watcher) runs every 5 min, detects expired sessions, and flushes memories proactively in a thread pool. By the time the user sends their next message, memories are already saved and the response is immediate. Changes: - Add _is_session_expired(entry) to SessionStore — works from entry alone without needing a SessionSource - Add _pre_flushed_sessions set to track already-flushed sessions - Remove sync _on_auto_reset callback from get_or_create_session - Refactor flush into _flush_memories_for_session (sync worker) + _async_flush_memories (thread pool wrapper) - Add _session_expiry_watcher background task, started in start() - Simplify /reset command to use shared fire-and-forget flush - Add 10 tests for expiry detection, callback removal, tracking	2026-03-07 11:27:50 -08:00
teknium1	b84f9e410c	feat: default reasoning effort from xhigh to medium Reduces token usage and latency for most tasks by defaulting to medium reasoning effort instead of xhigh. Users can still override via config or CLI flag. Updates code, tests, example config, and docs.	2026-03-07 10:14:19 -08:00
0xbyt4	ee7d8c56c7	fix: prevent data loss in clipboard PNG conversion when ImageMagick fails _convert_to_png() renamed the original file to .bmp before calling ImageMagick convert, then unconditionally deleted the .bmp regardless of whether convert succeeded. If convert failed, both files were gone. - Only delete .bmp after confirmed successful conversion - Restore original file on convert failure, timeout, or missing binary - Add 3 tests covering failure, not-installed, and timeout scenarios	2026-03-07 20:02:12 +03:00
0xbyt4	451a007fb1	fix(tests): isolate max_turns tests from CI env and update default to 90 _make_cli() did not clear HERMES_MAX_ITERATIONS env var, so tests failed in CI where the var was set externally. Also, default max_turns changed from 60 to 90 in `0a82396` but tests were not updated. - Clear HERMES_MAX_ITERATIONS in _make_cli() for proper isolation - Add env_overrides parameter for tests that need specific env values - Update hardcoded 60 assertions to 90 to match new default - Simplify test_env_var_max_turns using env_overrides	2026-03-07 19:43:20 +03:00
0xbyt4	5cdcb9e26f	fix: strip MarkdownV2 italic markers in Telegram plaintext fallback When MarkdownV2 parsing fails, _strip_mdv2() removes escape backslashes and bold markers (text) but missed italic markers (_text_). Users saw raw underscores around italic text in the plaintext fallback. - Add regex to strip _text_ italic markers in _strip_mdv2() - Use word boundary lookaround to preserve snake_case identifiers - Add tests for _strip_mdv2 covering italic, bold, snake_case, and edge cases	2026-03-07 18:55:25 +03:00
teknium1	f668e9fc75	feat: platform-conditional skill loading + Apple/macOS skills Add a 'platforms' field to SKILL.md frontmatter that restricts skills to specific operating systems. Skills with platforms: [macos] only appear in the system prompt, skills_list(), and slash commands on macOS. Skills without the field load everywhere (backward compatible). Implementation: - skill_matches_platform() in tools/skills_tool.py — core filter - Wired into all 3 discovery paths: prompt_builder.py, skills_tool.py, skill_commands.py - 28 new tests across 3 test files New bundled Apple/macOS skills (all platforms: [macos]): - imessage — Send/receive iMessages via imsg CLI - apple-reminders — Manage Reminders via remindctl CLI - apple-notes — Manage Notes via memo CLI - findmy — Track devices/AirTags via AppleScript + screen capture Docs updated: CONTRIBUTING.md, AGENTS.md, creating-skills.md, skills.md (user guide)	2026-03-07 00:47:54 -08:00
teknium1	69a36a3361	Merge PR #309 : fix(timezone): timezone-aware now() for prompt, cron, and execute_code Authored by areu01or00. Adds timezone support via hermes_time.now() helper with IANA timezone resolution (HERMES_TIMEZONE env → config.yaml → server-local). Updates system prompt timestamp, cron scheduling, and execute_code sandbox TZ injection. Includes config migration (v4→v5) and comprehensive test coverage.	2026-03-07 00:04:41 -08:00
stablegenius49	5609117882	fix(doctor): recognize OPENAI_API_KEY custom endpoint config	2026-03-06 19:47:09 -08:00
Tyler	53b4b7651a	Add official OpenClaw migration skill for Hermes Agent Introduces a new OpenClaw-to-Hermes migration skill with a Python helper script that handles importing SOUL.md, memories, user profiles, messaging settings, command allowlists, skills, TTS assets, and workspace instructions. Supports two migration presets (user-data / full), three skill conflict modes (skip / overwrite / rename), overflow file export for entries that exceed character limits, and granular include/exclude option filtering. Includes detailed SKILL.md agent instructions covering the clarify-tool interaction protocol, decision-to-command mapping, post-run reporting rules, and path resolution guidance. Adds dynamic panel width calculation to CLI clarify/approval widgets so panels adapt to content and terminal size. Includes 7 new tests covering presets, include/exclude, conflict modes, overflow exports, and skills_guard integration.	2026-03-06 18:57:12 -08:00
teknium1	388dd4789c	feat: add z.ai/GLM, Kimi/Moonshot, MiniMax as first-class providers Adds 4 new direct API-key providers (zai, kimi-coding, minimax, minimax-cn) to the inference provider system. All use standard OpenAI-compatible chat/completions endpoints with Bearer token auth. Core changes: - auth.py: Extended ProviderConfig with api_key_env_vars and base_url_env_var fields. Added providers to PROVIDER_REGISTRY. Added provider aliases (glm, z-ai, zhipu, kimi, moonshot). Added auto-detection of API-key providers in resolve_provider(). Added resolve_api_key_provider_credentials() and get_api_key_provider_status() helpers. - runtime_provider.py: Added generic API-key provider branch in resolve_runtime_provider() — any provider with auth_type='api_key' is automatically handled. - main.py: Added providers to hermes model menu with generic _model_flow_api_key_provider() flow. Updated _has_any_provider_configured() to check all provider env vars. Updated argparse --provider choices. - setup.py: Added providers to setup wizard with API key prompts and curated model lists. - config.py: Added env vars (GLM_API_KEY, KIMI_API_KEY, MINIMAX_API_KEY, etc.) to OPTIONAL_ENV_VARS. - status.py: Added API key display and provider status section. - doctor.py: Added connectivity checks for each provider endpoint. - cli.py: Updated provider docstrings. Docs: Updated README.md, .env.example, cli-config.yaml.example, cli-commands.md, environment-variables.md, configuration.md. Tests: 50 new tests covering registry, aliases, resolution, auto-detection, credential resolution, and runtime provider dispatch. Inspired by PR #33 (numman-ali) which proposed a provider registry approach. Credit to tars90percent (PR #473) and manuelschipper (PR #420) for related provider improvements merged earlier in this changeset.	2026-03-06 18:55:18 -08:00
Robin Fernandes	bc091eb7ef	fix: implement Nous credential refresh on 401 error for retry logic	2026-03-07 13:34:23 +11:00
0xbyt4	33cfe1515d	fix: sanitize FTS5 queries and close mirror DB connections Two bugs fixed: 1. search_messages() crashes with OperationalError when user queries contain FTS5 special characters (+, ", (, {, dangling AND/OR, etc). Added _sanitize_fts5_query() to strip dangerous operators and a fallback try-except for edge cases. 2. _append_to_sqlite() in mirror.py creates a new SessionDB per call but never closes it, leaking SQLite connections. Added finally block to ensure db.close() is always called.	2026-03-07 04:24:45 +03:00
teknium1	94053d75a6	fix: custom endpoint no longer leaks OPENROUTER_API_KEY (#560 ) API key selection is now base_url-aware: when the resolved base_url targets OpenRouter, OPENROUTER_API_KEY takes priority (preserving the #289 fix). When hitting any other endpoint (Z.ai, vLLM, custom, etc.), OPENAI_API_KEY takes priority so the OpenRouter key doesn't leak. Applied in both the runtime provider resolver (the real code path) and the CLI initial default (for consistency). Fixes #560.	2026-03-06 17:16:14 -08:00
teknium1	2a68099675	fix(tests): isolate tests from user ~/.hermes/ config and SOUL.md _make_cli() now patches CLI_CONFIG with clean defaults so test_cli_init tests don't depend on the developer's local config.yaml. test_empty_dir_returns_empty now mocks Path.home() so it doesn't pick up a global SOUL.md. Credit to teyrebaz33 for identifying and fixing these in PR #557. Fixes #555.	2026-03-06 17:10:35 -08:00
0xbyt4	3b43f7267a	fix: count actual tool calls instead of tool-related messages tool_call_count was inaccurate in two ways: 1. Under-counting: an assistant message with N parallel tool calls (e.g. "kill the light and shut off the fan" = 2 ha_call_service) only incremented tool_call_count by 1 instead of N. 2. Over-counting: tool response messages (role=tool) also incremented tool_call_count, double-counting every tool interaction. Combined: 2 parallel tool calls produced tool_call_count=3 (1 from assistant + 2 from tool responses) instead of the correct value of 2. Fix: only count from assistant messages with tool_calls, incrementing by len(tool_calls) to handle parallel calls correctly. Tool response messages no longer affect tool_call_count. This impacts /insights and /usage accuracy for sessions with tool use.	2026-03-07 04:07:52 +03:00
0xbyt4	211b55815e	fix: prevent data loss in skills sync on copy/update failure Two bugs in sync_skills(): 1. Failed copytree poisons manifest: when shutil.copytree fails (disk full, permission error), the skill is still recorded in the manifest. On the next sync, the skill appears as "in manifest but not on disk" which is interpreted as "user deliberately deleted it" — the skill is never retried. Fix: only write to manifest on successful copy. 2. Failed update destroys user copy: rmtree deletes the existing skill directory before copytree runs. If copytree then fails, the user's skill is gone with no way to recover. Fix: move to .bak before copying, restore from backup if copytree fails. Both bugs are proven by new regression tests that fail on the old code and pass on the fix.	2026-03-07 03:58:32 +03:00
teknium1	4f56e31dc7	fix: track origin hashes in skills manifest to preserve user modifications Upgrade skills_sync manifest to v2 format (name:origin_hash). The origin hash records the MD5 of the bundled skill at the time it was last synced. On update, the user's copy is compared against the origin hash: - User copy == origin hash → unmodified → safe to update from bundled - User copy != origin hash → user customized → skip (preserve changes) v1 manifests (plain names) are auto-migrated: the user's current hash becomes the baseline, so future syncs can detect modifications. Output now shows user-modified skills: ~ whisper (user-modified, skipping) 27 tests covering all scenarios including v1→v2 migration, user modification detection, update after migration, and origin hash tracking. 2009 tests pass.	2026-03-06 16:13:58 -08:00
Teknium	6d3804770c	Merge pull request #552 from NousResearch/feat/insights feat: /insights command — usage analytics, cost estimation & activity patterns	2026-03-06 16:00:28 -08:00
teknium1	ab0f4126cf	fix: restore all removed bundled skills + fix skills sync system - Restored 21 skills removed in commits `757d012` and `740dd92`: accelerate, audiocraft, code-review, faiss, flash-attention, gguf, grpo-rl-training, guidance, llava, nemo-curator, obliteratus, peft, pytorch-fsdp, pytorch-lightning, simpo, slime, stable-diffusion, tensorrt-llm, torchtitan, trl-fine-tuning, whisper - Rewrote sync_skills() with proper update semantics: * New skills (not in manifest): copied to user dir * Existing skills (in manifest + on disk): updated via hash comparison * User-deleted skills (in manifest, not on disk): respected, not re-added * Stale manifest entries (removed from bundled): cleaned from manifest - Added sync_skills() to CLI startup (cmd_chat) and gateway startup (start_gateway) — previously only ran during 'hermes update' - Updated cmd_update output to show new/updated/cleaned counts - Rewrote tests: 20 tests covering manifest CRUD, dir hashing, fresh install, user deletion respect, update detection, stale cleanup, and name collision handling 75 bundled skills total. 2002 tests pass.	2026-03-06 15:57:30 -08:00
unmodeled-tyler	1755a9e38a	Design agent migration skill for Hermes Agent from OpenClaw \| Run successful dry tests with reports	2026-03-06 15:12:45 -08:00
teknium1	585f8528b2	fix: deep review — prefix matching, tool_calls extraction, query perf, serialization Issues found and fixed during deep code path review: 1. CRITICAL: Prefix matching returned wrong prices for dated model names - 'gpt-4o-mini-2024-07-18' matched gpt-4o ($2.50) instead of gpt-4o-mini ($0.15) - Same for o3-mini→o3 (9x), gpt-4.1-mini→gpt-4.1 (5x), gpt-4.1-nano→gpt-4.1 (20x) - Fix: use longest-match-wins strategy instead of first-match - Removed dangerous key.startswith(bare) reverse matching 2. CRITICAL: Top Tools section was empty for CLI sessions - run_agent.py doesn't set tool_name on tool response messages (pre-existing) - Insights now also extracts tool names from tool_calls JSON on assistant messages, which IS populated for all sessions - Uses max() merge strategy to avoid double-counting between sources 3. SELECT * replaced with explicit column list - Skips system_prompt and model_config blobs (can be thousands of chars) - Reduces memory and I/O for large session counts 4. Sets in overview dict converted to sorted lists - models_with_pricing / models_without_pricing were Python sets - Sets aren't JSON-serializable — would crash json.dumps() 5. Negative duration guard - end > start check prevents negative durations from clock drift 6. Model breakdown sort fallback - When all tokens are 0, now sorts by session count instead of arbitrary order 7. Removed unused timedelta import Added 6 new tests: dated model pricing (4), tool_calls JSON extraction, JSON serialization safety. Total: 69 tests.	2026-03-06 14:50:57 -08:00
teknium1	75f523f5c0	fix: unknown/custom models get zero cost instead of fake estimates Custom OAI endpoints, self-hosted models, and local inference should NOT show fabricated cost estimates. Changed default pricing from $3/$12 per million tokens to $0/$0 for unrecognized models. - Added _has_known_pricing() to distinguish commercial vs custom models - Models with known pricing show $ amounts; unknown models show 'N/A' - Overview shows asterisk + note when some models lack pricing data - Gateway format adds '(excludes custom/self-hosted models)' note - Added 7 new tests for custom model cost handling	2026-03-06 14:18:19 -08:00
teknium1	b52b37ae64	feat: add /insights command with usage analytics and cost estimation Inspired by Claude Code's /insights, adapted for Hermes Agent's multi-platform architecture. Analyzes session history from state.db to produce comprehensive usage insights. Features: - Overview stats: sessions, messages, tokens, estimated cost, active time - Model breakdown: per-model sessions, tokens, and cost estimation - Platform breakdown: CLI vs Telegram vs Discord etc. (unique to Hermes) - Tool usage ranking: most-used tools with percentages - Activity patterns: day-of-week chart, peak hours, streaks - Notable sessions: longest, most messages, most tokens, most tool calls - Cost estimation: real pricing data for 25+ models (OpenAI, Anthropic, DeepSeek, Google, Meta) with fuzzy model name matching - Configurable time window: --days flag (default 30) - Source filtering: --source flag to filter by platform Three entry points: - /insights slash command in CLI (supports --days and --source flags) - /insights slash command in gateway (compact markdown format) - hermes insights CLI subcommand (standalone) Includes 56 tests covering pricing helpers, format helpers, empty DB, populated DB with multi-platform data, filtering, formatting, and edge cases.	2026-03-06 14:04:59 -08:00
teknium1	d63b363cde	refactor: extract atomic_json_write helper, add 24 checkpoint tests Extract the duplicated temp-file + fsync + os.replace pattern from batch_runner.py (1 instance) and process_registry.py (2 instances) into a shared utils.atomic_json_write() function. Add 12 tests for atomic_json_write covering: valid JSON, parent dir creation, overwrite, crash safety (original preserved on error), no temp file leaks, string paths, unicode, custom indent, concurrent writes. Add 12 tests for batch_runner checkpoint behavior covering: _save_checkpoint (valid JSON, last_updated, overwrite, lock/no-lock, parent dirs, no temp leaks), _load_checkpoint (missing file, existing data, corrupt JSON), and resume logic (preserves prior progress, different run_name starts fresh).	2026-03-06 05:50:12 -08:00
teknium1	4a63737227	Merge PR #433 : fix(whatsapp): replace Linux-only fuser with cross-platform port cleanup Authored by Farukest. Fixes #432. Extracts _kill_port_process() helper that uses netstat+taskkill on Windows and fuser on Linux. Previously, fuser calls were inline with bare except-pass, so on Windows orphaned bridge processes were never cleaned up — causing 'address already in use' errors on reconnect. Includes 5 tests covering both platforms, port matching edge cases, and exception suppression.	2026-03-06 04:52:25 -08:00
teknium1	3e93db16bd	Merge PR #436 : fix: use _max_tokens_param in max-iterations retry path Authored by Farukest. Fixes #435. The retry summary in _handle_max_iterations() hardcoded max_tokens instead of using _max_tokens_param(), which returns max_completion_tokens for direct OpenAI API (required by gpt-4o, o-series). The first attempt already used _max_tokens_param correctly — only the retry path was wrong. Includes 4 tests for _max_tokens_param provider detection.	2026-03-06 04:46:24 -08:00
teknium1	c30967806c	test: add 26 tests for set_config_value secret routing Verifies explicit allowlist keys, catch-all _API_KEY/_TOKEN patterns, case insensitivity, TERMINAL_SSH prefix, and config.yaml routing for non-secret keys. Covers the fix from PR #469.	2026-03-06 04:26:18 -08:00
teknium1	b89eb29174	fix: correct mock tool name 'search' → 'search_files' in test_code_execution The mock handler checked for function_name == 'search' but the RPC sends 'search_files'. Any test exercising search_files through the mock would get 'Unknown tool' instead of the canned response.	2026-03-06 03:53:43 -08:00
teknium1	3982fcf095	fix: sync execute_code sandbox stubs with real tool schemas The _TOOL_STUBS dict in code_execution_tool.py was out of sync with the actual tool schemas, causing TypeErrors when the LLM used parameters it sees in its system prompt but the sandbox stubs didn't accept: search_files: - Added missing params: context, offset, output_mode - Fixed target default: 'grep' → 'content' (old value was obsolete) patch: - Added missing params: mode, patch (V4A multi-file patch support) Also added 4 drift-detection tests (TestStubSchemaDrift) that will catch future divergence between stubs and real schemas: - test_stubs_cover_all_schema_params: every schema param in stub - test_stubs_pass_all_params_to_rpc: every stub param sent over RPC - test_search_files_target_uses_current_values: no obsolete values - test_generated_module_accepts_all_params: generated code compiles All 28 tests pass.	2026-03-06 03:40:06 -08:00
teknium1	39299e2de4	Merge PR #451 : feat: Add Daytona environment backend Authored by rovle. Adds Daytona as the sixth terminal execution backend with cloud sandboxes, persistent workspaces, and full CLI/gateway integration. Includes 24 unit tests and 8 integration tests.	2026-03-06 03:32:40 -08:00
teknium1	efec4fcaab	feat(execute_code): add json_parse, shell_quote, retry helpers to sandbox The execute_code sandbox generates a hermes_tools.py stub module for LLM scripts. Three common failure modes keep tripping up scripts: 1. json.loads(strict=True) rejects control chars in terminal() output (e.g., GitHub issue bodies with literal tabs/newlines) 2. Shell backtick/quote interpretation when interpolating dynamic content into terminal() commands (markdown with backticks gets eaten by bash) 3. No retry logic for transient network failures (API timeouts, rate limits) Adds three convenience helpers to the generated hermes_tools module: - json_parse(text) — json.loads with strict=False for tolerant parsing - shell_quote(s) — shlex.quote() for safe shell interpolation - retry(fn, max_attempts=3, delay=2) — exponential backoff wrapper Also updates the EXECUTE_CODE_SCHEMA description to document these helpers so LLMs know they're available without importing anything extra. Includes 7 new tests (unit + integration) covering all three helpers.	2026-03-06 01:52:46 -08:00
teknium1	2317d115cd	fix: clipboard image paste on WSL2, Wayland, and VSCode terminal The original implementation only supported xclip (X11), which silently fails on WSL2 (can't access Windows clipboard for images), Wayland desktops (xclip is X11-only), and VSCode terminal on WSL2. Clipboard backend changes (hermes_cli/clipboard.py): - WSL2: detect via /proc/version, use powershell.exe with .NET System.Windows.Forms.Clipboard to extract images as base64 PNG - Wayland: use wl-paste with MIME type detection, auto-convert BMP to PNG for WSLg environments (via Pillow or ImageMagick) - Dispatch order: WSL → Wayland → X11 (xclip), with fallthrough - New has_clipboard_image() for lightweight clipboard checks - Cache WSL detection result per-process CLI changes (cli.py): - /paste command: explicit clipboard image check for terminals where BracketedPaste doesn't fire (image-only clipboard in VSCode/WinTerm) - Ctrl+V keybinding: fallback for Linux terminals where Ctrl+V sends raw byte instead of triggering bracketed paste Tests: 80 tests (up from 37) covering WSL, Wayland, X11 dispatch, BMP conversion, has_clipboard_image, and /paste command.	2026-03-05 20:22:44 -08:00
teknium1	8253b54be9	test: strengthen assertions in skill_manager + memory_tool (batch 3) test_skill_manager_tool.py (20 weak → 0): - Validation error messages verified against exact strings - Name validation: checks specific invalid name echoed in error - Frontmatter validation: exact error text for missing fields, unclosed markers, empty content, invalid YAML - File path validation: traversal, disallowed dirs, root-level test_memory_tool.py (13 weak → 0): - Security scan tests verify both 'Blocked' prefix AND specific threat pattern ID (prompt_injection, exfil_curl, etc.) - Invisible unicode tests verify exact codepoint strings - Snapshot test verifies type, header, content, and isolation	2026-03-05 18:51:43 -08:00
teknium1	5c867fd79f	test: strengthen assertions across 3 more test files (batch 2) test_run_agent.py (2 weak → 0, +13 assertions): - Session ID validated against actual YYYYMMDD_HHMMSS_hex format - API failure verifies error message propagation - Invalid JSON args verifies empty dict fallback + message structure - Context compression verifies final_response + completed flag - Invalid tool name retry verifies api_calls count - Invalid response verifies completed/failed/error structure test_model_tools.py (3 weak → 0): - Unknown tool error includes tool name in message - Exception returns dict with 'error' key + non-empty message - get_all_tool_names verifies both web_search AND terminal present test_approval.py (1 weak → 0, assert ratio 1.1 → 2.2): - Dangerous commands verify description content (delete, shell, drop, etc.) - Safe commands explicitly assert key AND desc are None - Pre/post condition checks for state management	2026-03-05 18:46:30 -08:00
teknium1	a44e041acf	test: strengthen assertions across 7 test files (batch 1) Replaced weak 'is not None' / '> 0' / 'len >= 1' assertions with concrete value checks across the most flagged test files: gateway/test_pairing.py (11 weak → 0): - Code assertions verify isinstance + len == CODE_LENGTH - Approval results verify dict structure + specific user_id/user_name - Added code2 != code1 check in rate_limit_expires test_hermes_state.py (6 weak → 0): - ended_at verified as float timestamp - Search result counts exact (== 2, not >= 1) - Context verified as non-empty list - Export verified as dict, session ID verified test_cli_init.py (4 weak → 0): - max_turns asserts exact value (60) - model asserts string with provider/name format gateway/test_hooks.py (2 zero-assert tests → fixed): - test_no_handlers_for_event: verifies no handler registered - test_handler_error_does_not_propagate: verifies handler count + return gateway/test_platform_base.py (9 weak image tests → fixed): - extract_images tests now verify actual URL and alt_text - truncate_message verifies content preservation after splitting cron/test_scheduler.py (1 weak → 0): - resolve_origin verifies dict equality, not just existence cron/test_jobs.py (2 weak → 0 + 4 new tests): - Schedule parsing verifies ISO timestamp type - Cron expression verifies result is valid datetime string - NEW: 4 tests for update_job() (was completely untested)	2026-03-05 18:39:37 -08:00
teknium1	e9f05b3524	test: comprehensive tests for model metadata + firecrawl config model_metadata tests (61 tests, was 39): - Token estimation: concrete value assertions, unicode, tool_call messages, vision multimodal content, additive verification - Context length resolution: cache-over-API priority, no-base_url skips cache, missing context_length key in API response - API metadata fetch: canonical_slug aliasing, TTL expiry with time mock, stale cache fallback on API failure, malformed JSON resilience - Probe tiers: above-max returns 2M, zero returns None - Error parsing: Anthropic format ('X > Y maximum'), LM Studio, empty string, unreasonably large numbers — also fixed parser to handle Anthropic format - Cache: corruption resilience (garbage YAML, wrong structure), value updates, special chars in model names Firecrawl config tests (8 tests, was 4): - Singleton caching (core purpose — verified constructor called once) - Constructor failure recovery (retry after exception) - Return value actually asserted (not just constructor args) - Empty string env vars treated as absent - Proper setup/teardown for env var isolation	2026-03-05 18:22:39 -08:00
teknium1	e2a834578d	refactor: extract clipboard methods + comprehensive tests (37 tests) Refactored image paste internals for testability: - Extracted _try_attach_clipboard_image() method (clipboard → state) - Extracted _build_multimodal_content() method (images → OpenAI format) - chat() now delegates to these instead of inline logic Tests organized in 4 levels: Level 1 (19 tests): Clipboard module — every platform path with realistic subprocess simulation (tools writing files, timeouts, empty files, cleanup on failure) Level 2 (8 tests): _build_multimodal_content — base64 encoding, MIME types (png/jpg/webp/unknown), missing files, multiple images, default question for empty text Level 3 (5 tests): _try_attach_clipboard_image — state management, counter increment/rollback, naming convention, mixed success/failure Level 4 (5 tests): Queue routing — tuple unpacking, command detection, images-only payloads, text-only payloads	2026-03-05 18:07:53 -08:00
teknium1	ffc752a79e	test: improve clipboard tests with realistic scenarios and multimodal coverage Rewrote clipboard tests from 11 shallow mocks to 21 realistic tests: - Success paths now simulate tools actually writing files (not pre-created) - osascript: success with PNG, success with TIFF, extraction-fail cases - pngpaste: empty file rejection edge case - Linux: extraction failure cleanup verification - New TestMultimodalConversion class: base64 encoding, MIME types, multiple images, missing file handling, default question fallback	2026-03-05 17:58:06 -08:00
teknium1	399562a7d1	feat: clipboard image paste in CLI (Cmd+V / Ctrl+V) Copy an image to clipboard (screenshot, browser, etc.) and paste into the Hermes CLI. The image is saved to ~/.hermes/images/, shown as a badge above the input ([📎 Image #1]), and sent to the model as a base64-encoded OpenAI vision multimodal content block. Implementation: - hermes_cli/clipboard.py: clean module with platform-specific extraction - macOS: pngpaste (if installed) → osascript fallback (always available) - Linux: xclip (apt install xclip) - cli.py: BracketedPaste key handler checks clipboard on every paste, image bar widget shows attached images, chat() converts to multimodal content format, Ctrl+C clears attachments Inspired by @m0at's fork (https://github.com/m0at/hermes-agent) which implemented image paste support for local vision models. Reimplemented cleanly as a separate module with tests.	2026-03-05 17:55:41 -08:00
teknium1	363633e2ba	fix: allow self-hosted Firecrawl without API key + add self-hosting docs On top of PR #460: self-hosted Firecrawl instances don't require an API key (USE_DB_AUTHENTICATION=false), so don't force users to set a dummy FIRECRAWL_API_KEY when FIRECRAWL_API_URL is set. Also adds a proper self-hosting section to the configuration docs explaining what you get, what you lose, and how to set it up (Docker stack, tradeoffs vs cloud). Added 2 more tests (URL-only without key, neither-set raises).	2026-03-05 16:44:21 -08:00
teknium1	a41ba57a7a	Merge PR #460 : feat(tools): add support for self-hosted firecrawl Authored by caentzminger. Adds optional FIRECRAWL_API_URL env var to point the Firecrawl client at a self-hosted instance instead of the cloud API.	2026-03-05 16:41:30 -08:00
teknium1	c886333d32	feat: smart context length probing with persistent caching + banner display Replaces the unsafe 128K fallback for unknown models with a descending probe strategy (2M → 1M → 512K → 200K → 128K → 64K → 32K). When a context-length error occurs, the agent steps down tiers and retries. The discovered limit is cached per model+provider combo in ~/.hermes/context_length_cache.yaml so subsequent sessions skip probing. Also parses API error messages to extract the actual context limit (e.g. 'maximum context length is 32768 tokens') for instant resolution. The CLI banner now displays the context window size next to the model name (e.g. 'claude-opus-4 · 200K context · Nous Research'). Changes: - agent/model_metadata.py: CONTEXT_PROBE_TIERS, persistent cache (save/load/get), parse_context_limit_from_error(), get_next_probe_tier() - agent/context_compressor.py: accepts base_url, passes to metadata - run_agent.py: step-down logic in context error handler, caches on success - cli.py + hermes_cli/banner.py: context length in welcome banner - tests: 22 new tests for probing, parsing, and caching Addresses #132. PR #319's approach (8K default) rejected — too conservative.	2026-03-05 16:09:57 -08:00
caentzminger	d7d10b14cd	feat(tools): add support for self-hosted firecrawl Adds optional FIRECRAWL_API_URL environment variable to support self-hosted Firecrawl deployments alongside the cloud service. - Add FIRECRAWL_API_URL to optional env vars in hermes_cli/config.py - Update _get_firecrawl_client() in tools/web_tools.py to accept custom API URL - Add tests for client initialization with/without URL - Document new env var in installation and config guides	2026-03-05 16:16:18 -06:00
rovle	a6499b6107	fix(daytona): use shell timeout wrapper instead of broken SDK exec timeout The Daytona SDK's process.exec(timeout=N) parameter is not enforced — the server-side timeout never fires and the SDK has no client-side fallback, causing commands to hang indefinitely. Fix: wrap commands with timeout N sh -c '...' (coreutils) which reliably kills the process and returns exit code 124. Added shlex.quote for proper shell escaping and a secondary deadline (timeout + 10s) that force-stops the sandbox if the shell timeout somehow fails. Signed-off-by: rovle <lovre.pesut@gmail.com>	2026-03-05 13:12:41 -08:00
rovle	efc7a7b957	fix(daytona): don't guess /root on cwd probe failure, keep constructor default; update tests to reflect this Signed-off-by: rovle <lovre.pesut@gmail.com>	2026-03-05 11:49:35 -08:00
rovle	577da79a47	fix(daytona): make disk cap visible and use SDK enum for sandbox state - Replace logger.warning with warnings.warn for the disk cap so users actually see it (logger was suppressed by CLI's log level config) - Use SandboxState enum instead of string literals in _ensure_sandbox_ready Signed-off-by: rovle <lovre.pesut@gmail.com>	2026-03-05 11:03:39 -08:00
rovle	d5efb82c7c	test(daytona): add unit and integration tests for Daytona backend Unit tests cover cwd resolution, sandbox persistence/resume, cleanup, command execution, resource conversion, interrupt handling, retry exhaustion, and sandbox readiness checks. Integration tests verify basic commands, filesystem ops, session persistence, and task isolation against a live Daytona API. Signed-off-by: rovle <lovre.pesut@gmail.com>	2026-03-05 10:26:22 -08:00
Teknium	21d61bdd71	Merge pull request #307 from batuhankocyigit/patch-1 fix: correct typo 'Grup' -> 'Group' in test section headers	2026-03-05 08:54:05 -08:00
teknium1	ad9c26afb8	Merge PR #293 : fix: eliminate shell noise from terminal output and fix test failures Authored by 0xbyt4. Wraps commands with unique fence markers to isolate real output from shell init/exit noise (oh-my-zsh, macOS session restore, etc.). Falls back to expanded pattern-based cleaning. Also fixes BSD find fallback and test module shadowing.	2026-03-05 08:48:26 -08:00
Farukest	e25ad79d5d	fix: use _max_tokens_param in max-iterations retry path The retry summary in _handle_max_iterations hardcodes max_tokens instead of calling _max_tokens_param(). For direct OpenAI API users (gpt-4o, o-series), the correct parameter name is max_completion_tokens. The first attempt at line 2697 already uses _max_tokens_param correctly but the retry path at line 2743 was missed.	2026-03-05 17:49:37 +03:00
Farukest	82cb1752d9	fix(whatsapp): replace Linux-only fuser with cross-platform port cleanup fuser command does not exist on Windows, causing orphaned bridge processes to never be cleaned up. On crash recovery, the port stays occupied and the next connect() fails with address-already-in-use. Add _kill_port_process() helper that uses netstat+taskkill on Windows and fuser on Linux/macOS. Replace both call sites in connect() and disconnect().	2026-03-05 17:13:14 +03:00
teknium1	b4b426c69d	test: add coverage for tee, process substitution, and full-path rm patterns Tests for the three new dangerous command patterns added in PR #280: - TestProcessSubstitutionPattern: 7 tests (bash/sh/zsh/ksh + safe commands) - TestTeePattern: 7 tests (sensitive paths + safe destinations) - TestFindExecFullPathRm: 4 tests (/bin/rm, /usr/bin/rm, bare rm, safe find)	2026-03-05 01:58:33 -08:00
teknium1	11a7c6b112	fix: update mock agent signature to accept task_id after PR #419 The _Codex401ThenSuccessAgent mock overrides run_conversation() but was missing the task_id parameter, causing a TypeError in the gateway test.	2026-03-05 01:41:50 -08:00
teknium1	d400fb8b23	feat: add /update slash command for gateway platforms Adds a /update command to Telegram, Discord, and other gateway platforms that runs `hermes update` to pull the latest code, update dependencies, sync skills, and restart the gateway. Implementation: - Spawns `hermes update` in a separate systemd scope (systemd-run --user --scope) so the process survives the gateway restart that hermes update triggers at the end. Falls back to nohup if systemd-run is unavailable. - Writes a marker file (.update_pending.json) with the originating platform and chat_id before spawning the update. - On gateway startup, _send_update_notification() checks for the marker, reads the captured update output, sends the results back to the user, and cleans up. Also: - Registers /update as a Discord slash command - Updates README.md, docs/messaging.md, docs/slash-commands.md - Adds 18 tests covering handler, notification, and edge cases	2026-03-05 01:20:58 -08:00
teknium1	9aa2999388	Merge PR #393 : fix(whatsapp): initialize data variable and close log handle on error paths Authored by FarukEst. Fixes #392. 1. Initialize data={} before health-check loop to prevent NameError when resp.json() raises after http_ready is set to True. 2. Extract _close_bridge_log() helper and call on all return False paths to prevent file descriptor leaks on failed connection attempts. Refactors disconnect() to reuse the same helper.	2026-03-04 21:49:53 -08:00
teknium1	90e6fa2612	Merge PR #204 : fix Telegram italic regex newline bug Authored by 0xbyt4. The italic regex [^]+ matched across newlines, corrupting bullet lists using markers (e.g. '* Item one\n* Item two' became italic garbage). Fixed by adding \n to the negated character class: [^*\n]+.	2026-03-04 19:52:03 -08:00
teknium1	fd22ae5fcb	Merge PR #203 : add unit tests for trajectory_compressor Authored by 0xbyt4. 25 tests covering CompressionConfig, TrajectoryMetrics, AggregateMetrics, protected indices, content extraction, and token counting.	2026-03-04 19:48:19 -08:00
teknium1	e1baab90f7	Merge PR #201 : fix skills hub dedup to prefer higher trust levels Authored by 0xbyt4. The dedup logic in GitHubSource.search() and unified_search() used 'r.trust_level == "trusted"' which let trusted results overwrite builtin ones. Now uses ranked comparison: builtin (2) > trusted (1) > community (0).	2026-03-04 19:40:41 -08:00
teknium1	4fcfa329ba	Merge PR #200 : fix extract_images and truncate_message bugs in platform base Authored by 0xbyt4. Two fixes: - extract_images(): only remove extracted image tags, not all markdown image tags. Previously ![doc](report.pdf) was silently dropped when real images were also present. - truncate_message(): walk chunk_body not full_chunk when tracking code block state, so the reopened fence prefix doesn't toggle in_code off and leave continuation chunks with unclosed code blocks.	2026-03-04 19:37:58 -08:00
teknium1	b336980229	Merge PR #193 : add unit tests for 5 security/logic-critical modules (batch 4) Authored by 0xbyt4. 144 new tests covering gateway/pairing.py, tools/skill_manager_tool.py, tools/skills_tool.py, honcho_integration/session.py, and agent/auxiliary_client.py.	2026-03-04 19:35:01 -08:00
teknium1	7128f95621	Merge PR #390 : fix hidden directory filter broken on Windows Authored by Farukest. Fixes #389. Replaces hardcoded forward-slash string checks ('/.git/', '/.hub/') with Path.parts membership test in _find_all_skills() and scan_skill_commands(). On Windows, str(Path) uses backslashes so the old filter never matched, causing quarantined skills to appear as installed.	2026-03-04 19:22:43 -08:00
teknium1	ffc6d767ec	Merge PR #388 : fix --force bypassing dangerous verdict in should_allow_install Authored by Farukest. Fixes #387. Removes 'and not force' from the dangerous verdict check so --force can never install skills with critical security findings (reverse shells, data exfiltration, etc). The docstring already documented this behavior but the code didn't enforce it.	2026-03-04 19:19:57 -08:00
teknium1	44a2d0c01f	Merge PR #386 : fix symlink boundary check prefix confusion in skills_guard Authored by Farukest. Fixes #385. Replaces startswith() with Path.is_relative_to() in _check_structure() symlink escape check — same fix pattern as skill_view() (PR #352). Prevents symlinks escaping to sibling directories with shared name prefixes.	2026-03-04 19:13:21 -08:00
teknium1	db58cfb13d	Merge PR #269 : Fix nous refresh token rotation failure on key mint failure Fixes a bug where the refresh token was not persisted when the API key mint failed (e.g., 402 insufficient credits, timeout). The rotated refresh token was lost, causing subsequent auth attempts to fail with a stale token. Changes: - Persist auth state immediately after each successful token refresh, before attempting the mint - Use latest in-memory refresh token on mint-retry paths (was using the stale original) - Atomic durable writes for auth.json (temp file + fsync + replace) - Opt-in OAuth trace logging (HERMES_OAUTH_TRACE=1, fingerprint-only) - 3 regression tests covering refresh+402, refresh+timeout, and invalid-token retry behavior Author: Robin Fernandes <rewbs>	2026-03-04 17:52:10 -08:00
teknium1	bd3025d669	Merge PR #395 : fix(gateway): use filtered history length for transcript message extraction Authored by PercyDikec. Fixes #394. The transcript extraction used len(history) to find new messages, but history includes session_meta entries stripped before reaching the agent. This caused 1 message lost per turn from turn 2 onwards. Fix returns history_offset (filtered length) from _run_agent and uses it for the slice.	2026-03-04 16:25:09 -08:00
teknium1	8311e8984b	fix: preflight context compression + error handler ordering for model switches Two fixes for the case where a user switches to a model with a smaller context window while having a large existing session: 1. Preflight compression in run_conversation(): Before the main loop, estimate tokens of loaded history + system prompt. If it exceeds the model's compression threshold (85% of context), compress proactively with up to 3 passes. This naturally handles model switches because the gateway creates a fresh AIAgent per message with the current model's context length. 2. Error handler reordering: Context-length errors (400 with 'maximum context length' etc.) are now checked BEFORE the generic 4xx handler. Previously, OpenRouter's 400-status context-length errors were caught as non-retryable client errors and aborted immediately, never reaching the compression+retry logic. Reported by Sonicrida on Discord: 840-message session (2MB+) crashed after switching from a large-context model to minimax via OpenRouter.	2026-03-04 14:42:41 -08:00
teknium1	093acd72dd	fix: catch exceptions from check_fn in is_toolset_available() get_definitions() already wrapped check_fn() calls in try/except, but is_toolset_available() did not. A failing check (network error, missing import, bad config) would propagate uncaught and crash the CLI banner, agent startup, and tools-info display. Now is_toolset_available() catches all exceptions and returns False, matching the existing pattern in get_definitions(). Added 4 tests covering exception handling in is_toolset_available(), check_toolset_requirements(), get_definitions(), and check_tool_availability(). Closes #402	2026-03-04 14:22:30 -08:00
PercyDikec	d3504f84af	fix(gateway): use filtered history length for transcript message extraction The transcript extraction used len(history) to find new messages, but history includes session_meta entries that are stripped before passing to the agent. This mismatch caused 1 message to be lost from the transcript on every turn after the first, because the slice offset was too high. Use the filtered history length (history_offset) returned by _run_agent instead. Also changed the else branch from returning all agent_messages to returning an empty list, so compressed/shorter agent output does not duplicate the entire history into the transcript.	2026-03-04 21:34:40 +03:00
Farukest	34badeb19c	fix(whatsapp): initialize data variable and close log handle on error paths	2026-03-04 19:11:48 +03:00
Farukest	f93b48226c	fix: use Path.parts for hidden directory filter in skill listing The hidden directory filter used hardcoded forward-slash strings like '/.git/' and '/.hub/' to exclude internal directories. On Windows, Path returns backslash-separated strings, so the filter never matched. This caused quarantined skills in .hub/quarantine/ to appear as installed skills and available slash commands on Windows. Replaced string-based checks with Path.parts membership test which works on both Windows and Unix.	2026-03-04 18:34:16 +03:00
Farukest	4805be0119	fix: prevent --force from overriding dangerous verdict in should_allow_install The docstring states --force should never override dangerous verdicts, but the condition `if result.verdict == "dangerous" and not force` allowed force=True to skip the early return. Execution then fell through to `if force: return True`, bypassing the policy block. Removed `and not force` so dangerous skills are always blocked regardless of the --force flag.	2026-03-04 18:10:18 +03:00
Farukest	a3ca71fe26	fix: use is_relative_to() for symlink boundary check in skills_guard The symlink escape check in _check_structure() used startswith() without a trailing separator. A symlink resolving to a sibling directory with a shared prefix (e.g. 'axolotl-backdoor') would pass the check for 'axolotl' since the string prefix matched. Replaced with Path.is_relative_to() which correctly handles directory boundaries and is consistent with the skill_view path check.	2026-03-04 17:23:23 +03:00
teknium1	70a0a5ff4a	fix: exclude current session from session_search results session_search was returning the current session if it matched the query, which is redundant — the agent already has the current conversation context. This wasted an LLM summarization call and a result slot. Added current_session_id parameter to session_search(). The agent passes self.session_id and the search filters out any results where either the raw or parent-resolved session ID matches. Both the raw match and the parent-resolved match are checked to handle child sessions from delegation. Two tests added verifying the exclusion works and that other sessions are still returned.	2026-03-04 06:06:40 -08:00
teknium1	4ae61b0886	Merge PR #370 : fix(session): use database session count for has_any_sessions Authored by Bartok9. Fixes #351.	2026-03-04 05:37:15 -08:00
teknium1	79871c2083	refactor: use Path.is_relative_to() for skill_view boundary check Replace the string-based startswith + os.sep approach with Path.is_relative_to() (Python 3.9+, we require 3.10+). This is the idiomatic pathlib way to check path containment — it handles separators, case sensitivity, and the equal-path case natively without string manipulation. Simplified tests to match: removed the now-unnecessary test_separator_is_os_native test since is_relative_to doesn't depend on separator choice.	2026-03-04 05:30:43 -08:00
teknium1	7796ac1411	Merge PR #354 : fix: use os.sep in skill_view path boundary check for Windows compatibility Authored by Farukest. Fixes #353.	2026-03-04 05:17:36 -08:00
teknium1	3db3d60368	refactor: extract build_session_key() as single source of truth The session key construction logic was duplicated in 4 places (session.py + 3 inline copies in run.py), which is exactly the kind of drift that caused issue #349 in the first place. Extracted build_session_key() as a public function in session.py. SessionStore._generate_session_key() now delegates to it, and all inline key construction in run.py has been replaced with calls to the shared function. Tests updated to test the function directly.	2026-03-04 03:34:45 -08:00
Bartok Moltbot	87a16ad2e5	fix(session): use database session count for has_any_sessions (#351 ) The previous implementation used `len(self._entries) > 1` to check if any sessions had ever been created. This failed for single-platform users because when sessions reset (via /reset, auto-reset, or gateway restart), the entry for the same session_key is replaced in _entries, not added. So len(_entries) stays at 1 for users who only use one platform. Fix: Query the SQLite database's session count instead. The database preserves historical session records (marked as ended), so session_count() correctly returns > 1 for returning users even after resets. This prevents the agent from reintroducing itself to returning users after every session reset. Fixes #351	2026-03-04 03:34:57 -05:00
Farukest	e86f391cac	fix: use os.sep in skill_view path boundary check for Windows compatibility	2026-03-04 06:50:06 +03:00
Farukest	e39de2e752	fix(gateway): match _quick_key to _generate_session_key for WhatsApp DMs	2026-03-04 06:34:46 +03:00
teknium1	ffec21236d	feat: enhance Home Assistant integration with service discovery and setup Improvements to the HA integration merged from PR #184: - Add ha_list_services tool: discovers available services (actions) per domain with descriptions and parameter fields. Tells the model what it can do with each device type (e.g. light.turn_on accepts brightness, color_name, transition). Closes the gap where the model had to guess available actions. - Add HA to hermes tools config: users can enable/disable the homeassistant toolset and configure HASS_TOKEN + HASS_URL through 'hermes tools' setup flow instead of manually editing .env. - Fix should-fix items from code review: - Remove sys.path.insert hack from gateway adapter - Replace all print() calls with proper logger (info/warning/error) - Move env var reads from import-time to handler-time via _get_config() - Add dedicated REST session reuse in gateway send() - Update ha_call_service description to reference ha_list_services for action discovery. - Update tests for new ha_list_services tool in toolset resolution.	2026-03-03 05:16:53 -08:00
areu01or00	a1c25046a9	fix(timezone): add timezone-aware clock across agent, cron, and execute_code	2026-03-03 18:23:40 +05:30
0xbyt4	aefc330b8f	merge: resolve conflict with main (add mcp + homeassistant extras)	2026-03-03 14:52:22 +03:00
0xbyt4	f967471758	merge: resolve conflict with main (keep fence markers + _find_shell)	2026-03-03 14:50:45 +03:00
BathreeNode	f08ad94d4d	fix: correct typo 'Grup' -> 'Group' in test section headers Three section header comments in tests/test_run_agent.py used 'Grup' instead of 'Group': - Line 124: # Grup 1: Pure Functions - Line 276: # Grup 2: State / Structure Methods - Line 572: # Grup 3: Conversation Loop Pieces (OpenAI mock)	2026-03-03 09:10:35 +03:00
teknium1	7df14227a9	feat(mcp): banner integration, /reload-mcp command, resources & prompts Banner integration: - MCP Servers section in CLI startup banner between Tools and Skills - Shows each server with transport type, tool count, connection status - Failed servers shown in red; section hidden when no MCP configured - Summary line includes MCP server count - Removed raw print() calls from discovery (banner handles display) /reload-mcp command: - New slash command in both CLI and gateway - Disconnects all MCP servers, re-reads config.yaml, reconnects - Reports what changed (added/removed/reconnected servers) - Allows adding/removing MCP servers without restarting Resources & Prompts support: - 4 utility tools registered per server: list_resources, read_resource, list_prompts, get_prompt - Exposes MCP Resources (data sources) and Prompts (templates) as tools - Proper parameter schemas (uri for read_resource, name for get_prompt) - Handles text and binary resource content - 23 new tests covering schemas, handlers, and registration Test coverage: 74 MCP tests total, 1186 tests pass overall.	2026-03-02 19:15:59 -08:00
teknium1	60effcfc44	fix(mcp): parallel discovery, user-visible logging, config validation - Discovery is now parallel (asyncio.gather) instead of sequential, fixing the 60s shared timeout issue with multiple servers - Startup messages use print() so users see connection status even with default log levels (the 'tools' logger is set to ERROR) - Summary line shows total tools and failed servers count - Validate conflicting config: warn if both 'url' and 'command' are present (HTTP takes precedence) - Update TODO.md: mark MCP as implemented, list remaining work - Add test for conflicting config detection (51 tests total) All 1163 tests pass.	2026-03-02 19:02:28 -08:00
teknium1	64ff8f065b	feat(mcp): add HTTP transport, reconnection, security hardening Upgrades the MCP client implementation from PR #291 with: - HTTP/Streamable HTTP transport: support 'url' key in config for remote MCP servers (Notion, Slack, Sentry, Supabase, etc.) - Automatic reconnection with exponential backoff (1s-60s, 5 retries) when a server connection drops unexpectedly - Environment variable filtering: only pass safe vars (PATH, HOME, etc.) plus user-specified env to stdio subprocesses (prevents secret leaks) - Credential stripping: sanitize error messages before returning to the LLM (strips GitHub PATs, OpenAI keys, Bearer tokens, etc.) - Configurable per-server timeouts: 'timeout' and 'connect_timeout' keys - Fix shutdown race condition in servers_snapshot variable scoping Test coverage: 50 tests (up from 30), including new tests for env filtering, credential sanitization, HTTP config detection, reconnection logic, and configurable timeouts. All 1162 tests pass (1162 passed, 3 skipped, 0 failed).	2026-03-02 18:40:03 -08:00
teknium1	468b7fdbad	Merge PR #291 : feat: add MCP (Model Context Protocol) client support Authored by 0xbyt4. Adds MCP client with official SDK, direct tool registration, auto-injection into hermes-* toolsets, and graceful degradation.	2026-03-02 18:24:31 -08:00
teknium1	221e4228ec	Merge PR #295 : fix: resolve OPENROUTER_API_KEY before OPENAI_API_KEY in all code paths Authored by 0xbyt4. Fixes #289.	2026-03-02 17:29:25 -08:00
teknium1	dd9d3f89b9	Merge PR #286 : Fix ClawHub Skills Hub adapter for API endpoint changes Authored by BP602. Fixes #285.	2026-03-02 17:25:14 -08:00
teknium1	2ba87a10b0	Merge PR #219 : fix: guard POSIX-only process functions for Windows compatibility Authored by Farukest. Fixes #218.	2026-03-02 17:07:49 -08:00
0xbyt4	6053236158	fix: prioritize OPENROUTER_API_KEY over OPENAI_API_KEY When both OPENROUTER_API_KEY and OPENAI_API_KEY are set (e.g. OPENAI_API_KEY in .bashrc), the wrong key was sent to OpenRouter causing auth failures. Fixed key resolution order in cli.py and runtime_provider.py. Fixes #289	2026-03-03 00:28:26 +03:00
0xbyt4	11615014a4	fix: eliminate shell noise from terminal output with fence markers - Wrap commands with unique fence markers (printf FENCE; cmd; printf FENCE) to isolate real output from shell init/exit noise (oh-my-zsh, macOS session restore/save, docker plugin errors, etc.) - Expand _clean_shell_noise to cover zsh/macOS patterns and strip from both beginning and end (fallback when fences are missing) - Fix BSD find compatibility: fallback to simple find when -printf produces empty output (macOS) - Fix test_terminal_disk_usage: use sys.modules to get the real module instead of the shadowed function from tools/__init__.py - Add 13 new unit tests for fence extraction and zsh noise patterns	2026-03-02 22:53:21 +03:00
0xbyt4	11a2ecb936	fix: resolve thread safety issues and shutdown deadlock in MCP client - Add threading.Lock protecting all shared state (_servers, _mcp_loop, _mcp_thread) - Fix deadlock in shutdown_mcp_servers: _stop_mcp_loop was called inside a _lock block but also acquires _lock (non-reentrant) - Fix race condition in _ensure_mcp_loop with concurrent callers - Change idempotency to per-server (retry failed servers, skip connected) - Dynamic toolset injection via startswith("hermes-") instead of hardcoded list - Parallel shutdown via asyncio.gather instead of sequential loop - Add tests for partial failure retry, parallel shutdown, dynamic injection	2026-03-02 22:08:32 +03:00
0xbyt4	151e8d896c	fix(tests): isolate discover_mcp_tools tests from global _servers state Patch _servers to empty dict in tests that call discover_mcp_tools() with mocked config, preventing interference from real MCP connections that may exist when running within the full test suite.	2026-03-02 21:38:01 +03:00
0xbyt4	aa2ecaef29	fix: resolve orphan subprocess leak on MCP server shutdown Refactor MCP connections from AsyncExitStack to task-per-server architecture. Each server now runs as a long-lived asyncio Task with `async with stdio_client(...)`, ensuring anyio cancel-scope cleanup happens in the same Task that opened the connection.	2026-03-02 21:22:00 +03:00
0xbyt4	3c252ae44b	feat: add MCP (Model Context Protocol) client support Connect to external MCP servers via stdio transport, discover their tools at startup, and register them into the hermes-agent tool registry. - New tools/mcp_tool.py: config loading, server connection via background event loop, tool handler factories, discovery, and graceful shutdown - model_tools.py: trigger MCP discovery after built-in tool imports - cli.py: call shutdown_mcp_servers in _run_cleanup - pyproject.toml: add mcp>=1.2.0 as optional dependency - 27 unit tests covering config, schema conversion, handlers, registration, SDK interaction, toolset injection, graceful fallback, and shutdown Config format (in ~/.hermes/config.yaml): mcp_servers: filesystem: command: "npx" args: ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]	2026-03-02 21:03:14 +03:00
BP602	6789084ec0	Fix ClawHub Skills Hub adapter for updated API	2026-03-02 16:11:49 +01:00
teknium1	7652afb8de	Merge PR #243 : fix(honcho): auto-enable when API key is present Authored by Bartok9. Fixes #241.	2026-03-02 05:13:33 -08:00
teknium1	7862e7010c	test: add additional multiline bypass tests for find patterns Extra test coverage for newline bypass detection (DOTALL fix). Inspired by Bartok9's PR #245.	2026-03-02 04:46:27 -08:00
teknium1	4faf2a6cf4	Merge PR #233 : fix(security): add re.DOTALL to prevent multiline bypass of dangerous command detection Authored by Farukest. Fixes #232.	2026-03-02 04:44:06 -08:00
teknium1	6d2481ee5c	Merge PR #231 : fix: use task-specific glob pattern in disk usage calculation Authored by Farukest. Fixes #230.	2026-03-02 04:38:58 -08:00
teknium1	ca5525bcd7	fix(tests): isolate HERMES_HOME in tests and adjust log directory for debug session Added a fixture to redirect HERMES_HOME to a temporary directory during tests, preventing writes to the user's home directory. Updated the test for DebugSession to create a dedicated log directory for saving logs, ensuring test isolation and accuracy in assertions.	2026-03-02 04:34:21 -08:00
teknium1	56b53bff6e	Merge PR #229 : fix(agent): copy conversation_history to avoid mutating caller's list Authored by Farukest. Fixes #228. # Conflicts: # tests/test_run_agent.py	2026-03-02 04:21:39 -08:00
teknium1	c4ea996612	fix: repair flush sentinel test — mock auxiliary client and add guard The TestFlushSentinelNotLeaked test from PR #227 had two issues: 1. flush_memories() uses get_text_auxiliary_client() which could bypass agent.client entirely — mock it to return (None, None) 2. No assertion that the API was actually called — added guard assert Without these fixes the test passed vacuously (API never called).	2026-03-02 03:21:08 -08:00
teknium1	39bfd226b8	Merge PR #225 : fix: preserve empty content in ReadResult.to_dict() Authored by Farukest. Fixes #224.	2026-03-02 03:13:31 -08:00
teknium1	234b67f5fd	fix: mock time in retry exhaustion tests to prevent backoff sleep The TestRetryExhaustion tests from PR #223 didn't mock time.sleep/time.time, causing the retry backoff loops (275s+ total) to run in real time. Tests would time out instead of running quickly. Added _make_fast_time_mock() helper that creates a mock time module where time.time() advances 500s per call (so sleep_end is always in the past) and time.sleep() is a no-op. Both tests now complete in <1s.	2026-03-02 02:59:41 -08:00
teknium1	e27e3a4f8a	Merge PR #223 : fix: correct off-by-one in retry exhaustion checks Authored by Farukest. Fixes #222.	2026-03-02 02:54:10 -08:00
teknium1	1cb2311bad	fix(security): block path traversal in skill_view file_path (fixes #220 ) skill_view accepted arbitrary file_path values like '../../.env' and would read files outside the skill directory, exposing API keys and other sensitive data. Added two layers of defense: 1. Reject paths with '..' components (fast, catches obvious traversal) 2. resolve() containment check with trailing '/' to prevent prefix collisions (catches symlinks and edge cases) Fix approach from PR #242 (@Bartok9). Vulnerability reported by @Farukest (#220, PR #221). Tests rewritten to properly mock SKILLS_DIR. Closes #220	2026-03-02 02:00:09 -08:00
teknium1	25c65bc99e	fix(agent): handle None content in context compressor (fixes #211 ) The OpenAI API returns content: null on assistant messages that only contain tool calls. msg.get('content', '') returns None (not '') when the key exists with value None, causing TypeError on len() and string concatenation in _generate_summary and compress. Fix: msg.get('content') or '' — handles both missing keys and None. Tests from PR #216 (@Farukest). Fix also in PR #215 (@cutepawss). Both PRs had stale branches and couldn't be merged directly. Closes #211	2026-03-02 01:35:52 -08:00
teknium1	afb680b50d	fix(cli): fix max_turns comment and test for correct priority order Priority is: CLI arg > config file > env var > default (not env var > config file as the old comment stated) The test failed because config.yaml had max_turns at both root level and inside agent section. The test cleared agent.max_turns but the root-level value still took precedence over the env var. Fixed the test to clear both, and corrected the comment to match the intended priority order.	2026-03-02 01:18:52 -08:00
teknium1	e265006fd6	test: add coverage for chat_topic in SessionSource and session context prompt Tests added: - Roundtrip serialization of chat_topic via to_dict/from_dict - chat_topic defaults to None when missing from dict - Channel Topic line appears in session context prompt when set - Channel Topic line is omitted when chat_topic is None Follow-up to PR #248 (feat: Discord channel topic in session context).	2026-03-02 00:53:21 -08:00
teknium1	719f2eef32	Merge branch 'pr-217' # Conflicts: # gateway/session.py	2026-03-02 00:18:41 -08:00
Robin Fernandes	5e5e0efc60	Fix nous refresh token rotation failure in case where api key mint/retrieval fails	2026-03-02 17:18:15 +11:00
teknium1	e5893075f9	feat(agent): add summary handling for reasoning items Enhanced the AIAgent class to capture and normalize summary information for reasoning items. Implemented logic to handle summaries as lists, ensuring proper formatting for API interactions. Updated tests to validate the inclusion of summaries in reasoning items, both for existing and default cases.	2026-03-01 20:03:03 -08:00
teknium1	5e598a588f	refactor(auth): transition Codex OAuth tokens to Hermes auth store Updated the authentication mechanism to store Codex OAuth tokens in the Hermes auth store located at ~/.hermes/auth.json instead of the previous ~/.codex/auth.json. This change includes refactoring related functions for reading and saving tokens, ensuring better management of authentication states and preventing conflicts between different applications. Adjusted tests to reflect the new storage structure and improved error handling for missing or malformed tokens.	2026-03-01 19:59:24 -08:00
teknium1	8bc2de4ab6	feat(provider-routing): add OpenRouter provider routing configuration Introduced a new `provider_routing` section in the CLI configuration to control how requests are routed across providers when using OpenRouter. This includes options for sorting providers by throughput, latency, or price, as well as allowing or ignoring specific providers, setting the order of provider attempts, and managing data collection policies. Updated relevant classes and documentation to support these features, enhancing flexibility in provider selection.	2026-03-01 18:24:27 -08:00
teknium1	11f5c1ecf0	fix(tests): use bare @pytest.mark.asyncio for hook emit tests Remove loop_scope="function" parameter from async test decorators in test_hooks.py. This matches the existing convention in the repo (test_telegram_documents.py) and avoids requiring pytest-asyncio 0.23+. All 144 new tests from PR #191 now pass.	2026-03-01 05:28:55 -08:00
0xbyt4	3b745633e4	test: add unit tests for 8 untested modules (batch 3) (#191 ) * test: add unit tests for 8 untested modules (batch 3) New test files (143 tests total): - tools/debug_helpers.py: DebugSession enable/disable, log, save, session info - tools/skills_guard.py: scan_file, scan_skill, trust levels, install policy, structural checks - tools/skills_sync.py: manifest read/write, skill discovery, sync logic - gateway/sticker_cache.py: cache CRUD, sticker injection text builders - gateway/channel_directory.py: channel resolution, display formatting, session building - gateway/hooks.py: hook discovery, sync/async emit, wildcard matching - gateway/mirror.py: session lookup, JSONL append, mirror_to_session - honcho_integration/client.py: config from env/file, session name resolution, linked workspaces Also documents a gap in skills_guard: multi-word prompt injection variants like "ignore all prior instructions" bypass the regex scanner. * test: strengthen sticker injection tests with exact format assertions Replace loose "contains" checks with exact output matching for build_sticker_injection and build_animated_sticker_injection. Add edge cases: set_name without emoji, empty description, empty emoji. * test: remove skills_guard gap-documenting test to avoid conflict with fix PR	2026-03-01 05:28:12 -08:00
0xbyt4	900d48714a	Merge remote-tracking branch 'origin/main' into test/expand-coverage-4 # Conflicts: # tests/agent/test_auxiliary_client.py	2026-03-01 12:11:54 +03:00
0xbyt4	3fdf03390e	Merge remote-tracking branch 'origin/main' into feature/homeassistant-integration # Conflicts: # run_agent.py	2026-03-01 11:59:12 +03:00
0xbyt4	25fb9aafcb	fix: add service domain blocklist and entity_id validation to HA tools Block dangerous HA service domains (shell_command, command_line, python_script, pyscript, hassio, rest_command) that allow arbitrary code execution or SSRF. Add regex validation for entity_id to prevent path traversal attacks. 17 new tests covering both security features.	2026-03-01 11:53:50 +03:00
Bartok Moltbot	ed0e860abb	fix(honcho): auto-enable when API key is present Fixes #241 When users set HONCHO_API_KEY via `hermes config set` or environment variable, they expect the integration to activate. Previously, the `enabled` flag defaulted to `false` when reading from global config, requiring users to also explicitly enable Honcho. This change auto-enables Honcho when: - An API key is present (from config file or env var) - AND `enabled` is not explicitly set to `false` in the config Users who want to disable Honcho while keeping the API key can still set `enabled: false` in their config. Also adds unit tests for the auto-enable behavior.	2026-03-01 03:12:37 -05:00
teknium1	41d8a80226	fix(display): fix subagent progress tree-view visual nits Two fixes to the subagent progress display from PR #186: 1. Task index prefix: show 1-indexed prefix ([1], [2], ...) for ALL tasks in batch mode (task_count > 1). Single tasks get no prefix. Previously task 0 had no prefix while others did, making batch output confusing. 2. Completion indicator: use spinner.print_above() instead of raw print() for per-task completion lines (✓ [1/2] ...). Raw print collided with the active spinner, mushing the completion text onto the spinner line. Now prints cleanly above. Added task_count parameter to _build_child_progress_callback and _run_single_child. Updated tests accordingly.	2026-02-28 23:29:49 -08:00
teknium1	4ec386cc72	fix(display): use spaces instead of ANSI \033[K in print_above() for prompt_toolkit compat print_above() used \033[K (erase-to-end-of-line) to clear the spinner line before printing text above it. This causes garbled escape codes when prompt_toolkit's patch_stdout is active in CLI mode. Switched to the same spaces-based clearing approach used by stop() — overwrite with blanks, then carriage return back to start of line. Updated test assertion to match the new clearing method.	2026-02-28 23:19:23 -08:00
lila	dd69f16c3e	feat(gateway): expose subagent tool calls and thinking to user (fixes #169 ) (#186 ) When subagents run via delegate_task, the user now sees real-time progress instead of silence: CLI: tree-view activity lines print above the delegation spinner 🔀 Delegating: research quantum computing ├─ 💭 "I'll search for papers first..." ├─ 🔍 web_search "quantum computing" ├─ 📖 read_file "paper.pdf" └─ ⠹ working... (18.2s) Gateway (Telegram/Discord): batched progress summaries sent every 5 tool calls to avoid message spam. Remaining tools flushed on subagent completion. Changes: - agent/display.py: add KawaiiSpinner.print_above() to print status lines above an active spinner without disrupting animation. Uses captured stdout (self._out) so it works inside the child's redirect_stdout(devnull). - tools/delegate_tool.py: add _build_child_progress_callback() that creates a per-child callback relaying tool calls and thinking events to the parent's spinner (CLI) or progress queue (gateway). Each child gets its own callback instance, so parallel subagents don't share state. Includes _flush() for gateway batch completion. - run_agent.py: fire tool_progress_callback with '_thinking' event when the model produces text content. Guarded by _delegate_depth > 0 so only subagents fire this (prevents gateway spam from main agent). REASONING_SCRATCHPAD/think/ reasoning XML tags are stripped before display. Tests: 21 new tests covering print_above, callback builder, thinking relay, SCRATCHPAD filtering, batching, flush, thread isolation, delegate_depth guard, and prefix handling.	2026-02-28 23:18:00 -08:00
teknium1	1db5598294	feat(tests): add live integration tests for file operations and shell noise filtering - Introduce a new test suite in `test_file_tools_live.py` to validate file operations and ensure accurate command execution in a real environment. - Implement assertions to check for shell noise contamination in outputs, enhancing the reliability of command results. - Create fixtures for setting up a local environment and populating directories with known file contents for comprehensive testing. - Refactor shell noise handling in `process_registry.py` and `local.py` to support multiple noise patterns, improving output cleanliness.	2026-02-28 22:57:58 -08:00
teknium1	70dfec9638	test(redact): add sensitive text redaction - Introduce a new test suite for the `redact_sensitive_text` function, covering various sensitive data formats including API keys, tokens, and environment variables. - Ensure that sensitive information is properly masked in logs and outputs while non-sensitive data remains unchanged. - Add tests for different scenarios including JSON fields, authorization headers, and environment variable assignments. - Implement a redacting formatter for logging to enhance security during log output.	2026-02-28 21:56:27 -08:00
teknium1	500f0eab4a	refactor(cli): Finalize OpenAI Codex Integration with OAuth - Enhanced Codex model discovery by fetching available models from the API, with fallback to local cache and defaults. - Updated the context compressor's summary target tokens to 2500 for improved performance. - Added external credential detection for Codex CLI to streamline authentication. - Refactored various components to ensure consistent handling of authentication and model selection across the application.	2026-02-28 21:47:51 -08:00
Teknium	5a79e423fe	Merge branch 'main' into codex/align-codex-provider-conventions-mainrepo	2026-02-28 18:13:38 -08:00
Farukest	7166647ca1	fix(security): add re.DOTALL to prevent multiline bypass of dangerous command detection	2026-03-01 03:23:29 +03:00
Farukest	f7300a858e	fix(tools): use task-specific glob pattern in disk usage calculation	2026-03-01 03:17:50 +03:00
Farukest	e87859e82c	fix(agent): copy conversation_history to avoid mutating caller's list	2026-03-01 03:06:13 +03:00
Farukest	de101a8202	fix(agent): strip _flush_sentinel from API messages	2026-03-01 02:51:31 +03:00
Farukest	7f1f4c2248	fix(tools): preserve empty content in ReadResult.to_dict()	2026-03-01 02:42:15 +03:00
Farukest	c33f8d381b	fix: correct off-by-one in retry exhaustion checks The retry exhaustion checks used > instead of >= to compare retry_count against max_retries. Since the while loop condition is retry_count < max_retries, the check retry_count > max_retries can never be true inside the loop. When retries are exhausted, the loop exits and falls through to response.choices[0] on an invalid response, crashing with IndexError instead of returning a proper error.	2026-03-01 02:27:26 +03:00
Farukest	3f58e47c63	fix: guard POSIX-only process functions for Windows compatibility os.setsid, os.killpg, and os.getpgid do not exist on Windows and raise AttributeError on import or first call. This breaks the terminal tool, code execution sandbox, process registry, and WhatsApp bridge on Windows. Added _IS_WINDOWS platform guard in all four affected files, following the pattern documented in CONTRIBUTING.md. On Windows, preexec_fn is set to None and process termination falls back to proc.terminate() / proc.kill() instead of process group signals. Files changed: - tools/environments/local.py (3 call sites) - tools/process_registry.py (2 call sites) - tools/code_execution_tool.py (3 call sites) - gateway/platforms/whatsapp.py (3 call sites)	2026-03-01 01:54:27 +03:00
Farukest	b7f8a17c24	fix(gateway): persist transcript changes in /retry, /undo and fix /reset /retry and /undo set session_entry.conversation_history which does not exist on SessionEntry. The truncated history was never written to disk, so the next message reload picked up the full unmodified transcript. Added SessionStore.rewrite_transcript() that persists changes to both the JSONL file and SQLite database, and updated both commands to use it. /reset accessed self.session_store._sessions which does not exist on SessionStore (the correct attribute is _entries). Also replaced the hand-coded session key with _generate_session_key() to fix WhatsApp DM sessions using the wrong key format. Closes #210	2026-03-01 01:40:30 +03:00
0xbyt4	b759602483	fix: prevent italic regex from spanning newlines in Telegram formatter The italic regex \([^]+)\* used [^] which matches newlines, causing bullet lists with markers to be incorrectly converted to italic text. Changed to [^*\n]+ to prevent cross-line matching. Adds 43 tests for _escape_mdv2 and format_message covering code blocks, bold/italic, headers, links, mixed formatting, and the regression case.	2026-02-28 22:01:48 +03:00
0xbyt4	9769e07cd5	test: add 25 unit tests for trajectory_compressor Tests cover CompressionConfig (defaults, from_yaml with full/partial/empty), TrajectoryMetrics and AggregateMetrics (to_dict, aggregation, division-by-zero guards), _find_protected_indices (basic, all-protected, no tail, missing roles, disabled protection), _extract_turn_content_for_summary (basic, truncation, empty range), and token counting (empty, basic, trajectory, fallback on error).	2026-02-28 21:28:28 +03:00
0xbyt4	08250a53a1	fix: skills hub dedup prefers higher trust levels + 43 tests - unified_search and GitHubSource.search dedup: replace naive `trust_level == "trusted"` check with ranked comparison so "builtin" results are never overwritten by "trusted" or "community" - Add 43 unit tests covering _parse_frontmatter_quick, trust_level_for, HubLockFile CRUD, TapsManager ops, LobeHub _convert_to_skill_md, unified_search dedup (with regression test), and append_audit_log	2026-02-28 21:25:55 +03:00
0xbyt4	ff6d62802d	fix: platform base extract_images and truncate_message bugs + tests - extract_images: only remove extracted image tags from content, preserve non-image markdown links (e.g. PDFs) that were previously silently lost - truncate_message: walk only chunk_body (not prepended prefix) so the reopened code fence does not toggle in_code off, leaving continuation chunks with unclosed code blocks - Add 49 unit tests covering MessageEvent command parsing, extract_images, extract_media, truncate_message code block handling, and _get_human_delay	2026-02-28 21:21:03 +03:00
0xbyt4	46506769f1	test: add unit tests for 5 security/logic-critical modules (batch 4) - gateway/pairing.py: rate limiting, lockout, code expiry, approval flow (28 tests) - tools/skill_manager_tool.py: validation, path traversal prevention, CRUD (46 tests) - tools/skills_tool.py: frontmatter/tag parsing, skill discovery, view chain (34 tests) - agent/auxiliary_client.py: auth reading, API key resolution, param branching (16 tests) - honcho_integration/session.py: session dataclass, ID sanitization, transcript format (20 tests)	2026-02-28 20:33:48 +03:00
0xbyt4	dfd50ceccd	fix: preserve Gemini thought_signature in tool call messages Gemini 3 thinking models attach extra_content with thought_signature to function call responses. This must be echoed back on subsequent API calls or the server rejects with a 400 error. The assistant message builder was dropping this field, causing all Gemini 3 Flash/Pro tool-calling flows to fail after the first function call.	2026-02-28 18:10:05 +03:00
0xbyt4	2390728cc3	fix: resolve 4 bugs found in HA integration code review - Auto-authorize HA events in gateway (system-generated, not user messages) - Guard _read_events against None/closed WebSocket after failed reconnect - Use UUID for send() message_id instead of polluting WS sequence counter - entity_id parameter now takes precedence over data["entity_id"]	2026-02-28 15:12:18 +03:00
0xbyt4	b32c642af3	test: add HA integration tests with fake in-process server Fake HA server (aiohttp.web) simulates full API surface over real TCP: - WebSocket auth handshake + event push - REST endpoints (states, services, notifications) 14 integration tests verify end-to-end flows without mocks: - WS connect/auth/subscribe/event-forwarding/disconnect - REST list/get/call-service against fake server - send() notification delivery and auth failure - 401/500 error handling	2026-02-28 14:28:04 +03:00
0xbyt4	c36b256de5	feat: add Home Assistant integration (REST tools + WebSocket gateway) - Add ha_list_entities, ha_get_state, ha_call_service tools via REST API - Add WebSocket gateway adapter for real-time state_changed event monitoring - Support domain/entity filtering, cooldown, and auto-reconnect with backoff - Use REST API for outbound notifications to avoid WS race condition - Gate tool availability on HASS_TOKEN env var - Add 82 unit tests covering real logic (filtering, payload building, event pipeline)	2026-02-28 13:32:48 +03:00
Bartok9	35655298e6	fix(gateway): prevent TTS voice messages from accumulating across turns Fixes #160 The issue was that MEDIA tags were being extracted from ALL messages in the conversation history, not just messages from the current turn. This caused TTS voice messages generated in earlier turns to be re-attached to every subsequent reply. The fix: - Track history_len before calling run_conversation - Only scan messages AFTER history_len for MEDIA tags - Add comprehensive tests to prevent regression This ensures each voice message is sent exactly once, when it's generated, not on every subsequent message in the session.	2026-02-28 03:38:27 -05:00
teknium1	50cb4d5fc7	fix(agent): update error message for unsupported Anthropic API endpoints to clarify usage of OpenRouter	2026-02-27 23:23:31 -08:00
Teknium	2bc9508b7c	Merge pull request #173 from adavyas/fix/anthropic-base-url-guard fix(agent): fail fast on Anthropic native base URLs	2026-02-27 23:22:01 -08:00
teknium1	19f28a633a	fix(agent): enhance 413 error handling and improve conversation history management in tests	2026-02-27 23:04:32 -08:00
Teknium	2c817ce4a5	Merge pull request #153 from tekelala/main fix(agent): handle 413 payload-too-large via compression instead of aborting	2026-02-27 22:57:55 -08:00
adavyas	0c0a2eb0a2	fix(agent): fail fast on Anthropic native base URLs	2026-02-27 21:19:29 -08:00
Teknium	0d2ac1c07f	Merge pull request #121 from Bartok9/test-clarify-tool test(tools): add unit tests for clarify_tool.py	2026-02-27 16:27:37 -08:00
tekelala	79bd65034c	fix(agent): handle 413 payload-too-large via compression instead of aborting The 413 "Request Entity Too Large" error from the LLM API was caught by the generic 4xx handler which aborts immediately. This is wrong for 413 — it's a payload-size issue that can be resolved by compressing conversation history. - Intercept 413 before the generic 4xx block and route to _compress_context - Exclude 413 from generic is_client_error detection - Add 'request entity too large' to context-length phrases as safety net - Add tests for 413 compression behavior Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 12:21:27 -05:00
tekelala	fbb1923fad	fix(security): patch path traversal, size bypass, and prompt injection in document processing - Sanitize filenames in cache_document_from_bytes to prevent path traversal (strip directory components, null bytes, resolve check) - Reject documents with None file_size instead of silently allowing download - Cap text file injection at 100 KB to prevent oversized prompt payloads - Sanitize display_name in run.py context notes to block prompt injection via filenames - Add 35 unit tests covering document cache utilities and Telegram document handling Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 11:53:46 -05:00
Teknium	3526fa27fd	Merge pull request #62 from 0xbyt4/test/expand-coverage-2 test: add unit tests for 8 modules (batch 2)	2026-02-27 01:47:30 -08:00
Teknium	64eca85876	Merge pull request #67 from 0xbyt4/test/add-run-agent-unit-tests test: add unit tests for run_agent.py (AIAgent)	2026-02-27 01:36:49 -08:00
Teknium	152271851f	Merge pull request #63 from 0xbyt4/fix/cron-prompt-injection-bypass fix: cron prompt injection scanner bypass for multi-word variants	2026-02-27 01:34:14 -08:00
Teknium	0909be3aa8	Merge pull request #61 from 0xbyt4/fix/write-deny-macos-symlink fix: resolve symlink bypass in write deny list on macOS	2026-02-27 01:32:19 -08:00
Teknium	274e623b50	Merge pull request #60 from 0xbyt4/test/expand-coverage test: add unit tests for 8 untested core modules	2026-02-27 01:30:36 -08:00
Bartok Moltbot	df8a62d018	test(tools): add unit tests for clarify_tool.py Add comprehensive test coverage for the clarify_tool module: - TestClarifyToolBasics: 5 tests for core functionality - Simple questions, questions with choices, error handling - TestClarifyToolChoicesValidation: 5 tests for choices parameter - MAX_CHOICES enforcement, empty/whitespace handling, type conversion - TestClarifyToolCallbackHandling: 3 tests for callback behavior - Exception handling, question/response trimming - TestCheckClarifyRequirements: 1 test verifying always-true behavior - TestClarifySchema: 6 tests verifying OpenAI function schema - Required/optional parameters, maxItems constraint Total: 20 tests covering all public functions and edge cases.	2026-02-27 03:29:26 -05:00
George Pickett	32070e6bc0	Merge remote-tracking branch 'origin/main' into codex/align-codex-provider-conventions-mainrepo # Conflicts: # cron/scheduler.py # gateway/run.py # tools/delegate_tool.py	2026-02-26 10:56:29 -08:00
darya	f5c09a3aba	test: add regression tests for recursive delete false positive fix Add 15 new tests in two classes: - TestRmFalsePositiveFix (8 tests): verify filenames starting with 'r' (readme.txt, requirements.txt, report.csv, etc.) are NOT falsely flagged as 'recursive delete' - TestRmRecursiveFlagVariants (7 tests): verify all recursive delete flag styles (-r, -rf, -rfv, -fr, -irf, --recursive, sudo rm -rf) are still correctly caught All 29 tests pass (14 existing + 15 new).	2026-02-26 16:40:44 +03:00
0xbyt4	90ca2ae16b	test: add unit tests for run_agent.py (AIAgent) 71 tests covering pure functions, state/structure methods, and conversation loop pieces. OpenAI client and tool loading are mocked.	2026-02-26 16:15:04 +03:00
0xbyt4	feea8332d6	fix: cron prompt injection scanner bypass for multi-word variants The regex `ignore\s+(previous\|all\|above\|prior)\s+instructions` only allowed ONE word between "ignore" and "instructions". Multi-word variants like "Ignore ALL prior instructions" bypassed the scanner because "ALL" matched the alternation but then `\s+instructions` failed to match "prior". Fix: use `(?:\w+\s+)*` groups to allow optional extra words before and after the keyword alternation.	2026-02-26 13:55:54 +03:00
0xbyt4	ffbdd7fcce	test: add unit tests for 8 modules (batch 2) Cover model_tools, toolset_distributions, context_compressor, prompt_caching, cronjob_tools, session_search, process_registry, and cron/scheduler with 127 new test cases.	2026-02-26 13:54:20 +03:00
0xbyt4	b699cf8c48	test: remove /etc platform-conditional tests from file_operations These tests documented the macOS symlink bypass bug with platform-conditional assertions. The fix and proper regression tests are in PR #61 (tests/tools/test_write_deny.py), so remove them here to avoid ordering conflicts between the two PRs.	2026-02-26 13:43:30 +03:00
0xbyt4	2efd9bbac4	fix: resolve symlink bypass in write deny list on macOS On macOS, /etc is a symlink to /private/etc. The _is_write_denied() function resolves the input path with os.path.realpath() but the deny list entries were stored as literal strings ("/etc/shadow"). This meant the resolved path "/private/etc/shadow" never matched, allowing writes to sensitive system files on macOS. Fix: Apply os.path.realpath() to deny list entries at module load time so both sides of the comparison use resolved paths. Adds 19 regression tests in tests/tools/test_write_deny.py.	2026-02-26 13:30:55 +03:00
0xbyt4	0ac3af8776	test: add unit tests for 8 untested modules Add comprehensive test coverage for: - cron/jobs.py: schedule parsing, job CRUD, due-job detection (34 tests) - tools/memory_tool.py: security scanning, MemoryStore ops, dispatcher (32 tests) - toolsets.py: resolution, validation, composition, cycle detection (19 tests) - tools/file_operations.py: write deny list, result dataclasses, helpers (37 tests) - agent/prompt_builder.py: context scanning, truncation, skills index (24 tests) - agent/model_metadata.py: token estimation, context lengths (16 tests) - hermes_state.py: SessionDB SQLite CRUD, FTS5 search, export, prune (28 tests) Total: 210 new tests, all passing (380 total suite).	2026-02-26 13:27:58 +03:00
teknium1	178658bf9f	test: enhance session source tests and add validation for chat types - Renamed test method for clarity and added comprehensive tests for `SessionSource` including handling of numeric `chat_id`, missing optional fields, and invalid platforms. - Introduced tests for session source descriptions based on chat types and names, ensuring accurate representation in prompts. - Improved file tools tests by validating schema structures, ensuring no duplicate model IDs, and enhancing error handling in file operations.	2026-02-26 00:53:57 -08:00
George Pickett	74c662b63a	Harden Codex auth refresh and responses compatibility	2026-02-25 19:27:54 -08:00
George Pickett	91bdb9eb2d	Fix Codex stream fallback for Responses completion gaps	2026-02-25 19:08:11 -08:00
George Pickett	47f16505d2	Omit optional function_call id in Responses replay input	2026-02-25 19:00:11 -08:00
George Pickett	e63986b534	Harden Codex stream handling and ack continuation	2026-02-25 18:56:06 -08:00
George Pickett	ce175d7372	Fix Codex Responses continuation and schema parity	2026-02-25 18:20:41 -08:00
George Pickett	609b19b630	Add OpenAI Codex provider runtime and responses integration (without .agent/PLANS.md)	2026-02-25 18:20:38 -08:00
0xbyt4	8fc28c34ce	test: reorganize test structure and add missing unit tests Reorganize flat tests/ directory to mirror source code structure (tools/, gateway/, hermes_cli/, integration/). Add 11 new test files covering previously untested modules: registry, patch_parser, fuzzy_match, todo_tool, approval, file_tools, gateway session/config/ delivery, and hermes_cli config/models. Total: 147 unit tests passing, 9 integration tests gated behind pytest marker.	2026-02-26 03:20:08 +03:00
teknium1	8fedbf87d9	feat: add cleanup utility for test artifacts in checkpoint resumption tests - Introduced a new `_cleanup_test_artifacts` function to remove test-generated files and directories after test execution. - Integrated the cleanup function into the `test_current_implementation` and `test_interruption_and_resume` tests to ensure proper resource management and prevent clutter from leftover files.	2026-02-23 02:16:10 -08:00
teknium1	d8a369e194	refactor: update API key checks in WebToolsTester - Replaced the Nous API key check with the Auxiliary Model check in the WebToolsTester class. - Updated the environment configuration to reflect the change in API key validation, ensuring accurate reporting of available keys.	2026-02-23 02:13:33 -08:00
teknium1	90af34bc83	feat: enhance interrupt handling and container resource configuration - Introduced a shared interrupt signaling mechanism to allow tools to check for user interrupts during long-running operations. - Updated the AIAgent to handle interrupts more effectively, ensuring in-progress tool calls are canceled and multiple interrupt messages are combined into one prompt. - Enhanced the CLI configuration to include container resource limits (CPU, memory, disk) and persistence options for Docker, Singularity, and Modal environments. - Improved documentation to clarify interrupt behaviors and container resource settings, providing users with better guidance on configuration and usage.	2026-02-23 02:11:33 -08:00
teknium1	cbff1b818c	refactor: remove obsolete Nous API test scripts - Deleted test scripts for Nous API limits, patterns, and temperature checks to streamline the testing suite. - These scripts were no longer necessary and their removal helps maintain a cleaner codebase.	2026-02-21 03:21:13 -08:00
teknium1	70dd3a16dc	Cleanup time!	2026-02-20 23:23:32 -08:00
teknium1	90e5211128	feat: implement subagent delegation for task management - Introduced the `delegate_task` tool, allowing the main agent to spawn child AIAgent instances with isolated context for complex tasks. - Supported both single-task and batch processing (up to 3 concurrent tasks) to enhance task management capabilities. - Updated configuration options for delegation, including maximum iterations and default toolsets for subagents. - Enhanced documentation to provide clear guidance on using the delegation feature and its configuration. - Added comprehensive tests to ensure the functionality and reliability of the delegation logic.	2026-02-20 03:15:53 -08:00
teknium1	783acd712d	feat: implement code execution sandbox for programmatic tool calling - Introduced a new `execute_code` tool that allows the agent to run Python scripts that call Hermes tools via RPC, reducing the number of round trips required for tool interactions. - Added configuration options for timeout and maximum tool calls in the sandbox environment. - Updated the toolset definitions to include the new code execution capabilities, ensuring integration across platforms. - Implemented comprehensive tests for the code execution sandbox, covering various scenarios including tool call limits and error handling. - Enhanced the CLI and documentation to reflect the new functionality, providing users with clear guidance on using the code execution tool.	2026-02-19 23:23:43 -08:00
teknium	248acf715e	Add browser automation tools and enhance environment configuration - Introduced new browser automation tools in `browser_tool.py` for navigating, interacting with, and extracting content from web pages using the agent-browser CLI and Browserbase cloud execution. - Updated `.env.example` to include new configuration options for Browserbase API keys and session settings. - Enhanced `model_tools.py` and `toolsets.py` to integrate browser tools into the existing tool framework, ensuring consistent access across toolsets. - Updated `README.md` with setup instructions for browser tools and their usage examples. - Added new test script `test_modal_terminal.py` to validate Modal terminal backend functionality. - Improved `run_agent.py` to support browser tool integration and logging enhancements for better tracking of API responses.	2026-01-29 06:10:24 +00:00
teknium	c82741c3d8	some cleanups	2025-11-05 03:47:17 +00:00
teknium	f6f75cbe2b	update webtools	2025-11-02 06:03:21 +00:00
teknium	0e2e69a71d	Add batch processing capabilities with checkpointing and statistics tracking, along with toolset distribution management. Update README and add test scripts for validation.	2025-10-06 03:17:58 +00:00
teknium	a7ff4d49e9	A bit of restructuring for simplicity and organization	2025-10-01 23:29:25 +00:00

... 46 47 48 49 50 ...

2617 Commits