hermes-agent

Author	SHA1	Message	Date
Teknium	950f69475f	feat(browser): add Camofox local anti-detection browser backend (#4008 ) Camofox-browser is a self-hosted Node.js server wrapping Camoufox (Firefox fork with C++ fingerprint spoofing). When CAMOFOX_URL is set, all 11 browser tools route through the Camofox REST API instead of the agent-browser CLI. Maps 1:1 to the existing browser tool interface: - Navigate, snapshot, click, type, scroll, back, press, close - Get images, vision (screenshot + LLM analysis) - Console (returns empty with note — camofox limitation) Setup: npm start in camofox-browser dir, or docker run -p 9377:9377 Then: CAMOFOX_URL=http://localhost:9377 in ~/.hermes/.env Advantages over Browserbase (cloud): - Free (no per-session API costs) - Local (zero network latency for browser ops) - Anti-detection at C++ level (bypasses Cloudflare/Google bot detection) - Works offline, Docker-ready Files: - tools/browser_camofox.py: Full REST backend (~400 lines) - tools/browser_tool.py: Routing at each tool function - hermes_cli/config.py: CAMOFOX_URL env var entry - tests/tools/test_browser_camofox.py: 20 tests	2026-03-30 13:18:42 -07:00
Teknium	37825189dd	fix(skills): validate hub bundle paths before install (#3986 ) Co-authored-by: Gutslabs <gutslabsxyz@gmail.com>	2026-03-30 08:37:19 -07:00
Teknium	1e896b0251	fix: resolve 7 failing CI tests (#3936 ) 1. matrix voice: _on_room_message_media unconditionally overwrote media_urls with the image cache path (always None for non-images), wiping the locally-cached voice path. Now only overrides when cached_path is truthy. 2. cli_tools_command: /tools disable no longer prompts for confirmation (input() removed in earlier commit to fix TUI hang), but tests still expected the old Y/N prompt flow. Updated tests to match current behavior (direct apply + session reset). 3. slack app_mention: connect() was refactored for multi-workspace (creates AsyncWebClient per token), but test only mocked the old self._app.client path. Added AsyncWebClient and acquire_scoped_lock mocks. 4. website_policy: module-level _cached_policy from earlier tests caused fast-path return of None. Added invalidate_cache() before assertion. 5. codex 401 refresh: already passing on current main (fixed by intervening commit).	2026-03-30 08:10:14 -07:00
Teknium	5148682b43	feat: mount skills directory into all remote backends with live sync (#3890 ) Skills with scripts/, templates/, and references/ subdirectories need those files available inside sandboxed execution environments. Previously the skills directory was missing entirely from remote backends. Live sync — files stay current as credentials refresh and skills update: - Docker/Singularity: bind mounts are inherently live (host changes visible immediately) - Modal: _sync_files() runs before each command with mtime+size caching, pushing only changed credential and skill files (~13μs no-op overhead) - SSH: rsync --safe-links before each command (naturally incremental) - Daytona: _upload_if_changed() with mtime+size caching before each command Security — symlink filtering: - Docker/Singularity: sanitized temp copy when symlinks detected - Modal/Daytona: iter_skills_files() skips symlinks - SSH: rsync --safe-links skips symlinks pointing outside source tree - Temp dir cleanup via atexit + reuse across calls Non-root user support: - SSH: detects remote home via echo $HOME, syncs to $HOME/.hermes/ - Daytona: detects sandbox home before sync, uploads to $HOME/.hermes/ - Docker/Modal/Singularity: run as root, /root/.hermes/ is correct Also: - credential_files.py: fix name/path key fallback in required_credential_files - Singularity, SSH, Daytona: gained credential file support - 14 tests covering symlink filtering, name/path fallback, iter_skills_files	2026-03-30 02:45:41 -07:00
Teknium	b4ceb541a7	fix(terminal): preserve partial output when command times out (#3868 ) When a command timed out, all captured output was discarded — the agent only saw 'Command timed out after Xs' with zero context. Now returns the buffered output followed by a timeout marker, matching the existing interrupt path behavior. Salvaged from PR #3286 by @binhnt92. Co-authored-by: nguyen binh <binhnt92@users.noreply.github.com>	2026-03-29 21:51:44 -07:00
Robin Fernandes	1cbb1b99cc	Gate tool-gateway behind an env var, so it's not in users' faces until we're ready. Even if users enable it, it'll be blocked server-side for now, until we unlock for non-admin users on tool-gateway.	2026-03-30 13:28:10 +09:00
Teknium	2d607d36f6	fix(security): catch sensitive path writes in approval checks (#3859 ) Co-authored-by: Gutslabs <gutslabsxyz@gmail.com>	2026-03-29 20:57:57 -07:00
Teknium	5e67fc8c40	fix(vision): reject non-image files and enforce website policy (salvage #1940 ) (#3845 ) Three safety gaps in vision_analyze_tool: 1. Local files accepted without checking if they're actually images — a renamed text file would get base64-encoded and sent to the model. Now validates magic bytes (PNG, JPEG, GIF, BMP, WebP, SVG). 2. No website policy enforcement on image URLs — blocked domains could be fetched via the vision tool. Now checks before download. 3. No redirect check — if an allowed URL redirected to a blocked domain, the download would proceed. Now re-checks the final URL. Fixed one test that needed _validate_image_url mocked to bypass DNS resolution on the fake blocked.test domain (is_safe_url does DNS checks that were added after the original PR). Co-authored-by: GutSlabs <GutSlabs@users.noreply.github.com>	2026-03-29 20:55:04 -07:00
Teknium	3e203de125	fix(skills): block category path traversal in skill manager (#3844 ) Validate category names in _create_skill() before using them as filesystem path segments. Previously, categories like '../escape' or '/tmp/pwned' could write skill files outside ~/.hermes/skills/. Adds _validate_category() that rejects slashes, backslashes, absolute paths, and non-alphanumeric characters (reuses existing VALID_NAME_RE). Tests: 5 new tests for traversal, absolute paths, and valid categories. Salvaged from PR #1939 by Gutslabs.	2026-03-29 20:08:22 -07:00
Teknium	c774833667	fix(banner): show honcho tools as available when configured (#3810 ) The honcho check_fn only checked runtime session state, which isn't set until the agent initializes. At banner time, honcho tools showed as red/disabled even when properly configured. Now checks configuration (enabled + api_key/base_url) as a fallback when the session context isn't active yet. Fast path (session active) unchanged; slow path (config check) only runs at banner time. Adds 4 tests covering: session active, configured but no session, not configured, and import failure graceful fallback. Closes #1843.	2026-03-29 15:55:05 -07:00
Teknium	d5d22fe7ba	feat(mcp): dynamic tool discovery via notifications/tools/list_changed (#3812 ) When a connected MCP server sends a ToolListChangedNotification (per the MCP spec), Hermes now automatically re-fetches the tool list, deregisters removed tools, and registers new ones — without requiring a restart. This enables MCP servers with dynamic toolsets (e.g. GitHub MCP with GITHUB_DYNAMIC_TOOLSETS=1) to add/remove tools at runtime. Changes: - registry.py: add ToolRegistry.deregister() for nuke-and-repave refresh - mcp_tool.py: extract _register_server_tools() from _discover_and_register_server() as a shared helper for both initial discovery and dynamic refresh - mcp_tool.py: add _make_message_handler() and _refresh_tools() on MCPServerTask, wired into all 3 ClientSession sites (stdio, new HTTP, deprecated HTTP) - Graceful degradation: silently falls back to static discovery when the MCP SDK lacks notification types or message_handler support - 8 new tests covering registration, refresh, handler dispatch, and deregister Salvaged from PR #1794 by shivvor2.	2026-03-29 15:52:54 -07:00
Teknium	57481c8ac5	fix(tools): implement send_message routing for Matrix, Mattermost, HomeAssistant, DingTalk (#3796 ) * fix(tools): implement send_message routing for Matrix, Mattermost, HomeAssistant, DingTalk Matrix, Mattermost, HomeAssistant, and DingTalk were present in platform_map but fell through to the "not yet implemented" else branch, causing send_message tool calls to silently fail on these platforms. Add four async sender functions: - _send_mattermost: POST /api/v4/posts via Mattermost REST API - _send_matrix: PUT /_matrix/client/v3/rooms/.../send via Matrix CS API - _send_homeassistant: POST /api/services/notify/notify via HA REST API - _send_dingtalk: POST to session webhook URL Add routing in _send_to_platform() and 17 unit tests covering success, HTTP errors, missing config, env var fallback, and Matrix txn_id uniqueness. * fix: pass platform tokens explicitly to Mattermost/Matrix/HA senders The original PR passed pconfig.extra to sender functions, but tokens live at pconfig.token (not in extra). This caused the senders to always fall through to env var lookup instead of using the gateway-resolved token. Changes: - Mattermost/Matrix/HA: accept token as first arg, matching the Telegram/Discord/Slack sender pattern - DingTalk: add DINGTALK_WEBHOOK_URL env var fallback + docstring explaining the session-webhook vs robot-webhook difference - Tests updated for new signatures + new DingTalk env var test --------- Co-authored-by: sprmn24 <oncuevtv@gmail.com>	2026-03-29 15:17:46 -07:00
Teknium	7a3682ac3f	feat: mount skill credential files + fix env passthrough for remote backends (#3671 ) Two related fixes for remote terminal backends (Modal/Docker): 1. NEW: Credential file mounting system Skills declare required_credential_files in frontmatter. Files are mounted into Docker (read-only bind mounts) and Modal (mounts at creation + sync via exec on each command for mid-session changes). Google Workspace skill updated with the new field. 2. FIX: Docker backend now includes env_passthrough vars Skills that declare required_environment_variables (e.g. Notion with NOTION_API_KEY) register vars in the env_passthrough system. The local backend checked this, but Docker's forward_env was a separate disconnected list. Now Docker exec merges both sources, so skill-declared env vars are forwarded into containers automatically. This fixes the reported issue where NOTION_API_KEY in ~/.hermes/.env wasn't reaching the Docker container despite being registered via the Notion skill's prerequisites. Closes #3665	2026-03-28 23:53:40 -07:00
Teknium	4764e06fde	fix(acp): complete session management surface for editor clients (salvage #3501 ) (#3675 ) * fix acp adapter session methods * test: stub local command in transcription provider cases --------- Co-authored-by: David Zhang <david.d.zhang@gmail.com>	2026-03-28 23:45:53 -07:00
Teknium	1a032ccf79	fix(skills): stop marking persisted env vars missing on remote backends (#3650 ) Salvage of PR #3452 (kentimsit). Fixes skill readiness checks on remote backends — persisted env vars are no longer incorrectly marked as missing. Co-Authored-By: kentimsit <kentimsit@users.noreply.github.com>	2026-03-28 17:52:32 -07:00
Teknium	973deb4f76	fix(browser): guard LLM response content against None in snapshot and vision (#3642 ) Salvage of PR #3532 (binhnt92). Guards browser_tool.py against None content from reasoning-only models (DeepSeek-R1, QwQ). Follow-up to #3449. Co-Authored-By: binhnt92 <binhnt92@users.noreply.github.com>	2026-03-28 17:25:04 -07:00
Teknium	f803f66339	fix(terminal): avoid merging heredoc EOF with fence wrapper (#3598 ) One-shot local execution built `printf FENCE; <cmd>; __hermes_rc=...`, so a command ending in a heredoc produced a closing line like `EOF; __hermes_rc=...`, which is not a valid delimiter. Bash then treated the rest of the wrapper as heredoc body, leaking it into tool output (e.g. gh issue/PR flows). Use newline-separated wrapper lines so the delimiter stays alone and the trailer runs after the heredoc completes. Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>	2026-03-28 14:43:41 -07:00
Teknium	404a0b823e	fix: add self-termination guard for pkill/killall targeting hermes/gateway (#3593 ) Prevent the agent from accidentally killing its own process with pkill -f gateway, killall hermes, etc. Adds a dangerous command pattern that triggers the approval flow. Co-authored-by: arasovic <arasovic@users.noreply.github.com>	2026-03-28 14:33:48 -07:00
Teknium	735ca9dfb2	refactor: replace swe-rex with native Modal SDK for Modal backend (#3538 ) Drop the swe-rex dependency for Modal terminal backend and use the Modal SDK directly (Sandbox.create + Sandbox.exec). This fixes: - AsyncUsageWarning from synchronous App.lookup() in async context - DeprecationError from unencrypted_ports / .url on unencrypted tunnels (deprecated 2026-03-05) The new implementation: - Uses modal.App.lookup.aio() for async-safe app creation - Uses Sandbox.create.aio() with 'sleep infinity' entrypoint - Uses Sandbox.exec.aio() for direct command execution (no HTTP server or tunnel needed) - Keeps all existing features: persistent filesystem snapshots, configurable resources (CPU/memory/disk), sudo support, interrupt handling, _AsyncWorker for event loop safety Consistent with the Docker backend precedent (PR #2804) where we removed mini-swe-agent in favor of direct docker run. Files changed: - tools/environments/modal.py - core rewrite - tools/terminal_tool.py - health check: modal instead of swerex - hermes_cli/setup.py - install modal instead of swe-rex[modal] - pyproject.toml - modal extra: modal>=1.0.0 instead of swe-rex[modal] - scripts/kill_modal.sh - grep for hermes-agent instead of swe-rex - tests/ - updated for new implementation - environments/README.md - updated patches section - website/docs - updated install command	2026-03-28 11:21:44 -07:00
Teknium	658692799d	fix: guard aux LLM calls against None content + reasoning fallback + retry (salvage #3389 ) (#3449 ) Salvage of #3389 by @binhnt92 with reasoning fallback and retry logic added on top. All 7 auxiliary LLM call sites now use extract_content_or_reasoning() which mirrors the main agent loop's behavior: extract content, strip think blocks, fall back to structured reasoning fields, retry on empty. Closes #3389.	2026-03-27 15:28:19 -07:00
Teknium	e4e04c2005	fix: make tirith block verdicts approvable instead of hard-blocking (#3428 ) Previously, tirith exit code 1 (block) immediately rejected the command with no approval prompt — users saw 'BLOCKED: Command blocked by security scan' and the agent moved on. This prevented gateway/CLI users from approving pipe-to-shell installs like 'curl ... \| sh' even when they understood the risk. Changes: - Tirith 'block' and 'warn' now both go through the approval flow. Users see the full tirith findings (severity, title, description, safer alternatives) and can choose to approve or deny. - New _format_tirith_description() builds rich descriptions from tirith findings JSON so the approval prompt is informative. - CLI startup now warns when tirith is enabled but not available, so users know command scanning is degraded to pattern matching only. The default approval choice is still deny, so the security posture is unchanged for unattended/timeout scenarios. Reported via Discord by pistrie — 'curl -fsSL https://mandex.dev/install.sh \| sh' was hard-blocked with no way to approve.	2026-03-27 13:22:01 -07:00
Teknium	5127567d5d	perf(ttft): cache skills prompt with shared skill_utils module (salvage #3366 ) (#3421 ) Two-layer caching for build_skills_system_prompt(): 1. In-process LRU (OrderedDict, max 8) — same-process: 546ms → <1ms 2. Disk snapshot (.skills_prompt_snapshot.json) — cold start: 297ms → 103ms Key improvements over original PR #3366: - Extract shared logic into agent/skill_utils.py (parse_frontmatter, skill_matches_platform, get_disabled_skill_names, extract_skill_conditions, extract_skill_description, iter_skill_index_files) - tools/skills_tool.py delegates to shared module — zero code duplication - Proper LRU eviction via OrderedDict.move_to_end + popitem(last=False) - Cache invalidation on all skill mutation paths: - skill_manage tool (in-conversation writes) - hermes skills install (CLI hub) - hermes skills uninstall (CLI hub) - Automatic via mtime/size manifest on cold start prompt_builder.py no longer imports tools.skills_tool (avoids pulling in the entire tool registry chain at prompt build time). 6301 tests pass, 0 failures. Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>	2026-03-27 10:54:02 -07:00
Teknium	5a1e2a307a	perf(ttft): salvage easy-win startup optimizations from #3346 (#3395 ) * perf(ttft): dedupe shared tool availability checks * perf(ttft): short-circuit vision auto-resolution * perf(ttft): make Claude Code version detection lazy * perf(ttft): reuse loaded toolsets for skills prompt --------- Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>	2026-03-27 07:49:44 -07:00
Teknium	be416cdfa9	fix: guard config.get() against YAML null values to prevent AttributeError (#3377 ) dict.get(key, default) returns None — not the default — when the key IS present but explicitly set to null/~ in YAML. Calling .lower() on that raises AttributeError. Use (config.get(key) or fallback) so both missing keys and explicit nulls coalesce to the intended default. Files fixed: - tools/tts_tool.py — _get_provider() - tools/web_tools.py — _get_backend() - tools/mcp_tool.py — MCPServerTask auth config - trajectory_compressor.py — _detect_provider() and config loading Co-authored-by: dieutx <dangtc94@gmail.com>	2026-03-27 04:03:00 -07:00
Teknium	b8b1f24fd7	fix: handle addition-only hunks in V4A patch parser (#3325 ) V4A patches with only + lines (no context or - lines) were silently dropped because search_lines was empty and the 'if search_lines:' block was the only code path. Addition-only hunks are common when the model generates patches for new functions or blocks. Adds an else branch that inserts at the context_hint position when available, or appends at end of file. Includes 2 regression tests for addition-only hunks with and without context hints. Salvaged from PR #3092 by thakoreh. Co-authored-by: Hiren <hiren.thakore58@gmail.com>	2026-03-26 19:38:04 -07:00
Robin Fernandes	e95965d76a	Merge branch 'main' into rewbs/tool-use-charge-to-subscription	2026-03-26 16:18:28 -07:00
Robin Fernandes	95dc9aaa75	feat: add managed tool gateway and Nous subscription support - add managed modal and gateway-backed tool integrations\n- improve CLI setup, auth, and configuration for subscriber flows\n- expand tests and docs for managed tool support	2026-03-26 16:17:58 -07:00
Teknium	e5d14445ef	fix(security): restrict subagent toolsets to parent's enabled set (#3269 ) The delegate_task tool accepts a toolsets parameter directly from the LLM's function call arguments. When provided, these toolsets are passed through _strip_blocked_tools but never intersected with the parent agent's enabled_toolsets. A model can request toolsets the parent does not have (e.g., web, browser, rl), granting the subagent tools that were explicitly disabled for the parent. Intersect LLM-requested toolsets with the parent's enabled set before applying the blocked-tool filter, so subagents can only receive a subset of the parent's tools. Co-authored-by: dieutx <dangtc94@gmail.com>	2026-03-26 14:50:26 -07:00
Teknium	db241ae6ce	feat(sessions): add --source flag for third-party session isolation (#3255 ) When third-party tools (Paperclip orchestrator, etc.) spawn hermes chat as a subprocess, their sessions pollute user session history and search. - hermes chat --source <tag> (also HERMES_SESSION_SOURCE env var) - exclude_sources parameter on list_sessions_rich() and search_messages() - Sessions with source=tool hidden from sessions list/browse/search - Third-party adapters pass --source tool to isolate agent sessions Cherry-picked from PR #3208 by HenkDz. Co-authored-by: Henkey <noonou7@gmail.com>	2026-03-26 14:35:31 -07:00
Teknium	76ed15dd4d	fix(security): normalize input before dangerous command detection (#3260 ) detect_dangerous_command() ran regex patterns against raw command strings without normalization, allowing bypass via Unicode fullwidth chars, ANSI escape codes, null bytes, and 8-bit C1 controls. Adds _normalize_command_for_detection() that: - Strips ANSI escapes using the full ECMA-48 strip_ansi() from tools/ansi_strip (CSI, OSC, DCS, 8-bit C1, nF sequences) - Removes null bytes - Normalizes Unicode via NFKC (fullwidth Latin → ASCII, etc.) Includes 12 regression tests covering fullwidth, ANSI, C1, null byte, and combined obfuscation bypasses. Salvaged from PR #3089 by thakoreh — improved ANSI stripping to use existing comprehensive strip_ansi() instead of a weaker hand-rolled regex, and added test coverage. Co-authored-by: Hiren <hiren.thakore58@gmail.com>	2026-03-26 14:33:18 -07:00
Teknium	b7b3294c4a	fix(skills): preserve trust for skills-sh identifiers + reduce resolution churn (#3251 ) * fix(skills): reduce skills.sh resolution churn and preserve trust for wrapped identifiers - Accept common skills.sh prefix typos (skils-sh/, skils.sh/) - Strip skills-sh/ prefix in _resolve_trust_level() so trusted repos stay trusted when installed through skills.sh - Use resolved identifier (from bundle/meta) for scan_skill source - Prefer tree search before root scan in _discover_identifier() - Add _resolve_github_meta() consolidation for inspect flow Cherry-picked from PR #3001 by kshitijk4poor. * fix: restore candidate loop in SkillsShSource.fetch() for consistency The cherry-picked PR only tried the first candidate identifier in fetch() while inspect() (via _resolve_github_meta) tried all four. This meant skills at repo/skills/path would be found by inspect but missed by fetch, forcing it through the heavier _discover_identifier flow. Restore the candidate loop so both paths behave identically. Updated the test assertion to match. --------- Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>	2026-03-26 13:40:21 -07:00
Teknium	0cfc1f88a3	fix: add MCP tool name collision protection (#3077 ) - Registry now warns when a tool name is overwritten by a different toolset (silent dict overwrite was the previous behavior) - MCP tool registration checks for collisions with non-MCP (built-in) tools before registering. If an MCP tool's prefixed name matches an existing built-in, the MCP tool is skipped and a warning is logged. MCP-to-MCP collisions are allowed (last server wins). - Both regular MCP tools and utility tools (resources/prompts) are guarded. - Adds 5 tests covering: registry overwrite warning, same-toolset re-registration silence, built-in collision skip, normal registration, and MCP-to-MCP collision pass-through. Reported by k_sze (KONG) — MiniMax MCP server's web_search tool could theoretically shadow Hermes's built-in web_search if prefixing failed.	2026-03-25 16:52:04 -07:00
Teknium	ab548a9b5e	fix(security): add SSRF protection to browser_navigate (#3058 ) * fix(security): add SSRF protection to browser_navigate browser_navigate() only checked the website blocklist policy but did not call is_safe_url() to block private/internal addresses. This allowed the agent to navigate to localhost, cloud metadata endpoints (169.254.169.254), and private network IPs via the browser. web_tools and vision_tools already had this check. Added the same is_safe_url() pre-flight validation before the blocklist check in browser_navigate(). * fix: move SSRF import to module level, fix policy test mock Move is_safe_url import to module level so it can be monkeypatched in tests. Update test_browser_navigate_returns_policy_block to mock _is_safe_url so the SSRF check passes and the policy check is reached. * fix(security): harden browser SSRF protection Follow-up to cherry-picked PR #3041: 1. Fail-closed fallback: if url_safety module can't import, block all URLs instead of allowing all. Security guards should never fail-open. 2. Post-redirect SSRF check: after navigation, verify the final URL isn't a private/internal address. If a public URL redirected to 169.254.169.254 or localhost, navigate to about:blank and return an error — prevents the model from reading internal content via subsequent browser_snapshot calls. --------- Co-authored-by: 0xbyt4 <35742124+0xbyt4@users.noreply.github.com>	2026-03-25 15:16:57 -07:00
Siddharth Balyan	b6461903ff	feat: nix flake — uv2nix build, NixOS module, persistent container mode (#20 ) * feat: nix flake, uv2nix build, dev shell and home manager * fixed nix run, updated docs for setup * feat(nix): NixOS module with persistent container mode, managed guards, checks - Replace homeModules.nix with nixosModules.nix (two deployment modes) - Mode A (native): hardened systemd service with ProtectSystem=strict - Mode B (container): persistent Ubuntu container with /nix/store bind-mount, identity-hash-based recreation, GC root protection, symlink-based updates - Add HERMES_MANAGED guards blocking CLI config mutation (config set, setup, gateway install/uninstall) when running under NixOS module - Add nix/checks.nix with build-time verification (binary, CLI, managed guard) - Remove container.nix (no Nix-built OCI image; pulls ubuntu:24.04 at runtime) - Simplify packages.nix (drop fetchFromGitHub submodules, PYTHONPATH wrappers) - Rewrite docs/nixos-setup.md with full options reference, container architecture, secrets management, and troubleshooting guide Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Update config.py * feat(nix): add CI workflow and enhanced build checks - GitHub Actions workflow for nix flake check + build on linux/macOS - Entry point sync check to catch pyproject.toml drift - Expanded managed-guard check to cover config edit - Wrap hermes-acp binary in Nix package - Fix Path type mismatch in is_managed() * Update MCP server package name; bundled skills support * fix reading .env. instead have container user a common mounted .env file * feat(nix): container entrypoint with privilege drop and sudo provisioning Container was running as non-root via --user, which broke apt/pip installs and caused crashes when $HOME didn't exist. Replace --user with a Nix-built entrypoint script that provisions the hermes user, sudo (NOPASSWD), and /home/hermes inside the container on first boot, then drops privileges via setpriv. Writable layer persists so setup only runs once. Also expands MCP server options to support HTTP transport and sampling. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix group and user creation in container mode * feat(nix): persistent /home/hermes and MESSAGING_CWD in container mode Container mode now bind-mounts ${stateDir}/home to /home/hermes so the agent's home directory survives container recreation. Previously it lived in the writable layer and was lost on image/volume/options changes. Also passes MESSAGING_CWD to the container so the agent finds its workspace and documents, matching native mode behavior. Other changes: - Extract containerDataDir/containerHomeDir bindings (no more magic strings) - Fix entrypoint chown to run unconditionally (volume mounts always exist) - Add schema field to container identity hash for auto-recreation - Add idempotency test (Scenario G) to config-roundtrip check * docs: add Nix & NixOS setup guide to docs site Add comprehensive Nix documentation to the Docusaurus site at website/docs/getting-started/nix-setup.md, covering nix run/profile install, NixOS module (native + container modes), declarative settings, secrets management, MCP servers, managed mode, container architecture, dev shell, flake checks, and full options reference. - Register nix-setup in sidebar after installation page - Add Nix callout tip to installation.md linking to new guide - Add canonical version pointer in docs/nixos-setup.md * docs: remove docs/nixos-setup.md, consolidate into website docs Backfill missing details (restart/restartSec in full example, gateway.pid, 0750 permissions, docker inspect commands) into the canonical website/docs/getting-started/nix-setup.md and delete the old standalone file. * fix(nix): add compression.protect_last_n and target_ratio to config-keys.json New keys were added to DEFAULT_CONFIG on main, causing the config-drift check to fail in CI. * fix(nix): skip checks on aarch64-darwin (onnxruntime wheel missing) The full Python venv includes onnxruntime (via faster-whisper/STT) which lacks a compatible uv2nix wheel on aarch64-darwin. Gate all checks behind stdenv.hostPlatform.isLinux. The package and devShell still evaluate on macOS. * fix(nix): skip flake check and build on macOS CI onnxruntime (transitive dep via faster-whisper) lacks a compatible uv2nix wheel on aarch64-darwin. Run full checks and build on Linux only; macOS CI verifies the flake evaluates without building. * fix(nix): preserve container writable layer across nixos-rebuild The container identity hash included the entrypoint's Nix store path, which changes on every nixpkgs update (due to runtimeShell/stdenv input-addressing). This caused false-positive identity mismatches, triggering container recreation and losing the persistent writable layer. - Use stable symlink (current-entrypoint) like current-package already does - Remove entrypoint from identity hash (only image/volumes/options matter) - Add GC root for entrypoint so nix-collect-garbage doesn't break it - Remove global HERMES_HOME env var from addToSystemPackages (conflicted with interactive CLI use, service already sets its own) --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 01:08:02 +05:30
Teknium	fba73a60e3	fix(skills): use Git Trees API to prevent silent subdirectory loss during install (#2995 ) * fix(skills): use Git Trees API to prevent silent subdirectory loss during install Refactors _download_directory() to use the Git Trees API (single call for the entire repo tree) as the primary path, falling back to the recursive Contents API when the tree endpoint is unavailable or truncated. Prevents silent subdirectory loss caused by per-directory rate limiting or transient failures. Cherry-picked from PR #2981 by tugrulguner. Fixes #2940. * fix: simplify tree API — use branch name directly as tree-ish Eliminates an extra git/ref/heads API call by passing the branch name directly to git/trees/{branch}?recursive=1, matching the pattern already used by _find_skill_in_repo_tree. --------- Co-authored-by: tugrulguner <tugrulguner@users.noreply.github.com>	2026-03-25 10:48:18 -07:00
Teknium	5dbe2d9d73	fix: skills-sh install fails for deeply nested repo structures (#2980 ) * fix(run_agent): ensure _fire_first_delta() is called for tool generation events Added calls to _fire_first_delta() in the AIAgent class to improve the handling of tool generation events, ensuring timely notifications during the processing of function calls and tool usage. * fix(run_agent): improve timeout handling for chat completions Enhanced the timeout configuration for chat completions in the AIAgent class by introducing customizable connection, read, and write timeouts using environment variables. This ensures more robust handling of API requests during streaming operations. * fix(run_agent): reduce default stream read timeout for chat completions Updated the default stream read timeout from 120 seconds to 60 seconds in the AIAgent class, enhancing the timeout configuration for chat completions. This change aims to improve responsiveness during streaming operations. * fix(run_agent): enhance streaming error handling and retry logic Improved the error handling and retry mechanism for streaming requests in the AIAgent class. Introduced a configurable maximum number of stream retries and refined the handling of transient network errors, allowing for retries with fresh connections. Non-transient errors now trigger a fallback to non-streaming only when appropriate, ensuring better resilience during API interactions. * fix: skills-sh install fails for deeply nested repo structures Skills in repos with deep directory nesting (e.g. cli-tool/components/skills/development/senior-backend/) could not be installed because the candidate path generation and shallow root-dir scan never reached them. Added GitHubSource._find_skill_in_repo_tree() which uses the GitHub Trees API to recursively search the entire repo tree in a single API call. This is used as a final fallback in SkillsShSource._discover_identifier() when the standard candidate paths and shallow scan both fail. Fixes installation of skills from repos like davila7/claude-code-templates where skills are nested 4+ levels deep. Reported by user Samuraixheart.	2026-03-25 09:31:05 -07:00
Teknium	745859babb	feat: env var passthrough for skills and user config (#2807 ) * feat: env var passthrough for skills and user config Skills that declare required_environment_variables now have those vars passed through to sandboxed execution environments (execute_code and terminal). Previously, execute_code stripped all vars containing KEY, TOKEN, SECRET, etc. and the terminal blocklist removed Hermes infrastructure vars — both blocked skill-declared env vars. Two passthrough sources: 1. Skill-scoped (automatic): when a skill is loaded via skill_view and declares required_environment_variables, vars that are present in the environment are registered in a session-scoped passthrough set. 2. Config-based (manual): terminal.env_passthrough in config.yaml lets users explicitly allowlist vars for non-skill use cases. Changes: - New module: tools/env_passthrough.py — shared passthrough registry - hermes_cli/config.py: add terminal.env_passthrough to DEFAULT_CONFIG - tools/skills_tool.py: register available skill env vars on load - tools/code_execution_tool.py: check passthrough before filtering - tools/environments/local.py: check passthrough in _sanitize_subprocess_env and _make_run_env - 19 new tests covering all layers * docs: add environment variable passthrough documentation Document the env var passthrough feature across four docs pages: - security.md: new 'Environment Variable Passthrough' section with full explanation, comparison table, and security considerations - code-execution.md: update security section, add passthrough subsection, fix comparison table - creating-skills.md: add tip about automatic sandbox passthrough - skills.md: add note about passthrough after secure setup docs Live-tested: launched interactive CLI, loaded a skill with required_environment_variables, verified TEST_SKILL_SECRET_KEY was accessible inside execute_code sandbox (value: passthrough-test-value-42).	2026-03-24 08:19:34 -07:00
Teknium	02b38b93cb	refactor: remove mini-swe-agent dependency — inline Docker/Modal backends (#2804 ) Drop the mini-swe-agent git submodule. All terminal backends now use hermes-agent's own environment implementations directly. Docker backend: - Inline the `docker run -d` container startup (was 15 lines in minisweagent's DockerEnvironment). Our wrapper already handled execute(), cleanup(), security hardening, volumes, and resource limits. Modal backend: - Import swe-rex's ModalDeployment directly instead of going through minisweagent's 90-line passthrough wrapper. - Bake the _AsyncWorker pattern (from environments/patches.py) directly into ModalEnvironment for Atropos compatibility without monkey-patching. Cleanup: - Remove minisweagent_path.py (submodule path resolution helper) - Remove submodule init/install from install.sh and setup-hermes.sh - Remove mini-swe-agent from .gitmodules - environments/patches.py is now a no-op (kept for backward compat) - terminal_tool.py no longer does sys.path hacking for minisweagent - mini_swe_runner.py guards imports (optional, for RL training only) - Update all affected tests to mock the new direct subprocess calls - Update README.md, CONTRIBUTING.md No functionality change — all Docker, Modal, local, SSH, Singularity, and Daytona backends behave identically. 6093 tests pass.	2026-03-24 07:30:25 -07:00
Teknium	1345e93393	fix: add macOS Homebrew paths to browser and terminal PATH resolution On macOS with Homebrew (Apple Silicon), Node.js and agent-browser binaries live under /opt/homebrew/bin/ which is not included in the _SANE_PATH fallback used by browser_tool.py and environments/local.py. When Hermes runs with a filtered PATH (e.g. as a systemd service), these binaries are invisible, causing 'env: node: No such file or directory' errors when using browser tools. Changes: - Add /opt/homebrew/bin and /opt/homebrew/sbin to _SANE_PATH in both browser_tool.py and environments/local.py - Add _discover_homebrew_node_dirs() to find versioned Node installs (e.g. brew install node@24) that aren't linked into /opt/homebrew/bin - Extend _find_agent_browser() to search Homebrew and Hermes-managed dirs when agent-browser isn't on the current PATH - Include discovered Homebrew node dirs in subprocess PATH when launching agent-browser - Add 11 new tests covering all Homebrew path discovery logic	2026-03-23 22:45:55 -07:00
Teknium	0791efe2c3	fix(security): add SSRF protection to vision_tools and web_tools (hardened) * fix(security): add SSRF protection to vision_tools and web_tools Both vision_analyze and web_extract/web_crawl accept arbitrary URLs without checking if they target private/internal network addresses. A prompt-injected or malicious skill could use this to access cloud metadata endpoints (169.254.169.254), localhost services, or private network hosts. Adds a shared url_safety.is_safe_url() that resolves hostnames and blocks private, loopback, link-local, and reserved IP ranges. Also blocks known internal hostnames (metadata.google.internal). Integrated at the URL validation layer in vision_tools and before each website_policy check in web_tools (extract, crawl). * test(vision): update localhost test to reflect SSRF protection The existing test_valid_url_with_port asserted localhost URLs pass validation. With SSRF protection, localhost is now correctly blocked. Update the test to verify the block, and add a separate test for valid URLs with ports using a public hostname. * fix(security): harden SSRF protection — fail-closed, CGNAT, multicast, redirect guard Follow-up hardening on top of dieutx's SSRF protection (PR #2630): - Change fail-open to fail-closed: DNS errors and unexpected exceptions now block the request instead of allowing it (OWASP best practice) - Block CGNAT range (100.64.0.0/10): Python's ipaddress.is_private does NOT cover this range (returns False for both is_private and is_global). Used by Tailscale/WireGuard and carrier infrastructure. - Add is_multicast and is_unspecified checks: multicast (224.0.0.0/4) and unspecified (0.0.0.0) addresses were not caught by the original four-check chain - Add redirect guard for vision_tools: httpx event hook re-validates each redirect target against SSRF checks, preventing the classic redirect-based SSRF bypass (302 to internal IP) - Move SSRF filtering before backend dispatch in web_extract: now covers Parallel and Tavily backends, not just Firecrawl - Extract _is_blocked_ip() helper for cleaner IP range checking - Add 24 new tests (CGNAT, multicast, IPv4-mapped IPv6, fail-closed behavior, parametrized blocked/allowed IP lists) - Fix existing tests to mock DNS resolution for test hostnames --------- Co-authored-by: dieutx <dangtc94@gmail.com>	2026-03-23 15:40:42 -07:00
Teknium	934fbe3c06	fix: strip ANSI at the source — clean terminal output before it reaches the model Root cause: terminal_tool, execute_code, and process_registry returned raw subprocess output with ANSI escape sequences intact. The model saw these in tool results and copied them into file writes. Previous fix (PR #2532) stripped ANSI at the write point in file_tools.py, but this was a band-aid — regex on file content risks corrupting legitimate content, and doesn't prevent ANSI from wasting tokens in the model context. Source-level fix: - New tools/ansi_strip.py with comprehensive ECMA-48 regex covering CSI (incl. private-mode, colon-separated, intermediate bytes), OSC (both terminators), DCS/SOS/PM/APC strings, Fp/Fe/Fs/nF escapes, 8-bit C1 - terminal_tool.py: strip output before returning to model - code_execution_tool.py: strip stdout/stderr before returning - process_registry.py: strip output in poll/read_log/wait - file_tools.py: remove _strip_ansi band-aid (no longer needed) Verified: `ls --color=always` output returned as clean text to model, file written from that output contains zero ESC bytes.	2026-03-23 07:43:12 -07:00
Teknium	7da0822456	fix(approval): honor bare YAML approvals.mode: off (#2620 ) Cherry-picked from PR #2563 by tumf. YAML 1.1 parses unquoted 'off' as boolean False. Added _normalize_approval_mode() to map False -> 'off', True -> 'manual', and normalize string values. Includes regression tests.	2026-03-23 06:56:09 -07:00
Teknium	93dc5dee6f	fix: prevent agents from starting gateway outside systemd management (#2617 ) An agent session killed the systemd-managed gateway (PID 1605) and restarted it with '&disown', taking it outside systemd's Restart= management. When the orphaned process later received SIGTERM, nothing restarted it. Add dangerous command patterns to detect: - 'gateway run' with & (background), disown, nohup, or setsid - These should use 'systemctl --user restart hermes-gateway' instead Also applied directly to main repo and fixed the systemd service: - Changed Restart=on-failure to Restart=always (clean SIGTERM = exit 0 = not a 'failure', so on-failure never triggered) - RestartSec=10 for reasonable restart delay	2026-03-23 06:45:17 -07:00
Teknium	b072737193	fix: expand tilde (~) in vision_analyze local file paths (#2585 ) Path('~/.hermes/image.png').is_file() returns False because Path doesn't expand tilde. This caused the tool to fall through to URL validation, which also failed, producing a confusing error: 'Invalid image source. Provide an HTTP/HTTPS URL or a valid local file path.' Fix: use os.path.expanduser() before constructing the Path object. Added two tests for tilde expansion (success and nonexistent file).	2026-03-22 23:48:32 -07:00
Teknium	ed805f57ff	fix(mcp-oauth): port mismatch, path traversal, and shared handler state (salvage #2521 ) (#2552 ) * fix(mcp-oauth): port mismatch, path traversal, and shared state in OAuth flow Three bugs in the new MCP OAuth 2.1 PKCE implementation: 1. CRITICAL: OAuth redirect port mismatch — build_oauth_auth() calls _find_free_port() to register the redirect_uri, but _wait_for_callback() calls _find_free_port() again getting a DIFFERENT port. Browser redirects to port A, server listens on port B — callback never arrives, 120s timeout. Fix: share the port via module-level _oauth_port variable. 2. MEDIUM: Path traversal via unsanitized server_name — HermesTokenStorage uses server_name directly in filenames. A name like "../../.ssh/config" writes token files outside ~/.hermes/mcp-tokens/. Fix: sanitize server_name with the same regex pattern used elsewhere. 3. MEDIUM: Class-level auth_code/state on _CallbackHandler causes data races if concurrent OAuth flows run. Second callback overwrites first. Fix: factory function _make_callback_handler() returns a handler class with a closure-scoped result dict, isolating each flow. * test: add tests for MCP OAuth path traversal, handler isolation, and port sharing 7 new tests covering: - Path traversal blocked (../../.ssh/config stays in mcp-tokens/) - Dots/slashes sanitized and resolved within base dir - Normal server names preserved - Special characters sanitized (@, :, /) - Concurrent handler result dicts are independent - Handler writes to its own result dict, not class-level - build_oauth_auth stores port in module-level _oauth_port --------- Co-authored-by: 0xbyt4 <35742124+0xbyt4@users.noreply.github.com>	2026-03-22 15:02:26 -07:00
Teknium	c275aa4732	Merge pull request #2465 from NousResearch/hermes/hermes-31d7db3b feat(cli): MCP server management CLI + OAuth 2.1 PKCE auth	2026-03-22 04:56:48 -07:00
Teknium	b7091f93b1	feat(cli): MCP server management CLI + OAuth 2.1 PKCE auth Add hermes mcp add/remove/list/test/configure CLI for managing MCP server connections interactively. Discovery-first 'add' flow connects, discovers tools, and lets users select which to enable via curses checklist. Add OAuth 2.1 PKCE authentication for MCP HTTP servers (RFC 7636). Supports browser-based and manual (headless) authorization, token caching with 0600 permissions, automatic refresh. Zero external deps. Add ${ENV_VAR} interpolation in MCP server config values, resolved from os.environ + ~/.hermes/.env at load time. Core OAuth module from PR #2021 by @imnotdev25. CLI and mcp_tool wiring rewritten against current main. Closes #497, #690.	2026-03-22 04:52:52 -07:00
Teknium	0b370f2dd9	fix(skills_guard): agent-created dangerous skills ask instead of block Changes the policy for agent-created skills with critical security findings from 'block' (silently rejected) to 'ask' (allowed with warning logged). The agent created the skill, so blocking it entirely is too aggressive — let it through but log the findings. - Policy: agent-created dangerous changed from block to ask - should_allow_install returns None for 'ask' (vs True/False) - format_scan_report shows 'NEEDS CONFIRMATION' for ask - skill_manager_tool.py caller handles None (allows with warning) - force=True still overrides as before Based on PR #2271 by redhelix (closed — 3200 lines of unrelated Mission Control code excluded).	2026-03-22 03:56:02 -07:00
Teknium	0ea7d0ec80	fix(terminal): log disk warning check failures at debug level (salvage #2372 ) (#2394 ) * fix(terminal): log disk warning check failures at debug level * fix(terminal): guard _check_disk_usage_warning by moving scratch_dir into try --------- Co-authored-by: aydnOktay <xaydinoktay@gmail.com>	2026-03-21 17:10:17 -07:00
Test	1870069f80	fix(session_search): exclude current session lineage Cherry-picked from PR #2201 by @Gutslabs. session_search resolved hits to parent/root sessions but only excluded the exact current_session_id. If the active session was a child continuation (compression/delegation), its parent could still appear as a 'past' conversation result. Fix: resolve current_session_id to its lineage root before filtering, so the entire active lineage (parent and children) is excluded.	2026-03-20 21:07:48 -07:00
Teknium	4263350c5b	fix: remove post-compression file-read history injection (#2226 ) Remove the [Files already read — do NOT re-read these] user message that was injected into the conversation after context compression. This message used role='user' for system-generated content, creating a fake user turn that confused models about conversation state and could contribute to task-redo behavior. The file_tools.py read tracker (warn on 3rd consecutive read, block on 4th+) already handles re-read prevention inline without injecting synthetic messages. Closes #2224. Co-authored-by: Test <test@test.com>	2026-03-20 14:54:25 -07:00
hermes	4d2c93a04f	fix: normalize MCP object schemas without properties	2026-03-19 16:23:45 -07:00
cmcleay	bb59057d5d	fix: normalize live Chrome CDP endpoints for browser tools	2026-03-19 10:17:03 -07:00
rovle	18862145e4	fix(daytona): migrate sandbox lookup from find_one to get/list find_one is being deprecated. Primary lookup now uses get() with a deterministic sandbox name (hermes-{task_id}). A legacy fallback via list(labels=...) ensures sandboxes created before this migration are still resumable.	2026-03-19 17:54:46 +01:00
Test	ae8059ca24	fix(delegate): move _saved_tool_names assignment to correct scope The merge at `e7844e9c` re-introduced a line in _build_child_agent() that references _saved_tool_names — a variable only defined in _run_single_child(). This caused NameError on every delegate_task call, completely breaking subagent delegation. Moves the child._delegate_saved_tool_names assignment to _run_single_child() where _saved_tool_names is actually defined, keeping the save/restore in the same scope as the try/finally block. Adds two regression tests from PR #2038 (YanSte). Also fixes the same issue reported in PR #2048 (Gutslabs). Co-authored-by: Yannick Stephan <yannick.stephan@gmail.com> Co-authored-by: Guts <gutslabs@users.noreply.github.com>	2026-03-19 09:26:05 -07:00
Teknium	b70dd51cfa	fix: disabled skills respected across banner, system prompt, slash commands, and skill_view (#1897 ) * fix: banner skill count now respects disabled skills and platform filtering The banner's get_available_skills() was doing a raw rglob scan of ~/.hermes/skills/ without checking: - Whether skills are disabled (skills.disabled config) - Whether skills match the current platform (platforms: frontmatter) This caused the banner to show inflated skill counts (e.g. '100 skills' when many are disabled) and list macOS-only skills on Linux. Fix: delegate to _find_all_skills() from tools/skills_tool which already handles both platform gating and disabled-skill filtering. * fix: system prompt and slash commands now respect disabled skills Two more places where disabled skills were still surfaced: 1. build_skills_system_prompt() in prompt_builder.py — disabled skills appeared in the <available_skills> system prompt section, causing the agent to suggest/load them despite being disabled. 2. scan_skill_commands() in skill_commands.py — disabled skills still registered as /skill-name slash commands in CLI help and could be invoked. Both now load _get_disabled_skill_names() and filter accordingly. * fix: skill_view blocks disabled skills skill_view() checked platform compatibility but not disabled state, so the agent could still load and read disabled skills directly. Now returns a clear error when a disabled skill is requested, telling the user to enable it via hermes skills or inspect the files manually. --------- Co-authored-by: Test <test@test.com>	2026-03-18 03:17:37 -07:00
Test	4b53b89f09	feat(mcp): expose MCP servers as standalone toolsets Each configured MCP server now registers as its own toolset in TOOLSETS (e.g. TOOLSETS['github'] = {tools: ['mcp_github_list_files', ...]}), making raw server names resolvable in platform_toolsets overrides. Previously MCP tools were only injected into hermes-* umbrella toolsets, so gateway sessions using raw toolset names like ['terminal', 'github'] in platform_toolsets couldn't resolve MCP tools. Skips server names that collide with built-in toolsets. Also handles idempotent reloads (syncs toolsets even when no new servers connect). Inspired by PR #1876 by @kshitijk4poor. Adds 2 tests (standalone toolset creation + built-in collision guard).	2026-03-18 03:04:17 -07:00
Test	0fab46f65c	fix: allow agent-created skills with caution-level findings Agent-created skills were using the same policy as community hub installs, blocking any skill with medium/high severity findings (e.g. docker pull, pip install, git clone). This meant the agent couldn't create skills that reference Docker or other common tools. Changed agent-created policy from (allow, block, block) to (allow, allow, block) — matching the trusted policy. Caution-level findings (medium/high severity) are now allowed through, while dangerous findings (critical severity like exfiltration, prompt injection, reverse shells) remain blocked. Added 4 tests covering the agent-created policy: safe allowed, caution allowed, dangerous blocked, force override.	2026-03-17 16:32:25 -07:00
darya	a654bc04f7	fix(file_tools): include pagination args in repeated search key	2026-03-18 01:19:05 +03:00
Teknium	b5cf0f0aef	fix: preserve parent agent's tool list after subagent delegation (#1778 ) Save and restore the process-global _last_resolved_tool_names in _run_single_child() so the parent's execute_code sandbox generates correct tool imports after delegation completes. The global was already mostly mitigated (run_agent.py passes enabled_tools via self.valid_tool_names), but the global itself remained corrupted — a footgun for any code that reads it directly. Co-authored-by: shane9coy <shane9coy@users.noreply.github.com>	2026-03-17 10:31:38 -07:00
Teknium	9a1e971126	fix(stt): respect explicit provider config instead of env-var fallback (#1775 ) * fix(session): skip corrupt lines in load_transcript instead of crashing Wrap json.loads() in load_transcript() with try/except JSONDecodeError so that partial JSONL lines (from mid-write crashes like OOM/SIGKILL) are skipped with a warning instead of crashing the entire transcript load. The rest of the history loads fine. Adds a logger.warning with the session ID and truncated corrupt line content for debugging visibility. Salvaged from PR #1193 by alireza78a. Closes #1193 * fix(stt): respect explicit provider config instead of env-var fallback Rework _get_provider() to separate explicit config from auto-detect. When stt.provider is explicitly set in config.yaml, that choice is authoritative — no silent cross-provider fallback based on which env vars happen to be set. When no provider is configured, auto-detect still tries: local > groq > openai. This fixes the reported scenario where provider: local + a placeholder OPENAI_API_KEY caused the system to silently select OpenAI and fail with a 401. Closes #1774	2026-03-17 10:30:58 -07:00
sai-samarth	a3de843fdb	test: replace real-looking WhatsApp jid in regression test	2026-03-17 15:38:37 +00:00
sai-samarth	dc15bc508f	fix(tools): add outbound WhatsApp send_message routing	2026-03-17 15:31:13 +00:00
Teknium	d1d17f4f0a	feat(compression): add summary_base_url + move compression config to YAML-only - Add summary_base_url config option to compression block for custom OpenAI-compatible endpoints (e.g. zai, DeepSeek, Ollama) - Remove compression env var bridges from cli.py and gateway/run.py (CONTEXT_COMPRESSION_* env vars no longer set from config) - Switch run_agent.py to read compression config directly from config.yaml instead of env vars - Fix backwards-compat block in _resolve_task_provider_model to also fire when auxiliary.compression.provider is 'auto' (DEFAULT_CONFIG sets this, which was silently preventing the compression section's summary_* keys from being read) - Add test for summary_base_url config-to-client flow - Update docs to show compression as config.yaml-only Closes #1591 Based on PR #1702 by @uzaylisak	2026-03-17 04:46:15 -07:00
teknium1	0897e4350e	merge: resolve conflicts with origin/main	2026-03-17 04:30:37 -07:00
Teknium	d2b10545db	feat(web): add Tavily as web search/extract/crawl backend (#1731 ) Salvage of PR #1707 by @kshitijk4poor (cherry-picked with authorship preserved). Adds Tavily as a third web backend alongside Firecrawl and Parallel, using the Tavily REST API via httpx. - Backend selection via hermes tools → saved as web.backend in config.yaml - All three tools supported: search, extract, crawl - TAVILY_API_KEY in config registry, doctor, status, setup wizard - 15 new Tavily tests + 9 backend selection tests + 5 config tests - Backward compatible Closes #1707	2026-03-17 04:28:03 -07:00
Teknium	4433b83378	feat(web): add Parallel as alternative web search/extract backend (#1696 ) * feat(web): add Parallel as alternative web search/extract backend Adds Parallel (parallel.ai) as a drop-in alternative to Firecrawl for web_search and web_extract tools using the official parallel-web SDK. - Backend selection via WEB_SEARCH_BACKEND env var (auto/parallel/firecrawl) - Auto mode prefers Firecrawl when both keys present; Parallel when sole backend - web_crawl remains Firecrawl-only with clear error when unavailable - Lazy SDK imports, interrupt support, singleton clients - 16 new unit tests for backend selection and client config Co-authored-by: s-jag <s-jag@users.noreply.github.com> * fix: add PARALLEL_API_KEY to config registry and fix web_crawl policy tests Follow-up for Parallel backend integration: - Add PARALLEL_API_KEY to OPTIONAL_ENV_VARS (hermes doctor, env blocklist) - Add to set_config_value api_keys list (hermes config set) - Add to doctor keys display - Fix 2 web_crawl policy tests that didn't set FIRECRAWL_API_KEY (needed now that web_crawl has a Firecrawl availability guard) * refactor: explicit backend selection via hermes tools, not auto-detect Replace the auto-detect backend selection with explicit user choice: - hermes tools saves WEB_SEARCH_BACKEND to .env when user picks a provider - _get_backend() reads the explicit choice first - Fallback only for manual/legacy config (uses whichever key is present) - _is_provider_active() shows [active] for the selected web backend - Updated tests, docs, and .env.example to remove 'auto' mode language * refactor: use config.yaml for web backend, not env var Match the TTS/browser pattern — web.backend is stored in config.yaml (set by hermes tools), not as a WEB_SEARCH_BACKEND env var. - _load_web_config() reads web: section from config.yaml - _get_backend() reads web.backend from config, falls back to key detection - _configure_provider() saves to config dict (saved to config.yaml) - _is_provider_active() reads from config dict - Removed WEB_SEARCH_BACKEND from .env.example, set_config_value, docs - Updated all tests to mock _load_web_config instead of env vars --------- Co-authored-by: s-jag <s-jag@users.noreply.github.com>	2026-03-17 04:02:02 -07:00
crazywriter1	7049dba778	fix(docker): remove container on cleanup when container_persistent=false When container_persistent=false, the inner mini-swe-agent cleanup only runs 'docker stop' in the background, leaving containers in Exited state. Now cleanup() also runs 'docker rm -f' to fully remove the container. Also fixes pre-existing test failures in model_metadata (gpt-4.1 1M context), setup tests (TTS provider step), and adds MockInnerDocker.cleanup(). Original fix by crazywriter1. Cherry-picked and adapted for current main. Fixes #1679	2026-03-17 04:02:01 -07:00
Teknium	6405d389aa	test: align Hermes setup and full-suite expectations (#1710 ) Salvaged from PR #1708 by @kartikkabadi. Cherry-picked with authorship preserved. Fixes pre-existing test failures from setup TTS prompt flow changes and environment-sensitive assumptions. Co-authored-by: Kartik <user2@RentKars-MacBook-Air.local>	2026-03-17 04:01:37 -07:00
Teknium	b16186a32a	feat(telegram): auto-detect HTML tags and use parse_mode=HTML in send_message (#1709 ) * feat: interactive MCP tool configuration in hermes tools Add the ability to selectively enable/disable individual MCP server tools through the interactive 'hermes tools' TUI. Changes: - tools/mcp_tool.py: Add probe_mcp_server_tools() — lightweight function that temporarily connects to configured MCP servers, discovers their tools (names + descriptions), and disconnects. No registry side effects. - hermes_cli/tools_config.py: Add 'Configure MCP tools' option to the interactive menu. When selected: 1. Probes all enabled MCP servers for their available tools 2. Shows a per-server curses checklist with tool descriptions 3. Pre-selects tools based on existing include/exclude config 4. Writes changes back as tools.exclude entries in config.yaml 5. Reports which servers failed to connect The existing CLI commands (hermes tools enable/disable server:tool) continue to work unchanged. This adds the interactive TUI counterpart so users can browse and toggle MCP tools visually. Tests: 22 new tests covering probe function edge cases and interactive flow (pre-selection, exclude/include modes, description truncation, multi-server handling, error paths). * feat(telegram): auto-detect HTML tags and use parse_mode=HTML in send_message When _send_telegram detects HTML tags in the message body, it now sends with parse_mode='HTML' instead of converting to MarkdownV2. This allows cron jobs and agents to send rich HTML-formatted Telegram messages with bold, italic, code blocks, etc. that render correctly. Detection uses the same regex from PR #1568 by @ashaney: re.search(r'<[a-zA-Z/][^>]*>', message) Plain-text and markdown messages continue through the existing MarkdownV2 pipeline. The HTML fallback path also catches HTML parse errors and falls back to plain text, matching the existing MarkdownV2 error handling. Inspired by: github.com/ashaney — PR #1568	2026-03-17 03:56:06 -07:00
Teknium	d87655afff	fix(gateway): persist watcher metadata in checkpoint for crash recovery (#1706 ) Salvaged from PR #1573 by @eren-karakus0. Cherry-picked with authorship preserved. Fixes #1143 — background process notifications resume after gateway restart. Co-authored-by: Muhammet Eren Karakuş <erenkar950@gmail.com>	2026-03-17 03:52:15 -07:00
Teknium	ce7418e274	feat: interactive MCP tool configuration in hermes tools (#1694 ) Add the ability to selectively enable/disable individual MCP server tools through the interactive 'hermes tools' TUI. Changes: - tools/mcp_tool.py: Add probe_mcp_server_tools() — lightweight function that temporarily connects to configured MCP servers, discovers their tools (names + descriptions), and disconnects. No registry side effects. - hermes_cli/tools_config.py: Add 'Configure MCP tools' option to the interactive menu. When selected: 1. Probes all enabled MCP servers for their available tools 2. Shows a per-server curses checklist with tool descriptions 3. Pre-selects tools based on existing include/exclude config 4. Writes changes back as tools.exclude entries in config.yaml 5. Reports which servers failed to connect The existing CLI commands (hermes tools enable/disable server:tool) continue to work unchanged. This adds the interactive TUI counterpart so users can browse and toggle MCP tools visually. Tests: 22 new tests covering probe function edge cases and interactive flow (pre-selection, exclude/include modes, description truncation, multi-server handling, error paths).	2026-03-17 03:48:44 -07:00
teknium1	6fc76ef954	fix: harden website blocklist — default off, TTL cache, fail-open, guarded imports - Default enabled: false (zero overhead when not configured) - Fast path: cached disabled state skips all work immediately - TTL cache (30s) for parsed policy — avoids re-reading config.yaml on every URL check - Missing shared files warn + skip instead of crashing all web tools - Lazy yaml import — missing PyYAML doesn't break browser toolset - Guarded browser_tool import — fail-open lambda fallback - check_website_access never raises for default path (fail-open with warning log); only raises with explicit config_path (test mode) - Simplified enforcement code in web_tools/browser_tool — no more try/except wrappers since errors are handled internally	2026-03-17 03:11:26 -07:00
Teknium	c3d626eb07	Revert "feat: add inference.sh integration (infsh tool + skill) (#1682 )" (#1684 ) This reverts commit `6020db0243`.	2026-03-17 03:01:30 -07:00
teknium1	30c417fe70	feat: add website blocklist enforcement for web/browser tools (#1064 ) Adds security.website_blocklist config for user-managed domain blocking across URL-capable tools. Enforced at the tool level (not monkey-patching) so it's safe and predictable. - tools/website_policy.py: shared policy loader with domain normalization, wildcard support (.tracking.example), shared file imports, and structured block metadata - web_extract: pre-fetch URL check + post-redirect recheck - web_crawl: pre-crawl URL check + per-page URL recheck - browser_navigate: pre-navigation URL check - Blocked responses include blocked_by_policy metadata so the agent can explain exactly what was denied Config: security: website_blocklist: enabled: true domains: ["evil.com", ".tracking.example"] shared_files: ["team-blocklist.txt"] Salvaged from PR #1086 by @kshitijk4poor. Browser post-redirect checks deferred (browser_tool was fully rewritten since the PR branched). Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>	2026-03-17 02:59:39 -07:00
Teknium	6020db0243	feat: add inference.sh integration (infsh tool + skill) (#1682 ) Add inference.sh CLI (infsh) as a tool integration, giving agents access to 150+ AI apps through a single CLI — image gen (FLUX, Reve, Seedream), video (Veo, Wan, Seedance), LLMs, search (Tavily, Exa), 3D, avatar/lipsync, and more. One API key manages all services. Tools: - infsh: run any infsh CLI command (app list, app run, etc.) - infsh_install: install the CLI if not present Registered as an 'inference' toolset (opt-in, not in core tools). Includes comprehensive skill docs with examples for all app categories. Changes from original PR: - NOT added to _HERMES_CORE_TOOLS (available via --toolsets inference) - Added 12 tests covering tool registration, command execution, error handling, timeout, JSON parsing, and install flow Inspired by PR #1021 by @okaris. Co-authored-by: okaris <okaris@users.noreply.github.com>	2026-03-17 02:59:21 -07:00
Teknium	1d5a39e002	fix: thread safety for concurrent subagent delegation (#1672 ) * fix: thread safety for concurrent subagent delegation Four thread-safety fixes that prevent crashes and data races when running multiple subagents concurrently via delegate_task: 1. Remove redirect_stdout/stderr from delegate_tool — mutating global sys.stdout races with the spinner thread when multiple children start concurrently, causing segfaults. Children already run with quiet_mode=True so the redirect was redundant. 2. Split _run_single_child into _build_child_agent (main thread) + _run_single_child (worker thread). AIAgent construction creates httpx/SSL clients which are not thread-safe to initialize concurrently. 3. Add threading.Lock to SessionDB — subagents share the parent's SessionDB and call create_session/append_message from worker threads with no synchronization. 4. Add _active_children_lock to AIAgent — interrupt() iterates _active_children while worker threads append/remove children. 5. Add _client_cache_lock to auxiliary_client — multiple subagent threads may resolve clients concurrently via call_llm(). Based on PR #1471 by peteromallet. * feat: Honcho base_url override via config.yaml + quick command alias type Two features salvaged from PR #1576: 1. Honcho base_url override: allows pointing Hermes at a remote self-hosted Honcho deployment via config.yaml: honcho: base_url: "http://192.168.x.x:8000" When set, this overrides the Honcho SDK's environment mapping (production/local), enabling LAN/VPN Honcho deployments without requiring the server to live on localhost. Uses config.yaml instead of env var (HONCHO_URL) per project convention. 2. Quick command alias type: adds a new 'alias' quick command type that rewrites to another slash command before normal dispatch: quick_commands: sc: type: alias target: /context Supports both CLI and gateway. Arguments are forwarded to the target command. Based on PR #1576 by redhelix. --------- Co-authored-by: peteromallet <peteromallet@users.noreply.github.com> Co-authored-by: redhelix <redhelix@users.noreply.github.com>	2026-03-17 02:53:33 -07:00
Teknium	556e0f4b43	fix(docker): add explicit env allowlist for container credentials (#1436 ) Docker terminal sessions are secret-dark by default. This adds terminal.docker_forward_env as an explicit allowlist for env vars that may be forwarded into Docker containers. Values resolve from the current shell first, then fall back to ~/.hermes/.env. Only variables the user explicitly lists are forwarded — nothing is auto-exposed. Cherry-picked from PR #1449 by @teknium1, conflict-resolved onto current main. Fixes #1436 Supersedes #1439	2026-03-17 02:34:35 -07:00
Teknium	2c7c30be69	fix(security): harden terminal safety and sandbox file writes (#1653 ) * fix(security): harden terminal safety and sandbox file writes Two security improvements: 1. Dangerous command detection: expand shell -c pattern to catch combined flags (bash -lc, bash -ic, ksh -c) that were previously undetected. Pattern changed from matching only 'bash -c' to matching any shell invocation with -c anywhere in the flags. 2. File write sandboxing: add HERMES_WRITE_SAFE_ROOT env var that constrains all write_file/patch operations to a configured directory tree. Opt-in — when unset, behavior is unchanged. Useful for gateway/messaging deployments that should only touch a workspace. Based on PR #1085 by ismoilh. * fix: correct "POSIDEON" typo to "POSEIDON" in banner ASCII art The poseidon skin's banner_logo had the E and I letters swapped, spelling "POSIDEON-AGENT" instead of "POSEIDON-AGENT". --------- Co-authored-by: ismoilh <ismoilh@users.noreply.github.com> Co-authored-by: unmodeled-tyler <unmodeled.tyler@proton.me>	2026-03-17 02:22:12 -07:00
Teknium	6a320e8bfe	fix(security): block sandbox backend creds from subprocess env (#1264 ) * fix: prevent infinite 400 failure loop on context overflow (#1630) When a gateway session exceeds the model's context window, Anthropic may return a generic 400 invalid_request_error with just 'Error' as the message. This bypassed the phrase-based context-length detection, causing the agent to treat it as a non-retryable client error. Worse, the failed user message was still persisted to the transcript, making the session even larger on each attempt — creating an infinite loop. Three-layer fix: 1. run_agent.py — Fallback heuristic: when a 400 error has a very short generic message AND the session is large (>40% of context or >80 messages), treat it as a probable context overflow and trigger compression instead of aborting. 2. run_agent.py + gateway/run.py — Don't persist failed messages: when the agent returns failed=True before generating any response, skip writing the user's message to the transcript/DB. This prevents the session from growing on each failure. 3. gateway/run.py — Smarter error messages: detect context-overflow failures and suggest /compact or /reset specifically, instead of a generic 'try again' that will fail identically. * fix(skills): detect prompt injection patterns and block cache file reads Adds two security layers to prevent prompt injection via skills hub cache files (#1558): 1. read_file: blocks direct reads of ~/.hermes/skills/.hub/ directory (index-cache, catalog files). The 3.5MB clawhub_catalog_v1.json was the original injection vector — untrusted skill descriptions in the catalog contained adversarial text that the model executed. 2. skill_view: warns when skills are loaded from outside the trusted ~/.hermes/skills/ directory, and detects common injection patterns in skill content ("ignore previous instructions", "<system>", etc.). Cherry-picked from PR #1562 by ygd58. * fix(tools): chunk long messages in send_message_tool before dispatch (#1552) Long messages sent via send_message tool or cron delivery silently failed when exceeding platform limits. Gateway adapters handle this via truncate_message(), but the standalone senders in send_message_tool bypassed that entirely. - Apply truncate_message() chunking in _send_to_platform() before dispatching to individual platform senders - Remove naive message[i:i+2000] character split in _send_discord() in favor of centralized smart splitting - Attach media files to last chunk only for Telegram - Add regression tests for chunking and media placement Cherry-picked from PR #1557 by llbn. * fix(approval): show full command in dangerous command approval (#1553) Previously the command was truncated to 80 chars in CLI (with a [v]iew full option), 500 chars in Discord embeds, and missing entirely in Telegram/Slack approval messages. Now the full command is always displayed everywhere: - CLI: removed 80-char truncation and [v]iew full menu option - Gateway (TG/Slack): approval_required message includes full command in a code block - Discord: embed shows full command up to 4096-char limit - Windows: skip SIGALRM-based test timeout (Unix-only) - Updated tests: replaced view-flow tests with direct approval tests Cherry-picked from PR #1566 by crazywriter1. * fix(cli): flush stdout during agent loop to prevent macOS display freeze (#1624) The interrupt polling loop in chat() waited on the queue without invalidating the prompt_toolkit renderer. On macOS, the StdoutProxy buffer only flushed on input events, causing the CLI to appear frozen during tool execution until the user typed a key. Fix: call _invalidate() on each queue timeout (every ~100ms, throttled to 150ms) to force the renderer to flush buffered agent output. * fix(claw): warn when API keys are skipped during OpenClaw migration (#1580) When --migrate-secrets is not passed (the default), API keys like OPENROUTER_API_KEY are silently skipped with no warning. Users don't realize their keys weren't migrated until the agent fails to connect. Add a post-migration warning with actionable instructions: either re-run with --migrate-secrets or add the key manually via hermes config set. Cherry-picked from PR #1593 by ygd58. * fix(security): block sandbox backend creds from subprocess env (#1264) Add Modal and Daytona sandbox credentials to the subprocess env blocklist so they're not leaked to agent terminal sessions via printenv/env. Cherry-picked from PR #1571 by ygd58. --------- Co-authored-by: buray <ygd58@users.noreply.github.com> Co-authored-by: lbn <llbn@users.noreply.github.com> Co-authored-by: crazywriter1 <53251494+crazywriter1@users.noreply.github.com>	2026-03-17 02:20:42 -07:00
Teknium	cb0deb5f9d	feat: add NeuTTS optional skill + local TTS provider backend * feat(skills): add bundled neutts optional skill Add NeuTTS optional skill with CLI scaffold, bootstrap helper, and sample voice profile. Also fixes skills_hub.py to handle binary assets (WAV files) during skill installation. Changes: - optional-skills/mlops/models/neutts/ — skill + CLI scaffold - tools/skills_hub.py — binary asset support (read_bytes, write_bytes) - tests/tools/test_skills_hub.py — regression tests for binary assets * feat(tts): add NeuTTS as local TTS provider backend Add NeuTTS as a fourth TTS provider option alongside Edge, ElevenLabs, and OpenAI. NeuTTS runs fully on-device via neutts_cli — no API key needed. Provider behavior: - Explicit: set tts.provider to 'neutts' in config.yaml - Fallback: when Edge TTS is unavailable and neutts_cli is installed, automatically falls back to NeuTTS instead of failing - check_tts_requirements() now includes NeuTTS in availability checks NeuTTS outputs WAV natively. For Telegram voice bubbles, ffmpeg converts to Opus (same pattern as Edge TTS). Changes: - tools/tts_tool.py — _generate_neutts(), _check_neutts_available(), provider dispatch, fallback logic, Opus conversion - hermes_cli/config.py — tts.neutts config defaults --------- Co-authored-by: unmodeled-tyler <unmodeled.tyler@proton.me>	2026-03-17 02:13:34 -07:00
0xbyt4	68fbcdaa06	fix: add browser_console to browser toolset and core tools list (#1084 ) browser_console was registered in the tool registry but missing from all toolset definitions (TOOLSETS, _HERMES_CORE_TOOLS, _LEGACY_TOOLSET_MAP), so the agent could never discover or use it. Added to all 4 locations + 4 wiring tests. Cherry-picked from PR #1084 by @0xbyt4 (authorship preserved in tests).	2026-03-17 02:02:57 -07:00
teknium1	7d91b436e4	fix: exclude hidden directories from find/grep search backends (#1558 ) The primary injection vector in #1558 was search_files discovering catalog cache files in .hub/index-cache/ via find or grep, which don't skip hidden directories like ripgrep does by default. Three-layer fix: 1. _search_files (find): add -not -path '/.' to exclude hidden directories, matching ripgrep's default behavior. 2. _search_with_grep: add --exclude-dir='.*' to skip hidden directories in the grep fallback path. 3. _write_index_cache: write a .ignore file to .hub/ so ripgrep also skips it even when invoked with --hidden (belt-and-suspenders). This makes all three search backends (rg, grep, find) consistently exclude hidden directories, preventing the agent from discovering and reading unvetted community content in hub cache files.	2026-03-17 02:02:57 -07:00
Teknium	4cb6735541	fix(approval): show full command in dangerous command approval (#1553 ) * fix: prevent infinite 400 failure loop on context overflow (#1630) When a gateway session exceeds the model's context window, Anthropic may return a generic 400 invalid_request_error with just 'Error' as the message. This bypassed the phrase-based context-length detection, causing the agent to treat it as a non-retryable client error. Worse, the failed user message was still persisted to the transcript, making the session even larger on each attempt — creating an infinite loop. Three-layer fix: 1. run_agent.py — Fallback heuristic: when a 400 error has a very short generic message AND the session is large (>40% of context or >80 messages), treat it as a probable context overflow and trigger compression instead of aborting. 2. run_agent.py + gateway/run.py — Don't persist failed messages: when the agent returns failed=True before generating any response, skip writing the user's message to the transcript/DB. This prevents the session from growing on each failure. 3. gateway/run.py — Smarter error messages: detect context-overflow failures and suggest /compact or /reset specifically, instead of a generic 'try again' that will fail identically. * fix(skills): detect prompt injection patterns and block cache file reads Adds two security layers to prevent prompt injection via skills hub cache files (#1558): 1. read_file: blocks direct reads of ~/.hermes/skills/.hub/ directory (index-cache, catalog files). The 3.5MB clawhub_catalog_v1.json was the original injection vector — untrusted skill descriptions in the catalog contained adversarial text that the model executed. 2. skill_view: warns when skills are loaded from outside the trusted ~/.hermes/skills/ directory, and detects common injection patterns in skill content ("ignore previous instructions", "<system>", etc.). Cherry-picked from PR #1562 by ygd58. * fix(tools): chunk long messages in send_message_tool before dispatch (#1552) Long messages sent via send_message tool or cron delivery silently failed when exceeding platform limits. Gateway adapters handle this via truncate_message(), but the standalone senders in send_message_tool bypassed that entirely. - Apply truncate_message() chunking in _send_to_platform() before dispatching to individual platform senders - Remove naive message[i:i+2000] character split in _send_discord() in favor of centralized smart splitting - Attach media files to last chunk only for Telegram - Add regression tests for chunking and media placement Cherry-picked from PR #1557 by llbn. * fix(approval): show full command in dangerous command approval (#1553) Previously the command was truncated to 80 chars in CLI (with a [v]iew full option), 500 chars in Discord embeds, and missing entirely in Telegram/Slack approval messages. Now the full command is always displayed everywhere: - CLI: removed 80-char truncation and [v]iew full menu option - Gateway (TG/Slack): approval_required message includes full command in a code block - Discord: embed shows full command up to 4096-char limit - Windows: skip SIGALRM-based test timeout (Unix-only) - Updated tests: replaced view-flow tests with direct approval tests Cherry-picked from PR #1566 by crazywriter1. --------- Co-authored-by: buray <ygd58@users.noreply.github.com> Co-authored-by: lbn <llbn@users.noreply.github.com> Co-authored-by: crazywriter1 <53251494+crazywriter1@users.noreply.github.com>	2026-03-17 02:02:33 -07:00
Teknium	12afccd9ca	fix(tools): chunk long messages in send_message_tool before dispatch (#1552 ) * fix: prevent infinite 400 failure loop on context overflow (#1630) When a gateway session exceeds the model's context window, Anthropic may return a generic 400 invalid_request_error with just 'Error' as the message. This bypassed the phrase-based context-length detection, causing the agent to treat it as a non-retryable client error. Worse, the failed user message was still persisted to the transcript, making the session even larger on each attempt — creating an infinite loop. Three-layer fix: 1. run_agent.py — Fallback heuristic: when a 400 error has a very short generic message AND the session is large (>40% of context or >80 messages), treat it as a probable context overflow and trigger compression instead of aborting. 2. run_agent.py + gateway/run.py — Don't persist failed messages: when the agent returns failed=True before generating any response, skip writing the user's message to the transcript/DB. This prevents the session from growing on each failure. 3. gateway/run.py — Smarter error messages: detect context-overflow failures and suggest /compact or /reset specifically, instead of a generic 'try again' that will fail identically. * fix(skills): detect prompt injection patterns and block cache file reads Adds two security layers to prevent prompt injection via skills hub cache files (#1558): 1. read_file: blocks direct reads of ~/.hermes/skills/.hub/ directory (index-cache, catalog files). The 3.5MB clawhub_catalog_v1.json was the original injection vector — untrusted skill descriptions in the catalog contained adversarial text that the model executed. 2. skill_view: warns when skills are loaded from outside the trusted ~/.hermes/skills/ directory, and detects common injection patterns in skill content ("ignore previous instructions", "<system>", etc.). Cherry-picked from PR #1562 by ygd58. * fix(tools): chunk long messages in send_message_tool before dispatch (#1552) Long messages sent via send_message tool or cron delivery silently failed when exceeding platform limits. Gateway adapters handle this via truncate_message(), but the standalone senders in send_message_tool bypassed that entirely. - Apply truncate_message() chunking in _send_to_platform() before dispatching to individual platform senders - Remove naive message[i:i+2000] character split in _send_discord() in favor of centralized smart splitting - Attach media files to last chunk only for Telegram - Add regression tests for chunking and media placement Cherry-picked from PR #1557 by llbn. --------- Co-authored-by: buray <ygd58@users.noreply.github.com> Co-authored-by: lbn <llbn@users.noreply.github.com>	2026-03-17 01:52:43 -07:00
Teknium	81f76111b0	Merge pull request #1560 from eren-karakus0/fix/singularity-preflight-check fix(terminal): add Singularity/Apptainer preflight availability check	2026-03-17 01:52:03 -07:00
teknium1	19eaf5d956	test: fix telegram mock to include ParseMode constant The MarkdownV2 formatting change imports telegram.constants.ParseMode, which the test mock didn't provide. Add ParseMode to the mock so existing tests continue working.	2026-03-17 01:44:11 -07:00
Teknium	949fac192f	fix(tools): remove unnecessary crontab requirement from cronjob tool (#1638 ) * fix(tools): remove unnecessary crontab requirement from cronjob tool The hermes cron system is internal — it uses a JSON-based scheduler ticked by the gateway (cron/scheduler.py), not system crontab. The check for shutil.which('crontab') was preventing the cronjob tool from being available in environments without crontab installed (e.g. minimal Ubuntu containers). Changes: - Remove shutil.which('crontab') check from check_cronjob_requirements() - Remove unused shutil import - Update docstring to clarify internal scheduler is used - Update tests to reflect new behavior and add coverage for all session modes (interactive, gateway, exec_ask) Fixes #1589 * test: add HERMES_EXEC_ASK coverage for cronjob requirements Adds missing test for the exec_ask session mode, complementing the cherry-picked fix from PR #1633. --------- Co-authored-by: Bartok9 <bartokmagic@proton.me>	2026-03-17 01:40:02 -07:00
Teknium	e3f9894caf	fix: send_animation metadata, MarkdownV2 inline code splitting, tirith cosign-free install (#1626 ) * fix: Anthropic OAuth compatibility — Claude Code identity fingerprinting Anthropic routes OAuth/subscription requests based on Claude Code's identity markers. Without them, requests get intermittent 500 errors (~25% failure rate observed). This matches what pi-ai (clawdbot) and OpenCode both implement for OAuth compatibility. Changes (OAuth tokens only — API key users unaffected): 1. Headers: user-agent 'claude-cli/2.1.2 (external, cli)' + x-app 'cli' 2. System prompt: prepend 'You are Claude Code, Anthropic's official CLI' 3. System prompt sanitization: replace Hermes/Nous references 4. Tool names: prefix with 'mcp_' (Claude Code convention for non-native tools) 5. Tool name stripping: remove 'mcp_' prefix from response tool calls Before: 9/12 OK, 1 hard fail, 4 needed retries (~25% error rate) After: 16/16 OK, 0 failures, 0 retries (0% error rate) * fix: three gateway issues from user error logs 1. send_animation missing metadata kwarg (base.py) - Base class send_animation lacked the metadata parameter that the call site in base.py line 917 passes. Telegram's override accepted it, but any platform without an override (Discord, Slack, etc.) hit TypeError. Added metadata to base class signature. 2. MarkdownV2 split-inside-inline-code (base.py truncate_message) - truncate_message could split at a space inside an inline code span (e.g. `function(arg1, arg2)`), leaving an unpaired backtick and unescaped parentheses in the chunk. Telegram rejects with 'character ( is reserved'. Added inline code awareness to the split-point finder — detects odd backtick counts and moves the split before the code span. 3. tirith auto-install without cosign (tirith_security.py) - Previously required cosign on PATH for auto-install, blocking install entirely with a warning if missing. Now proceeds with SHA-256 checksum verification only when cosign is unavailable. Cosign is still used for full supply chain verification when present. If cosign IS present but verification explicitly fails, install is still aborted (tampered release).	2026-03-16 23:39:41 -07:00
Muhammet Eren Karakuş	43b8ecd172	fix(tests): use case-insensitive regex in singularity preflight tests pytest.raises(match=...) is case-sensitive by default. The error message starts with "Neither" (capital N) but the regex used lowercase "neither", causing CI failures on Linux.	2026-03-16 19:01:39 +03:00
Muhammet Eren Karakuş	606f57a3ab	fix(terminal): add Singularity/Apptainer preflight availability check When neither apptainer nor singularity is installed, the Singularity backend silently defaults to "singularity" and fails with a cryptic FileNotFoundError inside _start_instance(). Add a preflight check that resolves the executable and verifies it responds, raising a clear RuntimeError with install instructions on failure. Closes #1511	2026-03-16 18:25:20 +03:00
teknium1	b72f522e30	test: fake minisweagent for docker cwd mount regressions Make the new Docker cwd-mount tests pass in CI environments that do not have the minisweagent package installed by injecting a fake module instead of monkeypatching an import path that may not exist.	2026-03-16 05:40:05 -07:00
teknium1	780ddd102b	fix(docker): gate cwd workspace mount behind config Keep Docker sandboxes isolated by default. Add an explicit terminal.docker_mount_cwd_to_workspace opt-in, thread it through terminal/file environment creation, and document the security tradeoff and config.yaml workflow clearly.	2026-03-16 05:20:56 -07:00
Bartok9	8cdbbcaaa2	fix(docker): auto-mount host CWD to /workspace Fixes #1445 — When using Docker backend, the user's current working directory is now automatically bind-mounted to /workspace inside the container. This allows users to run `cd my-project && hermes` and have their project files accessible to the agent without manual volume config. Changes: - Add host_cwd and auto_mount_cwd parameters to DockerEnvironment - Capture original host CWD in _get_env_config() before container fallback - Pass host_cwd through _create_environment() to Docker backend - Add TERMINAL_DOCKER_NO_AUTO_MOUNT env var to disable if needed - Skip auto-mount when /workspace is already explicitly mounted - Add tests for auto-mount behavior - Add documentation for the new feature The auto-mount is skipped when: 1. TERMINAL_DOCKER_NO_AUTO_MOUNT=true is set 2. User configured docker_volumes with :/workspace 3. persistent_filesystem=true (persistent sandbox mode) This makes the Docker backend behave more intuitively — the agent operates on the user's actual project directory by default.	2026-03-16 05:20:21 -07:00
Teknium	c1da1fdcd5	feat: auto-detect provider when switching models via /model (#1506 ) When typing /model deepseek-chat while on a different provider, the model name now auto-resolves to the correct provider instead of silently staying on the wrong one and causing API errors. Detection priority: 1. Direct provider with credentials (e.g. DEEPSEEK_API_KEY set) 2. OpenRouter catalog match with proper slug remapping 3. Direct provider without creds (clear error beats silent failure) Also adds DeepSeek as a first-class API-key provider — just set DEEPSEEK_API_KEY and /model deepseek-chat routes directly. Bare model names get remapped to proper OpenRouter slugs: /model gpt-5.4 → openai/gpt-5.4 /model claude-opus-4.6 → anthropic/claude-opus-4.6 Salvages the concept from PR #1177 by @virtaava with credential awareness and OpenRouter slug mapping added. Co-authored-by: virtaava <virtaava@users.noreply.github.com>	2026-03-16 04:34:45 -07:00
Teknium	dd7921d514	fix(honcho): isolate session routing for multi-user gateway (#1500 ) Salvaged from PR #1470 by adavyas. Core fix: Honcho tool calls in a multi-session gateway could route to the wrong session because honcho_tools.py relied on process-global state. Now threads session context through the call chain: AIAgent._invoke_tool() → handle_function_call() → registry.dispatch() → handler **kw → _resolve_session_context() Changes: - Add _resolve_session_context() to prefer per-call context over globals - Plumb honcho_manager + honcho_session_key through handle_function_call - Add sync_honcho=False to run_conversation() for synthetic flush turns - Pass honcho_session_key through gateway memory flush lifecycle - Harden gateway PID detection when /proc cmdline is unreadable - Make interrupt test scripts import-safe for pytest-xdist - Wrap BibTeX examples in Jekyll raw blocks for docs build - Fix thread-order-dependent assertion in client lifecycle test - Expand Honcho docs: session isolation, lifecycle, routing internals Dropped from original PR: - Indentation change in _create_request_openai_client that would move client creation inside the lock (causes unnecessary contention) Co-authored-by: adavyas <adavyas@users.noreply.github.com>	2026-03-16 00:23:47 -07:00
teknium1	1f72ce71b7	fix: restore local STT fallback for gateway voice notes Restore local STT command fallback for voice transcription, detect whisper and ffmpeg in common local install paths, and avoid bogus no-provider messaging when only a backend-specific key is missing.	2026-03-15 21:51:40 -07:00
teknium1	01e62c067b	merge: resolve conflicts with origin/main (SSH preflight check)	2026-03-15 21:13:40 -07:00
Teknium	ceb970c559	fix(terminal): add SSH preflight check (#1486 )	2026-03-15 21:09:07 -07:00
teknium1	210d5ade1e	feat(tools): centralize tool emoji metadata in registry + skin integration - Add 'emoji' field to ToolEntry and 'get_emoji()' to ToolRegistry - Add emoji= to all 50+ registry.register() calls across tool files - Add get_tool_emoji() helper in agent/display.py with 3-tier resolution: skin override → registry default → hardcoded fallback - Replace hardcoded emoji maps in run_agent.py, delegate_tool.py, and gateway/run.py with centralized get_tool_emoji() calls - Add 'tool_emojis' field to SkinConfig so skins can override per-tool emojis (e.g. ares skin could use swords instead of wrenches) - Add 11 tests (5 registry emoji, 6 display/skin integration) - Update AGENTS.md skin docs table Based on the approach from PR #1061 by ForgingAlex (emoji centralization in registry). This salvage fixes several issues from the original: - Does NOT split the cronjob tool (which would crash on missing schemas) - Does NOT change image_generate toolset/requires_env/is_async - Does NOT delete existing tests - Completes the centralization (gateway/run.py was missed) - Hooks into the skin system for full customizability	2026-03-15 20:21:21 -07:00
teknium1	33ebedc76d	feat: enable persistent shell by default for SSH, add config option SSH persistent shell now defaults to true — non-local backends benefit most from state persistence across execute() calls. Local backend remains opt-in via TERMINAL_LOCAL_PERSISTENT env var. New config.yaml option: terminal.persistent_shell (default: true) Controls the default for non-local backends. Users can disable with: hermes config set terminal.persistent_shell false Precedence: per-backend env var > TERMINAL_PERSISTENT_SHELL > default. Wired through cli.py, gateway/run.py, and hermes_cli/config.py so the config.yaml value reaches terminal_tool via env var bridge.	2026-03-15 20:17:13 -07:00
teknium1	5b80654198	feat(tools): add persistent shell mode to local and SSH backends Cherry-picked from PR #1067 by alt-glitch. Adds PersistentShellMixin with file-based IPC protocol for long-lived bash shells. LocalEnvironment and SSHEnvironment gain persistent=True option. Controlled via TERMINAL_LOCAL_PERSISTENT / TERMINAL_SSH_PERSISTENT env vars. Fixes latent stderr pipe buffer deadlock. Co-authored-by: alt-glitch <balyan.sid@gmail.com>	2026-03-15 20:13:02 -07:00
Teknium	471c663fdf	fix(cli): silence tirith prefetch install warnings at startup (#1452 )	2026-03-15 18:07:03 -07:00
Teknium	64d333204b	Merge pull request #1242 from NousResearch/fix/file-tool-log-noise fix: reduce file tool log noise	2026-03-15 11:11:18 -07:00
alt-glitch	4511322f56	Merge origin/main into sid/persistent-backend Resolve conflict in local.py: keep refactored _make_run_env helper over inline _sanitize_subprocess_env logic.	2026-03-15 21:08:11 +05:30
teyrebaz33	20f381cfb6	fix: preserve thread context for cronjob deliver=origin When a cronjob is created from within a Telegram or Slack thread, deliver=origin was posting to the parent channel instead of the thread. Root cause: the gateway never set HERMES_SESSION_THREAD_ID in the session environment, so cronjob_tools.py could not capture thread_id into the job's origin metadata — even though the scheduler already reads origin.get('thread_id'). Fix: - gateway/run.py: set HERMES_SESSION_THREAD_ID when thread_id is present on the session context, and clear it in _clear_session_env - tools/cronjob_tools.py: read HERMES_SESSION_THREAD_ID into origin Closes #1219	2026-03-15 06:57:00 -07:00
teknium1	b177b4abad	fix(security): block gateway and tool env vars in subprocesses Extend subprocess env sanitization beyond provider credentials by blocking Hermes-managed tool, messaging, and related gateway runtime vars. Reuse a shared sanitizer in LocalEnvironment and ProcessRegistry so background and PTY processes honor the same blocklist and _HERMES_FORCE_ escape hatch. Add regression coverage for local env execution and process_registry spawning.	2026-03-15 02:51:04 -07:00
Teknium	fd0e1aac72	Merge pull request #1400 from NousResearch/hermes/hermes-45b79a59-clawhub-search fix: harden ClawHub skill search exact matches	2026-03-14 23:17:24 -07:00
teknium1	8ccd14a0d4	fix: improve clawhub skill search matching	2026-03-14 23:15:04 -07:00
teknium1	df9020dfa3	fix: harden clawhub skill search exact matches	2026-03-14 22:31:09 -07:00
Teknium	c6fb7f6463	Merge pull request #1399 from NousResearch/hermes/hermes-629f8bde fix(#1002): expand environment blocklist for terminal isolation	2026-03-14 22:30:05 -07:00
teknium1	672dc1666f	test: cover extra provider env blocklist vars	2026-03-14 22:29:35 -07:00
Teknium	5b11570517	Merge pull request #1398 from NousResearch/hermes/hermes-1b6f4583 fix(cron): support per-job runtime overrides	2026-03-14 22:29:30 -07:00
Synergy	28b3764d1e	fix(cron): support per-job runtime overrides Salvaged from PR #1292 onto current main. Preserve per-job model, provider, and base_url overrides in cron execution, persist them in job records, expose them through the cronjob tool create/update paths, and add regression coverage. Deliberately does not persist per-job api_key values.	2026-03-14 22:22:31 -07:00
Teknium	62f1c2b622	Merge pull request #1397 from NousResearch/hermes/hermes-629f8bde fix: escape parens and braces in fork bomb regex pattern	2026-03-14 22:17:16 -07:00
Teknium	84d99f7754	Merge pull request #1394 from NousResearch/hermes/hermes-eca4a640 fix: honor stt.enabled false across gateway transcription	2026-03-14 22:11:47 -07:00
teknium1	d5b64ebdb3	fix: preserve legacy approval keys after pattern key migration	2026-03-14 22:10:39 -07:00
teknium1	f8ceadbad0	fix: propagate STT disable through shared transcription config - add stt.enabled to the default user config - make transcription_tools respect the disabled flag globally - surface disabled state cleanly in voice mode diagnostics - add regression coverage for disabled STT provider selection	2026-03-14 22:09:59 -07:00
0xbyt4	4a93cfd889	fix: use description as pattern_key to prevent approval collisions pattern_key was derived by splitting the regex on \b and taking [1], so patterns starting with the same word (e.g. find -exec rm and find -delete) produced the same key "find". Approving one silently approved the other. Using the unique description string as the key eliminates all collisions.	2026-03-14 22:07:58 -07:00
0xbyt4	e6417cb7bc	fix: escape parens and braces in fork bomb regex pattern The fork bomb regex used `()` (empty capture group) and unescaped `{}` instead of literal `` and `\{\}`. This meant the classic fork bomb `:(){ :\|:& };:` was never detected. Also added `\s*` between `:` and `&` and between `;` and trailing `:` to catch whitespace variants.	2026-03-14 22:06:44 -07:00
Teknium	f9a61a0d9e	Merge pull request #1383 from NousResearch/hermes/hermes-7ef7cb6a fix: add project root to PYTHONPATH in execute_code sandbox	2026-03-14 21:41:50 -07:00
teknium1	0614969f7b	test: cover repo-root imports in execute_code sandbox	2026-03-14 21:41:12 -07:00
teknium1	f6ff6639e8	fix: complete salvaged cronjob dependency check Add regression coverage for cronjob availability and import shutil for the crontab PATH check added from PR #1380.	2026-03-14 21:39:59 -07:00
teknium1	9f6bccd76a	feat: add direct endpoint overrides for auxiliary and delegation Add base_url/api_key overrides for auxiliary tasks and delegation so users can route those flows straight to a custom OpenAI-compatible endpoint without having to rely on provider=main or named custom providers. Also clear gateway session env vars in test isolation so the full suite stays deterministic when run from a messaging-backed agent session.	2026-03-14 21:11:37 -07:00
Teknium	88a48037d1	Merge pull request #1367 from NousResearch/hermes/hermes-aa701810 refactor: unify vision backend gating	2026-03-14 20:31:58 -07:00
teknium1	dc11b86e4b	refactor: unify vision backend gating	2026-03-14 20:22:13 -07:00
teknium1	3229e434b8	Merge origin/main into hermes/hermes-5d160594	2026-03-14 19:34:05 -07:00
teknium1	2536ff328b	fix: prefer prompt names for multi-skill cron jobs	2026-03-14 19:28:52 -07:00
teknium1	c3ea620796	feat: add multi-skill cron editing and docs	2026-03-14 19:18:10 -07:00
teknium1	7b140b31e6	fix: suppress duplicate cron sends to auto-delivery targets Allow cron runs to keep using send_message for additional destinations, but skip same-target sends when the scheduler will already auto-deliver the final response there. Add prompt/tool guidance, docs, and regression coverage for origin/home-channel resolution and thread-aware comparisons.	2026-03-14 19:07:50 -07:00
alt-glitch	879b7d3fbf	fix(tests): update mock stdout in env blocklist tests The fake_popen mock used iter([]) for proc.stdout which doesn't support .close(). Use MagicMock with __iter__ instead, since _drain_stdout now calls proc.stdout.close() in its finally block. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-15 02:48:05 +05:30
balyan.sid@gmail.com	9001b34146	simplify docstrings, fix some bugs	2026-03-15 01:20:42 +05:30
balyan.sid@gmail.com	861202b56c	wip: add persistent shell to ssh and local terminal backends	2026-03-15 01:20:42 +05:30
teknium1	df5c61b37c	feat: compress cron management into one tool	2026-03-14 12:21:50 -07:00
Teknium	1114841a2c	Merge pull request #1329 from NousResearch/hermes/hermes-2f2b4807 fix: tighten memory and session recall guidance	2026-03-14 11:38:54 -07:00
teknium1	5319bb6ac4	fix: tighten memory and session recall guidance Remove diary-style memory framing from the system prompt and memory tool schema, explicitly steer task/session logs to session_search, and clarify that session_search is for cross-session recall after checking the current conversation first. Add regression tests for the updated guidance text.	2026-03-14 11:36:47 -07:00
Teknium	80a243efe6	Merge pull request #1333 from NousResearch/hermes/hermes-1fc28d17 fix: improve browser cleanup, local browser PATH setup, and screenshot recovery	2026-03-14 11:36:09 -07:00
Dave Tist	895fe5a5d3	Fix browser cleanup consistency and screenshot recovery Unify browser session teardown so manual close, inactivity cleanup, and emergency shutdown all follow the same cleanup path instead of partially duplicating logic. This changes browser_close() to delegate to cleanup_browser(), which means recording shutdown, Browserbase release, activity bookkeeping cleanup, and local socket-directory removal now happen consistently. It also updates emergency cleanup to route through cleanup_all_browsers() and explicitly clear in-memory tracking state after teardown so stale active-session, last-activity, and recording entries are not left behind on exit. The screenshot fallback path has also been fixed. _extract_screenshot_path_from_text() now matches real absolute PNG paths, including quoted output, so browser_vision() can recover screenshots when agent-browser emits human-readable text instead of JSON. Regression coverage was added in tests/tools/test_browser_cleanup.py for screenshot path extraction, cleanup_browser() state removal, browser_close() delegation, and emergency cleanup state clearing. Verified with: - python -m pytest tests/tools/test_browser_cleanup.py -q - python -m pytest tests/tools/test_browser_console.py tests/gateway/test_send_image_file.py -q	2026-03-14 11:28:26 -07:00
Stable Genius	3325e51e53	fix(skills): honor policy table for dangerous verdicts Salvaged from PR #1007 by stablegenius49. - let INSTALL_POLICY decide dangerous verdict handling for builtin skills - allow --force to override blocked dangerous decisions for trusted and community sources - accept --yes / -y as aliases for --force in /skills install - update regression tests to match the intended policy precedence	2026-03-14 11:27:02 -07:00
Teknium	681f1068ea	Merge pull request #1303 from NousResearch/hermes/hermes-aa653753 feat(skills): integrate skills.sh as a hub source	2026-03-14 09:48:18 -07:00
teknium1	05770520af	test(skills): isolate well-known cache in adapter tests Prevent the mocked well-known adapter tests from sharing index-cache state across runs or xdist workers.	2026-03-14 08:24:59 -07:00
teknium1	43d25af964	feat(skills): add update checks and well-known support Round out the skills hub integration with: - richer skills.sh metadata and security surfacing during inspect/install - generic check/update flows for hub-installed skills - support for well-known Agent Skills endpoints via /.well-known/skills/index.json Also persist upstream bundle metadata in the lock file and add regression coverage plus live-compatible path handling for both skills.sh aliases and well-known endpoints.	2026-03-14 08:21:16 -07:00
Teknium	707f3ff41f	refactor: tighten MoA traceback logging scope (#1307 ) * improve: add exc_info to MoA error logging * refactor: tighten MoA traceback logging scope Follow up on salvaged PR #998 by limiting exc_info logging to terminal failure paths, avoiding duplicate aggregator errors, and refreshing the MoA default OpenRouter model lineup to current frontier options. --------- Co-authored-by: aydnOktay <xaydinoktay@gmail.com>	2026-03-14 07:53:56 -07:00
teknium1	02c307b004	fix(skills): resolve skills.sh alias installs Harden the skills.sh hub adapter by parsing skill detail pages when search slugs do not map cleanly onto GitHub skill folder names. This adds detail-page resolution for alias-style skills, improves inspect metadata from the page itself, and covers the behavior with regression tests plus live smoke validation for json-render-react.	2026-03-14 06:50:25 -07:00
Teknium	95c0bee7f8	Merge pull request #1299 from NousResearch/hermes/hermes-f5fb1d3b fix: salvage PR #327 voice mode onto current main	2026-03-14 06:45:20 -07:00
teknium1	483a0b5233	feat(skills): integrate skills.sh as a hub source Add a skills.sh-backed source adapter for the Hermes Skills Hub. The new adapter uses skills.sh search results for discovery, falls back to featured homepage links for browse-style queries, and resolves installs / inspects through the underlying GitHub repo using common Agent Skills layout conventions. Also expose skills-sh in CLI source filters and add regression coverage for search, alias resolution, and source routing.	2026-03-14 06:23:36 -07:00
teknium1	04e151714f	feat(mcp): make selective tool loading capability-aware Extend the salvaged MCP filtering work so utility tools are also governed by policy and server capabilities. Store the registered tool subset per server so rediscovery and status reporting stay accurate after filtering.	2026-03-14 06:22:02 -07:00
teyrebaz33	3198cc8fd9	feat(mcp): per-server tool filtering via include/exclude and enabled flag Add optional config keys under each mcp_servers entry: - tools.include: whitelist, only listed tools are registered - tools.exclude: blacklist, all tools except listed are registered - enabled: false: skip server entirely, no connection attempt Backward-compatible: no config keys = all tools registered as before. Tests: TestMCPSelectiveToolLoading (4 tests), 134 passed total.	2026-03-14 06:12:17 -07:00
Oktay Aydin	00a0f18544	fix: clearer terminal backend requirement errors Salvaged from PR #979 onto current main. Preserve the current terminal backend checks while surfacing actionable preflight errors for unknown TERMINAL_ENV values, missing SSH host/user configuration, and missing Modal credentials/config. Tighten the modal regression test so it deterministically exercises the config-missing path.	2026-03-14 06:04:39 -07:00
teknium1	523a1b6faf	merge: salvage PR #327 voice mode branch Merge contributor branch feature/voice-mode onto current main for follow-up fixes.	2026-03-14 06:03:07 -07:00
Teknium	b646440ca0	fix(mcp): resolve npx stdio connection failures (#1291 ) Salvaged from PR #977 onto current main. Preserves the MCP stdio command resolution and improved error diagnostics, with deterministic regression tests for the npx/node PATH cases. Co-authored-by: kshitij <82637225+kshitijk4poor@users.noreply.github.com>	2026-03-14 05:44:00 -07:00
0xbyt4	eb34c0b09a	fix: voice pipeline hardening — 7 bug fixes with tests 1. Anthropic + ElevenLabs TTS silence: forward full response to TTS callback for non-streaming providers (choices first, then native content blocks fallback). 2. Subprocess timeout kill: play_audio_file now kills the process on TimeoutExpired instead of leaving zombie processes. 3. Discord disconnect cleanup: leave all voice channels before closing the client to prevent leaked state. 4. Audio stream leak: close InputStream if stream.start() fails. 5. Race condition: read/write _on_silence_stop under lock in audio callback thread. 6. _vprint force=True: show API error, retry, and truncation messages even during streaming TTS. 7. _refresh_level lock: read _voice_recording under _voice_lock.	2026-03-14 14:27:21 +03:00
0xbyt4	35748a2fb0	fix: address PR review round 4 — remove web UI, fix audio/import/interface issues Remove web UI gateway (web.py, tests, docs, toolset, env vars, Platform.WEB enum) per maintainer request — Nous is building their own official chat UI. Fix 1: Replace sd.wait() with polling pattern in play_audio_file() to prevent indefinite hang when audio device stalls (consistent with play_beep()). Fix 2: Use importlib.util.find_spec() for faster_whisper/openai availability checks instead of module-level imports that trigger heavy native library loading (CUDA/cuDNN) at import time. Fix 3: Remove inspect.signature() hack in _send_voice_reply() — add **kwargs to Telegram send_voice() so all adapters accept metadata uniformly. Fix 4: Make session loading resilient to removed platform enum values — skip entries with unknown platforms instead of crashing the entire gateway.	2026-03-14 14:27:21 +03:00
0xbyt4	69cb373864	fix: update /voice status to show correct STT provider Voice status was hardcoded to check API keys only. Now uses the actual provider resolution (local/groq/openai) so it correctly shows "local faster-whisper" when installed instead of "Groq" or "MISSING".	2026-03-14 14:27:21 +03:00
0xbyt4	b8f8d3ef9e	feat: integrate faster-whisper local STT with three-provider fallback Merge main's faster-whisper (local, free) with our Groq support into a unified three-provider STT pipeline: local > groq > openai. Provider priority ensures free options are tried first. Each provider has its own transcriber function with model auto-correction, env- overridable endpoints, and proper error handling. 74 tests cover the full provider matrix, fallback chains, model correction, config loading, validation edge cases, and dispatch.	2026-03-14 14:27:21 +03:00
0xbyt4	2c84979d77	refactor: extract get_stt_model_from_config helper to eliminate DRY violation Duplicated YAML config parsing for stt.model existed in gateway/run.py and gateway/platforms/discord.py. Moved to a single helper in transcription_tools.py and added 5 tests covering all edge cases.	2026-03-14 14:27:21 +03:00
0xbyt4	34c324ff59	fix(test): use real _strip_markdown_for_tts instead of duplicated copy - Import from tools.tts_tool instead of reimplementing the logic - Fix test_truncates_long_text: truncation is the caller's job, not the function's - Remove unused re import	2026-03-14 14:27:20 +03:00
0xbyt4	eb79dda04b	fix: persistent audio stream and silence detection improvements - Keep InputStream alive across recordings to avoid CoreAudio hang on repeated open/close cycles on macOS. New _ensure_stream() creates the stream once; start()/stop()/cancel() only toggle frame collection. - Add _close_stream_with_timeout() with daemon thread to prevent stream.stop()/close() from blocking indefinitely. - Add generation counter to detect stale stream-open completions after cancel or restart. - Run recorder.cancel() in background thread from Ctrl+C handler to keep the event loop responsive. - Add shutdown() method called on /voice off to release audio resources. - Fix silence timer reset during active speech: use dip tolerance for _resume_start tracker so natural speech pauses (< 0.3s) don't prevent the silence timer from being reset. - Update tests to match persistent stream behavior.	2026-03-14 14:27:20 +03:00
0xbyt4	eec04d180a	fix(test): update play_beep test to match polling-based implementation play_beep was changed from sd.wait() to a poll loop + sd.stop() in 302e1fe but the test was not updated. Now asserts sd.stop() instead of sd.wait().	2026-03-14 14:27:20 +03:00
0xbyt4	9d58cafec9	fix: move process_loop voice restart to daemon thread, use _cprint consistently - process_loop's continuous mode restart called _voice_start_recording() directly, blocking the loop if play_beep/sd.wait hangs — queued user input would stall silently. Dispatch to daemon thread like Ctrl+B handler. - Replace print() with _cprint() in _handle_voice_command for consistency with the rest of the voice mode code.	2026-03-14 14:27:20 +03:00
0xbyt4	d0e3b39e69	fix: prevent Ctrl+B key handler from blocking prompt_toolkit event loop The handle_voice_record key binding runs in prompt_toolkit's event-loop thread. When silence auto-stopped recording, _voice_recording was False but recorder.stop() still held AudioRecorder._lock. A concurrent Ctrl+B press entered the START path and blocked on that lock, freezing all keyboard input. Three changes: - Set _voice_processing atomically with _voice_recording=False in _voice_stop_and_transcribe to close the race window - Add _voice_processing guard in the START path to prevent starting while stop/transcribe is still running - Dispatch _voice_start_recording to a daemon thread so play_beep (sd.wait) and AudioRecorder.start (lock acquire) never block the event loop	2026-03-14 14:27:20 +03:00
0xbyt4	ecc3dd7c63	test: add comprehensive voice mode test coverage (86 tests) - Add TestStreamingApiCall (11 tests) for _streaming_api_call in test_run_agent.py - Add regression tests for all 7 bug fixes (edge_tts lazy import, output_stream cleanup, ctrl+c continuous reset, disable stops TTS, config key, chat cleanup, browser_tool signal handler removal) - Add real behavior tests for CLI voice methods via _make_voice_cli() fixture: TestHandleVoiceCommandReal (7), TestEnableVoiceModeReal (7), TestDisableVoiceModeReal (6), TestVoiceSpeakResponseReal (7), TestVoiceStopAndTranscribeReal (12)	2026-03-14 14:27:20 +03:00
0xbyt4	6e51729c4c	fix: remove browser_tool signal handlers that cause voice mode deadlock browser_tool.py registered SIGINT/SIGTERM handlers that called sys.exit() at module import time. When a signal arrived during a lock acquisition (e.g. AudioRecorder._lock in voice mode), SystemExit was raised inside prompt_toolkit's async event loop, corrupting coroutine state and making the process unkillable (required SIGKILL). atexit handler already ensures browser sessions are cleaned up on any normal exit path, so the signal handlers were redundant and harmful.	2026-03-14 14:27:20 +03:00
0xbyt4	ddfd6e0c59	fix: resolve 6 voice mode bugs found during audit - edge_tts NameError: _generate_edge_tts now calls _import_edge_tts() instead of referencing bare module name (tts_tool.py) - TTS thread leak: chat() finally block sends sentinel to text_queue, sets stop_event, and joins tts_thread on exception paths (cli.py) - output_stream leak: moved close() into finally block so audio device is released even on exception (tts_tool.py) - Ctrl+C continuous mode: cancel handler now resets _voice_continuous to prevent auto-restart after user cancels recording (cli.py) - _disable_voice_mode: now calls stop_playback() and sets _voice_tts_done so TTS stops when voice mode is turned off (cli.py) - _show_voice_status: reads record key from config instead of hardcoding Ctrl+B (cli.py)	2026-03-14 14:27:20 +03:00
0xbyt4	a78249230c	fix: address voice mode PR review (streaming TTS, prompt cache, _vprint) Bug A: Replace stale _HAS_ELEVENLABS/_HAS_AUDIO boolean imports with lazy import function calls (_import_elevenlabs, _import_sounddevice). The old constants no longer exist in tts_tool -- the try/except silently swallowed the ImportError, leaving streaming TTS dead. Bug B: Use user message prefix instead of modifying system prompt for voice mode instruction. Changing ephemeral_system_prompt mid-session invalidates the prompt cache. Now the concise-response hint is prepended to the user_message passed to run_conversation while conversation_history keeps the original text. Minor: Add force parameter to _vprint so critical error messages (max retries, non-retryable errors, API failures) are always shown even during streaming TTS playback. Tests: 15 new tests in test_voice_cli_integration.py covering all three fixes -- lazy import activation, message prefix behavior, history cleanliness, system prompt stability, and AST verification that all critical _vprint calls use force=True.	2026-03-14 14:27:20 +03:00
0xbyt4	b859dfab16	fix: address voice mode review feedback 1. Fully lazy imports: sounddevice, numpy, elevenlabs, edge_tts, and openai are never imported at module level. Each is imported only when the feature is explicitly activated, preventing crashes in headless environments (SSH, Docker, WSL, no PortAudio). 2. No core agent loop changes: streaming TTS path extracted from _interruptible_api_call() into separate _streaming_api_call() method. The original method is restored to its upstream form. 3. Configurable key binding: push-to-talk key changed from Ctrl+R (conflicts with readline reverse-search) to Ctrl+B by default. Configurable via voice.push_to_talk_key in config.yaml. 4. Environment detection: new detect_audio_environment() function checks for SSH, Docker, WSL, and missing audio devices before enabling voice mode. Auto-disables with clear warnings in incompatible environments. 5. Graceful degradation: every audio touchpoint (sd.play, sd.InputStream, sd.OutputStream) wrapped in try/except with ImportError/OSError handling. Failures produce warnings, not crashes.	2026-03-14 14:27:20 +03:00
0xbyt4	dad865e920	fix: fix silence detection bugs and add Phase 4 voice mode features Fix 3 critical bugs in silence detection: - Micro-pause tolerance now tracks dip duration (not time since speech start) - Peak RMS check in stop() prevents discarding recordings with real speech - Reduced min_speech_duration from 0.5s to 0.3s for reliable speech confirmation Phase 4 features: configurable silence params, visual audio level indicator, voice system prompt, tool call audio cues, TTS interrupt, continuous mode auto-restart, interruptable playback via Popen tracking.	2026-03-14 14:26:30 +03:00
0xbyt4	32b033c11c	feat: add silence filter, hallucination guard, and continuous mode control - Skip silent recordings before STT call (RMS check in AudioRecorder.stop) - Filter known Whisper hallucinations ("Thank you.", "Bye." etc.) - Continuous mode: Ctrl+R starts loop, Ctrl+R during recording exits it - Wait for TTS to finish before auto-restart to avoid recording speaker - Silence timeout increased to 3s for natural pauses - Tests: hallucination filter, silent recording skip, real speech passthrough	2026-03-14 14:25:28 +03:00
0xbyt4	bfd9c97705	feat: add Phase 4 low-latency features for voice mode - Audio cues: beep on record start (880Hz), double beep on stop (660Hz) - Silence detection: auto-stop recording after 3s of silence (RMS-based) - Continuous mode: auto-restart recording after agent responds - Ctrl+R starts continuous mode, Ctrl+R during recording exits it - Waits for TTS to finish before restarting to avoid recording speaker - Tests: 7 new tests for beep generation and silence detection	2026-03-14 14:25:28 +03:00
0xbyt4	a69bd55b5a	fix: isolate GROQ_API_KEY in test_missing_stt_key test The test was failing because GROQ_API_KEY leaked from the environment. Now both VOICE_TOOLS_OPENAI_KEY and GROQ_API_KEY are removed to properly test the "no STT key" scenario.	2026-03-14 14:25:28 +03:00
0xbyt4	c23928d089	fix: improve voice mode robustness and add integration tests - Show TTS errors to user instead of silently logging - Improve markdown stripping: code blocks, URLs, links, horizontal rules - Fix stripping order: process markdown links before removing URLs - Add threading.Lock for voice state variables (cross-thread safety) - Add 14 CLI integration tests (markdown stripping, command parsing, thread safety) - Total: 47 voice-related tests	2026-03-14 14:25:28 +03:00
0xbyt4	37b01ab964	test: add transcription_tools tests for multi-provider STT - Provider resolution: OpenAI priority, Groq fallback, no keys - Model auto-correction: Groq corrects OpenAI models and vice versa - Success path: transcription, API errors, whitespace stripping - 12 new tests, 33 total voice-related tests	2026-03-14 14:25:28 +03:00
0xbyt4	1a6fbef8a9	feat: add voice mode with push-to-talk and TTS output for CLI Implements Issue #314 Phase 2 & 3: - /voice command to toggle voice mode (on/off/tts/status) - Ctrl+Space push-to-talk recording via sounddevice - Whisper STT transcription via existing transcription_tools - Optional TTS response playback via existing tts_tool - Visual indicators in prompt (recording/transcribing/voice) - 21 unit tests, all mocked (no real mic/API) - Optional deps: sounddevice, numpy (pip install hermes-agent[voice])	2026-03-14 14:25:28 +03:00
teknium1	5c9a84219d	fix: complete send_message MEDIA delivery salvage - prevent raw MEDIA tag leakage outside the gateway pipeline - make extract_media handle quoted/backticked paths and optional whitespace - send Telegram media natively with explicit error/warning handling - add regression tests for Telegram media dispatch and MEDIA parsing	2026-03-14 04:02:03 -07:00
teknium1	96c250e538	test: cover pipe characters in v4a patch apply Add a regression test for apply_v4a_operations when read content contains a literal pipe character outside a line-number prefix.	2026-03-14 03:54:46 -07:00
Teknium	6036793f60	fix: clearer docker backend preflight errors (#1276 ) * feat: improve context compaction handoff summaries Adapt PR #916 onto current main by replacing the old context summary marker with a clearer handoff wrapper, updating the summarization prompt for resume-oriented summaries, and preserving the current call_llm-based compression path. * fix: clearer error when docker backend is unavailable * fix: preserve docker discovery in backend preflight Follow up on salvaged PR #940 by reusing find_docker() during the new availability check so non-PATH Docker Desktop installs still work. Add a regression test covering the resolved executable path. --------- Co-authored-by: aydnOktay <xaydinoktay@gmail.com>	2026-03-14 02:53:02 -07:00
teknium1	6f1889b0fa	fix: preserve current approval semantics for tirith guard Restore gateway/run.py to current main behavior while keeping tirith startup and pattern_keys replay, preserve yolo and non-interactive bypass semantics in the combined guard, and add regression tests for yolo and view-full flows.	2026-03-14 00:17:04 -07:00
sheeki003	375ce8a881	feat(security): add tirith pre-exec command scanning Integrate tirith as a pre-execution security scanner that detects homograph URLs, pipe-to-interpreter patterns, terminal injection, zero-width Unicode, and environment variable manipulation — threats the existing 50-pattern dangerous command detector doesn't cover. Architecture: gather-then-decide — both tirith and the dangerous command detector run before any approval prompt, preventing gateway force=True replay from bypassing one check when only the other was shown to the user. New files: - tools/tirith_security.py: subprocess wrapper with auto-installer, mandatory cosign provenance verification, non-blocking background download, disk-persistent failure markers with retryable-cause tracking (cosign_missing auto-clears when cosign appears on PATH) - tests/tools/test_tirith_security.py: 62 tests covering exit code mapping, fail_open, cosign verification, background install, HERMES_HOME isolation, and failure recovery - tests/tools/test_command_guards.py: 21 integration tests for the combined guard orchestration Modified files: - tools/approval.py: add check_all_command_guards() orchestrator, add allow_permanent parameter to prompt_dangerous_approval() - tools/terminal_tool.py: replace _check_dangerous_command with consolidated check_all_command_guards - cli.py: update _approval_callback for allow_permanent kwarg, call ensure_installed() at startup - gateway/run.py: iterate pattern_keys list on replay approval, call ensure_installed() at startup - hermes_cli/config.py: add security config defaults, split commented sections for independent fallback - cli-config.yaml.example: document tirith security config	2026-03-14 00:11:27 -07:00
Teknium	a20d373945	fix: worktree-aware minisweagent path discovery + clean up requirements check (#1248 ) Salvage of PR #1246 by ChatGPT (teknium1 session), resolved against current main which already includes #1239. Changes: - Add minisweagent_path.py: worktree-aware helper that finds mini-swe-agent/src from either the current checkout or the main checkout behind a git worktree - Use the helper in tools/terminal_tool.py and mini_swe_runner.py instead of naive path-relative lookup that fails in worktrees - Clean up check_terminal_requirements(): - local: return True (no minisweagent dep, per #1239) - singularity/ssh: remove unnecessary minisweagent imports - docker/modal: use importlib.util.find_spec with clear error - Add regression tests for worktree path discovery and tool resolution	2026-03-13 23:39:51 -07:00
Teknium	21422dba44	Merge pull request #1239 from NousResearch/hermes/hermes-07d947aa fix: stop local terminal warning without minisweagent	2026-03-13 22:14:44 -07:00
teknium1	b59da08730	fix: reduce file tool log noise - treat git diff --cached --quiet rc=1 as an expected checkpoint state instead of logging it as an error - downgrade expected write PermissionError/EROFS/EACCES failures out of error logging while keeping unexpected exceptions at error level - add regression tests for both logging behaviors	2026-03-13 22:14:00 -07:00
teknium1	329f83ff2d	fix: stop local terminal warning without minisweagent	2026-03-13 22:00:36 -07:00
0xIbra	437ec17125	fix(cli): respect HERMES_HOME in all remaining hardcoded ~/.hermes paths Several files resolved paths via Path.home() / ".hermes" or os.path.expanduser("~/.hermes/..."), bypassing the HERMES_HOME environment variable. This broke isolation when running multiple Hermes instances with distinct HERMES_HOME directories. Replace all hardcoded paths with calls to get_hermes_home() from hermes_cli.config, consistent with the rest of the codebase. Files fixed: - tools/process_registry.py (processes.json) - gateway/pairing.py (pairing/) - gateway/sticker_cache.py (sticker_cache.json) - gateway/channel_directory.py (channel_directory.json, sessions.json) - gateway/config.py (gateway.json, config.yaml, sessions_dir) - gateway/mirror.py (sessions/) - gateway/hooks.py (hooks/) - gateway/platforms/base.py (image_cache/, audio_cache/, document_cache/) - gateway/platforms/whatsapp.py (whatsapp/session) - gateway/delivery.py (cron/output) - agent/auxiliary_client.py (auth.json) - agent/prompt_builder.py (SOUL.md) - cli.py (config.yaml, images/, pastes/, history) - run_agent.py (logs/) - tools/environments/base.py (sandboxes/) - tools/environments/modal.py (modal_snapshots.json) - tools/environments/singularity.py (singularity_snapshots.json) - tools/tts_tool.py (audio_cache) - hermes_cli/status.py (cron/jobs.json, sessions.json) - hermes_cli/gateway.py (logs/, whatsapp session) - hermes_cli/main.py (whatsapp/session) Tests updated to use HERMES_HOME env var instead of patching Path.home(). Closes #892 (cherry picked from commit 78ac1bba43b8b74a934c6172f2c29bb4d03164b9)	2026-03-13 21:32:53 -07:00
Teknium	07927f6bf2	feat(stt): add free local whisper transcription via faster-whisper (#1185 ) * fix: Home Assistant event filtering now closed by default Previously, when no watch_domains or watch_entities were configured, ALL state_changed events passed through to the agent, causing users to be flooded with notifications for every HA entity change. Now events are dropped by default unless the user explicitly configures: - watch_domains: list of domains to monitor (e.g. climate, light) - watch_entities: list of specific entity IDs to monitor - watch_all: true (new option — opt-in to receive all events) A warning is logged at connect time if no filters are configured, guiding users to set up their HA platform config. All 49 gateway HA tests + 52 HA tool tests pass. * docs: update Home Assistant integration documentation - homeassistant.md: Fix event filtering docs to reflect closed-by-default behavior. Add watch_all option. Replace Python dict config example with YAML. Fix defaults table (was incorrectly showing 'all'). Add required configuration warning admonition. - environment-variables.md: Add HASS_TOKEN and HASS_URL to Messaging section. - messaging/index.md: Add Home Assistant to description, architecture diagram, platform toolsets table, and Next Steps links. * fix(terminal): strip provider env vars from background and PTY subprocesses Extends the env var blocklist from #1157 to also cover the two remaining leaky paths in process_registry.py: - spawn_local() PTY path (line 156) - spawn_local() background Popen path (line 197) Both were still using raw os.environ, leaking provider vars to background processes and interactive PTY sessions. Now uses the same dynamic _HERMES_PROVIDER_ENV_BLOCKLIST from local.py. Explicit env_vars passed to spawn_local() still override the blocklist, matching the existing behavior for callers that intentionally need these. Gap identified by PR #1004 (@PeterFile). * feat(delegate): add observability metadata to subagent results Enrich delegate_task results with metadata from the child AIAgent: - model: which model the child used - exit_reason: completed \| interrupted \| max_iterations - tokens.input / tokens.output: token counts - tool_trace: per-tool-call trace with byte sizes and ok/error status Tool trace uses tool_call_id matching to correctly pair parallel tool calls with their results, with a fallback for messages without IDs. Cherry-picked from PR #872 by @omerkaz, with fixes: - Fixed parallel tool call trace pairing (was always updating last entry) - Removed redundant 'iterations' field (identical to existing 'api_calls') - Added test for parallel tool call trace correctness Co-authored-by: omerkaz <omerkaz@users.noreply.github.com> * feat(stt): add free local whisper transcription via faster-whisper Replace OpenAI-only STT with a dual-provider system mirroring the TTS architecture (Edge TTS free / ElevenLabs paid): STT: faster-whisper local (free, default) / OpenAI Whisper API (paid) Changes: - tools/transcription_tools.py: Full rewrite with provider dispatch, config loading, local faster-whisper backend, and OpenAI API backend. Auto-downloads model (~150MB for 'base') on first voice message. Singleton model instance reused across calls. - pyproject.toml: Add faster-whisper>=1.0.0 as core dependency - hermes_cli/config.py: Expand stt config to match TTS pattern with provider selection and per-provider model settings - agent/context_compressor.py: Fix .strip() crash when LLM returns non-string content (dict from llama.cpp, None). Fixes #1100 partially. - tests/: 23 new tests for STT providers + 2 for compressor fix - docs/: Updated Voice & TTS page with STT provider table, model sizes, config examples, and fallback behavior Fallback behavior: - Local not installed → OpenAI API (if key set) - OpenAI key not set → local whisper (if installed) - Neither → graceful error message to user Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com> --------- Co-authored-by: omerkaz <omerkaz@users.noreply.github.com> Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com>	2026-03-13 11:11:05 -07:00
Teknium	02a819b16e	feat(delegate): add observability metadata to subagent results (#1175 ) * fix: Home Assistant event filtering now closed by default Previously, when no watch_domains or watch_entities were configured, ALL state_changed events passed through to the agent, causing users to be flooded with notifications for every HA entity change. Now events are dropped by default unless the user explicitly configures: - watch_domains: list of domains to monitor (e.g. climate, light) - watch_entities: list of specific entity IDs to monitor - watch_all: true (new option — opt-in to receive all events) A warning is logged at connect time if no filters are configured, guiding users to set up their HA platform config. All 49 gateway HA tests + 52 HA tool tests pass. * docs: update Home Assistant integration documentation - homeassistant.md: Fix event filtering docs to reflect closed-by-default behavior. Add watch_all option. Replace Python dict config example with YAML. Fix defaults table (was incorrectly showing 'all'). Add required configuration warning admonition. - environment-variables.md: Add HASS_TOKEN and HASS_URL to Messaging section. - messaging/index.md: Add Home Assistant to description, architecture diagram, platform toolsets table, and Next Steps links. * fix(terminal): strip provider env vars from background and PTY subprocesses Extends the env var blocklist from #1157 to also cover the two remaining leaky paths in process_registry.py: - spawn_local() PTY path (line 156) - spawn_local() background Popen path (line 197) Both were still using raw os.environ, leaking provider vars to background processes and interactive PTY sessions. Now uses the same dynamic _HERMES_PROVIDER_ENV_BLOCKLIST from local.py. Explicit env_vars passed to spawn_local() still override the blocklist, matching the existing behavior for callers that intentionally need these. Gap identified by PR #1004 (@PeterFile). * feat(delegate): add observability metadata to subagent results Enrich delegate_task results with metadata from the child AIAgent: - model: which model the child used - exit_reason: completed \| interrupted \| max_iterations - tokens.input / tokens.output: token counts - tool_trace: per-tool-call trace with byte sizes and ok/error status Tool trace uses tool_call_id matching to correctly pair parallel tool calls with their results, with a fallback for messages without IDs. Cherry-picked from PR #872 by @omerkaz, with fixes: - Fixed parallel tool call trace pairing (was always updating last entry) - Removed redundant 'iterations' field (identical to existing 'api_calls') - Added test for parallel tool call trace correctness Co-authored-by: omerkaz <omerkaz@users.noreply.github.com> --------- Co-authored-by: omerkaz <omerkaz@users.noreply.github.com>	2026-03-13 08:07:12 -07:00
Muhammet Eren Karakuş	c92507e53d	fix(terminal): strip Hermes provider env vars from subprocess environment (#1157 ) Terminal subprocesses inherit OPENAI_BASE_URL and other provider env vars loaded from ~/.hermes/.env, silently misrouting external CLIs like codex. Build a blocklist dynamically from the provider registry so new providers are automatically covered. Callers that truly need a blocked var can opt in via the _HERMES_FORCE_ prefix. Closes #1002 Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-13 07:52:03 -07:00
teknium1	06a5cc484c	fix: improve gateway secret capture guidance message The old message referenced 'hermes setup' which doesn't handle skill-specific env vars. Updated to direct users to load the skill in the local CLI (which triggers the secure prompt) or add the key to ~/.hermes/.env manually.	2026-03-13 04:10:22 -07:00
Teknium	0157253145	Merge pull request #1152 from NousResearch/hermes/hermes-f47f71c0 feat: concurrent tool execution with ThreadPoolExecutor	2026-03-13 03:20:38 -07:00
kshitijk4poor	ccfbf42844	feat: secure skill env setup on load (core #688 ) When a skill declares required_environment_variables in its YAML frontmatter, missing env vars trigger a secure TUI prompt (identical to the sudo password widget) when the skill is loaded. Secrets flow directly to ~/.hermes/.env, never entering LLM context. Key changes: - New required_environment_variables frontmatter field for skills - Secure TUI widget (masked input, 120s timeout) - Gateway safety: messaging platforms show local setup guidance - Legacy prerequisites.env_vars normalized into new format - Remote backend handling: conservative setup_needed=True - Env var name validation, file permissions hardened to 0o600 - Redact patterns extended for secret-related JSON fields - 12 existing skills updated with prerequisites declarations - ~48 new tests covering skip, timeout, gateway, remote backends - Dynamic panel widget sizing (fixes hardcoded width from original PR) Cherry-picked from PR #723 by kshitijk4poor, rebased onto current main with conflict resolution. Fixes #688 Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>	2026-03-13 03:14:04 -07:00
teknium1	5d0d5b191c	feat: concurrent tool execution with ThreadPoolExecutor When the model returns multiple tool calls in a single response, they are now executed concurrently using a thread pool instead of sequentially. This significantly reduces wall-clock time when multiple independent tools are batched (e.g. parallel web_search, read_file, terminal calls). Architecture: - _execute_tool_calls() dispatches to sequential or concurrent path - Single tool calls and batches containing 'clarify' use sequential path - Multiple non-interactive tools use ThreadPoolExecutor (max 8 workers) - Results are collected and appended to messages in original order - _invoke_tool() extracted as shared tool invocation helper Safety: - Pre-flight interrupt check skips all tools if interrupted - Per-tool exception handling: one failure doesn't crash the batch - Result truncation (100k char limit) applied per tool - Budget pressure injection after all tools complete - Checkpoints taken before file-mutating tools - CLI spinner shows batch progress, then per-tool completion messages Tests: 10 new tests covering dispatch logic, ordering, error handling, interrupt behavior, truncation, and _invoke_tool routing.	2026-03-13 02:51:51 -07:00
Teknium	2a62514d17	feat: add 'View full command' option to dangerous command approval (#887 ) When a dangerous command is detected and the user is prompted for approval, long commands are truncated (80 chars in fallback, 70 chars in the TUI). Users had no way to see the full command before deciding. This adds a 'View full command' option across all approval interfaces: - CLI fallback (tools/approval.py): [v]iew option in the prompt menu. Shows the full command and re-prompts for approval decision. - CLI TUI (cli.py): 'Show full command' choice in the arrow-key selection panel. Expands the command display in-place and removes the view option after use. - CLI callbacks (callbacks.py): 'view' choice added to the list when the command exceeds 70 characters. - Gateway (gateway/run.py): 'full', 'show', 'view' responses reveal the complete command while keeping the approval pending. Includes 7 new tests covering view-then-approve, view-then-deny, short command fallthrough, and double-view behavior. Closes community feedback about the 80-char cap on dangerous commands.	2026-03-12 06:27:21 -07:00
teknium1	a37fc05171	fix: skip hanging tests + add global test timeout 4 test files spawn real processes or make live API calls that hang indefinitely in batch/CI runs. Skip them with pytestmark: - tests/tools/test_code_execution.py (subprocess spawns) - tests/tools/test_file_tools_live.py (live LocalEnvironment) - tests/test_413_compression.py (blocks on process) - tests/test_agent_loop_tool_calling.py (live OpenRouter API calls) Also added global 30s signal.alarm timeout in conftest.py as a safety net, and removed stale nous-api test that hung on OAuth browser login. Suite now runs in ~55s with no hangs.	2026-03-12 01:23:28 -07:00
teknium1	2192b17670	merge: resolve conflicts with origin/main - gateway/run.py: Take main's _resolve_gateway_model() helper - hermes_cli/setup.py: Re-apply nous-api removal after merge brought it back. Fix provider_idx offset (Custom is now index 3, not 4). - tests/hermes_cli/test_setup.py: Fix custom setup test index (3→4)	2026-03-12 00:29:04 -07:00
teknium1	29ef69c703	fix: update all test mocks for call_llm migration Update 14 test files to use the new call_llm/async_call_llm mock patterns instead of the old get_text_auxiliary_client/ get_vision_auxiliary_client tuple returns. - vision_tools tests: mock async_call_llm instead of _aux_async_client - browser tests: mock call_llm instead of _aux_vision_client - flush_memories tests: mock call_llm instead of get_text_auxiliary_client - session_search tests: mock async_call_llm with RuntimeError - mcp_tool tests: fix whitelist model config, use side_effect for multi-response tests - auxiliary_config_bridge: update for model=None (resolved in router) 3251 passed, 2 pre-existing unrelated failures.	2026-03-11 21:06:54 -07:00
teknium1	0aa31cd3cb	feat: call_llm/async_call_llm + config slots + migrate all consumers Add centralized call_llm() and async_call_llm() functions that own the full LLM request lifecycle: 1. Resolve provider + model from task config or explicit args 2. Get or create a cached client for that provider 3. Format request args (max_tokens handling, provider extra_body) 4. Make the API call with max_tokens/max_completion_tokens retry 5. Return the response Config: expanded auxiliary section with provider:model slots for all tasks (compression, vision, web_extract, session_search, skills_hub, mcp, flush_memories). Config version bumped to 7. Migrated all auxiliary consumers: - context_compressor.py: uses call_llm(task='compression') - vision_tools.py: uses async_call_llm(task='vision') - web_tools.py: uses async_call_llm(task='web_extract') - session_search_tool.py: uses async_call_llm(task='session_search') - browser_tool.py: uses call_llm(task='vision'/'web_extract') - mcp_tool.py: uses call_llm(task='mcp') - skills_guard.py: uses call_llm(provider='openrouter') - run_agent.py flush_memories: uses call_llm(task='flush_memories') Tests updated for context_compressor and MCP tool. Some test mocks still need updating (15 remaining failures from mock pattern changes, 2 pre-existing).	2026-03-11 20:52:19 -07:00
0xbyt4	4a8f23eddf	fix: correctly track failed MCP server connections in discovery _discover_one() caught all exceptions and returned [], making asyncio.gather(return_exceptions=True) redundant. The isinstance(result, Exception) branch in _discover_all() was dead code, so failed_count was always 0. This caused: - No summary printed when all servers fail (silent failure) - ok_servers always equaling total_servers (misleading count) - Unused variables transport_desc and transport_type Fix: let exceptions propagate to gather() so failed_count increments correctly. Move per-server failure logging to _discover_all(). Remove dead variables.	2026-03-11 18:24:45 +03:00
dmahan93	59b53f0a23	fix: skip tests when atroposlib/minisweagent unavailable in CI - test_agent_loop_tool_calling.py: import atroposlib at module level to trigger skip (environments.agent_loop is now importable without atroposlib due to __init__.py graceful fallback) - test_modal_sandbox_fixes.py: skip TestToolResolution tests when minisweagent not installed	2026-03-11 06:52:55 -07:00
dmahan93	0f53275169	test: skip atropos-dependent tests when atroposlib not installed Guard all test files that import from environments/ or atroposlib with try/except + pytest.skip(allow_module_level=True) so they gracefully skip instead of crashing when deps aren't available.	2026-03-11 06:52:55 -07:00
dmahan93	b03aefaf20	test: 13 tests for Modal sandbox infra fixes	2026-03-11 06:51:42 -07:00
teknium1	184aa5b2b3	fix: tighten exc_info assertion in vision test (from PR #803 ) The weaker assertion (r.exc_info is not None) passes even when exc_info is (None, None, None). Check r.exc_info[0] is not None to verify actual exception info is present. The _aux_async_client mock was already applied on main. Co-authored-by: OutThisLife <nickolasgustafsson@gmail.com>	2026-03-11 06:32:01 -07:00
teknium1	9423fda5cb	feat: configurable subagent provider:model with full credential resolution Adds delegation.model and delegation.provider config fields so subagents can run on a completely different provider:model pair than the parent agent. When delegation.provider is set, the system resolves the full credential bundle (base_url, api_key, api_mode) via resolve_runtime_provider() — the same path used by CLI/gateway startup. This means all configured providers work out of the box: openrouter, nous, zai, kimi-coding, minimax, minimax-cn. Key design decisions: - Provider resolution uses hermes_cli.runtime_provider (single source of truth for credential resolution across CLI, gateway, cron, and now delegation) - When only delegation.model is set (no provider), the model name changes but parent credentials are inherited (for switching models within the same provider like OpenRouter) - When delegation.provider is set, full credentials are resolved independently — enabling cross-provider delegation (e.g. parent on Nous Portal, subagents on OpenRouter) - Clear error messages if provider resolution fails (missing API key, unknown provider name) - _load_config() now falls back to hermes_cli.config.load_config() for gateway/cron contexts where CLI_CONFIG is unavailable Based on PR #791 by 0xbyt4 (closes #609), reworked to use proper provider credential resolution instead of passing provider as metadata. Co-authored-by: 0xbyt4 <0xbyt4@users.noreply.github.com>	2026-03-11 06:12:21 -07:00
teknium1	09336a6710	Merge PR #795 : fix: handle empty choices in MCP sampling callback Adds defensive guard against empty/None/missing choices in SamplingHandler.__call__ before accessing response.choices[0]. Returns proper ErrorData instead of crashing with IndexError/TypeError on content filtering, provider errors, or rate limits. Authored by 0xbyt4. Co-authored-by: 0xbyt4 <0xbyt4@users.noreply.github.com>	2026-03-11 05:47:51 -07:00
Teknium	fe9da5280f	Merge pull request #766 from spanishflu-est1918/codex/telegram-topic-session-pr Isolate Telegram forum topic sessions — each topic gets its own independent session key, history, and interrupt tracking. Progress, hygiene, and cron messages all route to the correct topic.	2026-03-11 03:14:43 -07:00
alireza78a	f1510ec33e	test(terminal): add tests for env var validation in _get_env_config	2026-03-11 02:59:12 -07:00
SPANISH FLU	0d6b25274c	fix(gateway): isolate telegram forum topic sessions	2026-03-11 09:15:34 +01:00
teknium1	a9241f3e3e	fix: head+tail truncation for execute_code stdout Replaces head-only stdout capture with a two-buffer approach (40% head, 60% tail rolling window) so scripts that print() their final results at the end never lose them. Adds truncation notice between sections. Cherry-picked from PR #755, conflict resolved (test file additions). 3 new tests for short output, head+tail preservation, and notice format.	2026-03-11 00:26:13 -07:00
teknium1	586fe5d62d	Merge PR #724 : feat: --yolo flag to bypass all approval prompts Authored by dmahan93. Adds HERMES_YOLO_MODE env var and --yolo CLI flag to auto-approve all dangerous command prompts. Post-merge: renamed --fuck-it-ship-it to --yolo for brevity, resolved conflict with --checkpoints flag.	2026-03-10 20:56:30 -07:00
Teknium	b76cae94d4	Merge pull request #889 from NousResearch/hermes/hermes-b0162f8d fix: Docker backend fails when docker is not in PATH (macOS gateway)	2026-03-10 20:45:34 -07:00
teknium1	24479625a2	fix: Docker backend fails when docker is not in PATH (macOS gateway) On macOS, Docker Desktop installs the CLI to /usr/local/bin/docker, but when Hermes runs as a gateway service (launchd) or in other non-login contexts, /usr/local/bin is often not in PATH. This causes the Docker requirements check to fail with 'No such file or directory: docker' even though docker works fine from the user's terminal. Add find_docker() helper that uses shutil.which() first, then probes common Docker Desktop install paths on macOS (/usr/local/bin, /opt/homebrew/bin, Docker.app bundle). The resolved path is cached and passed to mini-swe-agent via its 'executable' parameter. - tools/environments/docker.py: add find_docker(), use it in _storage_opt_supported() and pass to _Docker(executable=...) - tools/terminal_tool.py: use find_docker() in requirements check - tests/tools/test_docker_find.py: 4 tests (PATH, fallback, not found, cache) 2877 tests pass.	2026-03-10 20:45:13 -07:00
teknium1	03a4f184e6	fix: call _stop_training_run on early-return failure paths The 4 early-return paths in _spawn_training_run (API exit, trainer exit, env not found, env exit) were doing manual process.terminate() or returning without cleanup, leaking open log file handles. Now all paths call _stop_training_run() which handles both process termination and file handle closure. Also adds 12 tests for _stop_training_run covering file handle cleanup, process termination, status transitions, and edge cases. Inspired by PR #715 (0xbyt4) which identified the early-return issue. Core file handle fix was already on main via `e28dc13` (memosr.eth).	2026-03-10 17:09:51 -07:00
teknium1	a458b535c9	fix: improve read-loop detection — consecutive-only, correct thresholds, fix bugs Follow-up to PR #705 (merged from 0xbyt4). Addresses several issues: 1. CONSECUTIVE-ONLY TRACKING: Redesigned the read/search tracker to only warn/block on truly consecutive identical calls. Any other tool call in between (write, patch, terminal, etc.) resets the counter via notify_other_tool_call(), called from handle_function_call() in model_tools.py. This prevents false blocks in read→edit→verify flows. 2. THRESHOLD ADJUSTMENT: Warn on 3rd consecutive (was 2nd), block on 4th+ consecutive (was 3rd+). Gives the model more room before intervening. 3. TUPLE UNPACKING BUG: Fixed get_read_files_summary() which crashed on search keys (5-tuple) when trying to unpack as 3-tuple. Now uses a separate read_history set that only tracks file reads. 4. WEB_EXTRACT DOCSTRING: Reverted incorrect removal of 'title' from web_extract return docs in code_execution_tool.py — the field IS returned by web_tools.py. 5. TESTS: Rewrote test_read_loop_detection.py (35 tests) to cover consecutive-only behavior, notify_other_tool_call, interleaved read/search, and summary-unaffected-by-searches.	2026-03-10 16:25:41 -07:00
teknium1	b53d5dad67	Merge PR #705 : fix: detect, warn, and block file re-read/search loops after context compression Authored by 0xbyt4. Adds read/search loop detection, file history injection after compression, and todo filtering for active items only.	2026-03-10 16:17:03 -07:00
teknium1	c1171fe666	fix: eliminate 3x SQLite message duplication in gateway sessions (#860 ) Three separate code paths all wrote to the same SQLite state.db with no deduplication, inflating session transcripts by 3-4x: 1. _log_msg_to_db() — wrote each message individually after append 2. _flush_messages_to_session_db() — re-wrote ALL new messages at every _persist_session() call (~18 exit points), with no tracking of what was already written 3. gateway append_to_transcript() — wrote everything a third time after the agent returned Since load_transcript() prefers SQLite over JSONL, the inflated data was loaded on every session resume, causing proportional token waste. Fix: - Remove _log_msg_to_db() and all 16 call sites (redundant with flush) - Add _last_flushed_db_idx tracking in _flush_messages_to_session_db() so repeated _persist_session() calls only write truly new messages - Reset flush cursor on compression (new session ID) - Add skip_db parameter to SessionStore.append_to_transcript() so the gateway skips SQLite writes when the agent already persisted them - Gateway now passes skip_db=True for agent-managed messages, still writes to JSONL as backup Verified: a 12-message CLI session with tool calls produces exactly 12 SQLite rows with zero duplicates (previously would be 36-48). Tests: 9 new tests covering flush deduplication, skip_db behavior, compression reset, and initialization. Full suite passes (2869 tests).	2026-03-10 15:22:44 -07:00
SHL0MS	0229e6b407	Fix test_analysis_error_logs_exc_info: mock _aux_async_client so download path is reached	2026-03-10 16:03:19 -04:00
0xbyt4	52e3580cd4	refactor: merge new tests into test_code_execution.py Move all new tests (schema, env filtering, edge cases, interrupt) into the existing test_code_execution.py instead of a separate file. Delete the now-redundant test_code_execution_schema.py.	2026-03-10 06:18:27 -07:00
0xbyt4	694a3ebdd5	fix(code_execution): handle empty enabled_sandbox_tools in schema description build_execute_code_schema(set()) produced "from hermes_tools import , ..." in the code property description — invalid Python syntax shown to the model. This triggers when a user enables only the code_execution toolset without any of the sandbox-allowed tools (e.g. `hermes tools code_execution`), because SANDBOX_ALLOWED_TOOLS & {"execute_code"} = empty set. Also adds 29 unit tests covering build_execute_code_schema, environment variable filtering, execute_code edge cases, and interrupt handling.	2026-03-10 06:18:27 -07:00
teknium1	5e6c7bc205	Merge PR #602 : fix: prevent data loss in clipboard PNG conversion when ImageMagick fails Authored by 0xbyt4. Only deletes temp .bmp after confirmed successful conversion, restores original on failure. Adds 3 tests.	2026-03-10 04:15:05 -07:00
teknium1	c1775de56f	feat: filesystem checkpoints and /rollback command Automatic filesystem snapshots before destructive file operations, with user-facing rollback. Inspired by PR #559 (by @alireza78a). Architecture: - Shadow git repos at ~/.hermes/checkpoints/{hash}/ via GIT_DIR - CheckpointManager: take/list/restore, turn-scoped dedup, pruning - Transparent — the LLM never sees it, no tool schema, no tokens - Once per turn — only first write_file/patch triggers a snapshot Integration: - Config: checkpoints.enabled + checkpoints.max_snapshots - CLI flag: hermes --checkpoints - Trigger: run_agent.py _execute_tool_calls() before write_file/patch - /rollback slash command in CLI + gateway (list, restore by number) - Pre-rollback snapshot auto-created on restore (undo the undo) Safety: - Never blocks file operations — all errors silently logged - Skips root dir, home dir, dirs >50K files - Disables gracefully when git not installed - Shadow repo completely isolated from project git Tests: 35 new tests, all passing (2798 total suite) Docs: feature page, config reference, CLI commands reference	2026-03-10 00:49:15 -07:00
teknium1	5212644861	fix(security): prevent shell injection in tilde-username path expansion Validate that the username portion of ~username paths contains only valid characters (alphanumeric, dot, hyphen, underscore) before passing to shell echo for expansion. Previously, paths like '~; rm -rf /' would be passed unquoted to self._exec(f'echo {path}'), allowing arbitrary command execution. The approach validates the username rather than using shlex.quote(), which would prevent tilde expansion from working at all since echo '~user' outputs the literal string instead of expanding it. Added tests for injection blocking and valid ~username/path expansion. Credit to @alireza78a for reporting (PR #442, issue #442).	2026-03-09 17:33:19 -07:00
0xbyt4	4e3a8a0637	fix: handle empty choices in MCP sampling callback SamplingHandler.__call__ accessed response.choices[0] without checking if the list was non-empty. LLM APIs can return empty choices on content filtering, provider errors, or rate limits, causing an unhandled IndexError that propagates to the MCP SDK and may crash the connection. Add a defensive guard that returns a proper ErrorData when choices is empty, None, or missing. Includes three test cases covering all variants.	2026-03-10 02:24:53 +03:00
teknium1	2d44ed1c5b	test: add comprehensive tests for vision_tools (42 tests) Covers PR #428 changes and existing vision_tools functionality: - _validate_image_url: 20 tests for urlparse-based validation - _determine_mime_type: 6 tests for MIME type detection - _image_to_base64_data_url: 3 tests for base64 conversion - _handle_vision_analyze: 5 tests for type hints, prompt building, AUXILIARY_VISION_MODEL env var override - Error logging exc_info: 3 async tests verifying stack traces are logged on download failure, analysis error, and cleanup error - check_vision_requirements & get_debug_session_info: 2 basic tests - Registry integration: 3 tests for tool registration	2026-03-09 15:32:02 -07:00
Teknium	654e16187e	feat(mcp): add sampling support — server-initiated LLM requests (#753 ) Add MCP sampling/createMessage capability via SamplingHandler class. Text-only sampling + tool use in sampling with governance (rate limits, model whitelist, token caps, tool loop limits). Per-server audit metrics. Based on concept from PR #366 by eren-karakus0. Restructured as class-based design with bug fixes and tests using real MCP SDK types. 50 new tests, 2600 total passing.	2026-03-09 03:37:38 -07:00
0xbyt4	912efe11b5	fix(tests): add content attribute to fake result objects _FakeReadResult and _FakeSearchResult now expose the attributes that read_file_tool/search_tool access after the redact_sensitive_text integration from main.	2026-03-09 13:25:52 +03:00
0xbyt4	4684aaffdc	merge: resolve file_tools.py conflict with origin/main Combine read/search loop detection with main's redact_sensitive_text and truncation hint features. Add tracker reset to TestSearchHints to prevent cross-test state leakage.	2026-03-09 13:21:46 +03:00
teknium1	7af33accf1	fix: apply secret redaction to file tool outputs Terminal output was already redacted via redact_sensitive_text() but read_file and search_files returned raw content. Now both tools redact secrets before returning results to the LLM. Based on PR #372 by @teyrebaz33 (closes #363) — applied manually due to branch conflicts with the current codebase.	2026-03-09 00:49:46 -07:00
teknium1	a8bf414f4a	feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill New browser capabilities and a built-in skill for agent-driven web QA. ## New tool: browser_console Returns console messages (log/warn/error/info) AND uncaught JavaScript exceptions in a single call. Uses agent-browser's 'console' and 'errors' commands through the existing session plumbing. Supports --clear to reset buffers. Verified working in both local and Browserbase cloud modes. ## Enhanced tool: browser_vision(annotate=True) New boolean parameter on browser_vision. When true, agent-browser overlays numbered [N] labels on interactive elements — each [N] maps to ref @eN. Annotation data (element name, role, bounding box) returned alongside the vision analysis. Useful for QA reports and spatial reasoning. ## Config: browser.record_sessions Auto-record browser sessions as WebM video files when enabled: - Starts recording on first browser_navigate - Stops and saves on browser_close - Saves to ~/.hermes/browser_recordings/ - Works in both local and cloud modes (verified) - Disabled by default ## Built-in skill: dogfood Systematic exploratory QA testing for web applications. Teaches the agent a 5-phase workflow: 1. Plan — accept URL, create output dirs, set scope 2. Explore — systematic crawl with annotated screenshots 3. Collect Evidence — screenshots, console errors, JS exceptions 4. Categorize — severity (Critical/High/Medium/Low) and category (Functional/Visual/Accessibility/Console/UX/Content) 5. Report — structured markdown with per-issue evidence Includes: - skills/dogfood/SKILL.md — full workflow instructions - skills/dogfood/references/issue-taxonomy.md — severity/category defs - skills/dogfood/templates/dogfood-report-template.md — report template ## Tests 21 new tests covering: - browser_console message/error parsing, clear flag, empty/failed states - browser_console schema registration - browser_vision annotate schema and flag passing - record_sessions config defaults and recording lifecycle - Dogfood skill file existence and content validation Addresses #315.	2026-03-08 21:28:12 -07:00
0xbyt4	d8df91dfa8	fix: resolve merge conflict with main in clipboard.py	2026-03-09 03:50:29 +03:00
teknium1	491605cfea	feat: add high-value tool result hints for patch and search_files (#722 ) Add contextual [Hint: ...] suffixes to tool results where they save real iterations: - patch (no match): suggests read_file/search_files to verify content before retrying — addresses the common pattern where the agent retries with stale old_string instead of re-reading the file. - search_files (truncated): provides explicit next offset and suggests narrowing the search — clearer than relying on total_count inference. Other hints proposed in #722 (terminal, web_search, web_extract, browser_snapshot, search zero-results, search content-matches) were evaluated and found to be low-value: either already covered by existing mechanisms (read_file pagination, similar-files, schema descriptions) or guidance the agent already follows from its own reasoning. 5 new tests covering hint presence/absence for both tools.	2026-03-08 17:46:28 -07:00
teknium1	c0520223fd	fix: clipboard BMP conversion file loss and broken test Source code (hermes_cli/clipboard.py): - _convert_to_png() lost the file when both Pillow and ImageMagick were unavailable: path.rename(tmp) moved the file to .bmp, then subprocess.run raised FileNotFoundError, but the file was never renamed back. The final fallback 'return path.exists()' returned False. - Fix: restore the original file in both except handlers by renaming tmp back to path when the original is missing. Test (tests/tools/test_clipboard.py): - test_file_still_usable_when_no_converter expected 'from PIL import Image' to raise an Exception, but Pillow is installed so pytest.raises fired 'DID NOT RAISE'. The test also never called _convert_to_png(). - Fix: properly mock PIL unavailability via patch.dict(sys.modules), actually call _convert_to_png(), and assert the correct result.	2026-03-08 17:22:27 -07:00
teknium1	3fb8938cd3	fix: search_files now reports error for non-existent paths instead of silent empty results Previously, search_files would silently return 0 results when the search path didn't exist (e.g., /root/.hermes/... when HOME is /home/user). The path was passed to rg/grep/find which would fail silently, and the empty stdout was parsed as 'no matches found'. Changes: - Add path existence check at the top of search() using test -e. Returns SearchResult with a clear error message when path doesn't exist. - Add exit code 2 checks in _search_with_rg() and _search_with_grep() as secondary safety net for other error types (bad regex, permissions). - Add 4 new tests covering: nonexistent path (content mode), nonexistent path (files mode), existing path proceeds normally, rg error exit code. Tests: 37 → 41 in test_file_operations.py, full suite 2330 passed.	2026-03-08 16:47:20 -07:00
dmahan93	7791174ced	feat: add --fuck-it-ship-it flag to bypass dangerous command approvals Adds a fun alias for skipping all dangerous command approval prompts. When passed, sets HERMES_YOLO_MODE=1 which causes check_dangerous_command() to auto-approve everything. Available on both top-level and chat subcommand: hermes --fuck-it-ship-it hermes chat --fuck-it-ship-it Includes 5 tests covering normal blocking, yolo bypass, all patterns, and edge cases (empty string env var).	2026-03-08 18:36:37 -05:00
0xbyt4	67421ed74f	fix: update test_non_empty_has_markers to match todo filtering behavior Completed/cancelled items are now filtered from format_for_injection() output. Update the existing test to verify active items appear and completed items are excluded.	2026-03-08 23:07:38 +03:00
0xbyt4	e2fe1373f3	fix: escalate read/search blocking, track search loops, filter completed todos - Block file reads after 3+ re-reads of same region (no content returned) - Track search_files calls and block repeated identical searches - Filter completed/cancelled todos from post-compression injection to prevent agent from re-doing finished work - Add 10 new tests covering all three fixes	2026-03-08 23:01:21 +03:00
0xbyt4	9eee529a7f	fix: detect and warn on file re-read loops after context compression When context compression summarizes conversation history, the agent loses track of which files it already read and re-reads them in a loop. Users report the agent reading the same files endlessly without writing. Root cause: context compression is lossy — file contents and read history are lost in the summary. After compression, the model thinks it hasn't examined the files yet and reads them again. Fix (two-part): 1. Track file reads per task in file_tools.py. When the same file region is read again, include a _warning in the response telling the model to stop re-reading and use existing information. 2. After context compression, inject a structured message listing all files already read in the session with explicit "do NOT re-read" instruction, preserving read history across compression boundaries. Adds 16 tests covering warning detection, task isolation, summary accuracy, tracker cleanup, and compression history injection.	2026-03-08 20:44:42 +03:00
teknium1	cf810c2950	fix: pre-process CLI clipboard images through vision tool instead of raw embedding Images pasted in the CLI were embedded as raw base64 image_url content parts in the conversation history, which only works with vision-capable models. If the main model (e.g. Nous API) doesn't support vision, this breaks the request and poisons all subsequent messages. Now the CLI uses the same approach as the messaging gateway: images are pre-processed through the auxiliary vision model (Gemini Flash via OpenRouter or Nous Portal) and converted to text descriptions. The local file path is included so the agent can re-examine via vision_analyze if needed. Works with any model. Fixes #638.	2026-03-08 06:22:00 -07:00
Teknium	b8120df860	Revert "feat: skill prerequisites — hide skills with unmet runtime dependencies"	2026-03-08 03:58:13 -07:00
kshitij	f210510276	feat: add prerequisites field to skill spec — hide skills with unmet dependencies Skills can now declare runtime prerequisites (env vars, CLI binaries) via YAML frontmatter. Skills with unmet prerequisites are excluded from the system prompt so the agent never claims capabilities it can't deliver, and skill_view() warns the agent about what's missing. Three layers of defense: - build_skills_system_prompt() filters out unavailable skills - _find_all_skills() flags unmet prerequisites in metadata - skill_view() returns prerequisites_warning with actionable details Tagged 12 bundled skills that have hard runtime dependencies: gif-search (TENOR_API_KEY), notion (NOTION_API_KEY), himalaya, imessage, apple-notes, apple-reminders, openhue, duckduckgo-search, codebase-inspection, blogwatcher, songsee, mcporter. Closes #658 Fixes #630	2026-03-08 13:19:32 +05:30
teknium1	24f6a193e7	fix: remove stale 'model' assertion from delegate_task schema test The 'model' property was removed from DELEGATE_TASK_SCHEMA but the test still asserted its presence, causing CI to fail.	2026-03-07 11:29:55 -08:00
0xbyt4	ee7d8c56c7	fix: prevent data loss in clipboard PNG conversion when ImageMagick fails _convert_to_png() renamed the original file to .bmp before calling ImageMagick convert, then unconditionally deleted the .bmp regardless of whether convert succeeded. If convert failed, both files were gone. - Only delete .bmp after confirmed successful conversion - Restore original file on convert failure, timeout, or missing binary - Add 3 tests covering failure, not-installed, and timeout scenarios	2026-03-07 20:02:12 +03:00
teknium1	f668e9fc75	feat: platform-conditional skill loading + Apple/macOS skills Add a 'platforms' field to SKILL.md frontmatter that restricts skills to specific operating systems. Skills with platforms: [macos] only appear in the system prompt, skills_list(), and slash commands on macOS. Skills without the field load everywhere (backward compatible). Implementation: - skill_matches_platform() in tools/skills_tool.py — core filter - Wired into all 3 discovery paths: prompt_builder.py, skills_tool.py, skill_commands.py - 28 new tests across 3 test files New bundled Apple/macOS skills (all platforms: [macos]): - imessage — Send/receive iMessages via imsg CLI - apple-reminders — Manage Reminders via remindctl CLI - apple-notes — Manage Notes via memo CLI - findmy — Track devices/AirTags via AppleScript + screen capture Docs updated: CONTRIBUTING.md, AGENTS.md, creating-skills.md, skills.md (user guide)	2026-03-07 00:47:54 -08:00
0xbyt4	211b55815e	fix: prevent data loss in skills sync on copy/update failure Two bugs in sync_skills(): 1. Failed copytree poisons manifest: when shutil.copytree fails (disk full, permission error), the skill is still recorded in the manifest. On the next sync, the skill appears as "in manifest but not on disk" which is interpreted as "user deliberately deleted it" — the skill is never retried. Fix: only write to manifest on successful copy. 2. Failed update destroys user copy: rmtree deletes the existing skill directory before copytree runs. If copytree then fails, the user's skill is gone with no way to recover. Fix: move to .bak before copying, restore from backup if copytree fails. Both bugs are proven by new regression tests that fail on the old code and pass on the fix.	2026-03-07 03:58:32 +03:00
teknium1	4f56e31dc7	fix: track origin hashes in skills manifest to preserve user modifications Upgrade skills_sync manifest to v2 format (name:origin_hash). The origin hash records the MD5 of the bundled skill at the time it was last synced. On update, the user's copy is compared against the origin hash: - User copy == origin hash → unmodified → safe to update from bundled - User copy != origin hash → user customized → skip (preserve changes) v1 manifests (plain names) are auto-migrated: the user's current hash becomes the baseline, so future syncs can detect modifications. Output now shows user-modified skills: ~ whisper (user-modified, skipping) 27 tests covering all scenarios including v1→v2 migration, user modification detection, update after migration, and origin hash tracking. 2009 tests pass.	2026-03-06 16:13:58 -08:00
teknium1	ab0f4126cf	fix: restore all removed bundled skills + fix skills sync system - Restored 21 skills removed in commits `757d012` and `740dd92`: accelerate, audiocraft, code-review, faiss, flash-attention, gguf, grpo-rl-training, guidance, llava, nemo-curator, obliteratus, peft, pytorch-fsdp, pytorch-lightning, simpo, slime, stable-diffusion, tensorrt-llm, torchtitan, trl-fine-tuning, whisper - Rewrote sync_skills() with proper update semantics: * New skills (not in manifest): copied to user dir * Existing skills (in manifest + on disk): updated via hash comparison * User-deleted skills (in manifest, not on disk): respected, not re-added * Stale manifest entries (removed from bundled): cleaned from manifest - Added sync_skills() to CLI startup (cmd_chat) and gateway startup (start_gateway) — previously only ran during 'hermes update' - Updated cmd_update output to show new/updated/cleaned counts - Rewrote tests: 20 tests covering manifest CRUD, dir hashing, fresh install, user deletion respect, update detection, stale cleanup, and name collision handling 75 bundled skills total. 2002 tests pass.	2026-03-06 15:57:30 -08:00
teknium1	b89eb29174	fix: correct mock tool name 'search' → 'search_files' in test_code_execution The mock handler checked for function_name == 'search' but the RPC sends 'search_files'. Any test exercising search_files through the mock would get 'Unknown tool' instead of the canned response.	2026-03-06 03:53:43 -08:00
teknium1	3982fcf095	fix: sync execute_code sandbox stubs with real tool schemas The _TOOL_STUBS dict in code_execution_tool.py was out of sync with the actual tool schemas, causing TypeErrors when the LLM used parameters it sees in its system prompt but the sandbox stubs didn't accept: search_files: - Added missing params: context, offset, output_mode - Fixed target default: 'grep' → 'content' (old value was obsolete) patch: - Added missing params: mode, patch (V4A multi-file patch support) Also added 4 drift-detection tests (TestStubSchemaDrift) that will catch future divergence between stubs and real schemas: - test_stubs_cover_all_schema_params: every schema param in stub - test_stubs_pass_all_params_to_rpc: every stub param sent over RPC - test_search_files_target_uses_current_values: no obsolete values - test_generated_module_accepts_all_params: generated code compiles All 28 tests pass.	2026-03-06 03:40:06 -08:00
teknium1	39299e2de4	Merge PR #451 : feat: Add Daytona environment backend Authored by rovle. Adds Daytona as the sixth terminal execution backend with cloud sandboxes, persistent workspaces, and full CLI/gateway integration. Includes 24 unit tests and 8 integration tests.	2026-03-06 03:32:40 -08:00
teknium1	efec4fcaab	feat(execute_code): add json_parse, shell_quote, retry helpers to sandbox The execute_code sandbox generates a hermes_tools.py stub module for LLM scripts. Three common failure modes keep tripping up scripts: 1. json.loads(strict=True) rejects control chars in terminal() output (e.g., GitHub issue bodies with literal tabs/newlines) 2. Shell backtick/quote interpretation when interpolating dynamic content into terminal() commands (markdown with backticks gets eaten by bash) 3. No retry logic for transient network failures (API timeouts, rate limits) Adds three convenience helpers to the generated hermes_tools module: - json_parse(text) — json.loads with strict=False for tolerant parsing - shell_quote(s) — shlex.quote() for safe shell interpolation - retry(fn, max_attempts=3, delay=2) — exponential backoff wrapper Also updates the EXECUTE_CODE_SCHEMA description to document these helpers so LLMs know they're available without importing anything extra. Includes 7 new tests (unit + integration) covering all three helpers.	2026-03-06 01:52:46 -08:00
teknium1	2317d115cd	fix: clipboard image paste on WSL2, Wayland, and VSCode terminal The original implementation only supported xclip (X11), which silently fails on WSL2 (can't access Windows clipboard for images), Wayland desktops (xclip is X11-only), and VSCode terminal on WSL2. Clipboard backend changes (hermes_cli/clipboard.py): - WSL2: detect via /proc/version, use powershell.exe with .NET System.Windows.Forms.Clipboard to extract images as base64 PNG - Wayland: use wl-paste with MIME type detection, auto-convert BMP to PNG for WSLg environments (via Pillow or ImageMagick) - Dispatch order: WSL → Wayland → X11 (xclip), with fallthrough - New has_clipboard_image() for lightweight clipboard checks - Cache WSL detection result per-process CLI changes (cli.py): - /paste command: explicit clipboard image check for terminals where BracketedPaste doesn't fire (image-only clipboard in VSCode/WinTerm) - Ctrl+V keybinding: fallback for Linux terminals where Ctrl+V sends raw byte instead of triggering bracketed paste Tests: 80 tests (up from 37) covering WSL, Wayland, X11 dispatch, BMP conversion, has_clipboard_image, and /paste command.	2026-03-05 20:22:44 -08:00
teknium1	8253b54be9	test: strengthen assertions in skill_manager + memory_tool (batch 3) test_skill_manager_tool.py (20 weak → 0): - Validation error messages verified against exact strings - Name validation: checks specific invalid name echoed in error - Frontmatter validation: exact error text for missing fields, unclosed markers, empty content, invalid YAML - File path validation: traversal, disallowed dirs, root-level test_memory_tool.py (13 weak → 0): - Security scan tests verify both 'Blocked' prefix AND specific threat pattern ID (prompt_injection, exfil_curl, etc.) - Invisible unicode tests verify exact codepoint strings - Snapshot test verifies type, header, content, and isolation	2026-03-05 18:51:43 -08:00
teknium1	5c867fd79f	test: strengthen assertions across 3 more test files (batch 2) test_run_agent.py (2 weak → 0, +13 assertions): - Session ID validated against actual YYYYMMDD_HHMMSS_hex format - API failure verifies error message propagation - Invalid JSON args verifies empty dict fallback + message structure - Context compression verifies final_response + completed flag - Invalid tool name retry verifies api_calls count - Invalid response verifies completed/failed/error structure test_model_tools.py (3 weak → 0): - Unknown tool error includes tool name in message - Exception returns dict with 'error' key + non-empty message - get_all_tool_names verifies both web_search AND terminal present test_approval.py (1 weak → 0, assert ratio 1.1 → 2.2): - Dangerous commands verify description content (delete, shell, drop, etc.) - Safe commands explicitly assert key AND desc are None - Pre/post condition checks for state management	2026-03-05 18:46:30 -08:00
teknium1	e9f05b3524	test: comprehensive tests for model metadata + firecrawl config model_metadata tests (61 tests, was 39): - Token estimation: concrete value assertions, unicode, tool_call messages, vision multimodal content, additive verification - Context length resolution: cache-over-API priority, no-base_url skips cache, missing context_length key in API response - API metadata fetch: canonical_slug aliasing, TTL expiry with time mock, stale cache fallback on API failure, malformed JSON resilience - Probe tiers: above-max returns 2M, zero returns None - Error parsing: Anthropic format ('X > Y maximum'), LM Studio, empty string, unreasonably large numbers — also fixed parser to handle Anthropic format - Cache: corruption resilience (garbage YAML, wrong structure), value updates, special chars in model names Firecrawl config tests (8 tests, was 4): - Singleton caching (core purpose — verified constructor called once) - Constructor failure recovery (retry after exception) - Return value actually asserted (not just constructor args) - Empty string env vars treated as absent - Proper setup/teardown for env var isolation	2026-03-05 18:22:39 -08:00
teknium1	e2a834578d	refactor: extract clipboard methods + comprehensive tests (37 tests) Refactored image paste internals for testability: - Extracted _try_attach_clipboard_image() method (clipboard → state) - Extracted _build_multimodal_content() method (images → OpenAI format) - chat() now delegates to these instead of inline logic Tests organized in 4 levels: Level 1 (19 tests): Clipboard module — every platform path with realistic subprocess simulation (tools writing files, timeouts, empty files, cleanup on failure) Level 2 (8 tests): _build_multimodal_content — base64 encoding, MIME types (png/jpg/webp/unknown), missing files, multiple images, default question for empty text Level 3 (5 tests): _try_attach_clipboard_image — state management, counter increment/rollback, naming convention, mixed success/failure Level 4 (5 tests): Queue routing — tuple unpacking, command detection, images-only payloads, text-only payloads	2026-03-05 18:07:53 -08:00
teknium1	ffc752a79e	test: improve clipboard tests with realistic scenarios and multimodal coverage Rewrote clipboard tests from 11 shallow mocks to 21 realistic tests: - Success paths now simulate tools actually writing files (not pre-created) - osascript: success with PNG, success with TIFF, extraction-fail cases - pngpaste: empty file rejection edge case - Linux: extraction failure cleanup verification - New TestMultimodalConversion class: base64 encoding, MIME types, multiple images, missing file handling, default question fallback	2026-03-05 17:58:06 -08:00
teknium1	399562a7d1	feat: clipboard image paste in CLI (Cmd+V / Ctrl+V) Copy an image to clipboard (screenshot, browser, etc.) and paste into the Hermes CLI. The image is saved to ~/.hermes/images/, shown as a badge above the input ([📎 Image #1]), and sent to the model as a base64-encoded OpenAI vision multimodal content block. Implementation: - hermes_cli/clipboard.py: clean module with platform-specific extraction - macOS: pngpaste (if installed) → osascript fallback (always available) - Linux: xclip (apt install xclip) - cli.py: BracketedPaste key handler checks clipboard on every paste, image bar widget shows attached images, chat() converts to multimodal content format, Ctrl+C clears attachments Inspired by @m0at's fork (https://github.com/m0at/hermes-agent) which implemented image paste support for local vision models. Reimplemented cleanly as a separate module with tests.	2026-03-05 17:55:41 -08:00
teknium1	363633e2ba	fix: allow self-hosted Firecrawl without API key + add self-hosting docs On top of PR #460: self-hosted Firecrawl instances don't require an API key (USE_DB_AUTHENTICATION=false), so don't force users to set a dummy FIRECRAWL_API_KEY when FIRECRAWL_API_URL is set. Also adds a proper self-hosting section to the configuration docs explaining what you get, what you lose, and how to set it up (Docker stack, tradeoffs vs cloud). Added 2 more tests (URL-only without key, neither-set raises).	2026-03-05 16:44:21 -08:00
caentzminger	d7d10b14cd	feat(tools): add support for self-hosted firecrawl Adds optional FIRECRAWL_API_URL environment variable to support self-hosted Firecrawl deployments alongside the cloud service. - Add FIRECRAWL_API_URL to optional env vars in hermes_cli/config.py - Update _get_firecrawl_client() in tools/web_tools.py to accept custom API URL - Add tests for client initialization with/without URL - Document new env var in installation and config guides	2026-03-05 16:16:18 -06:00
rovle	a6499b6107	fix(daytona): use shell timeout wrapper instead of broken SDK exec timeout The Daytona SDK's process.exec(timeout=N) parameter is not enforced — the server-side timeout never fires and the SDK has no client-side fallback, causing commands to hang indefinitely. Fix: wrap commands with timeout N sh -c '...' (coreutils) which reliably kills the process and returns exit code 124. Added shlex.quote for proper shell escaping and a secondary deadline (timeout + 10s) that force-stops the sandbox if the shell timeout somehow fails. Signed-off-by: rovle <lovre.pesut@gmail.com>	2026-03-05 13:12:41 -08:00
rovle	efc7a7b957	fix(daytona): don't guess /root on cwd probe failure, keep constructor default; update tests to reflect this Signed-off-by: rovle <lovre.pesut@gmail.com>	2026-03-05 11:49:35 -08:00
rovle	577da79a47	fix(daytona): make disk cap visible and use SDK enum for sandbox state - Replace logger.warning with warnings.warn for the disk cap so users actually see it (logger was suppressed by CLI's log level config) - Use SandboxState enum instead of string literals in _ensure_sandbox_ready Signed-off-by: rovle <lovre.pesut@gmail.com>	2026-03-05 11:03:39 -08:00
rovle	d5efb82c7c	test(daytona): add unit and integration tests for Daytona backend Unit tests cover cwd resolution, sandbox persistence/resume, cleanup, command execution, resource conversion, interrupt handling, retry exhaustion, and sandbox readiness checks. Integration tests verify basic commands, filesystem ops, session persistence, and task isolation against a live Daytona API. Signed-off-by: rovle <lovre.pesut@gmail.com>	2026-03-05 10:26:22 -08:00
teknium1	ad9c26afb8	Merge PR #293 : fix: eliminate shell noise from terminal output and fix test failures Authored by 0xbyt4. Wraps commands with unique fence markers to isolate real output from shell init/exit noise (oh-my-zsh, macOS session restore, etc.). Falls back to expanded pattern-based cleaning. Also fixes BSD find fallback and test module shadowing.	2026-03-05 08:48:26 -08:00
teknium1	b4b426c69d	test: add coverage for tee, process substitution, and full-path rm patterns Tests for the three new dangerous command patterns added in PR #280: - TestProcessSubstitutionPattern: 7 tests (bash/sh/zsh/ksh + safe commands) - TestTeePattern: 7 tests (sensitive paths + safe destinations) - TestFindExecFullPathRm: 4 tests (/bin/rm, /usr/bin/rm, bare rm, safe find)	2026-03-05 01:58:33 -08:00
teknium1	e1baab90f7	Merge PR #201 : fix skills hub dedup to prefer higher trust levels Authored by 0xbyt4. The dedup logic in GitHubSource.search() and unified_search() used 'r.trust_level == "trusted"' which let trusted results overwrite builtin ones. Now uses ranked comparison: builtin (2) > trusted (1) > community (0).	2026-03-04 19:40:41 -08:00
teknium1	b336980229	Merge PR #193 : add unit tests for 5 security/logic-critical modules (batch 4) Authored by 0xbyt4. 144 new tests covering gateway/pairing.py, tools/skill_manager_tool.py, tools/skills_tool.py, honcho_integration/session.py, and agent/auxiliary_client.py.	2026-03-04 19:35:01 -08:00
teknium1	7128f95621	Merge PR #390 : fix hidden directory filter broken on Windows Authored by Farukest. Fixes #389. Replaces hardcoded forward-slash string checks ('/.git/', '/.hub/') with Path.parts membership test in _find_all_skills() and scan_skill_commands(). On Windows, str(Path) uses backslashes so the old filter never matched, causing quarantined skills to appear as installed.	2026-03-04 19:22:43 -08:00
teknium1	ffc6d767ec	Merge PR #388 : fix --force bypassing dangerous verdict in should_allow_install Authored by Farukest. Fixes #387. Removes 'and not force' from the dangerous verdict check so --force can never install skills with critical security findings (reverse shells, data exfiltration, etc). The docstring already documented this behavior but the code didn't enforce it.	2026-03-04 19:19:57 -08:00
teknium1	44a2d0c01f	Merge PR #386 : fix symlink boundary check prefix confusion in skills_guard Authored by Farukest. Fixes #385. Replaces startswith() with Path.is_relative_to() in _check_structure() symlink escape check — same fix pattern as skill_view() (PR #352). Prevents symlinks escaping to sibling directories with shared name prefixes.	2026-03-04 19:13:21 -08:00
teknium1	093acd72dd	fix: catch exceptions from check_fn in is_toolset_available() get_definitions() already wrapped check_fn() calls in try/except, but is_toolset_available() did not. A failing check (network error, missing import, bad config) would propagate uncaught and crash the CLI banner, agent startup, and tools-info display. Now is_toolset_available() catches all exceptions and returns False, matching the existing pattern in get_definitions(). Added 4 tests covering exception handling in is_toolset_available(), check_toolset_requirements(), get_definitions(), and check_tool_availability(). Closes #402	2026-03-04 14:22:30 -08:00
Farukest	f93b48226c	fix: use Path.parts for hidden directory filter in skill listing The hidden directory filter used hardcoded forward-slash strings like '/.git/' and '/.hub/' to exclude internal directories. On Windows, Path returns backslash-separated strings, so the filter never matched. This caused quarantined skills in .hub/quarantine/ to appear as installed skills and available slash commands on Windows. Replaced string-based checks with Path.parts membership test which works on both Windows and Unix.	2026-03-04 18:34:16 +03:00
Farukest	4805be0119	fix: prevent --force from overriding dangerous verdict in should_allow_install The docstring states --force should never override dangerous verdicts, but the condition `if result.verdict == "dangerous" and not force` allowed force=True to skip the early return. Execution then fell through to `if force: return True`, bypassing the policy block. Removed `and not force` so dangerous skills are always blocked regardless of the --force flag.	2026-03-04 18:10:18 +03:00
Farukest	a3ca71fe26	fix: use is_relative_to() for symlink boundary check in skills_guard The symlink escape check in _check_structure() used startswith() without a trailing separator. A symlink resolving to a sibling directory with a shared prefix (e.g. 'axolotl-backdoor') would pass the check for 'axolotl' since the string prefix matched. Replaced with Path.is_relative_to() which correctly handles directory boundaries and is consistent with the skill_view path check.	2026-03-04 17:23:23 +03:00
teknium1	70a0a5ff4a	fix: exclude current session from session_search results session_search was returning the current session if it matched the query, which is redundant — the agent already has the current conversation context. This wasted an LLM summarization call and a result slot. Added current_session_id parameter to session_search(). The agent passes self.session_id and the search filters out any results where either the raw or parent-resolved session ID matches. Both the raw match and the parent-resolved match are checked to handle child sessions from delegation. Two tests added verifying the exclusion works and that other sessions are still returned.	2026-03-04 06:06:40 -08:00
teknium1	79871c2083	refactor: use Path.is_relative_to() for skill_view boundary check Replace the string-based startswith + os.sep approach with Path.is_relative_to() (Python 3.9+, we require 3.10+). This is the idiomatic pathlib way to check path containment — it handles separators, case sensitivity, and the equal-path case natively without string manipulation. Simplified tests to match: removed the now-unnecessary test_separator_is_os_native test since is_relative_to doesn't depend on separator choice.	2026-03-04 05:30:43 -08:00
Farukest	e86f391cac	fix: use os.sep in skill_view path boundary check for Windows compatibility	2026-03-04 06:50:06 +03:00
0xbyt4	aefc330b8f	merge: resolve conflict with main (add mcp + homeassistant extras)	2026-03-03 14:52:22 +03:00
0xbyt4	f967471758	merge: resolve conflict with main (keep fence markers + _find_shell)	2026-03-03 14:50:45 +03:00
teknium1	7df14227a9	feat(mcp): banner integration, /reload-mcp command, resources & prompts Banner integration: - MCP Servers section in CLI startup banner between Tools and Skills - Shows each server with transport type, tool count, connection status - Failed servers shown in red; section hidden when no MCP configured - Summary line includes MCP server count - Removed raw print() calls from discovery (banner handles display) /reload-mcp command: - New slash command in both CLI and gateway - Disconnects all MCP servers, re-reads config.yaml, reconnects - Reports what changed (added/removed/reconnected servers) - Allows adding/removing MCP servers without restarting Resources & Prompts support: - 4 utility tools registered per server: list_resources, read_resource, list_prompts, get_prompt - Exposes MCP Resources (data sources) and Prompts (templates) as tools - Proper parameter schemas (uri for read_resource, name for get_prompt) - Handles text and binary resource content - 23 new tests covering schemas, handlers, and registration Test coverage: 74 MCP tests total, 1186 tests pass overall.	2026-03-02 19:15:59 -08:00
teknium1	60effcfc44	fix(mcp): parallel discovery, user-visible logging, config validation - Discovery is now parallel (asyncio.gather) instead of sequential, fixing the 60s shared timeout issue with multiple servers - Startup messages use print() so users see connection status even with default log levels (the 'tools' logger is set to ERROR) - Summary line shows total tools and failed servers count - Validate conflicting config: warn if both 'url' and 'command' are present (HTTP takes precedence) - Update TODO.md: mark MCP as implemented, list remaining work - Add test for conflicting config detection (51 tests total) All 1163 tests pass.	2026-03-02 19:02:28 -08:00
teknium1	64ff8f065b	feat(mcp): add HTTP transport, reconnection, security hardening Upgrades the MCP client implementation from PR #291 with: - HTTP/Streamable HTTP transport: support 'url' key in config for remote MCP servers (Notion, Slack, Sentry, Supabase, etc.) - Automatic reconnection with exponential backoff (1s-60s, 5 retries) when a server connection drops unexpectedly - Environment variable filtering: only pass safe vars (PATH, HOME, etc.) plus user-specified env to stdio subprocesses (prevents secret leaks) - Credential stripping: sanitize error messages before returning to the LLM (strips GitHub PATs, OpenAI keys, Bearer tokens, etc.) - Configurable per-server timeouts: 'timeout' and 'connect_timeout' keys - Fix shutdown race condition in servers_snapshot variable scoping Test coverage: 50 tests (up from 30), including new tests for env filtering, credential sanitization, HTTP config detection, reconnection logic, and configurable timeouts. All 1162 tests pass (1162 passed, 3 skipped, 0 failed).	2026-03-02 18:40:03 -08:00
teknium1	468b7fdbad	Merge PR #291 : feat: add MCP (Model Context Protocol) client support Authored by 0xbyt4. Adds MCP client with official SDK, direct tool registration, auto-injection into hermes-* toolsets, and graceful degradation.	2026-03-02 18:24:31 -08:00
teknium1	dd9d3f89b9	Merge PR #286 : Fix ClawHub Skills Hub adapter for API endpoint changes Authored by BP602. Fixes #285.	2026-03-02 17:25:14 -08:00
teknium1	2ba87a10b0	Merge PR #219 : fix: guard POSIX-only process functions for Windows compatibility Authored by Farukest. Fixes #218.	2026-03-02 17:07:49 -08:00
0xbyt4	11615014a4	fix: eliminate shell noise from terminal output with fence markers - Wrap commands with unique fence markers (printf FENCE; cmd; printf FENCE) to isolate real output from shell init/exit noise (oh-my-zsh, macOS session restore/save, docker plugin errors, etc.) - Expand _clean_shell_noise to cover zsh/macOS patterns and strip from both beginning and end (fallback when fences are missing) - Fix BSD find compatibility: fallback to simple find when -printf produces empty output (macOS) - Fix test_terminal_disk_usage: use sys.modules to get the real module instead of the shadowed function from tools/__init__.py - Add 13 new unit tests for fence extraction and zsh noise patterns	2026-03-02 22:53:21 +03:00
0xbyt4	11a2ecb936	fix: resolve thread safety issues and shutdown deadlock in MCP client - Add threading.Lock protecting all shared state (_servers, _mcp_loop, _mcp_thread) - Fix deadlock in shutdown_mcp_servers: _stop_mcp_loop was called inside a _lock block but also acquires _lock (non-reentrant) - Fix race condition in _ensure_mcp_loop with concurrent callers - Change idempotency to per-server (retry failed servers, skip connected) - Dynamic toolset injection via startswith("hermes-") instead of hardcoded list - Parallel shutdown via asyncio.gather instead of sequential loop - Add tests for partial failure retry, parallel shutdown, dynamic injection	2026-03-02 22:08:32 +03:00
0xbyt4	151e8d896c	fix(tests): isolate discover_mcp_tools tests from global _servers state Patch _servers to empty dict in tests that call discover_mcp_tools() with mocked config, preventing interference from real MCP connections that may exist when running within the full test suite.	2026-03-02 21:38:01 +03:00
0xbyt4	aa2ecaef29	fix: resolve orphan subprocess leak on MCP server shutdown Refactor MCP connections from AsyncExitStack to task-per-server architecture. Each server now runs as a long-lived asyncio Task with `async with stdio_client(...)`, ensuring anyio cancel-scope cleanup happens in the same Task that opened the connection.	2026-03-02 21:22:00 +03:00
0xbyt4	3c252ae44b	feat: add MCP (Model Context Protocol) client support Connect to external MCP servers via stdio transport, discover their tools at startup, and register them into the hermes-agent tool registry. - New tools/mcp_tool.py: config loading, server connection via background event loop, tool handler factories, discovery, and graceful shutdown - model_tools.py: trigger MCP discovery after built-in tool imports - cli.py: call shutdown_mcp_servers in _run_cleanup - pyproject.toml: add mcp>=1.2.0 as optional dependency - 27 unit tests covering config, schema conversion, handlers, registration, SDK interaction, toolset injection, graceful fallback, and shutdown Config format (in ~/.hermes/config.yaml): mcp_servers: filesystem: command: "npx" args: ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]	2026-03-02 21:03:14 +03:00
BP602	6789084ec0	Fix ClawHub Skills Hub adapter for updated API	2026-03-02 16:11:49 +01:00
teknium1	7862e7010c	test: add additional multiline bypass tests for find patterns Extra test coverage for newline bypass detection (DOTALL fix). Inspired by Bartok9's PR #245.	2026-03-02 04:46:27 -08:00
teknium1	4faf2a6cf4	Merge PR #233 : fix(security): add re.DOTALL to prevent multiline bypass of dangerous command detection Authored by Farukest. Fixes #232.	2026-03-02 04:44:06 -08:00
teknium1	6d2481ee5c	Merge PR #231 : fix: use task-specific glob pattern in disk usage calculation Authored by Farukest. Fixes #230.	2026-03-02 04:38:58 -08:00
teknium1	ca5525bcd7	fix(tests): isolate HERMES_HOME in tests and adjust log directory for debug session Added a fixture to redirect HERMES_HOME to a temporary directory during tests, preventing writes to the user's home directory. Updated the test for DebugSession to create a dedicated log directory for saving logs, ensuring test isolation and accuracy in assertions.	2026-03-02 04:34:21 -08:00
teknium1	39bfd226b8	Merge PR #225 : fix: preserve empty content in ReadResult.to_dict() Authored by Farukest. Fixes #224.	2026-03-02 03:13:31 -08:00
teknium1	1cb2311bad	fix(security): block path traversal in skill_view file_path (fixes #220 ) skill_view accepted arbitrary file_path values like '../../.env' and would read files outside the skill directory, exposing API keys and other sensitive data. Added two layers of defense: 1. Reject paths with '..' components (fast, catches obvious traversal) 2. resolve() containment check with trailing '/' to prevent prefix collisions (catches symlinks and edge cases) Fix approach from PR #242 (@Bartok9). Vulnerability reported by @Farukest (#220, PR #221). Tests rewritten to properly mock SKILLS_DIR. Closes #220	2026-03-02 02:00:09 -08:00
0xbyt4	3b745633e4	test: add unit tests for 8 untested modules (batch 3) (#191 ) * test: add unit tests for 8 untested modules (batch 3) New test files (143 tests total): - tools/debug_helpers.py: DebugSession enable/disable, log, save, session info - tools/skills_guard.py: scan_file, scan_skill, trust levels, install policy, structural checks - tools/skills_sync.py: manifest read/write, skill discovery, sync logic - gateway/sticker_cache.py: cache CRUD, sticker injection text builders - gateway/channel_directory.py: channel resolution, display formatting, session building - gateway/hooks.py: hook discovery, sync/async emit, wildcard matching - gateway/mirror.py: session lookup, JSONL append, mirror_to_session - honcho_integration/client.py: config from env/file, session name resolution, linked workspaces Also documents a gap in skills_guard: multi-word prompt injection variants like "ignore all prior instructions" bypass the regex scanner. * test: strengthen sticker injection tests with exact format assertions Replace loose "contains" checks with exact output matching for build_sticker_injection and build_animated_sticker_injection. Add edge cases: set_name without emoji, empty description, empty emoji. * test: remove skills_guard gap-documenting test to avoid conflict with fix PR	2026-03-01 05:28:12 -08:00
0xbyt4	900d48714a	Merge remote-tracking branch 'origin/main' into test/expand-coverage-4 # Conflicts: # tests/agent/test_auxiliary_client.py	2026-03-01 12:11:54 +03:00
0xbyt4	3fdf03390e	Merge remote-tracking branch 'origin/main' into feature/homeassistant-integration # Conflicts: # run_agent.py	2026-03-01 11:59:12 +03:00
0xbyt4	25fb9aafcb	fix: add service domain blocklist and entity_id validation to HA tools Block dangerous HA service domains (shell_command, command_line, python_script, pyscript, hassio, rest_command) that allow arbitrary code execution or SSRF. Add regex validation for entity_id to prevent path traversal attacks. 17 new tests covering both security features.	2026-03-01 11:53:50 +03:00
teknium1	1db5598294	feat(tests): add live integration tests for file operations and shell noise filtering - Introduce a new test suite in `test_file_tools_live.py` to validate file operations and ensure accurate command execution in a real environment. - Implement assertions to check for shell noise contamination in outputs, enhancing the reliability of command results. - Create fixtures for setting up a local environment and populating directories with known file contents for comprehensive testing. - Refactor shell noise handling in `process_registry.py` and `local.py` to support multiple noise patterns, improving output cleanliness.	2026-02-28 22:57:58 -08:00
Teknium	5a79e423fe	Merge branch 'main' into codex/align-codex-provider-conventions-mainrepo	2026-02-28 18:13:38 -08:00
Farukest	7166647ca1	fix(security): add re.DOTALL to prevent multiline bypass of dangerous command detection	2026-03-01 03:23:29 +03:00
Farukest	f7300a858e	fix(tools): use task-specific glob pattern in disk usage calculation	2026-03-01 03:17:50 +03:00
Farukest	7f1f4c2248	fix(tools): preserve empty content in ReadResult.to_dict()	2026-03-01 02:42:15 +03:00
Farukest	3f58e47c63	fix: guard POSIX-only process functions for Windows compatibility os.setsid, os.killpg, and os.getpgid do not exist on Windows and raise AttributeError on import or first call. This breaks the terminal tool, code execution sandbox, process registry, and WhatsApp bridge on Windows. Added _IS_WINDOWS platform guard in all four affected files, following the pattern documented in CONTRIBUTING.md. On Windows, preexec_fn is set to None and process termination falls back to proc.terminate() / proc.kill() instead of process group signals. Files changed: - tools/environments/local.py (3 call sites) - tools/process_registry.py (2 call sites) - tools/code_execution_tool.py (3 call sites) - gateway/platforms/whatsapp.py (3 call sites)	2026-03-01 01:54:27 +03:00
0xbyt4	08250a53a1	fix: skills hub dedup prefers higher trust levels + 43 tests - unified_search and GitHubSource.search dedup: replace naive `trust_level == "trusted"` check with ranked comparison so "builtin" results are never overwritten by "trusted" or "community" - Add 43 unit tests covering _parse_frontmatter_quick, trust_level_for, HubLockFile CRUD, TapsManager ops, LobeHub _convert_to_skill_md, unified_search dedup (with regression test), and append_audit_log	2026-02-28 21:25:55 +03:00
0xbyt4	46506769f1	test: add unit tests for 5 security/logic-critical modules (batch 4) - gateway/pairing.py: rate limiting, lockout, code expiry, approval flow (28 tests) - tools/skill_manager_tool.py: validation, path traversal prevention, CRUD (46 tests) - tools/skills_tool.py: frontmatter/tag parsing, skill discovery, view chain (34 tests) - agent/auxiliary_client.py: auth reading, API key resolution, param branching (16 tests) - honcho_integration/session.py: session dataclass, ID sanitization, transcript format (20 tests)	2026-02-28 20:33:48 +03:00
0xbyt4	2390728cc3	fix: resolve 4 bugs found in HA integration code review - Auto-authorize HA events in gateway (system-generated, not user messages) - Guard _read_events against None/closed WebSocket after failed reconnect - Use UUID for send() message_id instead of polluting WS sequence counter - entity_id parameter now takes precedence over data["entity_id"]	2026-02-28 15:12:18 +03:00
0xbyt4	c36b256de5	feat: add Home Assistant integration (REST tools + WebSocket gateway) - Add ha_list_entities, ha_get_state, ha_call_service tools via REST API - Add WebSocket gateway adapter for real-time state_changed event monitoring - Support domain/entity filtering, cooldown, and auto-reconnect with backoff - Use REST API for outbound notifications to avoid WS race condition - Gate tool availability on HASS_TOKEN env var - Add 82 unit tests covering real logic (filtering, payload building, event pipeline)	2026-02-28 13:32:48 +03:00
Teknium	0d2ac1c07f	Merge pull request #121 from Bartok9/test-clarify-tool test(tools): add unit tests for clarify_tool.py	2026-02-27 16:27:37 -08:00
Teknium	3526fa27fd	Merge pull request #62 from 0xbyt4/test/expand-coverage-2 test: add unit tests for 8 modules (batch 2)	2026-02-27 01:47:30 -08:00
Teknium	152271851f	Merge pull request #63 from 0xbyt4/fix/cron-prompt-injection-bypass fix: cron prompt injection scanner bypass for multi-word variants	2026-02-27 01:34:14 -08:00
Teknium	0909be3aa8	Merge pull request #61 from 0xbyt4/fix/write-deny-macos-symlink fix: resolve symlink bypass in write deny list on macOS	2026-02-27 01:32:19 -08:00
Teknium	274e623b50	Merge pull request #60 from 0xbyt4/test/expand-coverage test: add unit tests for 8 untested core modules	2026-02-27 01:30:36 -08:00
Bartok Moltbot	df8a62d018	test(tools): add unit tests for clarify_tool.py Add comprehensive test coverage for the clarify_tool module: - TestClarifyToolBasics: 5 tests for core functionality - Simple questions, questions with choices, error handling - TestClarifyToolChoicesValidation: 5 tests for choices parameter - MAX_CHOICES enforcement, empty/whitespace handling, type conversion - TestClarifyToolCallbackHandling: 3 tests for callback behavior - Exception handling, question/response trimming - TestCheckClarifyRequirements: 1 test verifying always-true behavior - TestClarifySchema: 6 tests verifying OpenAI function schema - Required/optional parameters, maxItems constraint Total: 20 tests covering all public functions and edge cases.	2026-02-27 03:29:26 -05:00
George Pickett	32070e6bc0	Merge remote-tracking branch 'origin/main' into codex/align-codex-provider-conventions-mainrepo # Conflicts: # cron/scheduler.py # gateway/run.py # tools/delegate_tool.py	2026-02-26 10:56:29 -08:00
darya	f5c09a3aba	test: add regression tests for recursive delete false positive fix Add 15 new tests in two classes: - TestRmFalsePositiveFix (8 tests): verify filenames starting with 'r' (readme.txt, requirements.txt, report.csv, etc.) are NOT falsely flagged as 'recursive delete' - TestRmRecursiveFlagVariants (7 tests): verify all recursive delete flag styles (-r, -rf, -rfv, -fr, -irf, --recursive, sudo rm -rf) are still correctly caught All 29 tests pass (14 existing + 15 new).	2026-02-26 16:40:44 +03:00
0xbyt4	feea8332d6	fix: cron prompt injection scanner bypass for multi-word variants The regex `ignore\s+(previous\|all\|above\|prior)\s+instructions` only allowed ONE word between "ignore" and "instructions". Multi-word variants like "Ignore ALL prior instructions" bypassed the scanner because "ALL" matched the alternation but then `\s+instructions` failed to match "prior". Fix: use `(?:\w+\s+)*` groups to allow optional extra words before and after the keyword alternation.	2026-02-26 13:55:54 +03:00
0xbyt4	ffbdd7fcce	test: add unit tests for 8 modules (batch 2) Cover model_tools, toolset_distributions, context_compressor, prompt_caching, cronjob_tools, session_search, process_registry, and cron/scheduler with 127 new test cases.	2026-02-26 13:54:20 +03:00
0xbyt4	b699cf8c48	test: remove /etc platform-conditional tests from file_operations These tests documented the macOS symlink bypass bug with platform-conditional assertions. The fix and proper regression tests are in PR #61 (tests/tools/test_write_deny.py), so remove them here to avoid ordering conflicts between the two PRs.	2026-02-26 13:43:30 +03:00
0xbyt4	2efd9bbac4	fix: resolve symlink bypass in write deny list on macOS On macOS, /etc is a symlink to /private/etc. The _is_write_denied() function resolves the input path with os.path.realpath() but the deny list entries were stored as literal strings ("/etc/shadow"). This meant the resolved path "/private/etc/shadow" never matched, allowing writes to sensitive system files on macOS. Fix: Apply os.path.realpath() to deny list entries at module load time so both sides of the comparison use resolved paths. Adds 19 regression tests in tests/tools/test_write_deny.py.	2026-02-26 13:30:55 +03:00
0xbyt4	0ac3af8776	test: add unit tests for 8 untested modules Add comprehensive test coverage for: - cron/jobs.py: schedule parsing, job CRUD, due-job detection (34 tests) - tools/memory_tool.py: security scanning, MemoryStore ops, dispatcher (32 tests) - toolsets.py: resolution, validation, composition, cycle detection (19 tests) - tools/file_operations.py: write deny list, result dataclasses, helpers (37 tests) - agent/prompt_builder.py: context scanning, truncation, skills index (24 tests) - agent/model_metadata.py: token estimation, context lengths (16 tests) - hermes_state.py: SessionDB SQLite CRUD, FTS5 search, export, prune (28 tests) Total: 210 new tests, all passing (380 total suite).	2026-02-26 13:27:58 +03:00
teknium1	178658bf9f	test: enhance session source tests and add validation for chat types - Renamed test method for clarity and added comprehensive tests for `SessionSource` including handling of numeric `chat_id`, missing optional fields, and invalid platforms. - Introduced tests for session source descriptions based on chat types and names, ensuring accurate representation in prompts. - Improved file tools tests by validating schema structures, ensuring no duplicate model IDs, and enhancing error handling in file operations.	2026-02-26 00:53:57 -08:00
0xbyt4	8fc28c34ce	test: reorganize test structure and add missing unit tests Reorganize flat tests/ directory to mirror source code structure (tools/, gateway/, hermes_cli/, integration/). Add 11 new test files covering previously untested modules: registry, patch_parser, fuzzy_match, todo_tool, approval, file_tools, gateway session/config/ delivery, and hermes_cli config/models. Total: 147 unit tests passing, 9 integration tests gated behind pytest marker.	2026-02-26 03:20:08 +03:00

... 5 6 7 8 9 ...

623 Commits