The check_interval parameter on terminal_tool sent periodic output
updates to the gateway chat, but these were display-only — the agent
couldn't see or act on them. This added schema bloat and introduced
a bug where notify_on_complete=True was silently dropped when
check_interval was also set (the not-check_interval guard skipped
fast-watcher registration, and the check_interval watcher dict
was missing the notify_on_complete key).
Removing check_interval entirely:
- Eliminates the notify_on_complete interaction bug
- Reduces tool schema size (one fewer parameter for the model)
- Simplifies the watcher registration path
- notify_on_complete (agent wake-on-completion) still works
- watch_patterns (output alerting) still works
- process(action='poll') covers manual status checking
Closes#7947 (root cause eliminated rather than patched).
Deduplicate todo items by ID before writing to the store, keeping the
last occurrence. Prevents ghost entries when the model sends duplicate
IDs in a single write() call, which corrupts subsequent merge operations.
Co-authored-by: WAXLYY <WAXLYY@users.noreply.github.com>
Three root causes of the 'agent stops mid-task' gateway bug:
1. Compression threshold floor (64K tokens minimum)
- The 50% threshold on a 100K-context model fired at 50K tokens,
causing premature compression that made models lose track of
multi-step plans. Now threshold_tokens = max(50% * context, 64K).
- Models with <64K context are rejected at startup with a clear error.
2. Budget warning removal — grace call instead
- Removed the 70%/90% iteration budget warnings entirely. These
injected '[BUDGET WARNING: Provide your final response NOW]' into
tool results, causing models to abandon complex tasks prematurely.
- Now: no warnings during normal execution. When the budget is
actually exhausted (90/90), inject a user message asking the model
to summarise, allow one grace API call, and only then fall back
to _handle_max_iterations.
3. Activity touches during long terminal execution
- _wait_for_process polls every 0.2s but never reported activity.
The gateway's inactivity timeout (default 1800s) would fire during
long-running commands that appeared 'idle.'
- Now: thread-local activity callback fires every 10s during the
poll loop, keeping the gateway's activity tracker alive.
- Agent wires _touch_activity into the callback before each tool call.
Also: docs update noting 64K minimum context requirement.
Closes#7915 (root cause was agent-loop termination, not Weixin delivery limits).
Complete the contextvars migration by adding HERMES_SESSION_KEY to the
unified _VAR_MAP in session_context.py. Without this, concurrent gateway
handlers race on os.environ["HERMES_SESSION_KEY"].
- Add _SESSION_KEY ContextVar to _VAR_MAP, set_session_vars(), clear_session_vars()
- Wire session_key through _set_session_env() from SessionContext
- Replace os.getenv fallback in tools/approval.py with get_session_env()
(function-level import to avoid cross-layer coupling)
- Keep os.environ set as CLI/cron fallback
Cherry-picked from PR #7878 by 0xbyt4.
Add a second WeCom integration mode for regular enterprise self-built
applications. Unlike the existing bot/websocket adapter (wecom.py),
this handles WeCom's standard callback flow: WeCom POSTs encrypted XML
to an HTTP endpoint, the adapter decrypts, queues for the agent, and
immediately acknowledges. The agent's reply is delivered proactively
via the message/send API.
Key design choice: always acknowledge immediately and use proactive
send — agent sessions take 3-30 minutes, so the 5-second inline reply
window is never useful. The original PR's Future/pending-reply
machinery was removed in favour of this simpler architecture.
Features:
- AES-CBC encrypt/decrypt (BizMsgCrypt-compatible)
- Multi-app routing scoped by corp_id:user_id
- Legacy bare user_id fallback for backward compat
- Access-token management with auto-refresh
- WECOM_CALLBACK_* env var overrides
- Port-in-use pre-check before binding
- Health endpoint at /health
Salvaged from PR #7774 by @chqchshj. Simplified by removing the
inline reply Future system and fixing: secrets.choice for nonce
generation, immediate plain-text acknowledgment (not encrypted XML
containing 'success'), and initial token refresh error handling.
Adds _normalize_path() helper that calls expanduser().resolve() to
properly handle tilde paths (e.g. ~/.hermes, ~/.config). Previously
Path.resolve() alone treated ~ as a literal directory name, producing
invalid paths like /root/~/.hermes.
Also improves _run_git() error handling to distinguish missing working
directories from missing git executable, and adds pre-flight directory
validation.
Cherry-picked from PR #7898 by faishal882.
Fixes#7807
_write_to_sandbox interpolated storage_dir and remote_path directly into
a shell command passed to env.execute(). Paths containing shell
metacharacters (spaces, semicolons, $(), backticks) could trigger
arbitrary command execution inside the sandbox.
Fix: wrap both paths with shlex.quote(). Clean paths (alphanumeric +
slashes/hyphens/dots) are left unmodified by shlex.quote, so existing
behavior is unchanged. Paths with unsafe characters get single-quoted.
Tests added for spaces, $(command) substitution, and semicolon injection.
This commit addresses a security vulnerability where unsanitized user inputs for commit_hash and file_path were passed directly to git commands in CheckpointManager.restore() and diff(). It validates commit hashes to be strictly hexadecimal characters without leading dashes (preventing flag injection like '--patch') and enforces file paths to stay within the working directory via root resolution. Regression tests test_restore_rejects_argument_injection, test_restore_rejects_invalid_hex_chars, and test_restore_rejects_path_traversal were added.
The interrupt mechanism in tools/interrupt.py used a process-global
threading.Event. In the gateway, multiple agents run concurrently in
the same process via run_in_executor. When any agent was interrupted
(user sends a follow-up message), the global flag killed ALL agents'
running tools — terminal commands, browser ops, web requests — across
all sessions.
Changes:
- tools/interrupt.py: Replace single threading.Event with a set of
interrupted thread IDs. set_interrupt() targets a specific thread;
is_interrupted() checks the current thread. Includes a backward-
compatible _ThreadAwareEventProxy for legacy _interrupt_event usage.
- run_agent.py: Store execution thread ID at start of run_conversation().
interrupt() and clear_interrupt() pass it to set_interrupt() so only
this agent's thread is affected.
- tools/code_execution_tool.py: Use is_interrupted() instead of
directly checking _interrupt_event.is_set().
- tools/process_registry.py: Same — use is_interrupted().
- tests: Update interrupt tests for per-thread semantics. Add new
TestPerThreadInterruptIsolation with two tests verifying cross-thread
isolation.
When a Python process exits uncleanly (SIGKILL, crash, gateway restart
via hermes update), in-memory _active_sessions tracking is lost but the
agent-browser node daemons and their Chromium child processes keep
running indefinitely. On a long-running system this causes unbounded
memory growth — 24 orphaned sessions consumed 7.6 GB on a production
machine over 9 days.
Add _reap_orphaned_browser_sessions() which scans the tmp directory for
agent-browser-{h_*,cdp_*} socket dirs on cleanup thread startup. For
each dir not tracked by the current process, reads the daemon PID file
and sends SIGTERM if the daemon is still alive. Handles edge cases:
dead PIDs, corrupt PID files, permission errors, foreign processes.
The reaper runs once on thread startup (not every 30s) to avoid races
with sessions being actively created by concurrent agents.
Background process watchers (notify_on_complete, check_interval) created
synthetic SessionSource objects without user_id/user_name. While the
internal=True bypass (1d8d4f28) prevented false pairing for agent-
generated notifications, the missing identity caused:
- Garbage entries in pairing rate limiters (discord:None, telegram:None)
- 'User None' in approval messages and logs
- No user identity available for future code paths that need it
Additionally, platform messages arriving without from_user (Telegram
service messages, channel forwards, anonymous admin actions) could still
trigger false pairing because they are not internal events.
Fix:
1. Propagate user_id/user_name through the full watcher chain:
session_context.py → gateway/run.py → terminal_tool.py →
process_registry.py (including checkpoint persistence/recovery)
2. Add None user_id guard in _handle_message() — silently drop
non-internal messages with no user identity instead of triggering
the pairing flow.
Salvaged from PRs #7664 (kagura-agent, ContextVar approach),
#6540 (MestreY0d4-Uninter, tests), and #7709 (guang384, None guard).
Closes#6341, #6485, #7643
Relates to #6516, #7392
Independent halving of width and height caused aspect ratio distortion
for extreme dimensions (e.g. 8000x200 panoramas). When one axis hit the
64px floor, the other kept shrinking — collapsing the ratio toward 1:1.
Use proportional scaling instead: when either dimension hits the floor,
derive the effective scale factor and apply it to both axes.
Add tests for extreme panorama (8000x200) and tall narrow (200x6000)
images to verify aspect ratio preservation.
Cherry-picked from PR #7749 by kshitijk4poor with modifications:
- Raise hard image limit from 5 MB to 20 MB (matches most restrictive provider)
- Send images at full resolution first; only auto-resize to 5 MB on API failure
- Add _is_image_size_error() helper to detect size-related API rejections
- Auto-resize uses Pillow (soft dep) with progressive downscale + JPEG quality reduction
- Fix get_model_capabilities() to check modalities.input for vision support
- Increase default vision timeout from 30s to 120s (matches hardcoded fallback intent)
- Applied retry-with-resize to both vision_analyze_tool and browser_vision
Closes#7740
* feat: add watch_patterns to background processes for output monitoring
Adds a new 'watch_patterns' parameter to terminal(background=true) that
lets the agent specify strings to watch for in process output. When a
matching line appears, a notification is queued and injected as a
synthetic message — triggering a new agent turn, similar to
notify_on_complete but mid-process.
Implementation:
- ProcessSession gets watch_patterns field + rate-limit state
- _check_watch_patterns() in ProcessRegistry scans new output chunks
from all three reader threads (local, PTY, env-poller)
- Rate limited: max 8 notifications per 10s window
- Sustained overload (45s) permanently disables watching for that process
- watch_queue alongside completion_queue, same consumption pattern
- CLI drains watch_queue in both idle loop and post-turn drain
- Gateway drains after agent runs via _inject_watch_notification()
- Checkpoint persistence + crash recovery includes watch_patterns
- Blocked in execute_code sandbox (like other bg params)
- 20 new tests covering matching, rate limiting, overload kill,
checkpoint persistence, schema, and handler passthrough
Usage:
terminal(
command='npm run dev',
background=true,
watch_patterns=['ERROR', 'WARN', 'listening on port']
)
* refactor: merge watch_queue into completion_queue
Unified queue with 'type' field distinguishing 'completion',
'watch_match', and 'watch_disabled' events. Extracted
_format_process_notification() in CLI and gateway to handle
all event types in a single drain loop. Removes duplication
across both CLI drain sites and the gateway.
Three fixes for vision_analyze returning cryptic 400 "Invalid request data":
1. Pre-flight base64 size check — base64 inflates data ~33%, so a 3.8 MB
file exceeds the 5 MB API limit. Reject early with a clear message
instead of letting the provider return a generic 400.
2. Handle file:// URIs — strip the scheme and resolve as a local path.
Previously file:///path/to/image.png fell through to the "invalid
image source" error since it matched neither is_file() nor http(s).
3. Separate invalid_request errors from "does not support vision" errors
so the user gets actionable guidance (resize/compress/retry) instead
of a misleading "model does not support vision" message.
Closes#6677
vision_tools.py: _download_image() loads the full HTTP response body into
memory via response.content (line 190) with no Content-Length check and no
max file size limit. An attacker-hosted multi-gigabyte file causes OOM.
Add a 50 MB hard cap: check Content-Length header before download, and
verify actual body size before writing to disk.
hermes_parser.py: tc_data["name"] at line 57 raises KeyError when the LLM
outputs a tool call JSON without a "name" field. The outer except catches
it silently, causing the entire tool call to be lost with zero diagnostics.
Add "name" field validation before constructing the ChatCompletionMessage.
mistral_parser.py: tc["name"] at line 101 has the same KeyError issue in
the pre-v11 format path. The fallback decoder (line 112) already checks
"name" correctly, but the primary path does not. Add validation to match.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
process_registry.py: _reader_loop() has process.wait() after the try-except
block (line 380). If the reader thread crashes with an unexpected exception
(e.g. MemoryError, KeyboardInterrupt), control exits the except handler but
skips wait() — leaving the child as a zombie process. Move wait() and the
cleanup into a finally block so the child is always reaped.
cron/scheduler.py: _run_job_script() only redacts secrets in stdout on the
SUCCESS path (line 417-421). When a cron script fails (non-zero exit), both
stdout and stderr are returned WITHOUT redaction (lines 407-413). A script
that accidentally prints an API key to stderr during a failure would leak it
into the LLM context. Move redaction before the success/failure branch so
both paths benefit.
skill_commands.py: _build_skill_message() enumerates supporting files using
rglob("*") but only checks is_file() (line 171) without filtering symlinks.
PR #6693 added symlink protection to scan_skill_commands() but missed this
function. A malicious skill can create symlinks in references/ pointing to
arbitrary files, exposing their paths (and potentially content via skill_view)
to the LLM. Add is_symlink() check to match the guard in scan_skill_commands.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
_discover_bundled_skills() used the directory name to identify skills,
but skills_tool.py and skills_hub.py use the `name:` field from SKILL.md
frontmatter. This mismatch caused 9 builtin skills whose directory name
differs from their SKILL.md name to be written to .bundled_manifest
under the wrong key, so `hermes skills list` showed them as "local"
instead of "builtin".
Read the frontmatter name field (with directory-name fallback) so the
manifest keys match what the rest of the codebase expects.
Closes#6835
- Remove unreachable `if not content_sample` branch inside the truthy
`if content_sample` block in `_is_likely_binary()` (dead code that
could never execute).
- Replace `linter_cmd.format(file=...)` with `linter_cmd.replace("{file}", ...)`
in `_check_lint()` so file paths containing curly braces (e.g.
`src/{test}.py`) no longer raise KeyError/ValueError.
- Add 16 unit tests covering both fixes and edge cases.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add delegation.reasoning_effort config key so subagents can run at a
different thinking level than the parent agent. When set, overrides
the parent's reasoning_config; when empty, inherits as before.
Valid values: xhigh, high, medium, low, minimal, none (disables thinking).
Config path: delegation.reasoning_effort in config.yaml
Files changed:
- tools/delegate_tool.py: resolve override in _build_child_agent
- hermes_cli/config.py: add reasoning_effort to DEFAULT_CONFIG
- tests/tools/test_delegate.py: 4 new tests covering all cases
warnings.warn() is suppressed/invisible when running as a gateway
or agent. Switch to logger.warning() so the disk cap message
actually appears in logs.
Fixes#7362 (item 3).
FileSyncManager now accepts an optional bulk_upload_fn callback.
When provided, all changed files are uploaded in one call instead
of iterating one-by-one with individual HTTP POSTs.
DaytonaEnvironment wires this to sandbox.fs.upload_files() which
batches everything into a single multipart POST — ~580 files goes
from ~5 min to <2s on init.
Parent directories are pre-created in one mkdir -p call.
Fixes#7362 (item 1).
When two gateway messages arrived concurrently, _set_session_env wrote
HERMES_SESSION_PLATFORM/CHAT_ID/CHAT_NAME/THREAD_ID into the process-global
os.environ. Because asyncio tasks share the same process, Message B would
overwrite Message A's values mid-flight, causing background-task notifications
and tool calls to route to the wrong thread/chat.
Replace os.environ with Python's contextvars.ContextVar. Each asyncio task
(and any run_in_executor thread it spawns) gets its own copy, so concurrent
messages never interfere.
Changes:
- New gateway/session_context.py with ContextVar definitions, set/clear/get
helpers, and os.environ fallback for CLI/cron/test backward compatibility
- gateway/run.py: _set_session_env returns reset tokens, _clear_session_env
accepts them for proper cleanup in finally blocks
- All tool consumers updated: cronjob_tools, send_message_tool, skills_tool,
terminal_tool (both notify_on_complete AND check_interval blocks), tts_tool,
agent/skill_utils, agent/prompt_builder
- Tests updated for new contextvar-based API
Fixes#7358
Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
Call child.close() in the _run_single_child finally block after
unregistering the child from the parent's active children list.
Previously child AIAgent instances were only removed from the tracking
list but never had their resources released — the OpenAI/httpx client
and any tool subprocesses relied entirely on garbage collection.
Ref: #7131
- Bug 1: replace read_file(limit=10000) with read_file_raw in _apply_update,
preventing silent truncation of files >2000 lines and corruption of lines
>2000 chars; add read_file_raw to FileOperations abstract interface and
ShellFileOperations
- Bug 2: split apply_v4a_operations into validate-then-apply phases; if any
hunk fails validation, zero writes occur (was: continue after failure,
leaving filesystem partially modified)
- Bug 3: parse_v4a_patch now returns an error for begin-marker-with-no-ops,
empty file paths, and moves missing a destination (was: always returned
error=None)
- Bug 4: raise strategy 7 (block anchor) single-candidate similarity threshold
from 0.10 to 0.50, eliminating false-positive matches in repetitive code
- Bug 5: add _strategy_unicode_normalized (new strategy 7) with position
mapping via _build_orig_to_norm_map; smart quotes and em-dashes in
LLM-generated patches now match via strategies 1-6 before falling through
to fuzzy strategies
- Bug 6: extend fuzzy_find_and_replace to return 4-tuple (content, count,
error, strategy); update all 5 call sites across patch_parser.py,
file_operations.py, and skill_manager_tool.py
- Bug 7: guard in _apply_update returns error when addition-only context hint
is ambiguous (>1 occurrences); validation phase errors on both 0 and >1
- Bug 8: _apply_delete returns error (not silent success) on missing file
- Bug 9: _validate_operations checks source existence and destination absence
for MOVE operations before any write occurs
Three locations perform `int()` conversion on environment variables or
HTTP headers without error handling, causing unhandled `ValueError` crashes
when the values are non-numeric:
1. `send_message_tool.py` — `EMAIL_SMTP_PORT` env var parsed outside the
try/except block; a non-numeric value crashes `_send_email()` instead
of returning a user-friendly error.
2. `process_registry.py` — `TERMINAL_TIMEOUT` env var parsed without
protection; a non-numeric value crashes the `wait()` method.
3. `skills_hub.py` — HTTP `Retry-After` header can contain date strings
per RFC 7231; `int()` conversion crashes on non-numeric values.
All three now fall back to their default values on `ValueError`/`TypeError`.
Two issues with sandbox container spawning:
1. PID 1 was `sleep 2h` which doesn't call wait() — every background
process that exited became a zombie (<defunct>), and the process
tool reported them as "running" because zombie PIDs still exist in
the process table. Fix: add --init to docker run, which uses
tini (Docker) or catatonit (Podman) as PID 1 to reap children
automatically. Both runtimes support --init natively.
2. The fixed 2-hour lifetime was arbitrary and sometimes too short
for long agent sessions. Fix: replace 'sleep 2h' with
'sleep infinity'. The idle reaper (_cleanup_inactive_envs, gated
by terminal.lifetime_seconds, default 300s) already handles
cleanup based on last activity timestamp — there's no need for
the container itself to have a fixed death timer.
Fixes#6908.
`delegate_task` silently truncated batch tasks to 3 — the model sends
5 tasks, gets results for 3, never told 2 were dropped. Now returns a
clear tool_error explaining the limit and how to fix it.
The limit is configurable via:
- delegation.max_concurrent_children in config.yaml (priority 1)
- DELEGATION_MAX_CONCURRENT_CHILDREN env var (priority 2)
- default: 3
Uses the same _load_config() path as the rest of delegate_task for
consistent config priority. Clamps to min 1, warns on non-integer
config values.
Also removes the hardcoded maxItems: 3 from the JSON schema — the
schema was blocking the model from even attempting >3 tasks before
the runtime check could fire. The runtime check gives a much more
actionable error message.
Backwards compatible: default remains 3, existing configs unchanged.
Isolate system tool configs (git, ssh, gh, npm) per profile by injecting
a per-profile HOME into subprocess environments only. The Python
process's own os.environ['HOME'] and Path.home() are never modified,
preserving all existing profile infrastructure.
Activation is directory-based: when {HERMES_HOME}/home/ exists on disk,
subprocesses see it as HOME. The directory is created automatically for:
- Docker: entrypoint.sh bootstraps it inside the persistent volume
- Named profiles: added to _PROFILE_DIRS in profiles.py
Injection points (all three subprocess env builders):
- tools/environments/local.py _make_run_env() — foreground terminal
- tools/environments/local.py _sanitize_subprocess_env() — background procs
- tools/code_execution_tool.py child_env — execute_code sandbox
Single source of truth: hermes_constants.get_subprocess_home()
Closes#4426
hermes skills browse ran all 7 source adapters serially with no overall
timeout and no progress indicator. On a cold cache, GitHubSource alone
could make 100+ sequential HTTP calls (directory listing + inspect per
skill per tap), taking 5+ minutes with no output — appearing to hang.
Changes:
- Add parallel_search_sources() in tools/skills_hub.py that runs all
source adapters concurrently via ThreadPoolExecutor with a 30s
overall timeout. Sources that finish in time contribute results;
slow ones are skipped gracefully with a visible notice.
- Update unified_search() to use parallel_search_sources() internally.
- Update do_browse() and do_search() in hermes_cli/skills_hub.py to
show a Rich spinner while fetching, so the user sees activity.
- Bump per-source limits (clawhub 50→500, lobehub 50→500, etc.) now
that fetching is parallel — yields far more results per browse.
- Report timed-out sources and suggest re-running for cached results.
- Replace 'inspect/install' footer with 'search deeper' tip.
Worst-case latency drops from 5+ minutes (serial) to ~30s (parallel
with timeout cap). Result count should jump from ~242 to 1000+.
When delegate_task runs, the parent agent's activity tracker freezes
because child.run_conversation() blocks and the child's own
_touch_activity() never propagates back to the parent. The gateway
inactivity timeout then fires a spurious 'No activity' warning and
eventually kills the agent, even though the subagent is actively working.
Fix: add a heartbeat thread in _run_single_child that calls
parent._touch_activity() every 30 seconds with detail from the child's
activity summary (current tool, iteration count). The thread is a daemon
that starts before child.run_conversation() and is cleaned up in the
finally block.
This also improves the gateway 'Still working...' status messages —
instead of just 'running: delegate_task', users now see what the
subagent is actually doing (e.g., 'delegate_task: subagent running
terminal (iteration 5/50)').
- Remove sys.path.insert hack (leftover from standalone dev)
- Add token lock (acquire_scoped_lock/release_scoped_lock) in
connect()/disconnect() to prevent duplicate pollers across profiles
- Fix get_connected_platforms: WEIXIN check must precede generic
token/api_key check (requires both token AND account_id)
- Add WEIXIN_HOME_CHANNEL_NAME to _EXTRA_ENV_KEYS
- Add gateway setup wizard with QR login flow
- Add platform status check for partially configured state
- Add weixin.md docs page with full adapter documentation
- Update environment-variables.md reference with all 11 env vars
- Update sidebars.ts to include weixin docs page
- Wire all gateway integration points onto current main
Salvaged from PR #6747 by Zihan Huang.
Four gaps in DANGEROUS_PATTERNS found by running 10 targeted tests that
each mapped to a specific pattern in approval.py and checked whether the
documented defense actually held.
1. **Heredoc script injection** — `python3 << 'EOF'` bypasses the
existing `-e`/`-c` flag pattern. Adds pattern for interpreter + `<<`
covering python{2,3}, perl, ruby, node.
2. **PID expansion self-termination** — `kill -9 $(pgrep hermes)` is
opaque to the existing `pkill|killall` + name pattern because command
substitution is not expanded at detection time. Adds structural
patterns matching `kill` + `$(pgrep` and backtick variants.
3. **Git destructive operations** — `git reset --hard`, `push --force`,
`push -f`, `clean -f*`, and `branch -D` were entirely absent.
Note: `branch -d` also triggers because IGNORECASE is global —
acceptable since -d is still a delete, just a safe one, and the
prompt is only a confirmation, not a hard block.
4. **chmod +x then execute** — two-step social engineering where a
script containing dangerous commands is first written to disk (not
checked by write_file), then made executable and run as `./script`.
Pattern catches `chmod +x ... [;&|]+ ./` combos. Does not solve the
deeper architectural issue (write_file not checking content) — that
is called out in the PR description as a known limitation.
Tests: 23 new cases across 4 test classes, all in test_approval.py:
- TestHeredocScriptExecution (7 cases, incl. regressions for -c)
- TestPgrepKillExpansion (5 cases, incl. safe kill PID negative)
- TestGitDestructiveOps (8 cases, incl. safe git status/push negatives)
- TestChmodExecuteCombo (3 cases, incl. safe chmod-only negative)
Full suite: 146 passed, 0 failed.
Follow-up to Dusk1e's PR #7120 (Slack send_image redirect guard):
- Rename _safe_url_for_log -> safe_url_for_log (drop underscore) since
it is now imported cross-module by the Slack adapter
- Add _ssrf_redirect_guard httpx event hook to cache_image_from_url()
and cache_audio_from_url() in base.py — same pattern as vision_tools
and the Slack adapter fix
- Update url_safety.py docstring to reflect broader coverage
- Add regression tests for image/audio redirect blocking + safe passthrough
When kill_process() sends SIGTERM, both it and the reader thread race
to call _move_to_finished() — kill_process sets exit_code=-15 and
enqueues a notification, then the reader thread's process.wait()
returns with exit_code=143 (128+SIGTERM) and enqueues a second one.
Fix: make _move_to_finished() idempotent by tracking whether the
session was actually removed from _running. The second call sees it
was already moved and skips the completion_queue.put().
Adds regression test: test_move_to_finished_idempotent_no_duplicate
Automated dead code audit using vulture + coverage.py + ast-grep intersection,
confirmed by Opus deep verification pass. Every symbol verified to have zero
production callers (test imports excluded from reachability analysis).
Removes ~1,534 lines of dead production code across 46 files and ~1,382 lines
of stale test code. 3 entire files deleted (agent/builtin_memory_provider.py,
hermes_cli/checklist.py, tests/hermes_cli/test_setup_model_selection.py).
Co-authored-by: alt-glitch <balyan.sid@gmail.com>
When an MCP server returns both content (model-oriented text) and
structuredContent (machine-oriented JSON), the client now combines
them instead of discarding content. The text content becomes the
primary result (what the agent reads), and structuredContent is
included as supplementary metadata.
Previously, structuredContent took full precedence — causing data
loss for servers like Desktop Commander that put the actual file
text in content and metadata in structuredContent.
MCP spec guidance: for conversational/agent UX, prefer content.
Legacy flat stt.model config key (from cli-config.yaml.example and older
versions) was passed as a model override to transcribe_audio() by the
gateway, bypassing provider-specific model resolution. When the provider
was 'local' (faster-whisper), this caused:
ValueError: Invalid model size 'whisper-1'
Changes:
- gateway/run.py, discord.py: stop passing model override — let
transcribe_audio() handle provider-specific model resolution internally
- get_stt_model_from_config(): now provider-aware, reads from the correct
nested section (stt.local.model, stt.openai.model, etc.); ignores
legacy flat key for local provider to prevent model name mismatch
- cli-config.yaml.example: updated STT section to show nested provider
config structure instead of legacy flat key
- config migration v13→v14: moves legacy stt.model to the correct
provider section and removes the flat key
Reported by community user on Discord.
Add Discord thread support to cron delivery and send_message_tool.
- _parse_target_ref: handle discord platform with chat_id:thread_id format
- _send_discord: add thread_id param, route to /channels/{thread_id}/messages
- _send_to_platform: pass thread_id through for Discord
- Discord adapter send(): read thread_id from metadata for gateway path
- Update tool schema description to document Discord thread targets
Cherry-picked from PR #7046 by pandacooming (maxyangcn).
Follow-up fixes:
- Restore proxy support (resolve_proxy_url/proxy_kwargs_for_aiohttp) that was
accidentally deleted — would have caused NameError at runtime
- Remove duplicate _DISCORD_TARGET_RE regex; reuse existing _TELEGRAM_TOPIC_TARGET_RE
via _NUMERIC_TOPIC_RE alias (identical pattern)
- Fix misleading test comments about Discord negative snowflake IDs
(Discord uses positive snowflakes; negative IDs are a Telegram convention)
- Rewrite misleading scheduler test that claimed to exercise home channel
fallback but actually tested the explicit platform:chat_id parsing path
The same .* pattern vulnerable to newline bypass that was fixed in
prompt_builder.py (PR #6925) also existed in skills_guard.py. Changed
to [\s\S]*? to match across newlines.
prompt_builder.py: The `hidden_div` detection pattern uses `.*` which does not
match newlines in Python regex (re.DOTALL is not passed). An attacker can bypass
detection by splitting the style attribute across lines:
`<div style="color:red;\ndisplay: none">injected content</div>`
Replace `.*` with `[\s\S]*?` to match across line boundaries.
credential_files.py: `_load_config_files()` catches all exceptions at DEBUG level
(line 171), making YAML parse failures invisible in production logs. Users whose
credential files silently fail to mount into sandboxes have no diagnostic clue.
Promote to WARNING to match the severity pattern used by the path validation
warnings at lines 150 and 158 in the same function.
webhook.py: `_reload_dynamic_routes()` logs JSON parse failures at WARNING (line
265) but the impact — stale/corrupted dynamic routes persisting silently — warrants
ERROR level to ensure operator visibility in alerting pipelines.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Three silent `except Exception` blocks in approval.py (lines 345, 387, 469) return
fallback values with zero logging — making it impossible to debug callback failures,
allowlist load errors, or config read issues. Add logger.warning/error calls that
match the pattern already used by save_permanent_allowlist() and _smart_approve()
in the same file.
In mcp_oauth.py, narrow the overly-broad `except Exception` in get_tokens() and
get_client_info() to the specific exceptions Pydantic's model_validate() can raise
(ValueError, TypeError, KeyError), and include the exception message in the warning.
Also wrap the _wait_for_callback() polling loop in try/finally so the HTTPServer is
always closed — previously an asyncio.CancelledError or any exception in the loop
would leak the server socket.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
error_classifier.py: Message-only auth errors ("invalid api key", "unauthorized",
etc.) were classified as retryable=True (line 707), inconsistent with the HTTP 401
path (line 432) which correctly uses retryable=False + should_fallback=True. The
mismatch causes 3 wasted retries with the same broken credential before fallback,
while 401 errors immediately attempt fallback. Align the message-based path to
match: retryable=False, should_fallback=True.
web_tools.py: The _PREFIX_RE secret-detection check in web_extract_tool() runs
against the raw URL string (line 1196). URL-encoded secrets like %73k-1234... (
sk-1234...) bypass the filter because the regex expects literal ASCII. Add
urllib.parse.unquote() before the check so percent-encoded variants are also caught.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Replace per-backend ad-hoc file sync with a shared FileSyncManager
that handles mtime-based change detection, remote deletion of
locally-removed files, and transactional state updates.
- New FileSyncManager class (tools/environments/file_sync.py)
with callbacks for upload/delete, rate limiting, and rollback
- Shared iter_sync_files() eliminates 3 duplicate implementations
- SSH: replace unconditional rsync with scp + mtime skip
- Modal/Daytona: replace inline _synced_files dict with manager
- All 3 backends now sync credentials + skills + cache uniformly
- Remote deletion: files removed locally are cleaned from remote
- HERMES_FORCE_FILE_SYNC=1 env var for debugging
- Base class _before_execute() simplified to empty hook
- 12 unit tests covering mtime skip, deletion, rollback, rate limiting
Change behavior from silent clamping to returning an error when the
model requests a foreground timeout exceeding FOREGROUND_MAX_TIMEOUT.
This forces the model to use background=true for long-running commands
rather than silently changing its intent.
- Config default timeouts above the cap are NOT rejected (user's choice)
- Only explicit model-requested timeouts trigger rejection
- Added boundary test for timeout exactly at the limit
When the model calls terminal() in foreground mode without background=true
(e.g. to start a server), the tool call blocks until the command exits or
the timeout expires. Without an upper bound the model can request arbitrarily
high timeouts (the schema had minimum=1 but no maximum), blocking the entire
agent session for hours until the gateway idle watchdog kills it.
Changes:
- Add FOREGROUND_MAX_TIMEOUT (600s, configurable via
TERMINAL_MAX_FOREGROUND_TIMEOUT env var) that caps foreground timeout
- Clamp effective_timeout to the cap when background=false and timeout
exceeds the limit
- Include a timeout_note in the tool result when clamped, nudging the
model to use background=true for long-running processes
- Update schema description to show the max timeout value
- Remove dead clamping code in the background branch that could never
fire (max_timeout was set to effective_timeout, so timeout > max_timeout
was always false)
- Add 7 tests covering clamping, no-clamping, config-default-exceeds-cap
edge case, background bypass, default timeout, constant value, and
schema content
Self-review fixes:
- Fixed bug where timeout_note said 'Requested timeout Nones' when
clamping fired from config default exceeding cap (timeout param is
None). Now uses unclamped_timeout instead of the raw timeout param.
- Removed unused pytest import from test file
- Extracted test config dict into _make_env_config() helper
- Fixed tautological test_default_value assertion
- Added missing test for config default > cap with no model timeout
Replace 6 identical copies of the Termux detection function across
cli.py, browser_tool.py, voice_mode.py, status.py, doctor.py, and
gateway.py with a single shared implementation in hermes_constants.py.
Each call site imports with its original local name to preserve all
existing callers (internal references and test monkeypatches).
When managed_persistence is enabled, cleanup_all now only clears local
tracking state without sending DELETE requests to the Camofox server.
This prevents persistent browser profiles (cookies, logins, localStorage)
from being destroyed during process-wide cleanup.
Ephemeral sessions still get full server-side deletion as before.
Follow-up improvements on top of the shared resolver from PR #6562:
- Add platform_env_var parameter to resolve_proxy_url() so DISCORD_PROXY
takes priority over generic HTTPS_PROXY/ALL_PROXY env vars
- Add SOCKS proxy support via aiohttp_socks.ProxyConnector with rdns=True
(critical for GFW/Shadowrocket/Clash users — issue #6649)
- proxy_kwargs_for_bot() returns connector= for SOCKS, proxy= for HTTP
- proxy_kwargs_for_aiohttp() returns split (session_kw, request_kw) for
standalone aiohttp sessions
- Add proxy support to send_message_tool.py (Discord REST, Slack, SMS)
for cron job delivery behind proxies (from PR #2208)
- Add proxy support to Discord image/document downloads
- Fix duplicate import sys in base.py
Fixes blockquote > escaping, edit_message raw markdown, ***bold italic***
handling, HTML entity double-escaping (&amp;), Wikipedia URL parens
truncation, and step numbering format. Also adds format_message to the
tool-layer _send_to_platform for consistent formatting across all
delivery paths.
Changes:
- Protect Slack entities (<@user>, <https://...|label>, <!here>) from
escaping passes
- Protect blockquote > markers before HTML entity escaping
- Unescape-before-escape for idempotent HTML entity handling
- ***bold italic*** → *_text_* conversion (before **bold** pass)
- URL regex upgraded to handle balanced parentheses
- mrkdwn:True flag on chat_postMessage payloads
- format_message applied in edit_message and send_message_tool
- 52 new tests (format, edit, streaming, splitting, tool chunking)
- Use reversed(dict) idiom for placeholder restoration
Based on PR #3715 by dashed, cherry-picked onto current main.
`_cleanup_task_resources` was unconditionally calling `cleanup_vm()` at
the end of every `run_conversation` (i.e. every user turn), tearing down
the docker/daytona/modal sandbox container regardless of its
`persistent_filesystem` setting. This contradicted the documented intent
of `terminal.lifetime_seconds` (idle reaper) and `container_persistent`,
and caused per-turn loss of `/workspace`, `~/.config`, agent CLI auth
state, and any other content living inside the sandbox.
The unconditional teardown was introduced in fbd3a2fd ("prevent leakage
of morph instances between tasks", 2025-11-04) to plug a Morph backend
leak, two days after `lifetime_seconds` shipped in faecbddd. It was
later refactored into `_cleanup_task_resources` in 70dd3a16 without
changing semantics. Code and docs have disagreed since.
Fix: introduce `terminal_tool.is_persistent_env(task_id)` and skip the
per-turn `cleanup_vm` when the active env is persistent. The idle reaper
(`_cleanup_inactive_envs`) still tears persistent envs down once
`terminal.lifetime_seconds` is exceeded. Non-persistent backends (Morph)
are unchanged — still torn down per turn, preserving the original
leak-prevention intent.
Fixes 9 test failures on current main, incorporating ideas from PR stack
#6219-#6222 by xinbenlv with corrections:
- model_metadata: sync HF context length key casing
(minimaxai/minimax-m2.5 → MiniMaxAI/MiniMax-M2.5)
- cli.py: route quick command error output through self.console
instead of creating a new ChatConsole() instance
- docker.py: explicit docker_forward_env entries now bypass the
Hermes secret blocklist (intentional opt-in wins over generic filter)
- auxiliary_client: revert _read_main_provider() to simple
provider.strip().lower() — the _normalize_aux_provider() call
introduced in 5c03f2e7 stripped the custom: prefix, breaking
named custom provider resolution
- auxiliary_client: flip vision auto-detection order to
active provider → OpenRouter → Nous → stop (was OR → Nous → active)
- test: update vision priority test to match new order
Based on PR #6219-#6222 by xinbenlv.
mistralai v2.x is a namespace package — `Mistral` class lives at
`mistralai.client`, not at the top-level `mistralai` module. The
previous `from mistralai import Mistral` raises ImportError at runtime.
Update both production code and test fixture to use the correct path.
* fix(tools): skip camofox auto-cleanup when managed persistence is enabled
When managed_persistence is enabled, cleanup_browser() was calling
camofox_close() which destroys the server-side browser context via
DELETE /sessions/{userId}, killing login sessions across cron runs.
Add camofox_soft_cleanup() — a public wrapper that drops only the
in-memory session entry when managed persistence is on, returning True.
When persistence is off it returns False so the caller falls back to
the full camofox_close(). The inactivity reaper still handles idle
resource cleanup.
Also surface a logger.warning() when _managed_persistence_enabled()
fails to load config, replacing a silent except-and-return-False.
Salvaged from #6182 by el-analista (Eduardo Perea Fernandez).
Added public API wrapper to avoid cross-module private imports,
and test coverage for both persistence paths.
Co-authored-by: Eduardo Perea Fernandez <el-analista@users.noreply.github.com>
* fix(doctor): only check the active memory provider, not all providers unconditionally
hermes doctor had hardcoded Honcho Memory and Mem0 Memory sections that
always ran regardless of the user's memory.provider config setting. After
the swappable memory provider update (#4623), users with leftover Honcho
config but no active provider saw false 'broken' errors.
Replaced both sections with a single Memory Provider section that reads
memory.provider from config.yaml and only checks the configured provider.
Users with no external provider see a green 'Built-in memory active' check.
Reported by community user michaelruiz001, confirmed by Eri (Honcho).
---------
Co-authored-by: Eduardo Perea Fernandez <el-analista@users.noreply.github.com>
Previously crash recovery recreated detached sessions as if they were
fully managed, so polls and kills could lie about liveness and the
checkpoint could forget recovered jobs after the next restart.
This commit refreshes recovered host-backed sessions from real PID
state, keeps checkpoint data durable, and preserves notify watcher
metadata while treating sandbox-only PIDs as non-recoverable.
- Persist `pid_scope` in `tools/process_registry.py` and skip
recovering sandbox-backed entries without a host-visible PID handle
- Refresh detached sessions on access so `get`/`poll`/`wait` and active
session queries observe exited processes instead of hanging forever
- Allow recovered host PIDs to be terminated honestly and requeue
`notify_on_complete` watchers during checkpoint recovery
- Add regression tests for durable checkpoints, detached exit/kill
behavior, sandbox skip logic, and recovered notify watchers
- DEFAULT_RESULT_SIZE_CHARS: 50K -> 100K (match current _LARGE_RESULT_CHARS)
- DEFAULT_PREVIEW_SIZE_CHARS: 2K -> 1.5K (match current _LARGE_RESULT_PREVIEW_CHARS)
- Per-tool overrides all set to 100K (terminal, execute_code, search_files)
- Remove pre-read byte guard (no behavioral regression vs current main)
- Revert limit signature change to int=500 (match current default)
- Restore original read_file schema description
- Update test assertions to match 100K thresholds
Add BudgetConfig dataclass to centralize and make overridable the
hardcoded constants (50K per-result, 200K per-turn, 2K preview) that
control when tool outputs get persisted to sandbox. Configurable at
the RL environment level via HermesAgentEnvConfig fields, threaded
through HermesAgentLoop to the storage layer.
Resolution: pinned (read_file=inf) > env config overrides > registry
per-tool > default. CLI override: --env.turn_budget_chars 80000
_deliver_result() now returns Optional[str] — None on success, error
message on failure. All failure paths (unknown platform, platform
disabled, config load error, send failure, unresolvable target)
return descriptive error strings.
mark_job_run() gains delivery_error param, tracked as
last_delivery_error on the job — separate from agent execution errors.
A job where the agent succeeded but delivery failed shows
last_status='ok' + last_delivery_error='...'.
The cronjob list tool now surfaces last_delivery_error so agents and
users can see when cron outputs aren't arriving.
Inspired by PR #5863 (oxngon) — reimplemented with proper wiring.
Tests: 3 new mark_job_run tests + 6 new _deliver_result return tests.
- The MCP SDK Pydantic model uses camelCase (structuredContent), not
snake_case (structured_content). The original getattr was a silent no-op.
- When structuredContent is present, return it AS the result instead of
alongside text — the structured payload is the machine-readable data.
- Move test file to tests/tools/ and fix fake class to use camelCase.
- Patch _run_on_mcp_loop in tests so the handler actually executes.
MCP CallToolResult may include structured_content (a JSON object) alongside
content blocks. The tool handler previously only forwarded concatenated text
from content blocks, silently dropping the structured payload.
This breaks MCP tools that return a minimal human text in content while
putting the actual machine-usable payload in structured_content.
Now, when structured_content is present, it is included in the returned
JSON under the 'structuredContent' key.
FixesNousResearch/hermes-agent#5874
- Add 7 unit tests for _profile_arg: default home, named profile,
hash path, nested path, invalid name, systemd integration, launchd integration
- Add stt.local.language to config.yaml (empty = auto-detect)
- Both STT code paths now read config.yaml first, env var fallback,
then default (auto-detect for faster-whisper, 'en' for CLI command)
- HERMES_LOCAL_STT_LANGUAGE env var still works as backward-compat fallback
- gateway/run.py: Strip "(The user sent a message with no text content)"
placeholder when voice transcription succeeds — it was being appended
alongside the transcript, creating duplicate user turns.
- tools/transcription_tools.py: Wire HERMES_LOCAL_STT_LANGUAGE env var
into the faster-whisper backend. It was only used by the CLI fallback
path (_transcribe_local_command), not the primary faster-whisper path.
Replace 10 callsites across 6 files that manually opened config.yaml,
called yaml.safe_load(), and handled missing-file/parse-error fallbacks
with the new read_raw_config() helper from hermes_cli/config.py.
Each migrated site previously had 5-8 lines of boilerplate:
config_path = get_hermes_home() / 'config.yaml'
if config_path.exists():
import yaml
with open(config_path) as f:
cfg = yaml.safe_load(f) or {}
Now reduced to:
from hermes_cli.config import read_raw_config
cfg = read_raw_config()
Migrated files:
- tools/browser_tool.py (4 sites): command_timeout, cloud_provider,
allow_private_urls, record_sessions
- tools/env_passthrough.py: terminal.env_passthrough
- tools/credential_files.py: terminal.credential_files
- tools/transcription_tools.py: stt.model
- hermes_cli/commands.py: config-gated command resolution
- hermes_cli/auth.py (2 sites): model config read + provider reset
Skipped (intentionally):
- gateway/run.py: 10+ sites with local aliases, critical path
- hermes_cli/profiles.py: profile-specific config path
- hermes_cli/doctor.py: reads raw then writes fixes back
- agent/model_metadata.py: different file (context_length_cache.yaml)
- tools/website_policy.py: custom config_path param + error types
* refactor: re-architect tests to mirror the codebase
* Update tests.yml
* fix: add missing tool_error imports after registry refactor
* fix(tests): replace patch.dict with monkeypatch to prevent env var leaks under xdist
patch.dict(os.environ) can leak TERMINAL_ENV across xdist workers,
causing test_code_execution tests to hit the Modal remote path.
* fix(tests): fix update_check and telegram xdist failures
- test_update_check: replace patch("hermes_cli.banner.os.getenv") with
monkeypatch.setenv("HERMES_HOME") — banner.py no longer imports os
directly, it uses get_hermes_home() from hermes_constants.
- test_telegram_conflict/approval_buttons: provide real exception classes
for telegram.error mock (NetworkError, TimedOut, BadRequest) so the
except clause in connect() doesn't fail with "catching classes that do
not inherit from BaseException" when xdist pollutes sys.modules.
* fix(tests): accept unavailable_models kwarg in _prompt_model_selection mock
16 callsites across 14 files were re-deriving the hermes home path
via os.environ.get('HERMES_HOME', ...) instead of using the canonical
get_hermes_home() from hermes_constants. This breaks profiles — each
profile has its own HERMES_HOME, and the inline fallback defaults to
~/.hermes regardless.
Fixed by importing and calling get_hermes_home() at each site. For
files already inside the hermes process (agent/, hermes_cli/, tools/,
gateway/, plugins/), this is always safe. Files that run outside the
process context (mcp_serve.py, mcp_oauth.py) already had correct
try/except ImportError fallbacks and were left alone.
Skipped: hermes_constants.py (IS the implementation), env_loader.py
(bootstrap), profiles.py (intentionally manipulates the env var),
standalone scripts (optional-skills/, skills/), and tests.
Comprehensive cleanup across 80 files based on automated (ruff, pyflakes, vulture)
and manual analysis of the entire codebase.
Changes by category:
Unused imports removed (~95 across 55 files):
- Removed genuinely unused imports from all major subsystems
- agent/, hermes_cli/, tools/, gateway/, plugins/, cron/
- Includes imports in try/except blocks that were truly unused
(vs availability checks which were left alone)
Unused variables removed (~25):
- Removed dead variables: connected, inner, channels, last_exc,
source, new_server_names, verify, pconfig, default_terminal,
result, pending_handled, temperature, loop
- Dropped unused argparse subparser assignments in hermes_cli/main.py
(12 instances of add_parser() where result was never used)
Dead code removed:
- run_agent.py: Removed dead ternary (None if False else None) and
surrounding unreachable branch in identity fallback
- run_agent.py: Removed write-only attribute _last_reported_tool
- hermes_cli/providers.py: Removed dead @property decorator on
module-level function (decorator has no effect outside a class)
- gateway/run.py: Removed unused MCP config load before reconnect
- gateway/platforms/slack.py: Removed dead SessionSource construction
Undefined name bugs fixed (would cause NameError at runtime):
- batch_runner.py: Added missing logger = logging.getLogger(__name__)
- tools/environments/daytona.py: Added missing Dict and Path imports
Unnecessary global statements removed (14):
- tools/terminal_tool.py: 5 functions declared global for dicts
they only mutated via .pop()/[key]=value (no rebinding)
- tools/browser_tool.py: cleanup thread loop only reads flag
- tools/rl_training_tool.py: 4 functions only do dict mutations
- tools/mcp_oauth.py: only reads the global
- hermes_time.py: only reads cached values
Inefficient patterns fixed:
- startswith/endswith tuple form: 15 instances of
x.startswith('a') or x.startswith('b') consolidated to
x.startswith(('a', 'b'))
- len(x)==0 / len(x)>0: 13 instances replaced with pythonic
truthiness checks (not x / bool(x))
- in dict.keys(): 5 instances simplified to in dict
- Redefined unused name: removed duplicate _strip_mdv2 import in
send_message_tool.py
Other fixes:
- hermes_cli/doctor.py: Replaced undefined logger.debug() with pass
- hermes_cli/config.py: Consolidated chained .endswith() calls
Test results: 3934 passed, 17 failed (all pre-existing on main),
19 skipped. Zero regressions.
* feat: switch managed browser provider from Browserbase to Browser Use
The Nous subscription tool gateway now routes browser automation through
Browser Use instead of Browserbase. This commit:
- Adds managed Nous gateway support to BrowserUseProvider (idempotency
keys, X-BB-API-Key auth header, external_call_id persistence)
- Removes managed gateway support from BrowserbaseProvider (now
direct-only via BROWSERBASE_API_KEY/BROWSERBASE_PROJECT_ID)
- Updates browser_tool.py fallback: prefers Browser Use over Browserbase
- Updates nous_subscription.py: gateway vendor 'browser-use', auto-config
sets cloud_provider='browser-use' for new subscribers
- Updates tools_config.py: Nous Subscription entry now uses Browser Use
- Updates setup.py, cli.py, status.py, prompt_builder.py display strings
- Updates all affected tests to match new behavior
Browserbase remains fully functional for users with direct API credentials.
The change only affects the managed/subscription path.
* chore: remove redundant Browser Use hint from system prompt
* fix: upgrade Browser Use provider to v3 API
- Base URL: api/v2 -> api/v3 (v2 is legacy)
- Unified all endpoints to use native Browser Use paths:
- POST /browsers (create session, returns cdpUrl)
- PATCH /browsers/{id} with {action: stop} (close session)
- Removed managed-mode branching that used Browserbase-style
/v1/sessions paths — v3 gateway now supports /browsers directly
- Removed unused managed_mode variable in close_session
* fix(browser-use): use X-Browser-Use-API-Key header for managed mode
The managed gateway expects X-Browser-Use-API-Key, not X-BB-API-Key
(which is a Browserbase-specific header). Using the wrong header caused
a 401 AUTH_ERROR on every managed-mode browser session create.
Simplified _headers() to always use X-Browser-Use-API-Key regardless
of direct vs managed mode.
* fix(nous_subscription): browserbase explicit provider is direct-only
Since managed Nous gateway now routes through Browser Use, the
browserbase explicit provider path should not check managed_browser_available
(which resolves against the browser-use gateway). Simplified to direct-only
with managed=False.
* fix(browser-use): port missing improvements from PR #5605
- CDP URL normalization: resolve HTTP discovery URLs to websocket after
cloud provider create_session() (prevents agent-browser failures)
- Managed session payload: send timeout=5 and proxyCountryCode=us for
gateway-backed sessions (prevents billing overruns)
- Update prompt builder, browser_close schema, and module docstring to
replace remaining Browserbase references with Browser Use
- Dynamic /browser status detection via _get_cloud_provider() instead
of hardcoded env var checks (future-proof for new providers)
- Rename post_setup key from 'browserbase' to 'agent_browser'
- Update setup hint to mention Browser Use alongside Browserbase
- Add tests: CDP normalization, browserbase direct-only guard,
managed browser-use gateway, direct browserbase fallback
---------
Co-authored-by: rob-maron <132852777+rob-maron@users.noreply.github.com>
* refactor: remove browser_close tool — auto-cleanup handles it
The browser_close tool was called in only 9% of browser sessions (13/144
navigations across 66 sessions), always redundantly — cleanup_browser()
already runs via _cleanup_task_resources() at conversation end, and the
background inactivity reaper catches anything else.
Removing it saves one tool schema slot in every browser-enabled API call.
Also fixes a latent bug: cleanup_browser() now handles Camofox sessions
too (previously only Browserbase). Camofox sessions were never auto-cleaned
per-task because they live in a separate dict from _active_sessions.
Files changed (13):
- tools/browser_tool.py: remove function, schema, registry entry; add
camofox cleanup to cleanup_browser()
- toolsets.py, model_tools.py, prompt_builder.py, display.py,
acp_adapter/tools.py: remove browser_close from all tool lists
- tests/: remove browser_close test, update toolset assertion
- docs/skills: remove all browser_close references
* fix: repeat browser_scroll 5x per call for meaningful page movement
Most backends scroll ~100px per call — barely visible on a typical
viewport. Repeating 5x gives ~500px (~half a viewport), making each
scroll tool call actually useful.
Backend-agnostic approach: works across all 7+ browser backends without
needing to configure each one's scroll amount individually. Breaks
early on error for the agent-browser path.
* feat: auto-return compact snapshot from browser_navigate
Every browser session starts with navigate → snapshot. Now navigate
returns the compact accessibility tree snapshot inline, saving one
tool call per browser task.
The snapshot captures the full page DOM (not viewport-limited), so
scroll position doesn't affect it. browser_snapshot remains available
for refreshing after interactions or getting full=true content.
Both Browserbase and Camofox paths auto-snapshot. If the snapshot
fails for any reason, navigation still succeeds — the snapshot is
a bonus, not a requirement.
Schema descriptions updated to guide models: navigate mentions it
returns a snapshot, snapshot mentions it's for refresh/full content.
* refactor: slim cronjob tool schema — consolidate model/provider, drop unused params
Session data (151 calls across 67 sessions) showed several schema
properties were never used by models. Consolidated and cleaned up:
Removed from schema (still work via backend/CLI):
- skill (singular): use skills array instead
- reason: pause-only, unnecessary
- include_disabled: now defaults to true
- base_url: extreme edge case, zero usage
- provider (standalone): merged into model object
Consolidated:
- model + provider → single 'model' object with {model, provider} fields.
If provider is omitted, the current main provider is pinned at creation
time so the job stays stable even if the user changes their default.
Kept:
- script: useful data collection feature
- skills array: standard interface for skill loading
Schema shrinks from 14 to 10 properties. All backend functionality
preserved — the Python function signature and handler lambda still
accept every parameter.
* fix: remove mixture_of_agents from core toolsets — opt-in only via hermes tools
MoA was in _HERMES_CORE_TOOLS and composite toolsets (hermes-cli,
hermes-messaging, safe), which meant it appeared in every session
for anyone with OPENROUTER_API_KEY set. The _DEFAULT_OFF_TOOLSETS
gate only works after running 'hermes tools' explicitly.
Now MoA only appears when a user explicitly enables it via
'hermes tools'. The moa toolset definition and check_fn remain
unchanged — it just needs to be opted into.
* feat: notify_on_complete for background processes
When terminal(background=true, notify_on_complete=true), the system
auto-triggers a new agent turn when the process exits — no polling needed.
Changes:
- ProcessSession: add notify_on_complete field
- ProcessRegistry: add completion_queue, populate on _move_to_finished()
- Terminal tool: add notify_on_complete parameter to schema + handler
- CLI: drain completion_queue after agent turn AND during idle loop
- Gateway: enhanced _run_process_watcher injects synthetic MessageEvent
on completion, triggering a full agent turn
- Checkpoint persistence includes notify_on_complete for crash recovery
- code_execution_tool: block notify_on_complete in sandbox scripts
- 15 new tests covering queue mechanics, checkpoint round-trip, schema
* docs: update terminal tool descriptions for notify_on_complete
- background: remove 'ONLY for servers' language, describe both patterns
(long-lived processes AND long-running tasks with notify_on_complete)
- notify_on_complete: more prescriptive about when to use it
- TERMINAL_TOOL_DESCRIPTION: remove 'Do NOT use background for builds'
guidance that contradicted the new feature
Selectively cherry-picked from PR #5501 by MestreY0d4-Uninter.
- Add _resolve_workspace_hint() to detect parent's working directory
- Inject WORKSPACE PATH into child system prompts
- Add rule: never assume /workspace/ container paths
- Excludes the cli.py queue-busy-input changes from the original PR
Cherry-picked from PR #5580 by MestreY0d4-Uninter.
- Share parent's credential pool with child agents for key rotation
- Leasing layer spreads parallel children across keys (least-loaded)
- Thread-safe acquire_lease/release_lease in CredentialPool
- Reverted sneaked-in tool-name restoration change (kept original
getattr + isinstance guard pattern)
Two fixes:
1. Replace all stale 'hermes login' references with 'hermes auth' across
auth.py, auxiliary_client.py, delegate_tool.py, config.py, run_agent.py,
and documentation. The 'hermes login' command was deprecated; 'hermes auth'
now handles OAuth credential management.
2. Fix credential removal not persisting for singleton-sourced credentials
(device_code for openai-codex/nous, hermes_pkce for anthropic).
auth_remove_command already cleared env vars for env-sourced credentials,
but singleton credentials stored in the auth store were re-seeded by
_seed_from_singletons() on the next load_pool() call. Now clears the
underlying auth store entry when removing singleton-sourced credentials.
* feat(tools): add Firecrawl cloud browser provider
Adds Firecrawl (https://firecrawl.dev) as a cloud browser provider
alongside Browserbase and Browser Use. All browser tools route through
Firecrawl's cloud browser via CDP when selected.
- tools/browser_providers/firecrawl.py — FirecrawlProvider
- tools/browser_tool.py — register in _PROVIDER_REGISTRY
- hermes_cli/tools_config.py — add to onboarding provider picker
- hermes_cli/setup.py — add to setup summary
- hermes_cli/config.py — add FIRECRAWL_BROWSER_TTL config
- website/docs/ — browser docs and env var reference
Based on #4490 by @developersdigest.
Co-Authored-By: Developers Digest <124798203+developersdigest@users.noreply.github.com>
* refactor: simplify FirecrawlProvider.emergency_cleanup
Use self._headers() and self._api_url() instead of duplicating
env-var reads and header construction.
* fix: recognize Firecrawl in subscription browser detection
_resolve_browser_feature_state() now handles "firecrawl" as a direct
browser provider (same pattern as "browser-use"), so hermes setup
summary correctly shows "Browser Automation (Firecrawl)" instead of
misreporting as "Local browser".
Also fixes test_config_version_unchanged assertion (11 → 12).
---------
Co-authored-by: Developers Digest <124798203+developersdigest@users.noreply.github.com>
When parent_agent.enabled_toolsets is None (the default, meaning all tools
are enabled), subagents incorrectly fell back to DEFAULT_TOOLSETS
(['terminal', 'file', 'web']) instead of inheriting the parent's full
toolset.
Root cause:
- Line 188 used 'or' fallback: None or DEFAULT_TOOLSETS evaluates to
DEFAULT_TOOLSETS
- Line 192 checked truthiness: None is falsy, falling through to else
Fix:
- Use 'is not None' checks instead of truthiness
- When enabled_toolsets is None, derive effective toolsets from
parent_agent.valid_tool_names via the tool registry
Fixes the bug introduced in f75b1d21b and repeated in e5d14445e (PR #3269).
Shell injection via unquoted workdir interpolation in docker, singularity,
and SSH backends. When workdir contained shell metacharacters (e.g.
~/;id), arbitrary commands could execute.
Changes:
- Add shlex.quote() at each interpolation point in docker.py,
singularity.py, and ssh.py with tilde-aware quoting (keep ~
unquoted for shell expansion, quote only the subpath)
- Add _validate_workdir() allowlist in terminal_tool.py as
defense-in-depth before workdir reaches any backend
Original work by Mariano A. Nicolini (PR #5620). Salvaged with fixes
for tilde expansion (shlex.quote breaks cd ~/path) and replaced
incomplete deny-list with strict character allowlist.
Co-authored-by: Mariano A. Nicolini <entropidelic@users.noreply.github.com>
The ContextVar migration removed 'from pathlib import Path' but Path
is still used in _load_config_passthrough(). Without this import,
config-based env passthrough would raise NameError.
Subagent sessions spawned by delegate_task were created with
parent_session_id=NULL and source=cli, making them indistinguishable
from user sessions in hermes sessions list and /resume.
Changes:
- delegate_tool.py: pass parent_agent.session_id to child agent
- run_agent.py: accept parent_session_id param, pass to create_session
- hermes_state.py list_sessions_rich: filter parent_session_id IS NULL
by default (opt-in include_children=True for callers that need them)
- hermes_state.py delete_session: delete child sessions first (FK)
- hermes_state.py prune_sessions: delete children before parents (FK)
session_search already handles parent_session_id correctly — child
sessions are filtered from recent list and resolved to parent root
in full-text search results.
Fixes#5122
Before launching an MCP server via npx/uvx, queries the OSV (Open Source
Vulnerabilities) API to check if the package has known malware advisories
(MAL-* IDs). Regular CVEs are ignored — only confirmed malware is blocked.
- Free, public API (Google-maintained), ~300ms per query
- Runs once per MCP server launch, inside _run_stdio() before subprocess spawn
- Parallel with other MCP servers (asyncio.gather already in place)
- Fail-open: network errors, timeouts, unrecognized commands → allow
- Parses npm (scoped @scope/pkg@version) and PyPI (name[extras]==version)
Inspired by Block/goose extension malware check.
Add optional 'expression' parameter to browser_console that evaluates
JavaScript in the page context (like DevTools console). Returns structured
results with auto-JSON parsing.
No new tool — extends the existing browser_console schema with ~20 tokens
of overhead instead of adding a 12th browser tool.
Both backends supported:
- Browserbase: uses agent-browser 'eval' command via CDP
- Camofox: uses /tabs/{tab_id}/eval endpoint with graceful degradation
E2E verified: string eval, number eval, structured JSON, DOM manipulation,
error handling, and original console-output mode all working.
Add POST /v1/runs to start async agent runs and GET /v1/runs/{run_id}/events
for SSE streaming of typed lifecycle events (tool.started, tool.completed,
message.delta, reasoning.available, run.completed, run.failed).
Changes the internal tool_progress_callback signature from positional
(tool_name, preview, args) to event-type-first
(event_type, tool_name, preview, args, **kwargs). Existing consumers
filter on event_type and remain backward-compatible.
Adds concurrency limit (_MAX_CONCURRENT_RUNS=10) and orphaned run sweep.
Fixes logic inversion in cli.py _on_tool_progress where the original PR
would have displayed internal tools instead of non-internal ones.
Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
Allow delegate_task to specify custom ACP transport per-task, so a parent
running via CLI/Discord/Telegram can spawn child agents over ACP
(e.g. claude --acp --stdio). Follows the existing override_provider pattern.
Supports per-task granularity in batch mode.
Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
Route AIAgent print output to stderr via _print_fn for ACP stdio sessions.
Gate quiet-mode spinner startup on _should_start_quiet_spinner() so JSON-RPC
on stdout isn't corrupted. Child agents inherit the redirect.
Co-authored-by: Git-on-my-level <Git-on-my-level@users.noreply.github.com>
- Firecrawl scrape: 60s timeout via asyncio.wait_for + to_thread
(previously could hang indefinitely)
- Summarizer retries: 6 → 2 (one retry), reads timeout from
auxiliary.web_extract.timeout config (default 360s / 6min)
- Summarizer failure: falls back to truncated raw content (~5000 chars)
instead of useless error message, with guidance about config/model
- Config default: auxiliary.web_extract.timeout bumped 30 → 360s
for local model compatibility
Addresses Discord reports of agent hanging during web_extract.
Salvaged from PRs #3767 (chalkers), #5236 (ygd58), #2641 (buntingszn).
Three improvements to Matrix cron delivery:
1. Live adapter path: when the gateway is running, cron delivery now uses
the connected MatrixAdapter via run_coroutine_threadsafe instead of
the standalone HTTP PUT. This enables delivery to E2EE rooms where
the raw HTTP path cannot encrypt. Falls back to standalone on failure.
Threads adapters + event loop from gateway -> cron ticker -> tick() ->
_deliver_result(). (from #3767)
2. HTML formatted_body: _send_matrix() now converts markdown to HTML
using the optional markdown library, with h1-h6 to bold conversion
for Element X compatibility. Falls back to plain text if markdown
is not installed. Also adds random bytes to txn_id to prevent
collisions. (from #5236)
3. Origin fallback: when deliver="origin" but origin is null (jobs
created via API/scripts), falls back to HOME_CHANNEL env vars
in order: matrix -> telegram -> discord -> slack. (from #2641)
When commands like grep, diff, test, or find return non-zero exit codes
that aren't actual errors (grep 1 = no matches, diff 1 = files differ),
the model wastes turns investigating non-problems. This adds an
exit_code_meaning field to the terminal JSON result that explains
informational exit codes, so the agent can move on instead of debugging.
Covers grep/rg/ag/ack (no matches), diff (files differ), find (partial
access), test/[ (condition false), curl (timeouts, DNS, HTTP errors),
and git (context-dependent). Correctly extracts the last command from
pipelines and chains, strips full paths and env var assignments.
The exit_code field itself is unchanged — this is purely additive context.
When a dangerous command is approved (gateway, CLI, or smart approval),
the terminal tool now includes an 'approval' field in the result JSON
so the model knows approval was requested and granted. Previously the
model only saw normal command output with no indication that approval
happened, causing it to hallucinate that the approval system didn't fire.
Changes:
- approval.py: Return user_approved/description in all 3 approval paths
(gateway blocking, CLI interactive, smart approval)
- terminal_tool.py: Capture approval metadata and inject into both
foreground and background command results
* feat: execute_code runs on remote terminal backends (Docker/SSH/Modal/Daytona/Singularity)
When TERMINAL_ENV is not 'local', execute_code now ships the script to
the remote environment and runs it there via the terminal backend --
the same container/sandbox/SSH session used by terminal() and file tools.
Architecture:
- Local backend: unchanged (UDS RPC, subprocess.Popen)
- Remote backends: file-based RPC via execute_oneshot() polling
- Script writes request files, parent polls and dispatches tool calls
- Responses written atomically (tmp + rename) via base64/stdin
- execute_oneshot() bypasses persistent shell lock for concurrency
Changes:
- tools/environments/base.py: add execute_oneshot() (delegates to execute())
- tools/environments/persistent_shell.py: override execute_oneshot() to
bypass _shell_lock via _execute_oneshot(), enabling concurrent polling
- tools/code_execution_tool.py: add file-based transport to
generate_hermes_tools_module(), _execute_remote() with full env
get-or-create, file shipping, RPC poll loop, output post-processing
* fix: use _get_env_config() instead of raw TERMINAL_ENV env var
Read terminal backend type through the canonical config resolution
path (terminal_tool._get_env_config) instead of os.getenv directly.
* fix: use echo piping instead of stdin_data for base64 writes
Modal doesn't reliably deliver stdin_data to chained commands
(base64 -d > file && mv), producing 0-byte files. Switch to
echo 'base64' | base64 -d which works on all backends.
Verified E2E on both Docker and Modal.
Add an optional 'script' parameter to cron jobs that references a Python
script. The script runs before each agent turn, and its stdout is injected
into the prompt as context. This enables stateful monitoring — the script
handles data collection and change detection, the LLM analyzes and reports.
- cron/jobs.py: add script field to create_job(), stored in job dict
- cron/scheduler.py: add _run_job_script() executor with timeout handling,
inject script output/errors into _build_job_prompt()
- tools/cronjob_tools.py: add script to tool schema, create/update handlers,
_format_job display
- hermes_cli/cron.py: add --script to create/edit, display in list/edit output
- hermes_cli/main.py: add --script argparse for cron create/edit subcommands
- tests/cron/test_cron_script.py: 20 tests covering job CRUD, script
execution, path resolution, error handling, prompt injection, tool API
Script paths can be absolute or relative (resolved against ~/.hermes/scripts/).
Scripts run with a 120s timeout. Failures are injected as error context so
the LLM can report the problem. Empty string clears an attached script.
Bug fixes:
- agent/redact.py: catastrophic regex backtracking in _ENV_ASSIGN_RE — removed
re.IGNORECASE and changed [A-Z_]* to [A-Z0-9_]* to restrict matching to actual
env var name chars. Without this, the pattern backtracks exponentially on large
strings (e.g. 100K tool output), causing test_file_read_guards to time out.
- tools/file_operations.py: over-escaped newline in find -printf format string
produced literal backslash-n instead of a real newline, breaking file search
result parsing (total_count always 1, paths concatenated).
Test fixes:
- Remove stale pytestmark.skip from 4 test modules that were blanket-skipped as
'Hangs in non-interactive environments' but actually run fine:
- test_413_compression.py (12 tests, 25s)
- test_file_tools_live.py (71 tests, 24s)
- test_code_execution.py (61 tests, 99s)
- test_agent_loop_tool_calling.py (has proper OPENROUTER_API_KEY skip already)
- test_413_compression.py: fix threshold values in 2 preflight compression tests
where context_length was too small for the compressed output to fit in one pass.
- test_mcp_probe.py: add missing _MCP_AVAILABLE mock so tests work without MCP SDK.
- test_mcp_tool_issue_948.py: inject MCP symbols (StdioServerParameters etc.) when
SDK is not installed so patch() targets exist.
- test_approve_deny_commands.py: replace time.sleep(0.3) with deterministic polling
of _gateway_queues — fixes race condition where resolve fires before threads
register their approval entries, causing the test to hang indefinitely.
Net effect: +256 tests recovered from skip, 8 real failures fixed.
Add docker_env option to terminal config — a dict of key-value pairs that
get set inside Docker containers via -e flags at both container creation
(docker run) and per-command execution (docker exec) time.
This complements docker_forward_env (which reads values dynamically from
the host process environment). docker_env is useful when Hermes runs as a
systemd service without access to the user's shell environment — e.g.
setting SSH_AUTH_SOCK or GNUPGHOME to known stable paths for SSH/GPG
agent socket forwarding.
Precedence: docker_env provides baseline values; docker_forward_env
overrides for the same key.
Config example:
terminal:
docker_env:
SSH_AUTH_SOCK: /run/user/1000/ssh-agent.sock
GNUPGHOME: /root/.gnupg
docker_volumes:
- /run/user/1000/ssh-agent.sock:/run/user/1000/ssh-agent.sock
- /run/user/1000/gnupg/S.gpg-agent:/root/.gnupg/S.gpg-agent
Two pre-existing issues causing test_file_read_guards timeouts on CI:
1. agent/redact.py: _ENV_ASSIGN_RE used unbounded [A-Z_]* with
IGNORECASE, matching any letter/underscore to end-of-string at
each position → O(n²) backtracking on 100K+ char inputs.
Bounded to {0,50} since env var names are never that long.
2. tools/file_tools.py: redact_sensitive_text() ran BEFORE the
character-count guard, so oversized content (that would be rejected
anyway) went through the expensive regex first. Reordered to check
size limit before redaction.
The config key skills.external_dirs and core resolution (get_all_skills_dirs,
get_external_skills_dirs in agent/skill_utils.py) already existed but several
code paths still only scanned SKILLS_DIR. Now external dirs are respected
everywhere:
- skills_categories(): scan all dirs for category discovery
- _get_category_from_path(): resolve categories against any skills root
- skill_manager_tool._find_skill(): search all dirs for edit/patch/delete
- credential_files.get_skills_directory_mount(): mount all dirs into
Docker/Singularity containers (external dirs at external_skills/<idx>)
- credential_files.iter_skills_files(): list files from all dirs for
Modal/Daytona upload
- tools/environments/ssh.py: rsync all skill dirs to remote hosts
- gateway _check_unavailable_skill(): check disabled skills across all dirs
Usage in config.yaml:
skills:
external_dirs:
- ~/repos/agent-skills/hermes
- /shared/team-skills
- Add contextvars.Token[str] type hints to set/reset_current_session_key
- Use get_current_session_key(default='') in terminal_tool.py for background
process session tracking, fixing the same env var race for concurrent
gateway sessions spawning background processes
- Add .zip to SUPPORTED_DOCUMENT_TYPES so gateway platforms (Telegram,
Slack, Discord) cache uploaded zip files instead of rejecting them.
- Add get_cache_directory_mounts() and iter_cache_files() to
credential_files.py for host-side cache directory passthrough
(documents, images, audio, screenshots).
- Docker: bind-mount cache dirs read-only alongside credentials/skills.
Changes are live (bind mount semantics).
- Modal: mount cache files at sandbox creation + resync before each
command via _sync_files() with mtime+size change detection.
- Handles backward-compat with legacy dir names (document_cache,
image_cache, audio_cache, browser_screenshots) via get_hermes_dir().
- Container paths always use the new cache/<subdir> layout regardless
of host layout.
This replaces the need for a dedicated extract_archive tool (PR #4819)
— the agent can now use standard terminal commands (unzip, tar) on
uploaded files inside remote containers.
Closes: related to PR #4819 by kshitijk4poor
Three fixes for memory+profile isolation bugs:
1. memory_tool.py: Replace module-level MEMORY_DIR constant with
get_memory_dir() function that calls get_hermes_home() dynamically.
The old constant was cached at import time and could go stale if
HERMES_HOME changed after import. Internal MemoryStore methods now
call get_memory_dir() directly. MEMORY_DIR kept as backward-compat
alias.
2. profiles.py: profile create --clone now copies MEMORY.md and USER.md
from the source profile. These curated memory files are part of the
agent's identity (same as SOUL.md) and should carry over on clone.
3. holographic plugin: initialize() now expands $HERMES_HOME and
${HERMES_HOME} in the db_path config value, so users can write
'db_path: $HERMES_HOME/memory_store.db' and it resolves to the
active profile directory, not the default home.
Tests updated to mock get_memory_dir() alongside the legacy MEMORY_DIR.
Four fixes for MCP server stability issues reported by community member
(terminal lockup, zombie processes, escape sequence pollution, startup hang):
1. MCP reload timeout guard (cli.py): _check_config_mcp_changes now runs
_reload_mcp in a separate daemon thread with a 30s hard timeout. Previously,
a hung MCP server could block the process_loop thread indefinitely, freezing
the entire TUI (user can type but nothing happens, only Ctrl+D/Ctrl+\ work).
2. MCP stdio subprocess PID tracking (mcp_tool.py): Tracks child PIDs spawned
by stdio_client via before/after snapshots of /proc children. On shutdown,
_stop_mcp_loop force-kills any tracked PIDs that survived the SDK's graceful
SIGTERM→SIGKILL cleanup. Prevents zombie MCP server processes from
accumulating across sessions.
3. MCP event loop exception handler (mcp_tool.py): Installs
_mcp_loop_exception_handler on the MCP background event loop — same pattern
as the existing _suppress_closed_loop_errors on prompt_toolkit's loop.
Suppresses benign 'Event loop is closed' RuntimeError from httpx transport
__del__ during MCP shutdown. Salvaged from PR #2538 (acsezen).
4. MCP OAuth non-blocking (mcp_oauth.py): Replaces blocking input() call in
_wait_for_callback with OAuthNonInteractiveError raise. Adds _is_interactive()
TTY detection. In non-interactive environments, build_oauth_auth() still
returns a provider (cached tokens + refresh work), but the callback handler
raises immediately instead of blocking the MCP event loop for 120s. Re-raises
OAuth setup failures in _run_http so failed servers are reported cleanly
without blocking others. Salvaged from PRs #4521 (voidborne-d) and #4465
(heathley).
Closes#2537, closes#4462
Related: #4128, #3436
- Fix cron ThreadPoolExecutor blocking on timeout: use shutdown(wait=False,
cancel_futures=True) instead of context manager that waits indefinitely
- Extract _dequeue_pending_text() to deduplicate media-placeholder logic
in interrupt and normal-completion dequeue paths
- Remove hasattr guards for _running_agents_ts: add class-level default
so partial test construction works without scattered defensive checks
- Move `import concurrent.futures` to top of cron/scheduler.py
- Progress throttle: sleep remaining interval instead of busy-looping
0.1s (~15 wakeups per 1.5s window → 1 wakeup)
- Deduplicate _load_stt_config() in transcription_tools.py:
_has_openai_audio_backend() now delegates to _resolve_openai_audio_client_config()
Three targeted fixes from user-reported issues:
1. STT config resolution (transcription_tools.py):
_has_openai_audio_backend() and _resolve_openai_audio_client_config()
now check stt.openai.api_key/base_url in config.yaml FIRST, before
falling back to env vars. Fixes voice transcription breaking when
using a custom OpenAI-compatible endpoint via config.yaml.
2. Stream consumer flood control fallback (stream_consumer.py):
When an edit fails mid-stream (e.g., Telegram flood control returns
failure for waits >5s), reset _already_sent to False so the normal
final send path delivers the complete response. Previously, a
truncated partial was left as the final message.
3. Telegram edit_message comment alignment (telegram.py):
Clarify that long flood waits return failure so streaming can fall
back to a normal final send.
The PR changed prev_tools from list[str] to list[dict] with name/result
keys. The gateway's _step_callback_sync passed this directly to hooks
as 'tool_names', breaking user-authored hooks that call
', '.join(tool_names).
Now:
- 'tool_names' always contains strings (backward-compatible)
- 'tools' carries the enriched dicts for hooks that want results
Also adds summary logging to register_mcp_servers() and comprehensive
tests for all three PR changes:
- sanitize_mcp_name_component edge cases
- register_mcp_servers public API
- _register_session_mcp_servers ACP integration
- step_callback result forwarding
- gateway normalization backward compat
ACP clients pass MCP server definitions in session/new, load_session,
resume_session, and fork_session. Previously these were accepted but
silently ignored — the agent never connected to them.
This wires the mcp_servers parameter into the existing MCP registration
pipeline (tools/mcp_tool.py) so client-provided servers are connected,
their tools discovered, and the agent's tool surface refreshed before
the first prompt.
Changes:
tools/mcp_tool.py:
- Extract sanitize_mcp_name_component() to replace all non-[A-Za-z0-9_]
characters (fixes crash when server names contain / or other chars
that violate provider tool-name validation rules)
- Use it in _convert_mcp_schema, _sync_mcp_toolsets, _build_utility_schemas
- Extract register_mcp_servers(servers: dict) as a public API that takes
an explicit {name: config} map. discover_mcp_tools() becomes a thin
wrapper that loads config.yaml and calls register_mcp_servers()
acp_adapter/server.py:
- Add _register_session_mcp_servers() which converts ACP McpServerStdio /
McpServerHttp / McpServerSse objects to Hermes MCP config dicts,
registers them via asyncio.to_thread (avoids blocking the ACP event
loop), then rebuilds agent.tools, valid_tool_names, and invalidates
the cached system prompt
- Call it from new_session, load_session, resume_session, fork_session
Tested with Eden (theproxycompany.com) as ACP client — 5 MCP servers
(HTTP + stdio) registered successfully, 110 tools available to the agent.
* feat(memory): add pluggable memory provider interface with profile isolation
Introduces a pluggable MemoryProvider ABC so external memory backends can
integrate with Hermes without modifying core files. Each backend becomes a
plugin implementing a standard interface, orchestrated by MemoryManager.
Key architecture:
- agent/memory_provider.py — ABC with core + optional lifecycle hooks
- agent/memory_manager.py — single integration point in the agent loop
- agent/builtin_memory_provider.py — wraps existing MEMORY.md/USER.md
Profile isolation fixes applied to all 6 shipped plugins:
- Cognitive Memory: use get_hermes_home() instead of raw env var
- Hindsight Memory: check $HERMES_HOME/hindsight/config.json first,
fall back to legacy ~/.hindsight/ for backward compat
- Hermes Memory Store: replace hardcoded ~/.hermes paths with
get_hermes_home() for config loading and DB path defaults
- Mem0 Memory: use get_hermes_home() instead of raw env var
- RetainDB Memory: auto-derive profile-scoped project name from
hermes_home path (hermes-<profile>), explicit env var overrides
- OpenViking Memory: read-only, no local state, isolation via .env
MemoryManager.initialize_all() now injects hermes_home into kwargs so
every provider can resolve profile-scoped storage without importing
get_hermes_home() themselves.
Plugin system: adds register_memory_provider() to PluginContext and
get_plugin_memory_providers() accessor.
Based on PR #3825. 46 tests (37 unit + 5 E2E + 4 plugin registration).
* refactor(memory): drop cognitive plugin, rewrite OpenViking as full provider
Remove cognitive-memory plugin (#727) — core mechanics are broken:
decay runs 24x too fast (hourly not daily), prefetch uses row ID as
timestamp, search limited by importance not similarity.
Rewrite openviking-memory plugin from a read-only search wrapper into
a full bidirectional memory provider using the complete OpenViking
session lifecycle API:
- sync_turn: records user/assistant messages to OpenViking session
(threaded, non-blocking)
- on_session_end: commits session to trigger automatic memory extraction
into 6 categories (profile, preferences, entities, events, cases,
patterns)
- prefetch: background semantic search via find() endpoint
- on_memory_write: mirrors built-in memory writes to the session
- is_available: checks env var only, no network calls (ABC compliance)
Tools expanded from 3 to 5:
- viking_search: semantic search with mode/scope/limit
- viking_read: tiered content (abstract ~100tok / overview ~2k / full)
- viking_browse: filesystem-style navigation (list/tree/stat)
- viking_remember: explicit memory storage via session
- viking_add_resource: ingest URLs/docs into knowledge base
Uses direct HTTP via httpx (no openviking SDK dependency needed).
Response truncation on viking_read to prevent context flooding.
* fix(memory): harden Mem0 plugin — thread safety, non-blocking sync, circuit breaker
- Remove redundant mem0_context tool (identical to mem0_search with
rerank=true, top_k=5 — wastes a tool slot and confuses the model)
- Thread sync_turn so it's non-blocking — Mem0's server-side LLM
extraction can take 5-10s, was stalling the agent after every turn
- Add threading.Lock around _get_client() for thread-safe lazy init
(prefetch and sync threads could race on first client creation)
- Add circuit breaker: after 5 consecutive API failures, pause calls
for 120s instead of hammering a down server every turn. Auto-resets
after cooldown. Logs a warning when tripped.
- Track success/failure in prefetch, sync_turn, and all tool calls
- Wait for previous sync to finish before starting a new one (prevents
unbounded thread accumulation on rapid turns)
- Clean up shutdown to join both prefetch and sync threads
* fix(memory): enforce single external memory provider limit
MemoryManager now rejects a second non-builtin provider with a warning.
Built-in memory (MEMORY.md/USER.md) is always accepted. Only ONE
external plugin provider is allowed at a time. This prevents tool
schema bloat (some providers add 3-5 tools each) and conflicting
memory backends.
The warning message directs users to configure memory.provider in
config.yaml to select which provider to activate.
Updated all 47 tests to use builtin + one external pattern instead
of multiple externals. Added test_second_external_rejected to verify
the enforcement.
* feat(memory): add ByteRover memory provider plugin
Implements the ByteRover integration (from PR #3499 by hieuntg81) as a
MemoryProvider plugin instead of direct run_agent.py modifications.
ByteRover provides persistent memory via the brv CLI — a hierarchical
knowledge tree with tiered retrieval (fuzzy text then LLM-driven search).
Local-first with optional cloud sync.
Plugin capabilities:
- prefetch: background brv query for relevant context
- sync_turn: curate conversation turns (threaded, non-blocking)
- on_memory_write: mirror built-in memory writes to brv
- on_pre_compress: extract insights before context compression
Tools (3):
- brv_query: search the knowledge tree
- brv_curate: store facts/decisions/patterns
- brv_status: check CLI version and context tree state
Profile isolation: working directory at $HERMES_HOME/byterover/ (scoped
per profile). Binary resolution cached with thread-safe double-checked
locking. All write operations threaded to avoid blocking the agent
(curate can take 120s with LLM processing).
* fix(memory): thread remaining sync_turns, fix holographic, add config key
Plugin fixes:
- Hindsight: thread sync_turn (was blocking up to 30s via _run_in_thread)
- RetainDB: thread sync_turn (was blocking on HTTP POST)
- Both: shutdown now joins sync threads alongside prefetch threads
Holographic retrieval fixes:
- reason(): removed dead intersection_key computation (bundled but never
used in scoring). Now reuses pre-computed entity_residuals directly,
moved role_content encoding outside the inner loop.
- contradict(): added _MAX_CONTRADICT_FACTS=500 scaling guard. Above
500 facts, only checks the most recently updated ones to avoid O(n^2)
explosion (~125K comparisons at 500 is acceptable).
Config:
- Added memory.provider key to DEFAULT_CONFIG ("" = builtin only).
No version bump needed (deep_merge handles new keys automatically).
* feat(memory): extract Honcho as a MemoryProvider plugin
Creates plugins/honcho-memory/ as a thin adapter over the existing
honcho_integration/ package. All 4 Honcho tools (profile, search,
context, conclude) move from the normal tool registry to the
MemoryProvider interface.
The plugin delegates all work to HonchoSessionManager — no Honcho
logic is reimplemented. It uses the existing config chain:
$HERMES_HOME/honcho.json -> ~/.honcho/config.json -> env vars.
Lifecycle hooks:
- initialize: creates HonchoSessionManager via existing client factory
- prefetch: background dialectic query
- sync_turn: records messages + flushes to API (threaded)
- on_memory_write: mirrors user profile writes as conclusions
- on_session_end: flushes all pending messages
This is a prerequisite for the MemoryManager wiring in run_agent.py.
Once wired, Honcho goes through the same provider interface as all
other memory plugins, and the scattered Honcho code in run_agent.py
can be consolidated into the single MemoryManager integration point.
* feat(memory): wire MemoryManager into run_agent.py
Adds 8 integration points for the external memory provider plugin,
all purely additive (zero existing code modified):
1. Init (~L1130): Create MemoryManager, find matching plugin provider
from memory.provider config, initialize with session context
2. Tool injection (~L1160): Append provider tool schemas to self.tools
and self.valid_tool_names after memory_manager init
3. System prompt (~L2705): Add external provider's system_prompt_block
alongside existing MEMORY.md/USER.md blocks
4. Tool routing (~L5362): Route provider tool calls through
memory_manager.handle_tool_call() before the catchall handler
5. Memory write bridge (~L5353): Notify external provider via
on_memory_write() when the built-in memory tool writes
6. Pre-compress (~L5233): Call on_pre_compress() before context
compression discards messages
7. Prefetch (~L6421): Inject provider prefetch results into the
current-turn user message (same pattern as Honcho turn context)
8. Turn sync + session end (~L8161, ~L8172): sync_all() after each
completed turn, queue_prefetch_all() for next turn, on_session_end()
+ shutdown_all() at conversation end
All hooks are wrapped in try/except — a failing provider never breaks
the agent. The existing memory system, Honcho integration, and all
other code paths are completely untouched.
Full suite: 7222 passed, 4 pre-existing failures.
* refactor(memory): remove legacy Honcho integration from core
Extracts all Honcho-specific code from run_agent.py, model_tools.py,
toolsets.py, and gateway/run.py. Honcho is now exclusively available
as a memory provider plugin (plugins/honcho-memory/).
Removed from run_agent.py (-457 lines):
- Honcho init block (session manager creation, activation, config)
- 8 Honcho methods: _honcho_should_activate, _strip_honcho_tools,
_activate_honcho, _register_honcho_exit_hook, _queue_honcho_prefetch,
_honcho_prefetch, _honcho_save_user_observation, _honcho_sync
- _inject_honcho_turn_context module-level function
- Honcho system prompt block (tool descriptions, CLI commands)
- Honcho context injection in api_messages building
- Honcho params from __init__ (honcho_session_key, honcho_manager,
honcho_config)
- HONCHO_TOOL_NAMES constant
- All honcho-specific tool dispatch forwarding
Removed from other files:
- model_tools.py: honcho_tools import, honcho params from handle_function_call
- toolsets.py: honcho toolset definition, honcho tools from core tools list
- gateway/run.py: honcho params from AIAgent constructor calls
Removed tests (-339 lines):
- 9 Honcho-specific test methods from test_run_agent.py
- TestHonchoAtexitFlush class from test_exit_cleanup_interrupt.py
Restored two regex constants (_SURROGATE_RE, _BUDGET_WARNING_RE) that
were accidentally removed during the honcho function extraction.
The honcho_integration/ package is kept intact — the plugin delegates
to it. tools/honcho_tools.py registry entries are now dead code (import
commented out in model_tools.py) but the file is preserved for reference.
Full suite: 7207 passed, 4 pre-existing failures. Zero regressions.
* refactor(memory): restructure plugins, add CLI, clean gateway, migration notice
Plugin restructure:
- Move all memory plugins from plugins/<name>-memory/ to plugins/memory/<name>/
(byterover, hindsight, holographic, honcho, mem0, openviking, retaindb)
- New plugins/memory/__init__.py discovery module that scans the directory
directly, loading providers by name without the general plugin system
- run_agent.py uses load_memory_provider() instead of get_plugin_memory_providers()
CLI wiring:
- hermes memory setup — interactive curses picker + config wizard
- hermes memory status — show active provider, config, availability
- hermes memory off — disable external provider (built-in only)
- hermes honcho — now shows migration notice pointing to hermes memory setup
Gateway cleanup:
- Remove _get_or_create_gateway_honcho (already removed in prev commit)
- Remove _shutdown_gateway_honcho and _shutdown_all_gateway_honcho methods
- Remove all calls to shutdown methods (4 call sites)
- Remove _honcho_managers/_honcho_configs dict references
Dead code removal:
- Delete tools/honcho_tools.py (279 lines, import was already commented out)
- Delete tests/gateway/test_honcho_lifecycle.py (131 lines, tested removed methods)
- Remove if False placeholder from run_agent.py
Migration:
- Honcho migration notice on startup: detects existing honcho.json or
~/.honcho/config.json, prints guidance to run hermes memory setup.
Only fires when memory.provider is not set and not in quiet mode.
Full suite: 7203 passed, 4 pre-existing failures. Zero regressions.
* feat(memory): standardize plugin config + add per-plugin documentation
Config architecture:
- Add save_config(values, hermes_home) to MemoryProvider ABC
- Honcho: writes to $HERMES_HOME/honcho.json (SDK native)
- Mem0: writes to $HERMES_HOME/mem0.json
- Hindsight: writes to $HERMES_HOME/hindsight/config.json
- Holographic: writes to config.yaml under plugins.hermes-memory-store
- OpenViking/RetainDB/ByteRover: env-var only (default no-op)
Setup wizard (hermes memory setup):
- Now calls provider.save_config() for non-secret config
- Secrets still go to .env via env vars
- Only memory.provider activation key goes to config.yaml
Documentation:
- README.md for each of the 7 providers in plugins/memory/<name>/
- Requirements, setup (wizard + manual), config reference, tools table
- Consistent format across all providers
The contract for new memory plugins:
- get_config_schema() declares all fields (REQUIRED)
- save_config() writes native config (REQUIRED if not env-var-only)
- Secrets use env_var field in schema, written to .env by wizard
- README.md in the plugin directory
* docs: add memory providers user guide + developer guide
New pages:
- user-guide/features/memory-providers.md — comprehensive guide covering
all 7 shipped providers (Honcho, OpenViking, Mem0, Hindsight,
Holographic, RetainDB, ByteRover). Each with setup, config, tools,
cost, and unique features. Includes comparison table and profile
isolation notes.
- developer-guide/memory-provider-plugin.md — how to build a new memory
provider plugin. Covers ABC, required methods, config schema,
save_config, threading contract, profile isolation, testing.
Updated pages:
- user-guide/features/memory.md — replaced Honcho section with link to
new Memory Providers page
- user-guide/features/honcho.md — replaced with migration redirect to
the new Memory Providers page
- sidebars.ts — added both new pages to navigation
* fix(memory): auto-migrate Honcho users to memory provider plugin
When honcho.json or ~/.honcho/config.json exists but memory.provider
is not set, automatically set memory.provider: honcho in config.yaml
and activate the plugin. The plugin reads the same config files, so
all data and credentials are preserved. Zero user action needed.
Persists the migration to config.yaml so it only fires once. Prints
a one-line confirmation in non-quiet mode.
* fix(memory): only auto-migrate Honcho when enabled + credentialed
Check HonchoClientConfig.enabled AND (api_key OR base_url) before
auto-migrating — not just file existence. Prevents false activation
for users who disabled Honcho, stopped using it (config lingers),
or have ~/.honcho/ from a different tool.
* feat(memory): auto-install pip dependencies during hermes memory setup
Reads pip_dependencies from plugin.yaml, checks which are missing,
installs them via pip before config walkthrough. Also shows install
guidance for external_dependencies (e.g. brv CLI for ByteRover).
Updated all 7 plugin.yaml files with pip_dependencies:
- honcho: honcho-ai
- mem0: mem0ai
- openviking: httpx
- hindsight: hindsight-client
- holographic: (none)
- retaindb: requests
- byterover: (external_dependencies for brv CLI)
* fix: remove remaining Honcho crash risks from cli.py and gateway
cli.py: removed Honcho session re-mapping block (would crash importing
deleted tools/honcho_tools.py), Honcho flush on compress, Honcho
session display on startup, Honcho shutdown on exit, honcho_session_key
AIAgent param.
gateway/run.py: removed honcho_session_key params from helper methods,
sync_honcho param, _honcho.shutdown() block.
tests: fixed test_cron_session_with_honcho_key_skipped (was passing
removed honcho_key param to _flush_memories_for_session).
* fix: include plugins/ in pyproject.toml package list
Without this, plugins/memory/ wouldn't be included in non-editable
installs. Hermes always runs from the repo checkout so this is belt-
and-suspenders, but prevents breakage if the install method changes.
* fix(memory): correct pip-to-import name mapping for dep checks
The heuristic dep.replace('-', '_') fails for packages where the pip
name differs from the import name: honcho-ai→honcho, mem0ai→mem0,
hindsight-client→hindsight_client. Added explicit mapping table so
hermes memory setup doesn't try to reinstall already-installed packages.
* chore: remove dead code from old plugin memory registration path
- hermes_cli/plugins.py: removed register_memory_provider(),
_memory_providers list, get_plugin_memory_providers() — memory
providers now use plugins/memory/ discovery, not the general plugin system
- hermes_cli/main.py: stripped 74 lines of dead honcho argparse
subparsers (setup, status, sessions, map, peer, mode, tokens,
identity, migrate) — kept only the migration redirect
- agent/memory_provider.py: updated docstring to reflect new
registration path
- tests: replaced TestPluginMemoryProviderRegistration with
TestPluginMemoryDiscovery that tests the actual plugins/memory/
discovery system. Added 3 new tests (discover, load, nonexistent).
* chore: delete dead honcho_integration/cli.py and its tests
cli.py (794 lines) was the old 'hermes honcho' command handler — nobody
calls it since cmd_honcho was replaced with a migration redirect.
Deleted tests that imported from removed code:
- tests/honcho_integration/test_cli.py (tested _resolve_api_key)
- tests/honcho_integration/test_config_isolation.py (tested CLI config paths)
- tests/tools/test_honcho_tools.py (tested the deleted tools/honcho_tools.py)
Remaining honcho_integration/ files (actively used by the plugin):
- client.py (445 lines) — config loading, SDK client creation
- session.py (991 lines) — session management, queries, flush
* refactor: move honcho_integration/ into the honcho plugin
Moves client.py (445 lines) and session.py (991 lines) from the
top-level honcho_integration/ package into plugins/memory/honcho/.
No Honcho code remains in the main codebase.
- plugins/memory/honcho/client.py — config loading, SDK client creation
- plugins/memory/honcho/session.py — session management, queries, flush
- Updated all imports: run_agent.py (auto-migration), hermes_cli/doctor.py,
plugin __init__.py, session.py cross-import, all tests
- Removed honcho_integration/ package and pyproject.toml entry
- Renamed tests/honcho_integration/ → tests/honcho_plugin/
* docs: update architecture + gateway-internals for memory provider system
- architecture.md: replaced honcho_integration/ with plugins/memory/
- gateway-internals.md: replaced Honcho-specific session routing and
flush lifecycle docs with generic memory provider interface docs
* fix: update stale mock path for resolve_active_host after honcho plugin migration
* fix(memory): address review feedback — P0 lifecycle, ABC contract, honcho CLI restore
Review feedback from Honcho devs (erosika):
P0 — Provider lifecycle:
- Remove on_session_end() + shutdown_all() from run_conversation() tail
(was killing providers after every turn in multi-turn sessions)
- Add shutdown_memory_provider() method on AIAgent for callers
- Wire shutdown into CLI atexit, reset_conversation, gateway stop/expiry
Bug fixes:
- Remove sync_honcho=False kwarg from /btw callsites (TypeError crash)
- Fix doctor.py references to dead 'hermes honcho setup' command
- Cache prefetch_all() before tool loop (was re-calling every iteration)
ABC contract hardening (all backwards-compatible):
- Add session_id kwarg to prefetch/sync_turn/queue_prefetch
- Make on_pre_compress() return str (provider insights in compression)
- Add **kwargs to on_turn_start() for runtime context
- Add on_delegation() hook for parent-side subagent observation
- Document agent_context/agent_identity/agent_workspace kwargs on
initialize() (prevents cron corruption, enables profile scoping)
- Fix docstring: single external provider, not multiple
Honcho CLI restoration:
- Add plugins/memory/honcho/cli.py (from main's honcho_integration/cli.py
with imports adapted to plugin path)
- Restore full hermes honcho command with all subcommands (status, peer,
mode, tokens, identity, enable/disable, sync, peers, --target-profile)
- Restore auto-clone on profile creation + sync on hermes update
- hermes honcho setup now redirects to hermes memory setup
* fix(memory): wire on_delegation, skip_memory for cron/flush, fix ByteRover return type
- Wire on_delegation() in delegate_tool.py — parent's memory provider
is notified with task+result after each subagent completes
- Add skip_memory=True to cron scheduler (prevents cron system prompts
from corrupting user representations — closes#4052)
- Add skip_memory=True to gateway flush agent (throwaway agent shouldn't
activate memory provider)
- Fix ByteRover on_pre_compress() return type: None -> str
* fix(honcho): port profile isolation fixes from PR #4632
Ports 5 bug fixes found during profile testing (erosika's PR #4632):
1. 3-tier config resolution — resolve_config_path() now checks
$HERMES_HOME/honcho.json → ~/.hermes/honcho.json → ~/.honcho/config.json
(non-default profiles couldn't find shared host blocks)
2. Thread host=_host_key() through from_global_config() in cmd_setup,
cmd_status, cmd_identity (--target-profile was being ignored)
3. Use bare profile name as aiPeer (not host key with dots) — Honcho's
peer ID pattern is ^[a-zA-Z0-9_-]+$, dots are invalid
4. Wrap add_peers() in try/except — was fatal on new AI peers, killed
all message uploads for the session
5. Gate Honcho clone behind --clone/--clone-all on profile create
(bare create should be blank-slate)
Also: sanitize assistant_peer_id via _sanitize_id()
* fix(tests): add module cleanup fixture to test_cli_provider_resolution
test_cli_provider_resolution._import_cli() wipes tools.*, cli, and
run_agent from sys.modules to force fresh imports, but had no cleanup.
This poisoned all subsequent tests on the same xdist worker — mocks
targeting tools.file_tools, tools.send_message_tool, etc. patched the
NEW module object while already-imported functions still referenced
the OLD one. Caused ~25 cascade failures: send_message KeyError,
process_registry FileNotFoundError, file_read_guards timeouts,
read_loop_detection file-not-found, mcp_oauth None port, and
provider_parity/codex_execution stale tool lists.
Fix: autouse fixture saves all affected modules before each test and
restores them after, matching the pattern in
test_managed_browserbase_and_modal.py.
The gateway's dangerous command approval system was fundamentally broken:
the agent loop continued running after a command was flagged, and the
approval request only reached the user after the agent finished its
entire conversation loop. By then the context was lost.
This change makes the gateway approval mirror the CLI's synchronous
behavior. When a dangerous command is detected:
1. The agent thread blocks on a threading.Event
2. The approval request is sent to the user immediately
3. The user responds with /approve or /deny
4. The event is signaled and the agent resumes with the real result
The agent never sees 'approval_required' as a tool result. It either
gets the command output (approved) or a definitive BLOCKED message
(denied/timed out) — same as CLI mode.
Queue-based design supports multiple concurrent approvals (parallel
subagents via delegate_task, execute_code RPC handlers). Each approval
gets its own _ApprovalEntry with its own threading.Event. /approve
resolves the oldest (FIFO); /approve all resolves all at once.
Changes:
- tools/approval.py: Queue-based per-session blocking gateway approval
(register/unregister callbacks, resolve with FIFO or all-at-once)
- gateway/run.py: Register approval callback in run_sync(), remove
post-loop pop_pending hack, /approve and /deny support 'all' flag
- tests: 21 tests including parallel subagent E2E scenarios
- Add logger + debug log to read_nous_access_token() catch-all so token
refresh failures are observable instead of silently swallowed
- Tighten _is_nous_auxiliary_client() domain check to use proper URL
hostname parsing instead of substring match, preventing false-positives
on domains like not-nousresearch.com or nousresearch.com.evil.com
LLM responses from browser snapshot extraction and vision analysis
could echo back secrets that appeared on screen or in page content.
Input redaction alone is insufficient — the LLM may reproduce secrets
it read from screenshots (which cannot be text-redacted).
Now redact outputs from:
- _extract_relevant_content (auxiliary LLM response)
- browser_vision (vision LLM response)
- camofox_vision (vision LLM response)
Three exfiltration vectors closed:
1. Browser URL exfil — agent could embed secrets in URL params and
navigate to attacker-controlled server. Now scans URLs for known
API key patterns before navigating (browser_navigate, web_extract).
2. Browser snapshot leak — page displaying env vars or API keys would
send secrets to auxiliary LLM via _extract_relevant_content before
run_agent.py's redaction layer sees the result. Now redacts snapshot
text before the auxiliary call.
3. Camofox annotation leak — accessibility tree text sent to vision
LLM could contain secrets visible on screen. Now redacts annotation
context before the vision call.
10 new tests covering URL blocking, snapshot redaction, and annotation
redaction for both browser and camofox backends.
* feat(skills): add content size limits for agent-created skills
Agent writes via skill_manage (create/edit/patch/write_file) are now
constrained to prevent unbounded growth:
- SKILL.md and supporting files: 100,000 character limit
- Supporting files: additional 1 MiB byte limit
- Patches on oversized hand-placed skills that reduce the size are
allowed (shrink path), but patches that grow beyond the limit are
rejected
Hand-placed skills and hub-installed skills have NO hard limit —
they load and function normally regardless of size. Hub installs
get a warning in the log if SKILL.md exceeds 100k chars.
This mirrors the memory system's char_limit pattern. Without this,
the agent auto-grows skills indefinitely through iterative patches
(hermes-agent-dev reached 197k chars / 72k tokens — 40x larger than
the largest skill in the entire skills.sh ecosystem).
Constants: MAX_SKILL_CONTENT_CHARS (100k), MAX_SKILL_FILE_BYTES (1MiB)
Tests: 14 new tests covering all write paths and edge cases
* feat(skills): add fuzzy matching to skill patch
_patch_skill now uses the same 8-strategy fuzzy matching engine
(tools/fuzzy_match.py) as the file patch tool. Handles whitespace
normalization, indentation differences, escape sequences, and
block-anchor matching. Eliminates exact-match failures when agents
patch skills with minor formatting mismatches.
Adds two Camofox features:
1. Persistent browser sessions: new `browser.camofox.managed_persistence`
config option. When enabled, Hermes sends a deterministic profile-scoped
userId to Camofox so the server maps it to a persistent browser profile
directory. Cookies, logins, and browser state survive across restarts.
Default remains ephemeral (random userId per session).
2. VNC URL discovery: Camofox /health endpoint returns vncPort when running
in headed mode. Hermes constructs the VNC URL and includes it in navigate
responses so the agent can share it with users.
Also fixes camofox_vision bug where call_llm response object was passed
directly to json.dumps instead of extracting .choices[0].message.content.
Changes from original PR:
- Removed browser_evaluate tool (separate feature, needs own PR)
- Removed snapshot truncation limit change (unrelated)
- Config.yaml only for managed_persistence (no env var, no version bump)
- Rewrote tests to use config mock instead of env var
- Reverted package-lock.json churn
Co-authored-by: analista <psikonetik@gmail.com.com>
After a successful write_file or patch, update the stored read
timestamp to match the file's new modification time. Without this,
consecutive edits by the same task (read → write → write) would
false-warn on the second write because the stored timestamp still
reflected the original read, not the first write.
Also renames the internal tracker key from 'file_mtimes' to
'read_timestamps' for clarity.
* fix: root-level provider in config.yaml no longer overrides model.provider
load_cli_config() had a priority inversion: a stale root-level
'provider' key in config.yaml would OVERRIDE the canonical
'model.provider' set by 'hermes model'. The gateway reads
model.provider directly from YAML and worked correctly, but
'hermes chat -q' and the interactive CLI went through the merge
logic and picked up the stale root-level key.
Fix: root-level provider/base_url are now only used as a fallback
when model.provider/model.base_url is not set (never as an override).
Also added _normalize_root_model_keys() to config.py load_config()
and save_config() — migrates root-level provider/base_url into the
model section and removes the root-level keys permanently.
Reported by (≧▽≦) in Discord: opencode-go provider persisted as a
root-level key and overrode the correct model.provider=openrouter,
causing 401 errors.
* fix(security): redact secrets from execute_code sandbox output
The execute_code sandbox stripped env vars with secret-like names from
the child process (preventing os.environ access), but scripts could
still read secrets from disk (e.g. open('~/.hermes/.env')) and print
them to stdout. The raw values entered the model context unredacted.
terminal_tool and file_tools already applied redact_sensitive_text()
to their output — execute_code was the only tool that skipped this
step. Now the same redaction runs on both stdout and stderr after
ANSI stripping.
Reported via Discord (not filed on GitHub to avoid public disclosure
of the reproduction steps).
Track file mtime when read_file is called. When write_file or patch
subsequently targets the same file, compare the current mtime against
the recorded one. If they differ (external edit, concurrent agent,
user change), include a _warning in the result advising the agent to
re-read. The write still proceeds — this is a soft signal, not a
hard block.
Key design points:
- Per-task isolation: task A's reads don't affect task B's writes.
- Files never read produce no warning (not enforcing read-before-write).
- mtime naturally updates after the agent's own writes, so the warning
only fires on external changes, not the agent's own edits.
- V4A multi-file patches check all target paths.
Tests: 10 new tests covering write staleness, patch staleness,
never-read files, cross-task isolation, and the helper function.
The camofox-browser image defaults to port 3000 internally, not 9377.
Without -e CAMOFOX_PORT=9377, the -p 9377:9377 mapping silently fails
because nothing listens on 9377 inside the container.
E2E verified: -p 9377:9377 alone → connection reset,
-p 9377:9377 -e CAMOFOX_PORT=9377 → healthy and functional.
* feat(file_tools): harden read_file with size guard, dedup, and device blocking
Three improvements to read_file_tool to reduce wasted context tokens and
prevent process hangs:
1. Character-count guard: reads that produce more than 100K characters
(≈25-35K tokens across tokenisers) are rejected with an error that
tells the model to use offset+limit for a smaller range. The
effective cap is min(file_size, 100K) so small files that happen to
have long lines aren't over-penalised. Large truncated files also
get a hint nudging toward targeted reads.
2. File-read deduplication: when the same (path, offset, limit) is read
a second time and the file hasn't been modified (mtime unchanged),
return a lightweight stub instead of re-sending the full content.
Writes and patches naturally change mtime, so post-edit reads always
return fresh content. The dedup cache is cleared on context
compression — after compression the original read content is
summarised away, so the model needs the full content again.
3. Device path blocking: paths like /dev/zero, /dev/random, /dev/stdin
etc. are rejected before any I/O to prevent process hangs from
infinite-output or blocking-input devices.
Tests: 17 new tests covering all three features plus the dedup-reset-
on-compression integration. All 52 file-read tests pass (35 existing +
17 new). Full tool suite (2124 tests) passes with 0 failures.
* feat: make file_read_max_chars configurable, add docs
Add file_read_max_chars to DEFAULT_CONFIG (default 100K). read_file_tool
reads this on first call and caches for the process lifetime. Users on
large-context models can raise it; users on small local models can lower it.
Also adds a 'File Read Safety' section to the configuration docs
explaining the char limit, dedup behavior, and example values.
_load_config_files() had the same hermes_home / item pattern without
containment checks. While config.yaml is user-controlled (lower threat
than skill frontmatter), defense in depth prevents exploitation via
config injection or copy-paste mistakes.
WSL detection was treated as a hard fail, blocking voice mode even when
audio worked via PulseAudio bridge. Now PULSE_SERVER env var presence
makes WSL a soft notice instead of a blocking warning. Device query
failures in WSL with PULSE_SERVER are also treated as non-blocking.
The SSRF protection added in #3041 blocks all private/internal addresses
unconditionally in browser_navigate(). This prevents legitimate local use
cases (localhost apps, LAN devices) when using Camofox or the built-in
headless Chromium without a cloud provider.
The check is only meaningful for cloud backends (Browserbase, BrowserUse)
where the agent could reach internal resources on a remote machine. Local
backends give the user full terminal and network access already — the
SSRF check adds zero security value.
Add _is_local_backend() helper that returns True when Camofox is active
or no cloud provider is configured. Both the pre-navigation and
post-redirect SSRF checks now skip when running locally. The
browser.allow_private_urls config option remains available as an
explicit opt-out for cloud mode.
1. browser_tool.py: _allow_private_urls() used 'global _allow_private_urls'
then assigned a bool to it, replacing the function in the module namespace.
After first call, subsequent calls hit TypeError: 'bool' object is not
callable. Renamed cache variable to _cached_allow_private_urls.
2. test_provider_parity.py: test_custom_endpoint_when_no_nous relied on
OPENAI_BASE_URL env var (removed in config refactor). Mock
_resolve_custom_runtime directly instead.
* fix(tools): skip SSRF check in local browser mode
The SSRF protection added in #3041 blocks all private/internal
addresses unconditionally in browser_navigate(). This prevents
legitimate local development use cases (localhost testing, LAN
device access) when using the local Chromium backend.
The SSRF check is only meaningful for cloud browsers (Browserbase,
BrowserUse) where the agent could reach internal resources on a
remote machine. In local mode, the user already has full terminal
and network access, so the check adds no security value.
This change makes the SSRF check conditional on _get_cloud_provider(),
keeping full protection in cloud mode while allowing private addresses
in local mode.
* fix(tools): make SSRF check configurable via browser.allow_private_urls
Replace unconditional SSRF check with a configurable setting.
Default (False) keeps existing security behavior. Setting to True
allows navigating to private/internal IPs for local dev and LAN use cases.
---------
Co-authored-by: Nils (Norya) <nils@begou.dev>
* fix(gateway): honor default for invalid bool-like config values
* refactor: simplify web backend priority detection
Replace cascading boolean conditions with a priority-ordered loop.
Same behavior (verified against all 16 env var combinations),
half the lines, trivially extensible for new backends.
---------
Co-authored-by: aydnOktay <xaydinoktay@gmail.com>
Camofox-browser is a self-hosted Node.js server wrapping Camoufox
(Firefox fork with C++ fingerprint spoofing). When CAMOFOX_URL is set,
all 11 browser tools route through the Camofox REST API instead of
the agent-browser CLI.
Maps 1:1 to the existing browser tool interface:
- Navigate, snapshot, click, type, scroll, back, press, close
- Get images, vision (screenshot + LLM analysis)
- Console (returns empty with note — camofox limitation)
Setup: npm start in camofox-browser dir, or docker run -p 9377:9377
Then: CAMOFOX_URL=http://localhost:9377 in ~/.hermes/.env
Advantages over Browserbase (cloud):
- Free (no per-session API costs)
- Local (zero network latency for browser ops)
- Anti-detection at C++ level (bypasses Cloudflare/Google bot detection)
- Works offline, Docker-ready
Files:
- tools/browser_camofox.py: Full REST backend (~400 lines)
- tools/browser_tool.py: Routing at each tool function
- hermes_cli/config.py: CAMOFOX_URL env var entry
- tests/tools/test_browser_camofox.py: 20 tests
* feat(telegram): add webhook mode as alternative to polling
When TELEGRAM_WEBHOOK_URL is set, the adapter starts an HTTP webhook
server (via python-telegram-bot's start_webhook()) instead of long
polling. This enables cloud platforms like Fly.io and Railway to
auto-wake suspended machines on inbound HTTP traffic.
Polling remains the default — no behavior change unless the env var
is set.
Env vars:
TELEGRAM_WEBHOOK_URL Public HTTPS URL for Telegram to push to
TELEGRAM_WEBHOOK_PORT Local listen port (default 8443)
TELEGRAM_WEBHOOK_SECRET Secret token for update verification
Cherry-picked and adapted from PR #2022 by SHL0MS. Preserved all
current main enhancements (network error recovery, polling conflict
detection, DM topics setup).
Co-authored-by: SHL0MS <SHL0MS@users.noreply.github.com>
* fix: send_document call in background task delivery + vision download timeout
Two fixes salvaged from PR #2269 by amethystani:
1. gateway/run.py: adapter.send_file() → adapter.send_document()
send_file() doesn't exist on BasePlatformAdapter. Background task
media files were silently never delivered (AttributeError swallowed
by except Exception: pass).
2. tools/vision_tools.py: configurable image download timeout via
HERMES_VISION_DOWNLOAD_TIMEOUT env var (default 30s), plus guard
against raise None when max_retries=0.
The third fix in #2269 (opencode-go auth config) was already resolved
on main.
Co-authored-by: amethystani <amethystani@users.noreply.github.com>
---------
Co-authored-by: SHL0MS <SHL0MS@users.noreply.github.com>
Co-authored-by: amethystani <amethystani@users.noreply.github.com>
Skills with scripts/, templates/, and references/ subdirectories need
those files available inside sandboxed execution environments. Previously
the skills directory was missing entirely from remote backends.
Live sync — files stay current as credentials refresh and skills update:
- Docker/Singularity: bind mounts are inherently live (host changes
visible immediately)
- Modal: _sync_files() runs before each command with mtime+size caching,
pushing only changed credential and skill files (~13μs no-op overhead)
- SSH: rsync --safe-links before each command (naturally incremental)
- Daytona: _upload_if_changed() with mtime+size caching before each command
Security — symlink filtering:
- Docker/Singularity: sanitized temp copy when symlinks detected
- Modal/Daytona: iter_skills_files() skips symlinks
- SSH: rsync --safe-links skips symlinks pointing outside source tree
- Temp dir cleanup via atexit + reuse across calls
Non-root user support:
- SSH: detects remote home via echo $HOME, syncs to $HOME/.hermes/
- Daytona: detects sandbox home before sync, uploads to $HOME/.hermes/
- Docker/Modal/Singularity: run as root, /root/.hermes/ is correct
Also:
- credential_files.py: fix name/path key fallback in required_credential_files
- Singularity, SSH, Daytona: gained credential file support
- 14 tests covering symlink filtering, name/path fallback, iter_skills_files
* feat(approvals): make dangerous command approval timeout configurable
Read `approvals.timeout` from config.yaml (default 60s) instead of
hardcoding 60 seconds in both the fallback CLI prompt and the TUI
prompt_toolkit callback.
Follows the same pattern as `clarify.timeout` which is already
configurable via CLI_CONFIG.
Closes#3765
* fix: add timeout default to approvals section in DEFAULT_CONFIG
---------
Co-authored-by: acsezen <asezen@icloud.com>
Closes gaps that allowed an agent to expose Docker's Remote API to the
internet by writing to /etc/docker/daemon.json.
Terminal tool (approval.py):
- chmod: now catches 666 and symbolic modes (o+w, a+w), not just 777
- cp/mv/install: detected when targeting /etc/
- sed -i/--in-place: detected when targeting /etc/
File tools (file_tools.py):
- write_file and patch now refuse to write to sensitive system paths
(/etc/, /boot/, /usr/lib/systemd/, docker.sock)
- Directs users to the terminal tool (which has approval prompts) for
system file modifications
When a command timed out, all captured output was discarded — the agent
only saw 'Command timed out after Xs' with zero context. Now returns
the buffered output followed by a timeout marker, matching the existing
interrupt path behavior.
Salvaged from PR #3286 by @binhnt92.
Co-authored-by: nguyen binh <binhnt92@users.noreply.github.com>
Adds WeCom as a gateway platform adapter using the AI Bot WebSocket
gateway for real-time bidirectional communication. No public endpoint
or new pip dependencies needed (uses existing aiohttp + httpx).
Features:
- WebSocket persistent connection with auto-reconnect (exponential backoff)
- DM and group messaging with configurable access policies
- Media upload/download with AES decryption for encrypted attachments
- Markdown rendering, quote context preservation
- Proactive + passive reply message modes
- Chunked media upload pipeline (512KB chunks)
Cherry-picked from PR #1898 by EvilRan with:
- Moved to current main (PR was 300 commits behind)
- Skipped base.py regressions (reply_to additions are good but belong
in a separate PR since they affect all platforms)
- Fixed test assertions to match current base class send() signature
(reply_to=None kwarg now explicit)
- All 16 integration points added surgically to current main
- No new pip dependencies (aiohttp + httpx already installed)
Fixes#1898
Co-authored-by: EvilRan <EvilRan@users.noreply.github.com>
Three safety gaps in vision_analyze_tool:
1. Local files accepted without checking if they're actually images —
a renamed text file would get base64-encoded and sent to the model.
Now validates magic bytes (PNG, JPEG, GIF, BMP, WebP, SVG).
2. No website policy enforcement on image URLs — blocked domains could
be fetched via the vision tool. Now checks before download.
3. No redirect check — if an allowed URL redirected to a blocked domain,
the download would proceed. Now re-checks the final URL.
Fixed one test that needed _validate_image_url mocked to bypass DNS
resolution on the fake blocked.test domain (is_safe_url does DNS
checks that were added after the original PR).
Co-authored-by: GutSlabs <GutSlabs@users.noreply.github.com>
Validate category names in _create_skill() before using them as
filesystem path segments. Previously, categories like '../escape' or
'/tmp/pwned' could write skill files outside ~/.hermes/skills/.
Adds _validate_category() that rejects slashes, backslashes, absolute
paths, and non-alphanumeric characters (reuses existing VALID_NAME_RE).
Tests: 5 new tests for traversal, absolute paths, and valid categories.
Salvaged from PR #1939 by Gutslabs.
When a subagent hit max_iterations, status was always 'failed' even
if it produced a usable summary via _handle_max_iterations(). This
happened because the status check required both completed=True AND
a summary, but completed is False whenever max_iterations is reached
(run_agent.py line 7969).
Now gates status on whether a summary was produced — if the subagent
returned a final_response, the parent has usable output regardless of
iteration budget. The exit_reason field already distinguishes
'completed' vs 'max_iterations' for anything that needs to know how
the task ended.
Closes#1899.
Adds Feishu (ByteDance's enterprise messaging platform) as a gateway
platform adapter with full feature parity: WebSocket + webhook transports,
message batching, dedup, rate limiting, rich post/card content parsing,
media handling (images/audio/files/video), group @mention gating,
reaction routing, and interactive card button support.
Cherry-picked from PR #1793 by penwyp with:
- Moved to current main (PR was 458 commits behind)
- Fixed _send_with_retry shadowing BasePlatformAdapter method (renamed to
_feishu_send_with_retry to avoid signature mismatch crash)
- Fixed import structure: aiohttp/websockets imported independently of
lark_oapi so they remain available when SDK is missing
- Fixed get_hermes_home import (hermes_constants, not hermes_cli.config)
- Added skip decorators for tests requiring lark_oapi SDK
- All 16 integration points added surgically to current main
New dependency: lark-oapi>=1.5.3,<2 (optional, pip install hermes-agent[feishu])
Fixes#1788
Co-authored-by: penwyp <penwyp@users.noreply.github.com>
The write-deny list in file_operations.py hardcoded ~/.hermes/.env,
which misses the actual .env in custom HERMES_HOME or profile setups.
Use get_hermes_home() for profile-safe path resolution.
Salvaged from PR #3232 by @erhnysr.
Co-authored-by: Erhnysr <erhnysr@users.noreply.github.com>
The honcho check_fn only checked runtime session state, which isn't
set until the agent initializes. At banner time, honcho tools showed
as red/disabled even when properly configured.
Now checks configuration (enabled + api_key/base_url) as a fallback
when the session context isn't active yet. Fast path (session active)
unchanged; slow path (config check) only runs at banner time.
Adds 4 tests covering: session active, configured but no session,
not configured, and import failure graceful fallback.
Closes#1843.
When a connected MCP server sends a ToolListChangedNotification (per the
MCP spec), Hermes now automatically re-fetches the tool list, deregisters
removed tools, and registers new ones — without requiring a restart.
This enables MCP servers with dynamic toolsets (e.g. GitHub MCP with
GITHUB_DYNAMIC_TOOLSETS=1) to add/remove tools at runtime.
Changes:
- registry.py: add ToolRegistry.deregister() for nuke-and-repave refresh
- mcp_tool.py: extract _register_server_tools() from
_discover_and_register_server() as a shared helper for both initial
discovery and dynamic refresh
- mcp_tool.py: add _make_message_handler() and _refresh_tools() on
MCPServerTask, wired into all 3 ClientSession sites (stdio, new HTTP,
deprecated HTTP)
- Graceful degradation: silently falls back to static discovery when the
MCP SDK lacks notification types or message_handler support
- 8 new tests covering registration, refresh, handler dispatch, and
deregister
Salvaged from PR #1794 by shivvor2.
When a tool plugin registers a schema without an explicit 'name' key,
get_definitions() crashes with KeyError:
available_tool_names = {t["function"]["name"] for t in filtered_tools}
Fix: always merge entry.name into schema so 'name' is never missing.
Refs: #3729
Co-authored-by: ekkoitac <ekko.itac@gmail.com>
* feat: show estimated tool token context in hermes tools checklist
Adds a live token estimate indicator to the bottom of the interactive
tool configuration checklist (hermes tools / hermes setup). As users
toggle toolsets on/off, the total estimated context cost updates in
real time.
Implementation:
- tools/registry.py: Add get_schema() for check_fn-free schema access
- hermes_cli/curses_ui.py: Add optional status_fn callback to
curses_checklist — renders at bottom-right of terminal, stays fixed
while items scroll
- hermes_cli/tools_config.py: Add _estimate_tool_tokens() using
tiktoken (cl100k_base, already installed) to count tokens in the
JSON-serialised OpenAI-format tool schemas. Results are cached
per-process. The status function deduplicates overlapping tools
(e.g. browser includes web_search) for accurate totals.
- 12 new tests covering estimation, caching, graceful degradation
when tiktoken is unavailable, status_fn wiring, deduplication,
and the numbered fallback display
* fix: use effective toolsets (includes plugins) for token estimation index mapping
The status_fn closure built ts_keys from CONFIGURABLE_TOOLSETS but the
checklist uses _get_effective_configurable_toolsets() which appends plugin
toolsets. With plugins present, the indices would mismatch, causing
IndexError when selecting a plugin toolset.
* fix(tools): implement send_message routing for Matrix, Mattermost, HomeAssistant, DingTalk
Matrix, Mattermost, HomeAssistant, and DingTalk were present in
platform_map but fell through to the "not yet implemented" else branch,
causing send_message tool calls to silently fail on these platforms.
Add four async sender functions:
- _send_mattermost: POST /api/v4/posts via Mattermost REST API
- _send_matrix: PUT /_matrix/client/v3/rooms/.../send via Matrix CS API
- _send_homeassistant: POST /api/services/notify/notify via HA REST API
- _send_dingtalk: POST to session webhook URL
Add routing in _send_to_platform() and 17 unit tests covering success,
HTTP errors, missing config, env var fallback, and Matrix txn_id uniqueness.
* fix: pass platform tokens explicitly to Mattermost/Matrix/HA senders
The original PR passed pconfig.extra to sender functions, but tokens
live at pconfig.token (not in extra). This caused the senders to always
fall through to env var lookup instead of using the gateway-resolved
token.
Changes:
- Mattermost/Matrix/HA: accept token as first arg, matching the
Telegram/Discord/Slack sender pattern
- DingTalk: add DINGTALK_WEBHOOK_URL env var fallback + docstring
explaining the session-webhook vs robot-webhook difference
- Tests updated for new signatures + new DingTalk env var test
---------
Co-authored-by: sprmn24 <oncuevtv@gmail.com>
When a user runs 'hermes update', the Python process caches old modules
in sys.modules. After git pull updates files on disk, lazy imports of
newly-updated modules fail because they try to import display_hermes_home
from the cached (old) hermes_constants which doesn't have the function.
This specifically broke the gateway auto-restart in cmd_update — importing
hermes_cli/gateway.py triggered the top-level 'from hermes_constants
import display_hermes_home' against the cached old module. The ImportError
was silently caught, so the gateway was never restarted after update.
Users with a running gateway then hit the ImportError on their next
Telegram/Discord message when the stale gateway process lazily loaded
run_agent.py (new version) which also had the top-level import.
Fixes:
- hermes_cli/gateway.py: lazy import at call site (line 940)
- run_agent.py: lazy import at call site (line 6927)
- tools/terminal_tool.py: lazy imports at 3 call sites
- tools/tts_tool.py: static schema string (no module-level call)
- hermes_cli/auth.py: lazy import at call site (line 2024)
- hermes_cli/main.py: reload hermes_constants after git pull in cmd_update
Also fixes 4 pre-existing test failures in test_parse_env_var caused by
NameError on display_hermes_home in terminal_tool.py.
Add skills.external_dirs config option — a list of additional directories
to scan for skills alongside ~/.hermes/skills/. External dirs are read-only:
skill creation/editing always writes to the local dir. Local skills take
precedence when names collide.
This lets users share skills across tools/agents without copying them into
Hermes's own directory (e.g. ~/.agents/skills, /shared/team-skills).
Changes:
- agent/skill_utils.py: add get_external_skills_dirs() and get_all_skills_dirs()
- agent/prompt_builder.py: scan external dirs in build_skills_system_prompt()
- tools/skills_tool.py: _find_all_skills() and skill_view() search external dirs;
security check recognizes configured external dirs as trusted
- agent/skill_commands.py: /skill slash commands discover external skills
- hermes_cli/config.py: add skills.external_dirs to DEFAULT_CONFIG
- cli-config.yaml.example: document the option
- tests/agent/test_external_skills.py: 11 tests covering discovery, precedence,
deduplication, and skill_view for external skills
Requested by community member primco.
Two related fixes for remote terminal backends (Modal/Docker):
1. NEW: Credential file mounting system
Skills declare required_credential_files in frontmatter. Files are
mounted into Docker (read-only bind mounts) and Modal (mounts at
creation + sync via exec on each command for mid-session changes).
Google Workspace skill updated with the new field.
2. FIX: Docker backend now includes env_passthrough vars
Skills that declare required_environment_variables (e.g. Notion with
NOTION_API_KEY) register vars in the env_passthrough system. The
local backend checked this, but Docker's forward_env was a separate
disconnected list. Now Docker exec merges both sources, so
skill-declared env vars are forwarded into containers automatically.
This fixes the reported issue where NOTION_API_KEY in ~/.hermes/.env
wasn't reaching the Docker container despite being registered via
the Notion skill's prerequisites.
Closes#3665
Prep for profiles: user-facing messages now use display_hermes_home() so
diagnostic output shows the correct path for each profile.
New helper: display_hermes_home() in hermes_constants.py
12 files swept, ~30 user-facing string replacements.
Includes dynamic TTS schema description.
Switch MCP HTTP transport from the deprecated streamablehttp_client()
(mcp < 1.24.0) to the new streamable_http_client() API that accepts a
pre-built httpx.AsyncClient.
Changes vs the original PR #3391:
- Separate try/except imports so mcp < 1.24.0 doesn't break (graceful
fallback to deprecated API instead of losing HTTP MCP entirely)
- Wrap httpx.AsyncClient in async-with for proper lifecycle management
(the new SDK API explicitly skips closing caller-provided clients)
- Match SDK's own create_mcp_http_client defaults: follow_redirects=True,
Timeout(connect_timeout, read=300.0)
- Keep deprecated code path as fallback for older SDK versions
Co-authored-by: HenkDz <HenkDz@users.noreply.github.com>
Salvage of PR #3452 (kentimsit). Fixes skill readiness checks on remote backends — persisted env vars are no longer incorrectly marked as missing.
Co-Authored-By: kentimsit <kentimsit@users.noreply.github.com>
Adds Exa (https://exa.ai) as a fourth web backend alongside Parallel,
Firecrawl, and Tavily. Follows the exact same integration pattern:
- Backend selection: config web.backend=exa or auto-detect from EXA_API_KEY
- Search: _exa_search() with highlights for result descriptions
- Extract: _exa_extract() with full text content extraction
- Lazy singleton client with x-exa-integration header
- Wired into web_search_tool and web_extract_tool dispatchers
- check_web_api_key() and requires_env updated
- CLI: hermes setup summary, hermes tools config, hermes config show
- config.py: EXA_API_KEY in OPTIONAL_ENV_VARS with metadata
- pyproject.toml: exa-py>=2.9.0,<3 in dependencies
Salvaged from PR #1850.
Co-authored-by: louiswalsh <louiswalsh@users.noreply.github.com>
New installs get a cleaner structure:
cache/images/ (was image_cache/)
cache/audio/ (was audio_cache/)
cache/documents/ (was document_cache/)
cache/screenshots/ (was browser_screenshots/)
platforms/whatsapp/session/ (was whatsapp/session/)
platforms/matrix/store/ (was matrix/store/)
platforms/pairing/ (was pairing/)
Existing installs are unaffected -- get_hermes_dir() checks for the
old path first and uses it if present. No migration needed.
Adds get_hermes_dir(new_subpath, old_name) helper to hermes_constants.py
for reuse by any future subsystem.
Add the gstack community skills repo to the default tap list and fix
skill_identifier construction for repos with an empty path prefix.
Co-authored-by: Tugrul Guner <tugrulguner@users.noreply.github.com>
Salvage of PR #3533 (binhnt92). Follow-up to #3480 — applies min(100, ...) to 5 remaining unclamped percentage display sites in context_compressor, cli /stats, gateway /stats, and memory tool. Defensive clamps now that the root cause (estimation heuristic) was already removed in #3480.
Co-Authored-By: binhnt92 <binhnt92@users.noreply.github.com>
One-shot local execution built `printf FENCE; <cmd>; __hermes_rc=...`, so a
command ending in a heredoc produced a closing line like `EOF; __hermes_rc=...`,
which is not a valid delimiter. Bash then treated the rest of the wrapper as
heredoc body, leaking it into tool output (e.g. gh issue/PR flows).
Use newline-separated wrapper lines so the delimiter stays alone and the
trailer runs after the heredoc completes.
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Prevent the agent from accidentally killing its own process with
pkill -f gateway, killall hermes, etc. Adds a dangerous command
pattern that triggers the approval flow.
Co-authored-by: arasovic <arasovic@users.noreply.github.com>
* feat: GPT tool-use steering + strip budget warnings from history
Two changes to improve tool reliability, especially for OpenAI GPT models:
1. GPT tool-use enforcement prompt: Adds GPT_TOOL_USE_GUIDANCE to the
system prompt when the model name contains 'gpt' and tools are loaded.
This addresses a known behavioral pattern where GPT models describe
intended actions ('I will run the tests') instead of actually making
tool calls. Inspired by similar steering in OpenCode (beast.txt) and
Cline (GPT-5.1 variant).
2. Budget warning history stripping: Budget pressure warnings injected by
_get_budget_warning() into tool results are now stripped when
conversation history is replayed via run_conversation(). Previously,
these turn-scoped signals persisted across turns, causing models to
avoid tool calls in all subsequent messages after any turn that hit
the 70-90% iteration threshold.
* fix: replace hardcoded ~/.hermes paths with get_hermes_home() for profile support
Prep for the upcoming profiles feature — each profile is a separate
HERMES_HOME directory, so all paths must respect the env var.
Fixes:
- gateway/platforms/matrix.py: Matrix E2EE store was hardcoded to
~/.hermes/matrix/store, ignoring HERMES_HOME. Now uses
get_hermes_home() so each profile gets its own Matrix state.
- gateway/platforms/telegram.py: Two locations reading config.yaml via
Path.home()/.hermes instead of get_hermes_home(). DM topic thread_id
persistence and hot-reload would read the wrong config in a profile.
- tools/file_tools.py: Security path for hub index blocking was
hardcoded to ~/.hermes, would miss the actual profile's hub cache.
- hermes_cli/gateway.py: Service naming now uses the profile name
(hermes-gateway-coder) instead of a cryptic hash suffix. Extracted
_profile_suffix() helper shared by systemd and launchd.
- hermes_cli/gateway.py: Launchd plist path and Label now scoped per
profile (ai.hermes.gateway-coder.plist). Previously all profiles
would collide on the same plist file on macOS.
- hermes_cli/gateway.py: Launchd plist now includes HERMES_HOME in
EnvironmentVariables — was missing entirely, making custom
HERMES_HOME broken on macOS launchd (pre-existing bug).
- All launchctl commands in gateway.py, main.py, status.py updated
to use get_launchd_label() instead of hardcoded string.
Test fixes: DM topic tests now set HERMES_HOME env var alongside
Path.home() mock. Launchd test uses get_launchd_label() for expected
commands.
Drop the swe-rex dependency for Modal terminal backend and use the
Modal SDK directly (Sandbox.create + Sandbox.exec). This fixes:
- AsyncUsageWarning from synchronous App.lookup() in async context
- DeprecationError from unencrypted_ports / .url on unencrypted tunnels
(deprecated 2026-03-05)
The new implementation:
- Uses modal.App.lookup.aio() for async-safe app creation
- Uses Sandbox.create.aio() with 'sleep infinity' entrypoint
- Uses Sandbox.exec.aio() for direct command execution (no HTTP server
or tunnel needed)
- Keeps all existing features: persistent filesystem snapshots,
configurable resources (CPU/memory/disk), sudo support, interrupt
handling, _AsyncWorker for event loop safety
Consistent with the Docker backend precedent (PR #2804) where we
removed mini-swe-agent in favor of direct docker run.
Files changed:
- tools/environments/modal.py - core rewrite
- tools/terminal_tool.py - health check: modal instead of swerex
- hermes_cli/setup.py - install modal instead of swe-rex[modal]
- pyproject.toml - modal extra: modal>=1.0.0 instead of swe-rex[modal]
- scripts/kill_modal.sh - grep for hermes-agent instead of swe-rex
- tests/ - updated for new implementation
- environments/README.md - updated patches section
- website/docs - updated install command
* Fix#3409: Add fallback to session_search to prevent false negatives on summarization failure
Fixes#3409. When the auxiliary summarizer fails or returns None, the tool now returns a raw fallback preview of the matched session instead of silently dropping it and returning an empty list
* fix: clean up fallback logic — separate exception handling from preview
Restructure the loop: handle exceptions first (log + nullify), build
entry dict once, then branch on result truthiness. Removes duplicated
field assignments and makes the control flow linear.
---------
Co-authored-by: devorun <130918800+devorun@users.noreply.github.com>
Salvage of #3389 by @binhnt92 with reasoning fallback and retry logic added on top.
All 7 auxiliary LLM call sites now use extract_content_or_reasoning() which mirrors the main agent loop's behavior: extract content, strip think blocks, fall back to structured reasoning fields, retry on empty.
Closes#3389.
Previously, tirith exit code 1 (block) immediately rejected the command
with no approval prompt — users saw 'BLOCKED: Command blocked by
security scan' and the agent moved on. This prevented gateway/CLI users
from approving pipe-to-shell installs like 'curl ... | sh' even when
they understood the risk.
Changes:
- Tirith 'block' and 'warn' now both go through the approval flow.
Users see the full tirith findings (severity, title, description,
safer alternatives) and can choose to approve or deny.
- New _format_tirith_description() builds rich descriptions from tirith
findings JSON so the approval prompt is informative.
- CLI startup now warns when tirith is enabled but not available, so
users know command scanning is degraded to pattern matching only.
The default approval choice is still deny, so the security posture is
unchanged for unattended/timeout scenarios.
Reported via Discord by pistrie — 'curl -fsSL https://mandex.dev/install.sh | sh'
was hard-blocked with no way to approve.
dict.get(key, default) returns None — not the default — when the key IS
present but explicitly set to null/~ in YAML. Calling .lower() on that
raises AttributeError.
Use (config.get(key) or fallback) so both missing keys and explicit nulls
coalesce to the intended default.
Files fixed:
- tools/tts_tool.py — _get_provider()
- tools/web_tools.py — _get_backend()
- tools/mcp_tool.py — MCPServerTask auth config
- trajectory_compressor.py — _detect_provider() and config loading
Co-authored-by: dieutx <dangtc94@gmail.com>
V4A patches with only + lines (no context or - lines) were silently
dropped because search_lines was empty and the 'if search_lines:' block
was the only code path. Addition-only hunks are common when the model
generates patches for new functions or blocks.
Adds an else branch that inserts at the context_hint position when
available, or appends at end of file.
Includes 2 regression tests for addition-only hunks with and without
context hints.
Salvaged from PR #3092 by thakoreh.
Co-authored-by: Hiren <hiren.thakore58@gmail.com>
- add managed modal and gateway-backed tool integrations\n- improve CLI setup, auth, and configuration for subscriber flows\n- expand tests and docs for managed tool support
The delegate_task tool accepts a toolsets parameter directly from the
LLM's function call arguments. When provided, these toolsets are passed
through _strip_blocked_tools but never intersected with the parent
agent's enabled_toolsets. A model can request toolsets the parent does
not have (e.g., web, browser, rl), granting the subagent tools that
were explicitly disabled for the parent.
Intersect LLM-requested toolsets with the parent's enabled set before
applying the blocked-tool filter, so subagents can only receive a
subset of the parent's tools.
Co-authored-by: dieutx <dangtc94@gmail.com>
When third-party tools (Paperclip orchestrator, etc.) spawn hermes chat
as a subprocess, their sessions pollute user session history and search.
- hermes chat --source <tag> (also HERMES_SESSION_SOURCE env var)
- exclude_sources parameter on list_sessions_rich() and search_messages()
- Sessions with source=tool hidden from sessions list/browse/search
- Third-party adapters pass --source tool to isolate agent sessions
Cherry-picked from PR #3208 by HenkDz.
Co-authored-by: Henkey <noonou7@gmail.com>
detect_dangerous_command() ran regex patterns against raw command strings
without normalization, allowing bypass via Unicode fullwidth chars,
ANSI escape codes, null bytes, and 8-bit C1 controls.
Adds _normalize_command_for_detection() that:
- Strips ANSI escapes using the full ECMA-48 strip_ansi() from
tools/ansi_strip (CSI, OSC, DCS, 8-bit C1, nF sequences)
- Removes null bytes
- Normalizes Unicode via NFKC (fullwidth Latin → ASCII, etc.)
Includes 12 regression tests covering fullwidth, ANSI, C1, null byte,
and combined obfuscation bypasses.
Salvaged from PR #3089 by thakoreh — improved ANSI stripping to use
existing comprehensive strip_ansi() instead of a weaker hand-rolled
regex, and added test coverage.
Co-authored-by: Hiren <hiren.thakore58@gmail.com>
* fix(skills): reduce skills.sh resolution churn and preserve trust for wrapped identifiers
- Accept common skills.sh prefix typos (skils-sh/, skils.sh/)
- Strip skills-sh/ prefix in _resolve_trust_level() so trusted repos
stay trusted when installed through skills.sh
- Use resolved identifier (from bundle/meta) for scan_skill source
- Prefer tree search before root scan in _discover_identifier()
- Add _resolve_github_meta() consolidation for inspect flow
Cherry-picked from PR #3001 by kshitijk4poor.
* fix: restore candidate loop in SkillsShSource.fetch() for consistency
The cherry-picked PR only tried the first candidate identifier in
fetch() while inspect() (via _resolve_github_meta) tried all four.
This meant skills at repo/skills/path would be found by inspect but
missed by fetch, forcing it through the heavier _discover_identifier
flow. Restore the candidate loop so both paths behave identically.
Updated the test assertion to match.
---------
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
_send_discord(), _send_slack(), and _send_twilio() all created
aiohttp.ClientSession() without a timeout, leaving HTTP requests
able to hang indefinitely. _send_whatsapp() already used
aiohttp.ClientTimeout(total=30) — this fix applies the same
pattern consistently to all platform send functions.
- Add ClientTimeout(total=30) to _send_discord() ClientSession
- Add ClientTimeout(total=30) to _send_slack() ClientSession
- Add ClientTimeout(total=30) to _send_twilio() ClientSession
Three categories of cleanup, all zero-behavioral-change:
1. F-strings without placeholders (154 fixes across 29 files)
- Converted f'...' to '...' where no {expression} was present
- Heaviest files: run_agent.py (24), cli.py (20), honcho_integration/cli.py (34)
2. Simplify defensive patterns in run_agent.py
- Added explicit self._is_anthropic_oauth = False in __init__ (before
the api_mode branch that conditionally sets it)
- Replaced 7x getattr(self, '_is_anthropic_oauth', False) with direct
self._is_anthropic_oauth (attribute always initialized now)
- Added _is_openrouter_url() and _is_anthropic_url() helper methods
- Replaced 3 inline 'openrouter' in self._base_url_lower checks
3. Remove dead code in small files
- hermes_cli/claw.py: removed unused 'total' computation
- tools/fuzzy_match.py: removed unused strip_indent() function and
pattern_stripped variable
Full test suite: 6184 passed, 0 failures
E2E PTY: banner clean, tool calls work, zero garbled ANSI
In gateway mode, async tools (vision_analyze, web_extract, session_search)
deadlock because _run_async() spawns a thread with asyncio.run(), creating
a new event loop, but _get_cached_client() returns an AsyncOpenAI client
bound to a different loop. httpx.AsyncClient cannot work across event loop
boundaries, causing await client.chat.completions.create() to hang forever.
Fix: include the event loop identity in the async client cache key so each
loop gets its own AsyncOpenAI instance. Also fix session_search_tool.py
which had its own broken asyncio.run()-in-thread pattern — now uses the
centralized _run_async() bridge.
- Registry now warns when a tool name is overwritten by a different
toolset (silent dict overwrite was the previous behavior)
- MCP tool registration checks for collisions with non-MCP (built-in)
tools before registering. If an MCP tool's prefixed name matches an
existing built-in, the MCP tool is skipped and a warning is logged.
MCP-to-MCP collisions are allowed (last server wins).
- Both regular MCP tools and utility tools (resources/prompts) are
guarded.
- Adds 5 tests covering: registry overwrite warning, same-toolset
re-registration silence, built-in collision skip, normal registration,
and MCP-to-MCP collision pass-through.
Reported by k_sze (KONG) — MiniMax MCP server's web_search tool could
theoretically shadow Hermes's built-in web_search if prefixing failed.
Centralizes two widely-duplicated patterns into hermes_constants.py:
1. get_hermes_home() — Path resolution for ~/.hermes (HERMES_HOME env var)
- Was copy-pasted inline across 30+ files as:
Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
- Now defined once in hermes_constants.py (zero-dependency module)
- hermes_cli/config.py re-exports it for backward compatibility
- Removed local wrapper functions in honcho_integration/client.py,
tools/website_policy.py, tools/tirith_security.py, hermes_cli/uninstall.py
2. parse_reasoning_effort() — Reasoning effort string validation
- Was copy-pasted in cli.py, gateway/run.py, cron/scheduler.py
- Same validation logic: check against (xhigh, high, medium, low, minimal, none)
- Now defined once in hermes_constants.py, called from all 3 locations
- Warning log for unknown values kept at call sites (context-specific)
31 files changed, net +31 lines (125 insertions, 94 deletions)
Full test suite: 6179 passed, 0 failed
* fix(security): add SSRF protection to browser_navigate
browser_navigate() only checked the website blocklist policy but did
not call is_safe_url() to block private/internal addresses. This
allowed the agent to navigate to localhost, cloud metadata endpoints
(169.254.169.254), and private network IPs via the browser.
web_tools and vision_tools already had this check. Added the same
is_safe_url() pre-flight validation before the blocklist check in
browser_navigate().
* fix: move SSRF import to module level, fix policy test mock
Move is_safe_url import to module level so it can be monkeypatched
in tests. Update test_browser_navigate_returns_policy_block to mock
_is_safe_url so the SSRF check passes and the policy check is reached.
* fix(security): harden browser SSRF protection
Follow-up to cherry-picked PR #3041:
1. Fail-closed fallback: if url_safety module can't import, block all
URLs instead of allowing all. Security guards should never fail-open.
2. Post-redirect SSRF check: after navigation, verify the final URL
isn't a private/internal address. If a public URL redirected to
169.254.169.254 or localhost, navigate to about:blank and return
an error — prevents the model from reading internal content via
subsequent browser_snapshot calls.
---------
Co-authored-by: 0xbyt4 <35742124+0xbyt4@users.noreply.github.com>
* feat: nix flake, uv2nix build, dev shell and home manager
* fixed nix run, updated docs for setup
* feat(nix): NixOS module with persistent container mode, managed guards, checks
- Replace homeModules.nix with nixosModules.nix (two deployment modes)
- Mode A (native): hardened systemd service with ProtectSystem=strict
- Mode B (container): persistent Ubuntu container with /nix/store bind-mount,
identity-hash-based recreation, GC root protection, symlink-based updates
- Add HERMES_MANAGED guards blocking CLI config mutation (config set, setup,
gateway install/uninstall) when running under NixOS module
- Add nix/checks.nix with build-time verification (binary, CLI, managed guard)
- Remove container.nix (no Nix-built OCI image; pulls ubuntu:24.04 at runtime)
- Simplify packages.nix (drop fetchFromGitHub submodules, PYTHONPATH wrappers)
- Rewrite docs/nixos-setup.md with full options reference, container
architecture, secrets management, and troubleshooting guide
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Update config.py
* feat(nix): add CI workflow and enhanced build checks
- GitHub Actions workflow for nix flake check + build on linux/macOS
- Entry point sync check to catch pyproject.toml drift
- Expanded managed-guard check to cover config edit
- Wrap hermes-acp binary in Nix package
- Fix Path type mismatch in is_managed()
* Update MCP server package name; bundled skills support
* fix reading .env. instead have container user a common mounted .env file
* feat(nix): container entrypoint with privilege drop and sudo provisioning
Container was running as non-root via --user, which broke apt/pip installs
and caused crashes when $HOME didn't exist. Replace --user with a Nix-built
entrypoint script that provisions the hermes user, sudo (NOPASSWD), and
/home/hermes inside the container on first boot, then drops privileges via
setpriv. Writable layer persists so setup only runs once.
Also expands MCP server options to support HTTP transport and sampling.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix group and user creation in container mode
* feat(nix): persistent /home/hermes and MESSAGING_CWD in container mode
Container mode now bind-mounts ${stateDir}/home to /home/hermes so the
agent's home directory survives container recreation. Previously it lived
in the writable layer and was lost on image/volume/options changes.
Also passes MESSAGING_CWD to the container so the agent finds its
workspace and documents, matching native mode behavior.
Other changes:
- Extract containerDataDir/containerHomeDir bindings (no more magic strings)
- Fix entrypoint chown to run unconditionally (volume mounts always exist)
- Add schema field to container identity hash for auto-recreation
- Add idempotency test (Scenario G) to config-roundtrip check
* docs: add Nix & NixOS setup guide to docs site
Add comprehensive Nix documentation to the Docusaurus site at
website/docs/getting-started/nix-setup.md, covering nix run/profile
install, NixOS module (native + container modes), declarative settings,
secrets management, MCP servers, managed mode, container architecture,
dev shell, flake checks, and full options reference.
- Register nix-setup in sidebar after installation page
- Add Nix callout tip to installation.md linking to new guide
- Add canonical version pointer in docs/nixos-setup.md
* docs: remove docs/nixos-setup.md, consolidate into website docs
Backfill missing details (restart/restartSec in full example,
gateway.pid, 0750 permissions, docker inspect commands) into
the canonical website/docs/getting-started/nix-setup.md and
delete the old standalone file.
* fix(nix): add compression.protect_last_n and target_ratio to config-keys.json
New keys were added to DEFAULT_CONFIG on main, causing the
config-drift check to fail in CI.
* fix(nix): skip checks on aarch64-darwin (onnxruntime wheel missing)
The full Python venv includes onnxruntime (via faster-whisper/STT)
which lacks a compatible uv2nix wheel on aarch64-darwin. Gate all
checks behind stdenv.hostPlatform.isLinux. The package and devShell
still evaluate on macOS.
* fix(nix): skip flake check and build on macOS CI
onnxruntime (transitive dep via faster-whisper) lacks a compatible
uv2nix wheel on aarch64-darwin. Run full checks and build on Linux
only; macOS CI verifies the flake evaluates without building.
* fix(nix): preserve container writable layer across nixos-rebuild
The container identity hash included the entrypoint's Nix store path,
which changes on every nixpkgs update (due to runtimeShell/stdenv
input-addressing). This caused false-positive identity mismatches,
triggering container recreation and losing the persistent writable layer.
- Use stable symlink (current-entrypoint) like current-package already does
- Remove entrypoint from identity hash (only image/volumes/options matter)
- Add GC root for entrypoint so nix-collect-garbage doesn't break it
- Remove global HERMES_HOME env var from addToSystemPackages (conflicted
with interactive CLI use, service already sets its own)
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each subagent now gets its own IterationBudget instead of sharing the
parent's. The per-subagent cap is controlled by delegation.max_iterations
in config.yaml (default 50). Total iterations across parent + subagents
can exceed the parent's max_iterations, but the user retains control via
the config setting.
Previously, subagents shared the parent's budget, so three parallel
subagents configured for max_iterations=50 racing against a parent that
already used 60 of 90 would each only get ~10 iterations.
Inspired by PR #2928 (Bartok9) which identified the issue (#2873).
* fix(skills): use Git Trees API to prevent silent subdirectory loss during install
Refactors _download_directory() to use the Git Trees API (single call
for the entire repo tree) as the primary path, falling back to the
recursive Contents API when the tree endpoint is unavailable or
truncated. Prevents silent subdirectory loss caused by per-directory
rate limiting or transient failures.
Cherry-picked from PR #2981 by tugrulguner.
Fixes#2940.
* fix: simplify tree API — use branch name directly as tree-ish
Eliminates an extra git/ref/heads API call by passing the branch name
directly to git/trees/{branch}?recursive=1, matching the pattern
already used by _find_skill_in_repo_tree.
---------
Co-authored-by: tugrulguner <tugrulguner@users.noreply.github.com>
* fix(run_agent): ensure _fire_first_delta() is called for tool generation events
Added calls to _fire_first_delta() in the AIAgent class to improve the handling of tool generation events, ensuring timely notifications during the processing of function calls and tool usage.
* fix(run_agent): improve timeout handling for chat completions
Enhanced the timeout configuration for chat completions in the AIAgent class by introducing customizable connection, read, and write timeouts using environment variables. This ensures more robust handling of API requests during streaming operations.
* fix(run_agent): reduce default stream read timeout for chat completions
Updated the default stream read timeout from 120 seconds to 60 seconds in the AIAgent class, enhancing the timeout configuration for chat completions. This change aims to improve responsiveness during streaming operations.
* fix(run_agent): enhance streaming error handling and retry logic
Improved the error handling and retry mechanism for streaming requests in the AIAgent class. Introduced a configurable maximum number of stream retries and refined the handling of transient network errors, allowing for retries with fresh connections. Non-transient errors now trigger a fallback to non-streaming only when appropriate, ensuring better resilience during API interactions.
* fix: skills-sh install fails for deeply nested repo structures
Skills in repos with deep directory nesting (e.g.
cli-tool/components/skills/development/senior-backend/) could not be
installed because the candidate path generation and shallow root-dir
scan never reached them.
Added GitHubSource._find_skill_in_repo_tree() which uses the GitHub
Trees API to recursively search the entire repo tree in a single API
call. This is used as a final fallback in
SkillsShSource._discover_identifier() when the standard candidate
paths and shallow scan both fail.
Fixes installation of skills from repos like davila7/claude-code-templates
where skills are nested 4+ levels deep.
Reported by user Samuraixheart.
_resolve_trust_level() didn't handle 'agent-created' source, so it
fell through to 'community' trust level. Community policy blocks on
any caution or dangerous findings, which meant common patterns like
curl with env vars, systemctl, crontab, cloudflared references etc.
would block skill creation/patching.
The agent-created policy row already existed in INSTALL_POLICY with
permissive settings (allow caution, ask on dangerous) but was never
reached. Now it is.
Fixes reports of skill_manage being blocked by security scanner.
* docs: unify hooks documentation — add plugin hooks to hooks page, add session:end event
The hooks page only documented gateway event hooks (HOOK.yaml system).
The plugins page listed plugin hooks (pre_tool_call, etc.) that weren't
referenced from the hooks page, which was confusing.
Changes:
- hooks.md: Add overview table showing both hook systems
- hooks.md: Add Plugin Hooks section with available hooks, callback
signatures, and example
- hooks.md: Add missing session:end gateway event (emitted but undocumented)
- hooks.md: Mark pre_llm_call, post_llm_call, on_session_start,
on_session_end as planned (defined in VALID_HOOKS but not yet invoked)
- hooks.md: Update info box to cross-reference plugin hooks
- hooks.md: Fix heading hierarchy (gateway content as subsections)
- plugins.md: Add cross-reference to hooks page for full details
- plugins.md: Mark planned hooks as (planned)
* fix: browser_vision ignores auxiliary.vision.timeout config
browser_vision called call_llm() without passing a timeout parameter,
so it always used the 30-second default in auxiliary_client.py. This
made vision analysis with local models (llama.cpp, ollama) impossible
since they typically need more than 30s for screenshot analysis.
Now browser_vision reads auxiliary.vision.timeout from config.yaml
(same config key that vision_analyze already uses) and passes it
through to call_llm().
Also bumped the default vision timeout from 30s to 120s in both
browser_vision and vision_analyze — 30s is too aggressive for local
models and the previous default silently failed for anyone running
vision locally.
Fixes user report from GamerGB1988.
* feat: env var passthrough for skills and user config
Skills that declare required_environment_variables now have those vars
passed through to sandboxed execution environments (execute_code and
terminal). Previously, execute_code stripped all vars containing KEY,
TOKEN, SECRET, etc. and the terminal blocklist removed Hermes
infrastructure vars — both blocked skill-declared env vars.
Two passthrough sources:
1. Skill-scoped (automatic): when a skill is loaded via skill_view and
declares required_environment_variables, vars that are present in
the environment are registered in a session-scoped passthrough set.
2. Config-based (manual): terminal.env_passthrough in config.yaml lets
users explicitly allowlist vars for non-skill use cases.
Changes:
- New module: tools/env_passthrough.py — shared passthrough registry
- hermes_cli/config.py: add terminal.env_passthrough to DEFAULT_CONFIG
- tools/skills_tool.py: register available skill env vars on load
- tools/code_execution_tool.py: check passthrough before filtering
- tools/environments/local.py: check passthrough in _sanitize_subprocess_env
and _make_run_env
- 19 new tests covering all layers
* docs: add environment variable passthrough documentation
Document the env var passthrough feature across four docs pages:
- security.md: new 'Environment Variable Passthrough' section with
full explanation, comparison table, and security considerations
- code-execution.md: update security section, add passthrough subsection,
fix comparison table
- creating-skills.md: add tip about automatic sandbox passthrough
- skills.md: add note about passthrough after secure setup docs
Live-tested: launched interactive CLI, loaded a skill with
required_environment_variables, verified TEST_SKILL_SECRET_KEY was
accessible inside execute_code sandbox (value: passthrough-test-value-42).
Complete cleanup after dropping the mini-swe-agent submodule (PR #2804):
- Remove MSWEA_SILENT_STARTUP and MSWEA_GLOBAL_CONFIG_DIR env var
settings from cli.py, run_agent.py, hermes_cli/main.py, doctor.py
- Remove mini-swe-agent health check from hermes doctor
- Remove 'minisweagent' from logger suppression lists
- Remove litellm/typer/platformdirs from requirements.txt
- Remove mini-swe-agent install steps from install.ps1 (Windows)
- Remove mini-swe-agent install steps from website docs
- Update all stale comments/docstrings referencing mini-swe-agent
in terminal_tool.py, tools/__init__.py, code_execution_tool.py,
environments/README.md, environments/agent_loop.py
- Remove mini_swe_runner from pyproject.toml py-modules
(still exists as standalone script for RL training use)
- Shrink test_minisweagent_path.py to empty stub
The orphaned mini-swe-agent/ directory on disk needs manual removal:
rm -rf mini-swe-agent/
When TERMINAL_CWD is set to '.' or any relative path (common when the
CLI config defaults to cwd='.'), container backends (docker, modal,
singularity, daytona) would pass it directly to the container where it's
meaningless. This caused 'docker run -d -w .' to fail.
Now relative paths are caught alongside host paths and replaced with
the default '/root' for container backends.
Drop the mini-swe-agent git submodule. All terminal backends now use
hermes-agent's own environment implementations directly.
Docker backend:
- Inline the `docker run -d` container startup (was 15 lines in
minisweagent's DockerEnvironment). Our wrapper already handled
execute(), cleanup(), security hardening, volumes, and resource limits.
Modal backend:
- Import swe-rex's ModalDeployment directly instead of going through
minisweagent's 90-line passthrough wrapper.
- Bake the _AsyncWorker pattern (from environments/patches.py) directly
into ModalEnvironment for Atropos compatibility without monkey-patching.
Cleanup:
- Remove minisweagent_path.py (submodule path resolution helper)
- Remove submodule init/install from install.sh and setup-hermes.sh
- Remove mini-swe-agent from .gitmodules
- environments/patches.py is now a no-op (kept for backward compat)
- terminal_tool.py no longer does sys.path hacking for minisweagent
- mini_swe_runner.py guards imports (optional, for RL training only)
- Update all affected tests to mock the new direct subprocess calls
- Update README.md, CONTRIBUTING.md
No functionality change — all Docker, Modal, local, SSH, Singularity,
and Daytona backends behave identically. 6093 tests pass.
browser_vision and other browser commands had a hardcoded 30-second
subprocess timeout that couldn't be overridden. Users with slower
machines (local Chromium without GPU) would hit timeouts on screenshot
capture even when setting browser.command_timeout in config.yaml,
because nothing read that value.
Changes:
- Add browser.command_timeout to DEFAULT_CONFIG (default: 30s)
- Add _get_command_timeout() helper that reads config, falls back to 30s
- _run_browser_command() now defaults to config value instead of constant
- browser_vision screenshot no longer hardcodes timeout=30
- browser_navigate uses max(config_timeout, 60) as floor for navigation
Reported by Gamer1988.
On macOS with Homebrew (Apple Silicon), Node.js and agent-browser
binaries live under /opt/homebrew/bin/ which is not included in the
_SANE_PATH fallback used by browser_tool.py and environments/local.py.
When Hermes runs with a filtered PATH (e.g. as a systemd service),
these binaries are invisible, causing 'env: node: No such file or
directory' errors when using browser tools.
Changes:
- Add /opt/homebrew/bin and /opt/homebrew/sbin to _SANE_PATH in both
browser_tool.py and environments/local.py
- Add _discover_homebrew_node_dirs() to find versioned Node installs
(e.g. brew install node@24) that aren't linked into /opt/homebrew/bin
- Extend _find_agent_browser() to search Homebrew and Hermes-managed
dirs when agent-browser isn't on the current PATH
- Include discovered Homebrew node dirs in subprocess PATH when
launching agent-browser
- Add 11 new tests covering all Homebrew path discovery logic
echo was called with the full unquoted path (~username/suffix), allowing
command substitution in the suffix (e.g. ~user/$(malicious)) to execute
arbitrary shell commands. The fix expands only the validated ~username
portion via the shell and concatenates the suffix as a plain string.
Co-authored-by: Gutslabs <gutslabsxyz@gmail.com>
httpx.AsyncClient awaits event hooks. The sync _ssrf_redirect_guard
returned None, causing 'object NoneType can't be used in await
expression' on any vision_analyze call that followed redirects.
Caught during live PTY testing of the merged SSRF protection.
* fix(security): add SSRF protection to vision_tools and web_tools
Both vision_analyze and web_extract/web_crawl accept arbitrary URLs
without checking if they target private/internal network addresses.
A prompt-injected or malicious skill could use this to access cloud
metadata endpoints (169.254.169.254), localhost services, or private
network hosts.
Adds a shared url_safety.is_safe_url() that resolves hostnames and
blocks private, loopback, link-local, and reserved IP ranges. Also
blocks known internal hostnames (metadata.google.internal).
Integrated at the URL validation layer in vision_tools and before
each website_policy check in web_tools (extract, crawl).
* test(vision): update localhost test to reflect SSRF protection
The existing test_valid_url_with_port asserted localhost URLs pass
validation. With SSRF protection, localhost is now correctly blocked.
Update the test to verify the block, and add a separate test for
valid URLs with ports using a public hostname.
* fix(security): harden SSRF protection — fail-closed, CGNAT, multicast, redirect guard
Follow-up hardening on top of dieutx's SSRF protection (PR #2630):
- Change fail-open to fail-closed: DNS errors and unexpected exceptions
now block the request instead of allowing it (OWASP best practice)
- Block CGNAT range (100.64.0.0/10): Python's ipaddress.is_private
does NOT cover this range (returns False for both is_private and
is_global). Used by Tailscale/WireGuard and carrier infrastructure.
- Add is_multicast and is_unspecified checks: multicast (224.0.0.0/4)
and unspecified (0.0.0.0) addresses were not caught by the original
four-check chain
- Add redirect guard for vision_tools: httpx event hook re-validates
each redirect target against SSRF checks, preventing the classic
redirect-based SSRF bypass (302 to internal IP)
- Move SSRF filtering before backend dispatch in web_extract: now
covers Parallel and Tavily backends, not just Firecrawl
- Extract _is_blocked_ip() helper for cleaner IP range checking
- Add 24 new tests (CGNAT, multicast, IPv4-mapped IPv6, fail-closed
behavior, parametrized blocked/allowed IP lists)
- Fix existing tests to mock DNS resolution for test hostnames
---------
Co-authored-by: dieutx <dangtc94@gmail.com>
Root cause: terminal_tool, execute_code, and process_registry returned raw
subprocess output with ANSI escape sequences intact. The model saw these
in tool results and copied them into file writes.
Previous fix (PR #2532) stripped ANSI at the write point in file_tools.py,
but this was a band-aid — regex on file content risks corrupting legitimate
content, and doesn't prevent ANSI from wasting tokens in the model context.
Source-level fix:
- New tools/ansi_strip.py with comprehensive ECMA-48 regex covering CSI
(incl. private-mode, colon-separated, intermediate bytes), OSC (both
terminators), DCS/SOS/PM/APC strings, Fp/Fe/Fs/nF escapes, 8-bit C1
- terminal_tool.py: strip output before returning to model
- code_execution_tool.py: strip stdout/stderr before returning
- process_registry.py: strip output in poll/read_log/wait
- file_tools.py: remove _strip_ansi band-aid (no longer needed)
Verified: `ls --color=always` output returned as clean text to model,
file written from that output contains zero ESC bytes.
An agent session killed the systemd-managed gateway (PID 1605) and restarted
it with '&disown', taking it outside systemd's Restart= management. When the
orphaned process later received SIGTERM, nothing restarted it.
Add dangerous command patterns to detect:
- 'gateway run' with & (background), disown, nohup, or setsid
- These should use 'systemctl --user restart hermes-gateway' instead
Also applied directly to main repo and fixed the systemd service:
- Changed Restart=on-failure to Restart=always (clean SIGTERM = exit 0 = not
a 'failure', so on-failure never triggered)
- RestartSec=10 for reasonable restart delay
Path('~/.hermes/image.png').is_file() returns False because Path
doesn't expand tilde. This caused the tool to fall through to URL
validation, which also failed, producing a confusing error:
'Invalid image source. Provide an HTTP/HTTPS URL or a valid local
file path.'
Fix: use os.path.expanduser() before constructing the Path object.
Added two tests for tilde expansion (success and nonexistent file).
* fix(mcp-oauth): port mismatch, path traversal, and shared state in OAuth flow
Three bugs in the new MCP OAuth 2.1 PKCE implementation:
1. CRITICAL: OAuth redirect port mismatch — build_oauth_auth() calls
_find_free_port() to register the redirect_uri, but _wait_for_callback()
calls _find_free_port() again getting a DIFFERENT port. Browser redirects
to port A, server listens on port B — callback never arrives, 120s timeout.
Fix: share the port via module-level _oauth_port variable.
2. MEDIUM: Path traversal via unsanitized server_name — HermesTokenStorage
uses server_name directly in filenames. A name like "../../.ssh/config"
writes token files outside ~/.hermes/mcp-tokens/.
Fix: sanitize server_name with the same regex pattern used elsewhere.
3. MEDIUM: Class-level auth_code/state on _CallbackHandler causes data
races if concurrent OAuth flows run. Second callback overwrites first.
Fix: factory function _make_callback_handler() returns a handler class
with a closure-scoped result dict, isolating each flow.
* test: add tests for MCP OAuth path traversal, handler isolation, and port sharing
7 new tests covering:
- Path traversal blocked (../../.ssh/config stays in mcp-tokens/)
- Dots/slashes sanitized and resolved within base dir
- Normal server names preserved
- Special characters sanitized (@, :, /)
- Concurrent handler result dicts are independent
- Handler writes to its own result dict, not class-level
- build_oauth_auth stores port in module-level _oauth_port
---------
Co-authored-by: 0xbyt4 <35742124+0xbyt4@users.noreply.github.com>
When session_search is called without a query (or with an empty query),
it now returns metadata for the most recent sessions instead of erroring.
This lets the agent quickly see what was worked on recently without
needing specific keywords.
Returns for each session: session_id, title, source, started_at,
last_active, message_count, preview (first user message).
Zero LLM cost — pure DB query. Current session lineage and child
delegation sessions are excluded.
The agent can then keyword-search specific sessions if it needs
deeper context from any of them.
Models occasionally copy ANSI escape sequences from terminal output
or display formatting into file content, breaking shebangs and
injecting binary characters into scripts.
Strip ANSI codes (CSI, OSC, simple escapes) from:
- write_file content
- patch old_string, new_string, and V4A patch content
The check is fast (skips entirely if no ESC byte present).
Reported by Andi Jaeger.
Reads auxiliary.vision.timeout from config.yaml (default: 30s) and
passes it to async_call_llm. Useful for slow local vision models
that need more than 30 seconds.
Setting is in config.yaml (not .env) since it's not a secret:
auxiliary:
vision:
timeout: 120
Based on PR #2306.
Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
Add hermes mcp add/remove/list/test/configure CLI for managing MCP
server connections interactively. Discovery-first 'add' flow connects,
discovers tools, and lets users select which to enable via curses checklist.
Add OAuth 2.1 PKCE authentication for MCP HTTP servers (RFC 7636).
Supports browser-based and manual (headless) authorization, token
caching with 0600 permissions, automatic refresh. Zero external deps.
Add ${ENV_VAR} interpolation in MCP server config values, resolved
from os.environ + ~/.hermes/.env at load time.
Core OAuth module from PR #2021 by @imnotdev25. CLI and mcp_tool
wiring rewritten against current main. Closes#497, #690.
When sounddevice is installed but libportaudio2 is not present on the
system, the OSError was caught together with ImportError and showed a
generic 'pip install sounddevice' message that sent users down the wrong
path.
Split the except clause to give a clear, actionable message for the
OSError case, including the correct apt/brew commands to install the
system library.
* fix: respect DashScope v1 runtime mode for alibaba
Remove the hardcoded Alibaba branch from resolve_runtime_provider()
that forced api_mode='anthropic_messages' regardless of the base URL.
Alibaba now goes through the generic API-key provider path, which
auto-detects the protocol from the URL:
- /apps/anthropic → anthropic_messages (via endswith check)
- /v1 → chat_completions (default)
This fixes Alibaba setup with OpenAI-compatible DashScope endpoints
(e.g. coding-intl.dashscope.aliyuncs.com/v1) that were broken because
runtime always forced Anthropic mode even when setup saved a /v1 URL.
Based on PR #2024 by @kshitijk4poor.
* docs(skill): add split, merge, search examples to ocr-and-documents skill
Adds pymupdf examples for PDF splitting, merging, and text search
to the existing ocr-and-documents skill. No new dependencies — pymupdf
already covers all three operations natively.
* fix: replace all production print() calls with logger in rl_training_tool
Replace all bare print() calls in production code paths with proper logger calls.
- Add `import logging` and module-level `logger = logging.getLogger(__name__)`
- Replace print() in _start_training_run() with logger.info()
- Replace print() in _stop_training_run() with logger.info()
- Replace print(Warning/Note) calls with logger.warning() and logger.info()
Using the logging framework allows log level filtering, proper formatting,
and log routing instead of always printing to stdout.
---------
Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
Co-authored-by: memosr.eth <96793918+memosr@users.noreply.github.com>
Parse thread_id from explicit deliver target (e.g. telegram:-1003724596514:17)
and forward it to _send_to_platform and mirror_to_session.
Previously _resolve_delivery_target() always set thread_id=None when
parsing the platform:chat_id format, breaking cron job delivery to
specific Telegram topics.
Added tests:
- test_explicit_telegram_topic_target_with_thread_id
- test_explicit_telegram_chat_id_without_thread_id
Also updated CRONJOB_SCHEMA deliver description to document the
platform:chat_id:thread_id format.
Co-authored-by: Alex Ferrari <alex@thealexferrari.com>
Cherry-picked from PR #2122 by @AtlasMeridia.
1. do_inspect bytes crash: bundle.files returns bytes for official
skills, .split() expected str. Added decode guard.
2. GitHub redirects: three httpx.get calls missing follow_redirects=True,
causing silent 301 failures on renamed orgs.
3. Skill discovery fallback: scan repo root directories when standard
paths (skills/, .agents/skills/, .claude/skills/) miss.
4. tap list KeyError: t['repo'] crashes for local taps. Use safe .get().
Changes the policy for agent-created skills with critical security
findings from 'block' (silently rejected) to 'ask' (allowed with
warning logged). The agent created the skill, so blocking it entirely
is too aggressive — let it through but log the findings.
- Policy: agent-created dangerous changed from block to ask
- should_allow_install returns None for 'ask' (vs True/False)
- format_scan_report shows 'NEEDS CONFIRMATION' for ask
- skill_manager_tool.py caller handles None (allows with warning)
- force=True still overrides as before
Based on PR #2271 by redhelix (closed — 3200 lines of unrelated
Mission Control code excluded).
Two fixes:
1. Use a single open(os.devnull) handle for both stdout and stderr
suppression, preventing a file handle leak if the second open() fails.
2. Set server_sock = None after closing it in the try block to prevent
the finally block from closing it again (causing an OSError).
Closes#2136
Co-authored-by: dieutx <dangtc94@gmail.com>
Cherry-picked from PR #2201 by @Gutslabs.
session_search resolved hits to parent/root sessions but only excluded
the exact current_session_id. If the active session was a child
continuation (compression/delegation), its parent could still appear
as a 'past' conversation result.
Fix: resolve current_session_id to its lineage root before filtering,
so the entire active lineage (parent and children) is excluded.
Cherry-picked from PR #2169 by @0xbyt4.
1. _strip_provider_prefix: skip Ollama model:tag names (qwen:0.5b)
2. Fuzzy match: remove reverse direction that made claude-sonnet-4
resolve to 1M instead of 200K
3. _has_content_after_think_block: reuse _strip_think_blocks() to
handle all tag variants (thinking, reasoning, REASONING_SCRATCHPAD)
4. models.dev lookup: elif→if so nous provider also queries models.dev
5. Disk cache fallback: use 5-min TTL instead of full hour so network
is retried soon
6. Delegate build: wrap child construction in try/finally so
_last_resolved_tool_names is always restored on exception
Matrix, Mattermost, Home Assistant, and DingTalk were missing from the
platform_map in both cron/scheduler.py and tools/send_message_tool.py,
causing delivery to those platforms to silently fail.
Also updates the cronjob tool schema description to list all available
delivery targets so the model knows its options.
Cron jobs run unattended with no user present. Previously the agent had
send_message and clarify tools available, which makes no sense — the
final response is auto-delivered, and there's nobody to ask questions to.
Changes:
- Disable messaging and clarify toolsets for cron agent sessions
- Update cron platform hint to emphasize autonomous execution: no user
present, cannot ask questions, must execute fully and make decisions
- Update cronjob tool schema description to match (remove stale
send_message guidance)
Authored by Hanai. Allows overriding the OpenAI TTS endpoint via
tts.openai.base_url in config.yaml for self-hosted or OpenAI-compatible
TTS services. Falls back to api.openai.com when not set.
find_one is being deprecated. Primary lookup now uses get() with a
deterministic sandbox name (hermes-{task_id}). A legacy fallback via
list(labels=...) ensures sandboxes created before this migration are
still resumable.
The merge at e7844e9c re-introduced a line in _build_child_agent() that
references _saved_tool_names — a variable only defined in _run_single_child().
This caused NameError on every delegate_task call, completely breaking
subagent delegation.
Moves the child._delegate_saved_tool_names assignment to _run_single_child()
where _saved_tool_names is actually defined, keeping the save/restore in the
same scope as the try/finally block.
Adds two regression tests from PR #2038 (YanSte).
Also fixes the same issue reported in PR #2048 (Gutslabs).
Co-authored-by: Yannick Stephan <yannick.stephan@gmail.com>
Co-authored-by: Guts <gutslabs@users.noreply.github.com>
Allow users to configure a custom base_url for the OpenAI TTS provider
in ~/.hermes/config.yaml under tts.openai.base_url. Defaults to the
official OpenAI endpoint. Enables use of self-hosted or OpenAI-compatible
TTS services (e.g. http://localhost:8000/v1).
Also adds a TTS configuration example block to cli-config.yaml.example.
* fix: banner skill count now respects disabled skills and platform filtering
The banner's get_available_skills() was doing a raw rglob scan of
~/.hermes/skills/ without checking:
- Whether skills are disabled (skills.disabled config)
- Whether skills match the current platform (platforms: frontmatter)
This caused the banner to show inflated skill counts (e.g. '100 skills'
when many are disabled) and list macOS-only skills on Linux.
Fix: delegate to _find_all_skills() from tools/skills_tool which already
handles both platform gating and disabled-skill filtering.
* fix: system prompt and slash commands now respect disabled skills
Two more places where disabled skills were still surfaced:
1. build_skills_system_prompt() in prompt_builder.py — disabled skills
appeared in the <available_skills> system prompt section, causing
the agent to suggest/load them despite being disabled.
2. scan_skill_commands() in skill_commands.py — disabled skills still
registered as /skill-name slash commands in CLI help and could be
invoked.
Both now load _get_disabled_skill_names() and filter accordingly.
* fix: skill_view blocks disabled skills
skill_view() checked platform compatibility but not disabled state,
so the agent could still load and read disabled skills directly.
Now returns a clear error when a disabled skill is requested, telling
the user to enable it via hermes skills or inspect the files manually.
---------
Co-authored-by: Test <test@test.com>
Each configured MCP server now registers as its own toolset in TOOLSETS
(e.g. TOOLSETS['github'] = {tools: ['mcp_github_list_files', ...]}),
making raw server names resolvable in platform_toolsets overrides.
Previously MCP tools were only injected into hermes-* umbrella toolsets,
so gateway sessions using raw toolset names like ['terminal', 'github']
in platform_toolsets couldn't resolve MCP tools.
Skips server names that collide with built-in toolsets. Also handles
idempotent reloads (syncs toolsets even when no new servers connect).
Inspired by PR #1876 by @kshitijk4poor.
Adds 2 tests (standalone toolset creation + built-in collision guard).
MiniMax: Add M2.7 and M2.7-highspeed as new defaults across provider
model lists, auxiliary client, metadata, setup wizard, RL training tool,
fallback tests, and docs. Retain M2.5/M2.1 as alternatives.
OpenRouter: Add grok-4.20-beta, nemotron-3-super-120b-a12b:free,
trinity-large-preview:free, glm-5-turbo, and hunter-alpha to the
model catalog.
MiniMax changes based on PR #1882 by @octo-patch (applied manually
due to stale conflicts in refactored pricing module).
Fixes#1802
The v0.3.0 refactor split child agent construction (_build_child_agent)
and execution (_run_single_child) into separate functions. This created
a scope bug where _saved_tool_names was defined in _build_child_agent
but referenced in _run_single_child's finally block, causing a NameError
on every delegate_task call.
Solution: Move the save/restore logic entirely into _run_single_child,
keeping the save and restore in the same scope as the try/finally block.
This is cleaner than passing the variable through and removes the dead
save from _build_child_agent.
Add first-class GitHub Copilot and Copilot ACP provider support across
model selection, runtime provider resolution, CLI sessions, delegated
subagents, cron jobs, and the Telegram gateway.
This also normalizes Copilot model catalogs and API modes, introduces a
Copilot ACP OpenAI-compatible shim, and fixes service-mode auth by
resolving Homebrew-installed gh binaries under launchd.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Agent-created skills were using the same policy as community hub
installs, blocking any skill with medium/high severity findings
(e.g. docker pull, pip install, git clone). This meant the agent
couldn't create skills that reference Docker or other common tools.
Changed agent-created policy from (allow, block, block) to
(allow, allow, block) — matching the trusted policy. Caution-level
findings (medium/high severity) are now allowed through, while
dangerous findings (critical severity like exfiltration, prompt
injection, reverse shells) remain blocked.
Added 4 tests covering the agent-created policy: safe allowed,
caution allowed, dangerous blocked, force override.
* fix: NameError in OpenCode provider setup (prompt_text -> prompt)
The OpenCode Zen and OpenCode Go setup sections used prompt_text()
which is undefined. All other providers correctly use the local
prompt() function defined in setup.py. Fixes crash during
'hermes setup' when selecting either OpenCode provider.
* fix: Telegram streaming — config bridge, not-modified, flood control
Three fixes for gateway streaming:
1. Bridge streaming config from config.yaml into gateway runtime.
load_gateway_config() now reads the 'streaming' key from config.yaml
(same pattern as session_reset, stt, etc.), matching the docs.
Previously only gateway.json was read.
2. Handle 'Message is not modified' in Telegram edit_message().
This Telegram API error fires when editing with identical content —
a no-op, not a real failure. Previously it returned success=False
which made the stream consumer disable streaming entirely.
3. Handle RetryAfter / flood control in Telegram edit_message().
Fast providers can hit Telegram rate limits during streaming.
Now waits the requested retry_after duration and retries once,
instead of treating it as a fatal edit failure.
Also fixed double-edit on stream finish: the consumer now tracks
last-sent text and skips redundant edits, preventing the not-modified
error at the source.
* refactor: make config.yaml the primary gateway config source
Eliminates the per-key bridge pattern in load_gateway_config().
Previously gateway.json was the primary source and each config.yaml
key needed an individual bridge — easy to forget (streaming was
missing, causing garl4546's bug).
Now config.yaml is read first and its keys are mapped directly into
the GatewayConfig.from_dict() schema. gateway.json is kept as a
legacy fallback layer (loaded first, then overwritten by config.yaml
keys). If gateway.json exists, a log message suggests migrating.
Also:
- Removed dead save_gateway_config() (never called anywhere)
- Updated CLI help text and send_message error to reference
config.yaml instead of gateway.json
---------
Co-authored-by: Test <test@test.com>
Save and restore the process-global _last_resolved_tool_names in
_run_single_child() so the parent's execute_code sandbox generates
correct tool imports after delegation completes.
The global was already mostly mitigated (run_agent.py passes
enabled_tools via self.valid_tool_names), but the global itself
remained corrupted — a footgun for any code that reads it directly.
Co-authored-by: shane9coy <shane9coy@users.noreply.github.com>
* fix(session): skip corrupt lines in load_transcript instead of crashing
Wrap json.loads() in load_transcript() with try/except JSONDecodeError
so that partial JSONL lines (from mid-write crashes like OOM/SIGKILL)
are skipped with a warning instead of crashing the entire transcript
load. The rest of the history loads fine.
Adds a logger.warning with the session ID and truncated corrupt line
content for debugging visibility.
Salvaged from PR #1193 by alireza78a.
Closes#1193
* fix(stt): respect explicit provider config instead of env-var fallback
Rework _get_provider() to separate explicit config from auto-detect.
When stt.provider is explicitly set in config.yaml, that choice is
authoritative — no silent cross-provider fallback based on which env
vars happen to be set. When no provider is configured, auto-detect
still tries: local > groq > openai.
This fixes the reported scenario where provider: local + a placeholder
OPENAI_API_KEY caused the system to silently select OpenAI and fail
with a 401.
Closes#1774
1. browser_tool.py: Replace **args spread on browser_click, browser_type,
and browser_scroll handlers with explicit parameter extraction. The
**args pattern passed all dict keys as keyword arguments, causing
TypeError if the LLM sent unexpected parameters. Now extracts only
the expected params (ref, text, direction) with safe defaults.
2. fuzzy_match.py: Update module docstring to match actual strategy
order in code. Block anchor was listed as #3 but is actually #7.
Multi-occurrence is not a separate strategy but a flag. Updated
count from 9 to 8.
Salvage of PR #1707 by @kshitijk4poor (cherry-picked with authorship preserved).
Adds Tavily as a third web backend alongside Firecrawl and Parallel, using the Tavily REST API via httpx.
- Backend selection via hermes tools → saved as web.backend in config.yaml
- All three tools supported: search, extract, crawl
- TAVILY_API_KEY in config registry, doctor, status, setup wizard
- 15 new Tavily tests + 9 backend selection tests + 5 config tests
- Backward compatible
Closes#1707
Two concurrent gateway sessions calling memory add/replace/remove
simultaneously could both read the old state, apply their changes
independently, and write — the last writer silently drops the first
writer's entry.
Fix: wrap each mutation in a file lock (fcntl.flock on a .lock file).
Under the lock, re-read entries from disk to get the latest state,
apply the mutation, then write. This ensures concurrent writers
serialize properly.
The lock uses a separate .lock file since the memory file itself is
atomically replaced via os.replace() (can't flock a replaced file).
Readers remain lock-free since atomic rename ensures they always see
a complete file.
Two concurrent threads (e.g. parallel subagents) could both pass the
'task_id in _active_sessions' check, both create cloud sessions via
network calls, and then one would overwrite the other — leaking the
first cloud session.
Add double-check after the lock is re-acquired: if another thread
already created a session while we were doing the network call, use
the existing one instead of orphaning it.
* feat(web): add Parallel as alternative web search/extract backend
Adds Parallel (parallel.ai) as a drop-in alternative to Firecrawl for
web_search and web_extract tools using the official parallel-web SDK.
- Backend selection via WEB_SEARCH_BACKEND env var (auto/parallel/firecrawl)
- Auto mode prefers Firecrawl when both keys present; Parallel when sole backend
- web_crawl remains Firecrawl-only with clear error when unavailable
- Lazy SDK imports, interrupt support, singleton clients
- 16 new unit tests for backend selection and client config
Co-authored-by: s-jag <s-jag@users.noreply.github.com>
* fix: add PARALLEL_API_KEY to config registry and fix web_crawl policy tests
Follow-up for Parallel backend integration:
- Add PARALLEL_API_KEY to OPTIONAL_ENV_VARS (hermes doctor, env blocklist)
- Add to set_config_value api_keys list (hermes config set)
- Add to doctor keys display
- Fix 2 web_crawl policy tests that didn't set FIRECRAWL_API_KEY
(needed now that web_crawl has a Firecrawl availability guard)
* refactor: explicit backend selection via hermes tools, not auto-detect
Replace the auto-detect backend selection with explicit user choice:
- hermes tools saves WEB_SEARCH_BACKEND to .env when user picks a provider
- _get_backend() reads the explicit choice first
- Fallback only for manual/legacy config (uses whichever key is present)
- _is_provider_active() shows [active] for the selected web backend
- Updated tests, docs, and .env.example to remove 'auto' mode language
* refactor: use config.yaml for web backend, not env var
Match the TTS/browser pattern — web.backend is stored in config.yaml
(set by hermes tools), not as a WEB_SEARCH_BACKEND env var.
- _load_web_config() reads web: section from config.yaml
- _get_backend() reads web.backend from config, falls back to key detection
- _configure_provider() saves to config dict (saved to config.yaml)
- _is_provider_active() reads from config dict
- Removed WEB_SEARCH_BACKEND from .env.example, set_config_value, docs
- Updated all tests to mock _load_web_config instead of env vars
---------
Co-authored-by: s-jag <s-jag@users.noreply.github.com>
When container_persistent=false, the inner mini-swe-agent cleanup only
runs 'docker stop' in the background, leaving containers in Exited state.
Now cleanup() also runs 'docker rm -f' to fully remove the container.
Also fixes pre-existing test failures in model_metadata (gpt-4.1 1M context),
setup tests (TTS provider step), and adds MockInnerDocker.cleanup().
Original fix by crazywriter1. Cherry-picked and adapted for current main.
Fixes#1679
* feat: interactive MCP tool configuration in hermes tools
Add the ability to selectively enable/disable individual MCP server
tools through the interactive 'hermes tools' TUI.
Changes:
- tools/mcp_tool.py: Add probe_mcp_server_tools() — lightweight function
that temporarily connects to configured MCP servers, discovers their
tools (names + descriptions), and disconnects. No registry side effects.
- hermes_cli/tools_config.py: Add 'Configure MCP tools' option to the
interactive menu. When selected:
1. Probes all enabled MCP servers for their available tools
2. Shows a per-server curses checklist with tool descriptions
3. Pre-selects tools based on existing include/exclude config
4. Writes changes back as tools.exclude entries in config.yaml
5. Reports which servers failed to connect
The existing CLI commands (hermes tools enable/disable server:tool)
continue to work unchanged. This adds the interactive TUI counterpart
so users can browse and toggle MCP tools visually.
Tests: 22 new tests covering probe function edge cases and interactive
flow (pre-selection, exclude/include modes, description truncation,
multi-server handling, error paths).
* feat(telegram): auto-detect HTML tags and use parse_mode=HTML in send_message
When _send_telegram detects HTML tags in the message body, it now sends
with parse_mode='HTML' instead of converting to MarkdownV2. This allows
cron jobs and agents to send rich HTML-formatted Telegram messages with
bold, italic, code blocks, etc. that render correctly.
Detection uses the same regex from PR #1568 by @ashaney:
re.search(r'<[a-zA-Z/][^>]*>', message)
Plain-text and markdown messages continue through the existing
MarkdownV2 pipeline. The HTML fallback path also catches HTML parse
errors and falls back to plain text, matching the existing MarkdownV2
error handling.
Inspired by: github.com/ashaney — PR #1568
Salvaged from PR #1573 by @eren-karakus0. Cherry-picked with authorship preserved.
Fixes#1143 — background process notifications resume after gateway restart.
Co-authored-by: Muhammet Eren Karakuş <erenkar950@gmail.com>
Add the ability to selectively enable/disable individual MCP server
tools through the interactive 'hermes tools' TUI.
Changes:
- tools/mcp_tool.py: Add probe_mcp_server_tools() — lightweight function
that temporarily connects to configured MCP servers, discovers their
tools (names + descriptions), and disconnects. No registry side effects.
- hermes_cli/tools_config.py: Add 'Configure MCP tools' option to the
interactive menu. When selected:
1. Probes all enabled MCP servers for their available tools
2. Shows a per-server curses checklist with tool descriptions
3. Pre-selects tools based on existing include/exclude config
4. Writes changes back as tools.exclude entries in config.yaml
5. Reports which servers failed to connect
The existing CLI commands (hermes tools enable/disable server:tool)
continue to work unchanged. This adds the interactive TUI counterpart
so users can browse and toggle MCP tools visually.
Tests: 22 new tests covering probe function edge cases and interactive
flow (pre-selection, exclude/include modes, description truncation,
multi-server handling, error paths).