Authored by Farukest. Fixes#432. Extracts _kill_port_process() helper
that uses netstat+taskkill on Windows and fuser on Linux. Previously,
fuser calls were inline with bare except-pass, so on Windows orphaned
bridge processes were never cleaned up — causing 'address already in use'
errors on reconnect. Includes 5 tests covering both platforms, port
matching edge cases, and exception suppression.
Authored by Farukest. Fixes#435. The retry summary in
_handle_max_iterations() hardcoded max_tokens instead of using
_max_tokens_param(), which returns max_completion_tokens for direct
OpenAI API (required by gpt-4o, o-series). The first attempt already
used _max_tokens_param correctly — only the retry path was wrong.
Includes 4 tests for _max_tokens_param provider detection.
Verifies explicit allowlist keys, catch-all _API_KEY/_TOKEN patterns,
case insensitivity, TERMINAL_SSH prefix, and config.yaml routing for
non-secret keys. Covers the fix from PR #469.
The mock handler checked for function_name == 'search' but the RPC
sends 'search_files'. Any test exercising search_files through the
mock would get 'Unknown tool' instead of the canned response.
The _TOOL_STUBS dict in code_execution_tool.py was out of sync with the
actual tool schemas, causing TypeErrors when the LLM used parameters it
sees in its system prompt but the sandbox stubs didn't accept:
search_files:
- Added missing params: context, offset, output_mode
- Fixed target default: 'grep' → 'content' (old value was obsolete)
patch:
- Added missing params: mode, patch (V4A multi-file patch support)
Also added 4 drift-detection tests (TestStubSchemaDrift) that will
catch future divergence between stubs and real schemas:
- test_stubs_cover_all_schema_params: every schema param in stub
- test_stubs_pass_all_params_to_rpc: every stub param sent over RPC
- test_search_files_target_uses_current_values: no obsolete values
- test_generated_module_accepts_all_params: generated code compiles
All 28 tests pass.
Authored by rovle. Adds Daytona as the sixth terminal execution backend
with cloud sandboxes, persistent workspaces, and full CLI/gateway integration.
Includes 24 unit tests and 8 integration tests.
The execute_code sandbox generates a hermes_tools.py stub module for LLM
scripts. Three common failure modes keep tripping up scripts:
1. json.loads(strict=True) rejects control chars in terminal() output
(e.g., GitHub issue bodies with literal tabs/newlines)
2. Shell backtick/quote interpretation when interpolating dynamic content
into terminal() commands (markdown with backticks gets eaten by bash)
3. No retry logic for transient network failures (API timeouts, rate limits)
Adds three convenience helpers to the generated hermes_tools module:
- json_parse(text) — json.loads with strict=False for tolerant parsing
- shell_quote(s) — shlex.quote() for safe shell interpolation
- retry(fn, max_attempts=3, delay=2) — exponential backoff wrapper
Also updates the EXECUTE_CODE_SCHEMA description to document these helpers
so LLMs know they're available without importing anything extra.
Includes 7 new tests (unit + integration) covering all three helpers.
The original implementation only supported xclip (X11), which silently
fails on WSL2 (can't access Windows clipboard for images), Wayland
desktops (xclip is X11-only), and VSCode terminal on WSL2.
Clipboard backend changes (hermes_cli/clipboard.py):
- WSL2: detect via /proc/version, use powershell.exe with .NET
System.Windows.Forms.Clipboard to extract images as base64 PNG
- Wayland: use wl-paste with MIME type detection, auto-convert BMP
to PNG for WSLg environments (via Pillow or ImageMagick)
- Dispatch order: WSL → Wayland → X11 (xclip), with fallthrough
- New has_clipboard_image() for lightweight clipboard checks
- Cache WSL detection result per-process
CLI changes (cli.py):
- /paste command: explicit clipboard image check for terminals where
BracketedPaste doesn't fire (image-only clipboard in VSCode/WinTerm)
- Ctrl+V keybinding: fallback for Linux terminals where Ctrl+V sends
raw byte instead of triggering bracketed paste
Tests: 80 tests (up from 37) covering WSL, Wayland, X11 dispatch,
BMP conversion, has_clipboard_image, and /paste command.
Copy an image to clipboard (screenshot, browser, etc.) and paste into
the Hermes CLI. The image is saved to ~/.hermes/images/, shown as a
badge above the input ([📎 Image #1]), and sent to the model as a
base64-encoded OpenAI vision multimodal content block.
Implementation:
- hermes_cli/clipboard.py: clean module with platform-specific extraction
- macOS: pngpaste (if installed) → osascript fallback (always available)
- Linux: xclip (apt install xclip)
- cli.py: BracketedPaste key handler checks clipboard on every paste,
image bar widget shows attached images, chat() converts to multimodal
content format, Ctrl+C clears attachments
Inspired by @m0at's fork (https://github.com/m0at/hermes-agent) which
implemented image paste support for local vision models. Reimplemented
cleanly as a separate module with tests.
On top of PR #460: self-hosted Firecrawl instances don't require an API
key (USE_DB_AUTHENTICATION=false), so don't force users to set a dummy
FIRECRAWL_API_KEY when FIRECRAWL_API_URL is set. Also adds a proper
self-hosting section to the configuration docs explaining what you get,
what you lose, and how to set it up (Docker stack, tradeoffs vs cloud).
Added 2 more tests (URL-only without key, neither-set raises).
Replaces the unsafe 128K fallback for unknown models with a descending
probe strategy (2M → 1M → 512K → 200K → 128K → 64K → 32K). When a
context-length error occurs, the agent steps down tiers and retries.
The discovered limit is cached per model+provider combo in
~/.hermes/context_length_cache.yaml so subsequent sessions skip probing.
Also parses API error messages to extract the actual context limit
(e.g. 'maximum context length is 32768 tokens') for instant resolution.
The CLI banner now displays the context window size next to the model
name (e.g. 'claude-opus-4 · 200K context · Nous Research').
Changes:
- agent/model_metadata.py: CONTEXT_PROBE_TIERS, persistent cache
(save/load/get), parse_context_limit_from_error(), get_next_probe_tier()
- agent/context_compressor.py: accepts base_url, passes to metadata
- run_agent.py: step-down logic in context error handler, caches on success
- cli.py + hermes_cli/banner.py: context length in welcome banner
- tests: 22 new tests for probing, parsing, and caching
Addresses #132. PR #319's approach (8K default) rejected — too conservative.
Adds optional FIRECRAWL_API_URL environment variable to support
self-hosted Firecrawl deployments alongside the cloud service.
- Add FIRECRAWL_API_URL to optional env vars in hermes_cli/config.py
- Update _get_firecrawl_client() in tools/web_tools.py to accept custom API URL
- Add tests for client initialization with/without URL
- Document new env var in installation and config guides
The Daytona SDK's process.exec(timeout=N) parameter is not enforced —
the server-side timeout never fires and the SDK has no client-side
fallback, causing commands to hang indefinitely.
Fix: wrap commands with timeout N sh -c '...' (coreutils) which
reliably kills the process and returns exit code 124. Added
shlex.quote for proper shell escaping and a secondary deadline (timeout + 10s) that force-stops the sandbox if the shell timeout somehow fails.
Signed-off-by: rovle <lovre.pesut@gmail.com>
state
- Replace logger.warning with warnings.warn for the disk cap so users
actually see it (logger was suppressed by CLI's log level config)
- Use SandboxState enum instead of string literals in
_ensure_sandbox_ready
Signed-off-by: rovle <lovre.pesut@gmail.com>
Authored by 0xbyt4. Wraps commands with unique fence markers to isolate real output
from shell init/exit noise (oh-my-zsh, macOS session restore, etc.). Falls back to
expanded pattern-based cleaning. Also fixes BSD find fallback and test module shadowing.
The retry summary in _handle_max_iterations hardcodes max_tokens instead
of calling _max_tokens_param(). For direct OpenAI API users (gpt-4o,
o-series), the correct parameter name is max_completion_tokens. The first
attempt at line 2697 already uses _max_tokens_param correctly but the
retry path at line 2743 was missed.
fuser command does not exist on Windows, causing orphaned bridge processes
to never be cleaned up. On crash recovery, the port stays occupied and the
next connect() fails with address-already-in-use.
Add _kill_port_process() helper that uses netstat+taskkill on Windows and
fuser on Linux/macOS. Replace both call sites in connect() and disconnect().
Adds a /update command to Telegram, Discord, and other gateway platforms
that runs `hermes update` to pull the latest code, update dependencies,
sync skills, and restart the gateway.
Implementation:
- Spawns `hermes update` in a separate systemd scope (systemd-run --user
--scope) so the process survives the gateway restart that hermes update
triggers at the end. Falls back to nohup if systemd-run is unavailable.
- Writes a marker file (.update_pending.json) with the originating
platform and chat_id before spawning the update.
- On gateway startup, _send_update_notification() checks for the marker,
reads the captured update output, sends the results back to the user,
and cleans up.
Also:
- Registers /update as a Discord slash command
- Updates README.md, docs/messaging.md, docs/slash-commands.md
- Adds 18 tests covering handler, notification, and edge cases
Authored by FarukEst. Fixes#392.
1. Initialize data={} before health-check loop to prevent NameError when
resp.json() raises after http_ready is set to True.
2. Extract _close_bridge_log() helper and call on all return False paths
to prevent file descriptor leaks on failed connection attempts.
Refactors disconnect() to reuse the same helper.
Authored by 0xbyt4.
The italic regex [^*]+ matched across newlines, corrupting bullet lists
using * markers (e.g. '* Item one\n* Item two' became italic garbage).
Fixed by adding \n to the negated character class: [^*\n]+.
Authored by 0xbyt4.
The dedup logic in GitHubSource.search() and unified_search() used
'r.trust_level == "trusted"' which let trusted results overwrite builtin
ones. Now uses ranked comparison: builtin (2) > trusted (1) > community (0).
Authored by 0xbyt4.
Two fixes:
- extract_images(): only remove extracted image tags, not all markdown image
tags. Previously  was silently dropped when real images
were also present.
- truncate_message(): walk chunk_body not full_chunk when tracking code block
state, so the reopened fence prefix doesn't toggle in_code off and leave
continuation chunks with unclosed code blocks.
Authored by 0xbyt4.
144 new tests covering gateway/pairing.py, tools/skill_manager_tool.py,
tools/skills_tool.py, honcho_integration/session.py, and
agent/auxiliary_client.py.
Authored by Farukest. Fixes#389.
Replaces hardcoded forward-slash string checks ('/.git/', '/.hub/') with
Path.parts membership test in _find_all_skills() and scan_skill_commands().
On Windows, str(Path) uses backslashes so the old filter never matched,
causing quarantined skills to appear as installed.
Authored by Farukest. Fixes#387.
Removes 'and not force' from the dangerous verdict check so --force
can never install skills with critical security findings (reverse shells,
data exfiltration, etc). The docstring already documented this behavior
but the code didn't enforce it.
Authored by Farukest. Fixes#385.
Replaces startswith() with Path.is_relative_to() in _check_structure()
symlink escape check — same fix pattern as skill_view() (PR #352).
Prevents symlinks escaping to sibling directories with shared name prefixes.
Fixes a bug where the refresh token was not persisted when the API key
mint failed (e.g., 402 insufficient credits, timeout). The rotated
refresh token was lost, causing subsequent auth attempts to fail with
a stale token.
Changes:
- Persist auth state immediately after each successful token refresh,
before attempting the mint
- Use latest in-memory refresh token on mint-retry paths (was using
the stale original)
- Atomic durable writes for auth.json (temp file + fsync + replace)
- Opt-in OAuth trace logging (HERMES_OAUTH_TRACE=1, fingerprint-only)
- 3 regression tests covering refresh+402, refresh+timeout, and
invalid-token retry behavior
Author: Robin Fernandes <rewbs>
Authored by PercyDikec. Fixes#394.
The transcript extraction used len(history) to find new messages, but
history includes session_meta entries stripped before reaching the agent.
This caused 1 message lost per turn from turn 2 onwards. Fix returns
history_offset (filtered length) from _run_agent and uses it for the slice.
Two fixes for the case where a user switches to a model with a smaller
context window while having a large existing session:
1. Preflight compression in run_conversation(): Before the main loop,
estimate tokens of loaded history + system prompt. If it exceeds the
model's compression threshold (85% of context), compress proactively
with up to 3 passes. This naturally handles model switches because
the gateway creates a fresh AIAgent per message with the current
model's context length.
2. Error handler reordering: Context-length errors (400 with 'maximum
context length' etc.) are now checked BEFORE the generic 4xx handler.
Previously, OpenRouter's 400-status context-length errors were caught
as non-retryable client errors and aborted immediately, never reaching
the compression+retry logic.
Reported by Sonicrida on Discord: 840-message session (2MB+) crashed
after switching from a large-context model to minimax via OpenRouter.
get_definitions() already wrapped check_fn() calls in try/except,
but is_toolset_available() did not. A failing check (network error,
missing import, bad config) would propagate uncaught and crash the
CLI banner, agent startup, and tools-info display.
Now is_toolset_available() catches all exceptions and returns False,
matching the existing pattern in get_definitions().
Added 4 tests covering exception handling in is_toolset_available(),
check_toolset_requirements(), get_definitions(), and
check_tool_availability().
Closes#402
The transcript extraction used len(history) to find new messages, but
history includes session_meta entries that are stripped before passing
to the agent. This mismatch caused 1 message to be lost from the
transcript on every turn after the first, because the slice offset
was too high. Use the filtered history length (history_offset) returned
by _run_agent instead.
Also changed the else branch from returning all agent_messages to
returning an empty list, so compressed/shorter agent output does not
duplicate the entire history into the transcript.
The hidden directory filter used hardcoded forward-slash strings like
'/.git/' and '/.hub/' to exclude internal directories. On Windows,
Path returns backslash-separated strings, so the filter never matched.
This caused quarantined skills in .hub/quarantine/ to appear as
installed skills and available slash commands on Windows.
Replaced string-based checks with Path.parts membership test which
works on both Windows and Unix.
The docstring states --force should never override dangerous verdicts,
but the condition `if result.verdict == "dangerous" and not force`
allowed force=True to skip the early return. Execution then fell
through to `if force: return True`, bypassing the policy block.
Removed `and not force` so dangerous skills are always blocked
regardless of the --force flag.
The symlink escape check in _check_structure() used startswith()
without a trailing separator. A symlink resolving to a sibling
directory with a shared prefix (e.g. 'axolotl-backdoor') would pass
the check for 'axolotl' since the string prefix matched.
Replaced with Path.is_relative_to() which correctly handles directory
boundaries and is consistent with the skill_view path check.
session_search was returning the current session if it matched the
query, which is redundant — the agent already has the current
conversation context. This wasted an LLM summarization call and a
result slot.
Added current_session_id parameter to session_search(). The agent
passes self.session_id and the search filters out any results where
either the raw or parent-resolved session ID matches. Both the raw
match and the parent-resolved match are checked to handle child
sessions from delegation.
Two tests added verifying the exclusion works and that other
sessions are still returned.
Replace the string-based startswith + os.sep approach with
Path.is_relative_to() (Python 3.9+, we require 3.10+). This is
the idiomatic pathlib way to check path containment — it handles
separators, case sensitivity, and the equal-path case natively
without string manipulation.
Simplified tests to match: removed the now-unnecessary
test_separator_is_os_native test since is_relative_to doesn't
depend on separator choice.
The session key construction logic was duplicated in 4 places
(session.py + 3 inline copies in run.py), which is exactly the
kind of drift that caused issue #349 in the first place.
Extracted build_session_key() as a public function in session.py.
SessionStore._generate_session_key() now delegates to it, and all
inline key construction in run.py has been replaced with calls to
the shared function. Tests updated to test the function directly.
The previous implementation used `len(self._entries) > 1` to check if any
sessions had ever been created. This failed for single-platform users because
when sessions reset (via /reset, auto-reset, or gateway restart), the entry
for the same session_key is replaced in _entries, not added. So len(_entries)
stays at 1 for users who only use one platform.
Fix: Query the SQLite database's session count instead. The database preserves
historical session records (marked as ended), so session_count() correctly
returns > 1 for returning users even after resets.
This prevents the agent from reintroducing itself to returning users after
every session reset.
Fixes#351
Improvements to the HA integration merged from PR #184:
- Add ha_list_services tool: discovers available services (actions) per
domain with descriptions and parameter fields. Tells the model what
it can do with each device type (e.g. light.turn_on accepts brightness,
color_name, transition). Closes the gap where the model had to guess
available actions.
- Add HA to hermes tools config: users can enable/disable the homeassistant
toolset and configure HASS_TOKEN + HASS_URL through 'hermes tools' setup
flow instead of manually editing .env.
- Fix should-fix items from code review:
- Remove sys.path.insert hack from gateway adapter
- Replace all print() calls with proper logger (info/warning/error)
- Move env var reads from import-time to handler-time via _get_config()
- Add dedicated REST session reuse in gateway send()
- Update ha_call_service description to reference ha_list_services for
action discovery.
- Update tests for new ha_list_services tool in toolset resolution.
Three section header comments in tests/test_run_agent.py used
'Grup' instead of 'Group':
- Line 124: # Grup 1: Pure Functions
- Line 276: # Grup 2: State / Structure Methods
- Line 572: # Grup 3: Conversation Loop Pieces (OpenAI mock)
Banner integration:
- MCP Servers section in CLI startup banner between Tools and Skills
- Shows each server with transport type, tool count, connection status
- Failed servers shown in red; section hidden when no MCP configured
- Summary line includes MCP server count
- Removed raw print() calls from discovery (banner handles display)
/reload-mcp command:
- New slash command in both CLI and gateway
- Disconnects all MCP servers, re-reads config.yaml, reconnects
- Reports what changed (added/removed/reconnected servers)
- Allows adding/removing MCP servers without restarting
Resources & Prompts support:
- 4 utility tools registered per server: list_resources, read_resource,
list_prompts, get_prompt
- Exposes MCP Resources (data sources) and Prompts (templates) as tools
- Proper parameter schemas (uri for read_resource, name for get_prompt)
- Handles text and binary resource content
- 23 new tests covering schemas, handlers, and registration
Test coverage: 74 MCP tests total, 1186 tests pass overall.
- Discovery is now parallel (asyncio.gather) instead of sequential,
fixing the 60s shared timeout issue with multiple servers
- Startup messages use print() so users see connection status even
with default log levels (the 'tools' logger is set to ERROR)
- Summary line shows total tools and failed servers count
- Validate conflicting config: warn if both 'url' and 'command' are
present (HTTP takes precedence)
- Update TODO.md: mark MCP as implemented, list remaining work
- Add test for conflicting config detection (51 tests total)
All 1163 tests pass.
When both OPENROUTER_API_KEY and OPENAI_API_KEY are set (e.g. OPENAI_API_KEY
in .bashrc), the wrong key was sent to OpenRouter causing auth failures.
Fixed key resolution order in cli.py and runtime_provider.py.
Fixes#289
- Wrap commands with unique fence markers (printf FENCE; cmd; printf FENCE)
to isolate real output from shell init/exit noise (oh-my-zsh, macOS
session restore/save, docker plugin errors, etc.)
- Expand _clean_shell_noise to cover zsh/macOS patterns and strip from
both beginning and end (fallback when fences are missing)
- Fix BSD find compatibility: fallback to simple find when -printf
produces empty output (macOS)
- Fix test_terminal_disk_usage: use sys.modules to get the real module
instead of the shadowed function from tools/__init__.py
- Add 13 new unit tests for fence extraction and zsh noise patterns
- Add threading.Lock protecting all shared state (_servers, _mcp_loop, _mcp_thread)
- Fix deadlock in shutdown_mcp_servers: _stop_mcp_loop was called inside
a _lock block but also acquires _lock (non-reentrant)
- Fix race condition in _ensure_mcp_loop with concurrent callers
- Change idempotency to per-server (retry failed servers, skip connected)
- Dynamic toolset injection via startswith("hermes-") instead of hardcoded list
- Parallel shutdown via asyncio.gather instead of sequential loop
- Add tests for partial failure retry, parallel shutdown, dynamic injection
Patch _servers to empty dict in tests that call discover_mcp_tools()
with mocked config, preventing interference from real MCP connections
that may exist when running within the full test suite.
Refactor MCP connections from AsyncExitStack to task-per-server
architecture. Each server now runs as a long-lived asyncio Task
with `async with stdio_client(...)`, ensuring anyio cancel-scope
cleanup happens in the same Task that opened the connection.
Connect to external MCP servers via stdio transport, discover their tools
at startup, and register them into the hermes-agent tool registry.
- New tools/mcp_tool.py: config loading, server connection via background
event loop, tool handler factories, discovery, and graceful shutdown
- model_tools.py: trigger MCP discovery after built-in tool imports
- cli.py: call shutdown_mcp_servers in _run_cleanup
- pyproject.toml: add mcp>=1.2.0 as optional dependency
- 27 unit tests covering config, schema conversion, handlers, registration,
SDK interaction, toolset injection, graceful fallback, and shutdown
Config format (in ~/.hermes/config.yaml):
mcp_servers:
filesystem:
command: "npx"
args: ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
Added a fixture to redirect HERMES_HOME to a temporary directory during tests, preventing writes to the user's home directory. Updated the test for DebugSession to create a dedicated log directory for saving logs, ensuring test isolation and accuracy in assertions.
The TestFlushSentinelNotLeaked test from PR #227 had two issues:
1. flush_memories() uses get_text_auxiliary_client() which could bypass
agent.client entirely — mock it to return (None, None)
2. No assertion that the API was actually called — added guard assert
Without these fixes the test passed vacuously (API never called).
The TestRetryExhaustion tests from PR #223 didn't mock time.sleep/time.time,
causing the retry backoff loops (275s+ total) to run in real time. Tests would
time out instead of running quickly.
Added _make_fast_time_mock() helper that creates a mock time module where
time.time() advances 500s per call (so sleep_end is always in the past) and
time.sleep() is a no-op. Both tests now complete in <1s.
skill_view accepted arbitrary file_path values like '../../.env' and
would read files outside the skill directory, exposing API keys and
other sensitive data.
Added two layers of defense:
1. Reject paths with '..' components (fast, catches obvious traversal)
2. resolve() containment check with trailing '/' to prevent prefix
collisions (catches symlinks and edge cases)
Fix approach from PR #242 (@Bartok9). Vulnerability reported by
@Farukest (#220, PR #221). Tests rewritten to properly mock SKILLS_DIR.
Closes#220
The OpenAI API returns content: null on assistant messages that only
contain tool calls. msg.get('content', '') returns None (not '') when
the key exists with value None, causing TypeError on len() and string
concatenation in _generate_summary and compress.
Fix: msg.get('content') or '' — handles both missing keys and None.
Tests from PR #216 (@Farukest). Fix also in PR #215 (@cutepawss).
Both PRs had stale branches and couldn't be merged directly.
Closes#211
Priority is: CLI arg > config file > env var > default
(not env var > config file as the old comment stated)
The test failed because config.yaml had max_turns at both root level
and inside agent section. The test cleared agent.max_turns but the
root-level value still took precedence over the env var. Fixed the
test to clear both, and corrected the comment to match the intended
priority order.
Tests added:
- Roundtrip serialization of chat_topic via to_dict/from_dict
- chat_topic defaults to None when missing from dict
- Channel Topic line appears in session context prompt when set
- Channel Topic line is omitted when chat_topic is None
Follow-up to PR #248 (feat: Discord channel topic in session context).
Enhanced the AIAgent class to capture and normalize summary information for reasoning items. Implemented logic to handle summaries as lists, ensuring proper formatting for API interactions. Updated tests to validate the inclusion of summaries in reasoning items, both for existing and default cases.
Updated the authentication mechanism to store Codex OAuth tokens in the Hermes auth store located at ~/.hermes/auth.json instead of the previous ~/.codex/auth.json. This change includes refactoring related functions for reading and saving tokens, ensuring better management of authentication states and preventing conflicts between different applications. Adjusted tests to reflect the new storage structure and improved error handling for missing or malformed tokens.
Introduced a new `provider_routing` section in the CLI configuration to control how requests are routed across providers when using OpenRouter. This includes options for sorting providers by throughput, latency, or price, as well as allowing or ignoring specific providers, setting the order of provider attempts, and managing data collection policies. Updated relevant classes and documentation to support these features, enhancing flexibility in provider selection.
Remove loop_scope="function" parameter from async test decorators in
test_hooks.py. This matches the existing convention in the repo
(test_telegram_documents.py) and avoids requiring pytest-asyncio 0.23+.
All 144 new tests from PR #191 now pass.
Block dangerous HA service domains (shell_command, command_line,
python_script, pyscript, hassio, rest_command) that allow arbitrary
code execution or SSRF. Add regex validation for entity_id to prevent
path traversal attacks. 17 new tests covering both security features.
Fixes#241
When users set HONCHO_API_KEY via `hermes config set` or environment
variable, they expect the integration to activate. Previously, the
`enabled` flag defaulted to `false` when reading from global config,
requiring users to also explicitly enable Honcho.
This change auto-enables Honcho when:
- An API key is present (from config file or env var)
- AND `enabled` is not explicitly set to `false` in the config
Users who want to disable Honcho while keeping the API key can still
set `enabled: false` in their config.
Also adds unit tests for the auto-enable behavior.
Two fixes to the subagent progress display from PR #186:
1. Task index prefix: show 1-indexed prefix ([1], [2], ...) for ALL
tasks in batch mode (task_count > 1). Single tasks get no prefix.
Previously task 0 had no prefix while others did, making batch
output confusing.
2. Completion indicator: use spinner.print_above() instead of raw
print() for per-task completion lines (✓ [1/2] ...). Raw print
collided with the active spinner, mushing the completion text
onto the spinner line. Now prints cleanly above.
Added task_count parameter to _build_child_progress_callback and
_run_single_child. Updated tests accordingly.
print_above() used \033[K (erase-to-end-of-line) to clear the spinner
line before printing text above it. This causes garbled escape codes when
prompt_toolkit's patch_stdout is active in CLI mode.
Switched to the same spaces-based clearing approach used by stop() —
overwrite with blanks, then carriage return back to start of line.
Updated test assertion to match the new clearing method.
When subagents run via delegate_task, the user now sees real-time
progress instead of silence:
CLI: tree-view activity lines print above the delegation spinner
🔀 Delegating: research quantum computing
├─ 💭 "I'll search for papers first..."
├─ 🔍 web_search "quantum computing"
├─ 📖 read_file "paper.pdf"
└─ ⠹ working... (18.2s)
Gateway (Telegram/Discord): batched progress summaries sent every
5 tool calls to avoid message spam. Remaining tools flushed on
subagent completion.
Changes:
- agent/display.py: add KawaiiSpinner.print_above() to print
status lines above an active spinner without disrupting animation.
Uses captured stdout (self._out) so it works inside the child's
redirect_stdout(devnull).
- tools/delegate_tool.py: add _build_child_progress_callback()
that creates a per-child callback relaying tool calls and
thinking events to the parent's spinner (CLI) or progress
queue (gateway). Each child gets its own callback instance,
so parallel subagents don't share state. Includes _flush()
for gateway batch completion.
- run_agent.py: fire tool_progress_callback with '_thinking'
event when the model produces text content. Guarded by
_delegate_depth > 0 so only subagents fire this (prevents
gateway spam from main agent). REASONING_SCRATCHPAD/think/
reasoning XML tags are stripped before display.
Tests: 21 new tests covering print_above, callback builder,
thinking relay, SCRATCHPAD filtering, batching, flush, thread
isolation, delegate_depth guard, and prefix handling.
- Introduce a new test suite in `test_file_tools_live.py` to validate file operations and ensure accurate command execution in a real environment.
- Implement assertions to check for shell noise contamination in outputs, enhancing the reliability of command results.
- Create fixtures for setting up a local environment and populating directories with known file contents for comprehensive testing.
- Refactor shell noise handling in `process_registry.py` and `local.py` to support multiple noise patterns, improving output cleanliness.
- Introduce a new test suite for the `redact_sensitive_text` function, covering various sensitive data formats including API keys, tokens, and environment variables.
- Ensure that sensitive information is properly masked in logs and outputs while non-sensitive data remains unchanged.
- Add tests for different scenarios including JSON fields, authorization headers, and environment variable assignments.
- Implement a redacting formatter for logging to enhance security during log output.
- Enhanced Codex model discovery by fetching available models from the API, with fallback to local cache and defaults.
- Updated the context compressor's summary target tokens to 2500 for improved performance.
- Added external credential detection for Codex CLI to streamline authentication.
- Refactored various components to ensure consistent handling of authentication and model selection across the application.
The retry exhaustion checks used > instead of >= to compare
retry_count against max_retries. Since the while loop condition is
retry_count < max_retries, the check retry_count > max_retries can
never be true inside the loop. When retries are exhausted, the loop
exits and falls through to response.choices[0] on an invalid response,
crashing with IndexError instead of returning a proper error.
os.setsid, os.killpg, and os.getpgid do not exist on Windows and raise
AttributeError on import or first call. This breaks the terminal tool,
code execution sandbox, process registry, and WhatsApp bridge on Windows.
Added _IS_WINDOWS platform guard in all four affected files, following
the pattern documented in CONTRIBUTING.md. On Windows, preexec_fn is
set to None and process termination falls back to proc.terminate() /
proc.kill() instead of process group signals.
Files changed:
- tools/environments/local.py (3 call sites)
- tools/process_registry.py (2 call sites)
- tools/code_execution_tool.py (3 call sites)
- gateway/platforms/whatsapp.py (3 call sites)
/retry and /undo set session_entry.conversation_history which does not
exist on SessionEntry. The truncated history was never written to disk,
so the next message reload picked up the full unmodified transcript.
Added SessionStore.rewrite_transcript() that persists changes to both
the JSONL file and SQLite database, and updated both commands to use it.
/reset accessed self.session_store._sessions which does not exist on
SessionStore (the correct attribute is _entries). Also replaced the
hand-coded session key with _generate_session_key() to fix WhatsApp DM
sessions using the wrong key format.
Closes#210
The italic regex \*([^*]+)\* used [^*] which matches newlines, causing
bullet lists with * markers to be incorrectly converted to italic text.
Changed to [^*\n]+ to prevent cross-line matching.
Adds 43 tests for _escape_mdv2 and format_message covering code blocks,
bold/italic, headers, links, mixed formatting, and the regression case.
- unified_search and GitHubSource.search dedup: replace naive
`trust_level == "trusted"` check with ranked comparison so
"builtin" results are never overwritten by "trusted" or "community"
- Add 43 unit tests covering _parse_frontmatter_quick, trust_level_for,
HubLockFile CRUD, TapsManager ops, LobeHub _convert_to_skill_md,
unified_search dedup (with regression test), and append_audit_log
- extract_images: only remove extracted image tags from content, preserve
non-image markdown links (e.g. PDFs) that were previously silently lost
- truncate_message: walk only chunk_body (not prepended prefix) so the
reopened code fence does not toggle in_code off, leaving continuation
chunks with unclosed code blocks
- Add 49 unit tests covering MessageEvent command parsing, extract_images,
extract_media, truncate_message code block handling, and _get_human_delay
Gemini 3 thinking models attach extra_content with thought_signature
to function call responses. This must be echoed back on subsequent
API calls or the server rejects with a 400 error. The assistant
message builder was dropping this field, causing all Gemini 3 Flash/Pro
tool-calling flows to fail after the first function call.
- Auto-authorize HA events in gateway (system-generated, not user messages)
- Guard _read_events against None/closed WebSocket after failed reconnect
- Use UUID for send() message_id instead of polluting WS sequence counter
- entity_id parameter now takes precedence over data["entity_id"]
- Add ha_list_entities, ha_get_state, ha_call_service tools via REST API
- Add WebSocket gateway adapter for real-time state_changed event monitoring
- Support domain/entity filtering, cooldown, and auto-reconnect with backoff
- Use REST API for outbound notifications to avoid WS race condition
- Gate tool availability on HASS_TOKEN env var
- Add 82 unit tests covering real logic (filtering, payload building, event pipeline)
Fixes#160
The issue was that MEDIA tags were being extracted from ALL messages
in the conversation history, not just messages from the current turn.
This caused TTS voice messages generated in earlier turns to be
re-attached to every subsequent reply.
The fix:
- Track history_len before calling run_conversation
- Only scan messages AFTER history_len for MEDIA tags
- Add comprehensive tests to prevent regression
This ensures each voice message is sent exactly once, when it's
generated, not on every subsequent message in the session.
The 413 "Request Entity Too Large" error from the LLM API was caught by the
generic 4xx handler which aborts immediately. This is wrong for 413 — it's a
payload-size issue that can be resolved by compressing conversation history.
- Intercept 413 before the generic 4xx block and route to _compress_context
- Exclude 413 from generic is_client_error detection
- Add 'request entity too large' to context-length phrases as safety net
- Add tests for 413 compression behavior
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Sanitize filenames in cache_document_from_bytes to prevent path traversal (strip directory components, null bytes, resolve check)
- Reject documents with None file_size instead of silently allowing download
- Cap text file injection at 100 KB to prevent oversized prompt payloads
- Sanitize display_name in run.py context notes to block prompt injection via filenames
- Add 35 unit tests covering document cache utilities and Telegram document handling
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add 15 new tests in two classes:
- TestRmFalsePositiveFix (8 tests): verify filenames starting with 'r'
(readme.txt, requirements.txt, report.csv, etc.) are NOT falsely
flagged as 'recursive delete'
- TestRmRecursiveFlagVariants (7 tests): verify all recursive delete
flag styles (-r, -rf, -rfv, -fr, -irf, --recursive, sudo rm -rf)
are still correctly caught
All 29 tests pass (14 existing + 15 new).
The regex `ignore\s+(previous|all|above|prior)\s+instructions` only
allowed ONE word between "ignore" and "instructions". Multi-word
variants like "Ignore ALL prior instructions" bypassed the scanner
because "ALL" matched the alternation but then `\s+instructions`
failed to match "prior".
Fix: use `(?:\w+\s+)*` groups to allow optional extra words before
and after the keyword alternation.
Cover model_tools, toolset_distributions, context_compressor,
prompt_caching, cronjob_tools, session_search, process_registry,
and cron/scheduler with 127 new test cases.
These tests documented the macOS symlink bypass bug with
platform-conditional assertions. The fix and proper regression
tests are in PR #61 (tests/tools/test_write_deny.py), so remove
them here to avoid ordering conflicts between the two PRs.
On macOS, /etc is a symlink to /private/etc. The _is_write_denied()
function resolves the input path with os.path.realpath() but the deny
list entries were stored as literal strings ("/etc/shadow"). This meant
the resolved path "/private/etc/shadow" never matched, allowing writes
to sensitive system files on macOS.
Fix: Apply os.path.realpath() to deny list entries at module load time
so both sides of the comparison use resolved paths.
Adds 19 regression tests in tests/tools/test_write_deny.py.
- Renamed test method for clarity and added comprehensive tests for `SessionSource` including handling of numeric `chat_id`, missing optional fields, and invalid platforms.
- Introduced tests for session source descriptions based on chat types and names, ensuring accurate representation in prompts.
- Improved file tools tests by validating schema structures, ensuring no duplicate model IDs, and enhancing error handling in file operations.
- Introduced a new `_cleanup_test_artifacts` function to remove test-generated files and directories after test execution.
- Integrated the cleanup function into the `test_current_implementation` and `test_interruption_and_resume` tests to ensure proper resource management and prevent clutter from leftover files.
- Replaced the Nous API key check with the Auxiliary Model check in the WebToolsTester class.
- Updated the environment configuration to reflect the change in API key validation, ensuring accurate reporting of available keys.
- Introduced a shared interrupt signaling mechanism to allow tools to check for user interrupts during long-running operations.
- Updated the AIAgent to handle interrupts more effectively, ensuring in-progress tool calls are canceled and multiple interrupt messages are combined into one prompt.
- Enhanced the CLI configuration to include container resource limits (CPU, memory, disk) and persistence options for Docker, Singularity, and Modal environments.
- Improved documentation to clarify interrupt behaviors and container resource settings, providing users with better guidance on configuration and usage.
- Deleted test scripts for Nous API limits, patterns, and temperature checks to streamline the testing suite.
- These scripts were no longer necessary and their removal helps maintain a cleaner codebase.
- Introduced the `delegate_task` tool, allowing the main agent to spawn child AIAgent instances with isolated context for complex tasks.
- Supported both single-task and batch processing (up to 3 concurrent tasks) to enhance task management capabilities.
- Updated configuration options for delegation, including maximum iterations and default toolsets for subagents.
- Enhanced documentation to provide clear guidance on using the delegation feature and its configuration.
- Added comprehensive tests to ensure the functionality and reliability of the delegation logic.
- Introduced a new `execute_code` tool that allows the agent to run Python scripts that call Hermes tools via RPC, reducing the number of round trips required for tool interactions.
- Added configuration options for timeout and maximum tool calls in the sandbox environment.
- Updated the toolset definitions to include the new code execution capabilities, ensuring integration across platforms.
- Implemented comprehensive tests for the code execution sandbox, covering various scenarios including tool call limits and error handling.
- Enhanced the CLI and documentation to reflect the new functionality, providing users with clear guidance on using the code execution tool.
- Introduced new browser automation tools in `browser_tool.py` for navigating, interacting with, and extracting content from web pages using the agent-browser CLI and Browserbase cloud execution.
- Updated `.env.example` to include new configuration options for Browserbase API keys and session settings.
- Enhanced `model_tools.py` and `toolsets.py` to integrate browser tools into the existing tool framework, ensuring consistent access across toolsets.
- Updated `README.md` with setup instructions for browser tools and their usage examples.
- Added new test script `test_modal_terminal.py` to validate Modal terminal backend functionality.
- Improved `run_agent.py` to support browser tool integration and logging enhancements for better tracking of API responses.