hermes-agent/tools
Teknium 20f2258f34
fix(interrupt): propagate to concurrent-tool workers + opt-in debug trace (#11907)
* fix(interrupt): propagate to concurrent-tool workers + opt-in debug trace

interrupt() previously only flagged the agent's _execution_thread_id.
Tools running inside _execute_tool_calls_concurrent execute on
ThreadPoolExecutor worker threads whose tids are distinct from the
agent's, so is_interrupted() inside those tools returned False no matter
how many times the gateway called .interrupt() — hung ssh / curl / long
make-builds ran to their own timeout.

Changes:
- run_agent.py: track concurrent-tool worker tids in a per-agent set,
  fan interrupt()/clear_interrupt() out to them, and handle the
  register-after-interrupt race at _run_tool entry.  getattr fallback
  for the tracker so test stubs built via object.__new__ keep working.
- tools/environments/base.py: opt-in _wait_for_process trace (ENTER,
  per-30s HEARTBEAT with interrupt+activity-cb state, INTERRUPT
  DETECTED, TIMEOUT, EXIT) behind HERMES_DEBUG_INTERRUPT=1.
- tools/interrupt.py: opt-in set_interrupt() trace (caller tid, target
  tid, set snapshot) behind the same env flag.
- tests: new regression test runs a polling tool on a concurrent worker
  and asserts is_interrupted() flips to True within ~1s of interrupt().
  Second new test guards clear_interrupt() clearing tracked worker bits.

Validation: tests/run_agent/ all 762 pass; tests/tools/ interrupt+env
subset 216 pass.

* fix(interrupt-debug): bypass quiet_mode logger filter so trace reaches agent.log

AIAgent.__init__ sets logging.getLogger('tools').setLevel(ERROR) when
quiet_mode=True (the CLI default). This would silently swallow every
INFO-level trace line from the HERMES_DEBUG_INTERRUPT=1 instrumentation
added in the parent commit — confirmed by running hermes chat -q with
the flag and finding zero trace lines in agent.log even though
_wait_for_process was clearly executing (subprocess pid existed).

Fix: when HERMES_DEBUG_INTERRUPT=1, each traced module explicitly sets
its own logger level to INFO at import time, overriding the 'tools'
parent-level filter. Scoped to the opt-in case only, so production
(quiet_mode default) logs stay quiet as designed.

Validation: hermes chat -q with HERMES_DEBUG_INTERRUPT=1 now writes
'_wait_for_process ENTER/EXIT' lines to agent.log as expected.

* fix(cli): SIGTERM/SIGHUP no longer orphans tool subprocesses

Tool subprocesses spawned by the local environment backend use
os.setsid so they run in their own process group. Before this fix,
SIGTERM/SIGHUP to the hermes CLI killed the main thread via
KeyboardInterrupt but the worker thread running _wait_for_process
never got a chance to call _kill_process — Python exited, the child
was reparented to init (PPID=1), and the subprocess ran to its
natural end (confirmed live: sleep 300 survived 4+ min after SIGTERM
to the agent until manual cleanup).

Changes:
- cli.py _signal_handler (interactive) + _signal_handler_q (-q mode):
  route SIGTERM/SIGHUP through agent.interrupt() so the worker's poll
  loop sees the per-thread interrupt flag and calls _kill_process
  (os.killpg) on the subprocess group. HERMES_SIGTERM_GRACE (default
  1.5s) gives the worker time to complete its SIGTERM+SIGKILL
  escalation before KeyboardInterrupt unwinds main.
- tools/environments/base.py _wait_for_process: wrap the poll loop in
  try/except (KeyboardInterrupt, SystemExit) so the cleanup fires
  even on paths the signal handlers don't cover (direct sys.exit,
  unhandled KI from nested code, etc.). Emits EXCEPTION_EXIT trace
  line when HERMES_DEBUG_INTERRUPT=1.
- New regression test: injects KeyboardInterrupt into a running
  _wait_for_process via PyThreadState_SetAsyncExc, verifies the
  subprocess process group is dead within 3s of the exception and
  that KeyboardInterrupt re-raises cleanly afterward.

Validation:
| Before                                                  | After              |
|---------------------------------------------------------|--------------------|
| sleep 300 survives 4+ min as PPID=1 orphan after SIGTERM | dies within 2 s   |
| No INTERRUPT DETECTED in trace                          | INTERRUPT DETECTED fires + killing process group |
| tests/tools/test_local_interrupt_cleanup                | 1/1 pass          |
| tests/run_agent/test_concurrent_interrupt               | 4/4 pass          |
2026-04-17 20:39:25 -07:00
..
browser_providers feat: ungate Tool Gateway — subscription-based access with per-tool opt-in 2026-04-16 12:36:49 -07:00
environments fix(interrupt): propagate to concurrent-tool workers + opt-in debug trace (#11907) 2026-04-17 20:39:25 -07:00
neutts_samples refactor(tts): replace NeuTTS optional skill with built-in provider + setup flow 2026-03-17 02:33:12 -07:00
__init__.py Merge branch 'main' into rewbs/tool-use-charge-to-subscription 2026-03-31 08:48:54 +09:00
ansi_strip.py fix: strip ANSI at the source — clean terminal output before it reaches the model 2026-03-23 07:43:12 -07:00
approval.py fix(kimi): cover remaining fixed-temperature bypasses 2026-04-17 20:25:42 -07:00
binary_extensions.py fix(tools): address PR review — remove _extract_raw_output, BudgetConfig everywhere, read_file hardening 2026-04-08 02:24:32 -07:00
browser_camofox_state.py feat(browser): add persistent Camofox sessions and VNC URL discovery (salvage #4400) (#4419) 2026-04-01 04:18:50 -07:00
browser_camofox.py fix: /browser connect CDP override now takes priority over Camofox (#10523) 2026-04-15 14:11:18 -07:00
browser_tool.py fix: two process leaks (agent-browser daemons, paste.rs sleepers) (#11843) 2026-04-17 18:46:30 -07:00
budget_config.py fix: preserve existing thresholds, remove pre-read byte guard 2026-04-08 02:24:32 -07:00
checkpoint_manager.py fix(checkpoints): isolate shadow git repo from user's global config (#11261) 2026-04-16 16:06:49 -07:00
clarify_tool.py refactor: add tool_error/tool_result helpers + read_raw_config, migrate 129 callsites 2026-04-07 13:36:38 -07:00
code_execution_tool.py fix: follow-up for salvaged PR #10854 2026-04-16 06:42:45 -07:00
credential_files.py refactor: extract shared helpers to deduplicate repeated code patterns (#7917) 2026-04-11 13:59:52 -07:00
cronjob_tools.py fix: replace hardcoded ~/.hermes with display_hermes_home() in agent-facing text (#10285) 2026-04-15 04:57:55 -07:00
debug_helpers.py refactor: codebase-wide lint cleanup — unused imports, dead code, and inefficient patterns (#5821) 2026-04-07 10:25:31 -07:00
delegate_tool.py Merge branch 'main' of github.com:NousResearch/hermes-agent into feat/ink-refactor 2026-04-16 10:47:41 -05:00
env_passthrough.py refactor: remove dead code — 1,784 lines across 77 files (#9180) 2026-04-13 16:32:04 -07:00
feishu_doc_tool.py fix(feishu-comment): use get_hermes_home(); drop dead asyncio wrapper; AUTHOR_MAP 2026-04-17 19:04:11 -07:00
feishu_drive_tool.py fix(feishu-comment): use get_hermes_home(); drop dead asyncio wrapper; AUTHOR_MAP 2026-04-17 19:04:11 -07:00
file_operations.py fix(file-ops): follow terminal env's live cwd in _exec instead of init-time cached cwd (#11912) 2026-04-17 19:26:40 -07:00
file_tools.py fix(tools): bound _read_tracker sub-containers + prune _completion_consumed (#11839) 2026-04-17 15:53:57 -07:00
fuzzy_match.py fix(patch): harden V4A patch parser and fuzzy match — 9 correctness bugs 2026-04-10 16:47:44 -07:00
homeassistant_tool.py fix: clean up description escaping, add string-data tests 2026-04-13 04:45:07 -07:00
image_generation_tool.py feat(image_gen): upgrade Recraft V3 → V4 Pro, Nano Banana → Pro (#11406) 2026-04-16 22:05:41 -07:00
interrupt.py fix(interrupt): propagate to concurrent-tool workers + opt-in debug trace (#11907) 2026-04-17 20:39:25 -07:00
managed_tool_gateway.py fix(tools): add debug logging for token refresh and tighten domain check 2026-04-02 12:40:03 +11:00
mcp_oauth_manager.py fix(mcp): consolidate OAuth handling, pick up external token refreshes (#11383) 2026-04-16 21:57:10 -07:00
mcp_oauth.py fix(mcp): consolidate OAuth handling, pick up external token refreshes (#11383) 2026-04-16 21:57:10 -07:00
mcp_tool.py fix(mcp): consolidate OAuth handling, pick up external token refreshes (#11383) 2026-04-16 21:57:10 -07:00
memory_tool.py fix: nest msvcrt import inside fcntl except block 2026-04-14 10:18:05 -07:00
mixture_of_agents_tool.py refactor: remove dead code — 1,784 lines across 77 files (#9180) 2026-04-13 16:32:04 -07:00
neutts_synth.py fix(tts): document NeuTTS provider and align install guidance (#1903) 2026-03-18 02:55:30 -07:00
openrouter_client.py refactor: route ad-hoc LLM consumers through centralized provider router 2026-03-11 20:02:36 -07:00
osv_check.py feat: OSV malware check for MCP extension packages (#5305) 2026-04-05 12:46:07 -07:00
patch_parser.py fix(patch): harden V4A patch parser and fuzzy match — 9 correctness bugs 2026-04-10 16:47:44 -07:00
path_security.py refactor: extract shared helpers to deduplicate repeated code patterns (#7917) 2026-04-11 13:59:52 -07:00
process_registry.py Merge pull request #4692 from NousResearch/feat/ink-refactor 2026-04-17 18:02:37 -05:00
registry.py fix: tighten AST check to module-level only 2026-04-14 21:12:29 -07:00
rl_training_tool.py refactor: codebase-wide lint cleanup — unused imports, dead code, and inefficient patterns (#5821) 2026-04-07 10:25:31 -07:00
send_message_tool.py fix(discord): forum channel media + polish 2026-04-17 20:25:48 -07:00
session_search_tool.py fix(session_search): coerce limit to int to prevent TypeError with non-int values (#10522) 2026-04-15 14:11:05 -07:00
skill_manager_tool.py fix: five HERMES_HOME profile-isolation leaks (#10570) 2026-04-15 17:09:41 -07:00
skills_guard.py refactor: remove dead code — 1,784 lines across 77 files (#9180) 2026-04-13 16:32:04 -07:00
skills_hub.py feat(skills): centralized skills index — eliminate GitHub API calls for search/install 2026-04-12 16:39:04 -07:00
skills_sync.py feat(skills): add 'hermes skills reset' to un-stick bundled skills (#11468) 2026-04-17 00:41:31 -07:00
skills_tool.py fix: use absolute skill_dir for external skills (#10313) (#10587) 2026-04-15 17:22:55 -07:00
terminal_tool.py feat: ungate Tool Gateway — subscription-based access with per-tool opt-in 2026-04-16 12:36:49 -07:00
tirith_security.py fix: handle cross-device shutil.move failure in tirith auto-install (#10127) (#10524) 2026-04-15 14:50:07 -07:00
todo_tool.py fix(tools): enforce ID uniqueness in TODO store during replace operations 2026-04-11 16:22:50 -07:00
tool_backend_helpers.py feat: ungate Tool Gateway — subscription-based access with per-tool opt-in 2026-04-16 12:36:49 -07:00
tool_result_storage.py fix(tools): neutralize shell injection in _write_to_sandbox via path quoting (#7940) 2026-04-11 14:26:11 -07:00
transcription_tools.py refactor: remove dead code — 1,784 lines across 77 files (#9180) 2026-04-13 16:32:04 -07:00
tts_tool.py feat(tts): add Google Gemini TTS provider (#11229) 2026-04-16 14:23:16 -07:00
url_safety.py fix: allow trusted QQ CDN benchmark IP resolution 2026-04-17 04:22:40 -07:00
vision_tools.py refactor: remove dead code — 1,784 lines across 77 files (#9180) 2026-04-13 16:32:04 -07:00
voice_mode.py refactor: remove dead code — 1,784 lines across 77 files (#9180) 2026-04-13 16:32:04 -07:00
web_tools.py feat: ungate Tool Gateway — subscription-based access with per-tool opt-in 2026-04-16 12:36:49 -07:00
website_policy.py refactor: codebase-wide lint cleanup — unused imports, dead code, and inefficient patterns (#5821) 2026-04-07 10:25:31 -07:00
xai_http.py feat(xai): upgrade to Responses API, add TTS provider 2026-04-16 02:24:08 -07:00