hermes-agent/tests/hermes_cli
Teknium af22421e87
feat(dashboard): page-scoped plugin slots for built-in pages (#15658)
* fix(terminal): three-layer defense against watch_patterns notification spam

Background processes that stack notify_on_complete=True with watch_patterns
can flood the user with duplicate, delayed notifications — matches deliver
asynchronously via the completion queue and continue arriving minutes after
the process has exited. The docstring warning against this (PR #12113) has
proven insufficient; agents still misuse the combination.

Three layered defenses, each sufficient on its own:

1. Mutual exclusion (terminal_tool.py): When both flags are set on a
   background process, drop watch_patterns with a warning. notify_on_complete
   wins because 'let me know when it's done' is the more useful signal and
   fires exactly once. Extracted as _resolve_notification_flag_conflict() so
   the rule is testable in isolation.

2. Suppress-after-exit (process_registry.py): _check_watch_patterns() now
   bails the moment session.exited is True. Post-exit chunks (buffered reads
   draining after the process is gone) no longer produce notifications. This
   is the fix flagged as future work in session 20260418_020302_79881c.

3. Global circuit breaker (process_registry.py): Per-session rate limits don't
   catch the sibling-flood case — N concurrent processes can each stay under
   8/10s and still collectively spam. New WATCH_GLOBAL_MAX_PER_WINDOW=15 cap
   trips a 30-second cooldown across ALL sessions, emits a single
   watch_overflow_tripped event, silently counts dropped events, and emits a
   watch_overflow_released summary when the cooldown ends.

Also updates the tool schema + docstring to document the new behavior.

Tests: 8 new tests covering all three fixes (suppress-after-exit x2,
mutual-exclusion resolver x4, global breaker trip/cooldown/release x2).
All 60 tests across test_watch_patterns.py, test_notify_on_complete.py,
test_terminal_tool.py pass.

Real-world trigger: self-inflicted in session 20260425_051924 — three
concurrent hermes-sweeper review subprocesses each set watch_patterns=
['failed validation', 'errored'] AND notify_on_complete=True, then iterated
over multiple items, producing enough matches per process to defeat the
per-session cap while staying under the global cap that didn't yet exist.

* fix(terminal): aggressive 1-per-15s watch_patterns rate limit + strike-3 promotion

Per Teknium's direction, the watch_patterns rate limit is now much more
aggressive and self-healing.

## New rule — per session

- HARD cap: 1 watch-match notification per 15 seconds per process.
- Any match arriving inside the cooldown window is dropped and counts as
  ONE strike for that window (many drops in the same window still = 1 strike).
- After 3 consecutive strike windows, watch_patterns is permanently disabled
  for the session and the session is auto-promoted to notify_on_complete
  semantics — exactly one notification when the process actually exits.
- A cooldown window that expires with zero drops resets the consecutive
  strike counter — healthy cadence is forgiven.

## Schema + docstring rewritten

The tool schema description now gives the model explicit guidance:
- notify_on_complete is 'the right choice for almost every long-running task'
- watch_patterns is for RARE one-shot signals on LONG-LIVED processes
- Do NOT use watch_patterns with loops/batch jobs — error patterns fire every
  iteration and will hit the strike limit fast
- Mutual exclusion is stated on both parameter descriptions
- 1/15s cooldown and 3-strike promotion are stated in the watch_patterns
  description so the model sees the contract every turn

## Removed

- WATCH_MAX_PER_WINDOW (8/10s) and WATCH_OVERLOAD_KILL_SECONDS (45) — the
  new 1/15s limit subsumes both; keeping them would double-count.
- _watch_window_hits / _watch_window_start / _watch_overload_since fields
  on ProcessSession. Replaced by _watch_last_emit_at / _watch_cooldown_until
  / _watch_strike_candidate / _watch_consecutive_strikes.

## Kept

- Global circuit breaker across all sessions (15/10s → 30s cooldown) as a
  secondary safety net for concurrent siblings. Still valuable when 20
  short-lived processes each fire once — none individually violates the
  per-session limit.
- Suppress-after-exit guard.
- Mutual exclusion resolver at the tool entry point.

## Tests

- 6 new tests in TestPerSessionRateLimit covering: first match delivers,
  second in cooldown suppressed, multi-drop = single strike, 3 strikes
  disables + promotes, clean window resets counter, suppressed count
  carried to next emit.
- Global circuit breaker tests rewritten to use fresh sessions instead of
  hacking removed per-window fields.
- 50/50 watch_patterns + notify_on_complete tests pass.
- 60/60 including test_terminal_tool.py pass.

* feat(dashboard): page-scoped plugin slots for built-in pages

Dashboard plugins can now inject components into specific built-in
pages (Sessions, Analytics, Logs, Cron, Skills, Config, Env, Docs,
Chat) without overriding the whole route.

Previously, plugins could only:
  1. Add new tabs (tab.path)
  2. Replace whole built-in pages (tab.override)
  3. Inject into global shell slots (header-*, footer-*, pre-main, ...)

None of those let a plugin add a banner, card, or widget to an
existing page. The new <page>:top / <page>:bottom slots close that
gap, reusing the existing registerSlot() API.

Changes
- web/src/plugins/slots.ts: 18 new KNOWN_SLOT_NAMES entries
  (sessions:top, sessions:bottom, analytics:top, ..., chat:bottom),
  grouped under "Shell-wide" vs "Page-scoped" in the docblock
- web/src/pages/*: each built-in page now renders
    <PluginSlot name="<page>:top" />
  as the first child of its outer wrapper and
    <PluginSlot name="<page>:bottom" />
  as the last child -- zero visual cost when no plugin registers
- plugins/example-dashboard: registers a demo banner into
  sessions:top via registerSlot(), with matching slots entry in
  the manifest -- so freshly-setup users can see what page-scoped
  slots look like without writing any plugin code
- website/docs: new "Page-scoped slots" table in the plugin
  authoring guide, with a worked example
- tests/hermes_cli/test_web_server.py: round-trip test for
  colon-bearing slot names (sessions:top, analytics:bottom, ...)

Validation
- npm run build: clean (tsc -b + vite build, 2761 modules)
- scripts/run_tests.sh tests/hermes_cli/test_web_server.py::TestDashboardPluginManifestExtensions: 5/5 pass
2026-04-25 06:55:35 -07:00
..
__init__.py
test_ai_gateway_models.py refactor(ai-gateway): single source of truth for model catalog (#13304) 2026-04-20 22:21:21 -07:00
test_anthropic_model_flow_stale_oauth.py fix: re-auth on stale OAuth token; read Claude Code credentials from macOS Keychain 2026-04-24 07:14:00 -07:00
test_anthropic_oauth_flow.py
test_anthropic_provider_persistence.py
test_api_key_providers.py feat: add Step Plan provider support (salvage #6005) 2026-04-22 02:59:58 -07:00
test_arcee_provider.py test: stop testing mutable data — convert change-detectors to invariants (#13363) 2026-04-20 23:20:33 -07:00
test_argparse_flag_propagation.py feat: shell hooks — wire shell scripts as Hermes hook callbacks 2026-04-20 20:53:51 -07:00
test_at_context_completion_filter.py fix(tui): @folder: only yields directories, @file: only yields files 2026-04-21 14:31:48 -05:00
test_atomic_json_write.py
test_atomic_yaml_write.py
test_auth_codex_provider.py fix(model): let Codex setup reuse or reauthenticate 2026-04-24 04:53:32 -07:00
test_auth_commands.py fix: clarify auth retry guidance 2026-04-24 05:20:05 -07:00
test_auth_nous_provider.py fix: validate nous auth status against runtime credentials 2026-04-24 05:20:05 -07:00
test_auth_provider_gate.py
test_auth_qwen_provider.py
test_auth_ssl_macos.py fix(auth): honor SSL CA env vars across httpx + requests callsites 2026-04-24 03:00:33 -07:00
test_aux_config.py fix(aux): add session_search extra_body and concurrency controls 2026-04-20 00:47:39 -07:00
test_backup.py fix(backup): handle files with pre-1980 timestamps 2026-04-20 00:47:40 -07:00
test_banner_git_state.py
test_banner_skills.py
test_banner.py feat(banner): hyperlink startup banner title to latest GitHub release (#14945) 2026-04-23 23:28:34 -07:00
test_chat_skills_flag.py
test_claw.py
test_clear_stale_base_url.py
test_cmd_update.py test: update stale tests to match current code (#11963) 2026-04-17 21:35:30 -07:00
test_coalesce_session_args.py
test_codex_cli_model_picker.py fix(tests): unstick CI — sweep stale tests from recent merges (#12670) 2026-04-19 12:39:58 -07:00
test_codex_models.py fix(model): let Codex setup reuse or reauthenticate 2026-04-24 04:53:32 -07:00
test_commands.py refactor(commands): drop /provider, /plan handler, and clean up slash registry (#15047) 2026-04-24 03:10:52 -07:00
test_completion.py
test_config_drift.py feat(delegate): orchestrator role and configurable spawn depth (default flat) 2026-04-21 14:23:45 -07:00
test_config_env_expansion.py
test_config_env_refs.py fix(config): preserve env refs when save_config rewrites config (#11892) 2026-04-17 19:03:26 -07:00
test_config_validation.py fix(gemini): tighten native routing and streaming replay 2026-04-19 12:40:08 -07:00
test_config.py test: stop testing mutable data — convert change-detectors to invariants (#13363) 2026-04-20 23:20:33 -07:00
test_container_aware_cli.py
test_copilot_auth.py
test_copilot_context.py fix(copilot): wire live /models max_prompt_tokens into context-window resolver 2026-04-24 05:09:08 -07:00
test_copilot_in_model_list.py fix(model): repair Discord Copilot /model flow 2026-04-24 03:33:29 -07:00
test_copilot_token_exchange.py fix(copilot): exchange raw GitHub token for Copilot API JWT 2026-04-24 05:09:08 -07:00
test_cron.py feat(skills): consolidate find-nearby into maps as a single location skill 2026-04-19 05:19:22 -07:00
test_custom_provider_model_switch.py
test_debug.py fix(debug): distinguish empty-log from missing-log in report placeholder 2026-04-22 15:27:54 -05:00
test_deprecated_cwd_warning.py fix: enforce config.yaml as sole CWD source + deprecate .env CWD vars + add hermes memory reset (#11029) 2026-04-16 06:48:33 -07:00
test_detect_api_mode_for_url.py fix: restrict provider URL detection to exact hostname matches 2026-04-20 22:14:29 -07:00
test_determine_api_mode_hostname.py fix: extend hostname-match provider detection across remaining call sites 2026-04-20 22:14:29 -07:00
test_dingtalk_auth.py test(dingtalk): cover QR device-flow auth + OpenClaw branding disclosure 2026-04-17 05:08:07 -07:00
test_doctor_command_install.py feat(doctor): add Command Installation check for hermes bin symlink 2026-04-14 23:13:11 -07:00
test_doctor.py fix(cli): validate user-defined providers consistently 2026-04-24 04:48:56 -07:00
test_env_loader.py fix(cli): ensure project .env is sanitized before loading 2026-04-22 05:51:44 -07:00
test_env_sanitize_on_load.py
test_gateway_linger.py
test_gateway_runtime_health.py
test_gateway_service.py fix(gateway): kill tool subprocesses before adapter disconnect on drain timeout (#14728) 2026-04-23 13:59:29 -07:00
test_gateway_wsl.py
test_gateway.py fix(gateway): recover stale pid and planned restart state 2026-04-22 16:33:46 -07:00
test_gemini_free_tier_setup_block.py feat(gemini): block free-tier keys at setup + surface guidance on 429 (#15100) 2026-04-24 04:46:17 -07:00
test_gemini_provider.py test: stop testing mutable data — convert change-detectors to invariants (#13363) 2026-04-20 23:20:33 -07:00
test_hooks_cli.py feat: shell hooks — wire shell scripts as Hermes hook callbacks 2026-04-20 20:53:51 -07:00
test_ignore_user_config_flags.py feat(cli): add --ignore-user-config and --ignore-rules flags 2026-04-22 19:58:42 -07:00
test_image_gen_picker.py fix(image-gen): persist plugin provider on reconfigure 2026-04-23 01:56:09 -07:00
test_launcher.py
test_logs.py
test_managed_installs.py
test_mcp_config.py fix(mcp): consolidate OAuth handling, pick up external token refreshes (#11383) 2026-04-16 21:57:10 -07:00
test_mcp_tools_config.py
test_memory_reset.py fix: enforce config.yaml as sole CWD source + deprecate .env CWD vars + add hermes memory reset (#11029) 2026-04-16 06:48:33 -07:00
test_model_normalize.py fix(model-normalize): pass DeepSeek V-series IDs through instead of folding to deepseek-chat 2026-04-24 05:24:54 -07:00
test_model_picker_viewport.py refactor(cli): align model picker viewport with PR #11260 vocabulary 2026-04-17 06:33:21 -07:00
test_model_provider_persistence.py feat: add Step Plan provider support (salvage #6005) 2026-04-22 02:59:58 -07:00
test_model_switch_context_display.py fix(/model): show provider-enforced context length, not raw models.dev (#15438) 2026-04-24 17:21:38 -07:00
test_model_switch_copilot_api_mode.py fix: recompute Copilot api_mode after model switch 2026-04-16 01:16:14 -07:00
test_model_switch_custom_providers.py fix(opencode): derive api_mode from target model, not stale config default (#15106) 2026-04-24 04:58:46 -07:00
test_model_switch_opencode_anthropic.py fix(opencode): derive api_mode from target model, not stale config default (#15106) 2026-04-24 04:58:46 -07:00
test_model_switch_variant_tags.py
test_model_validation.py fix(cli): support model validation for anthropic_messages and cloudflare-protected endpoints 2026-04-24 05:48:15 -07:00
test_models_dev_preferred_merge.py feat(/model): merge models.dev entries for lesser-loved providers (#14221) 2026-04-22 17:33:42 -07:00
test_models.py feat(aux): use Portal /api/nous/recommended-models for auxiliary models 2026-04-21 20:35:16 -07:00
test_non_ascii_credential.py fix(env_loader): warn when non-ASCII stripped from credential env vars (#13300) 2026-04-20 22:14:03 -07:00
test_nous_hermes_non_agentic.py
test_nous_subscription.py feat: ungate Tool Gateway — subscription-based access with per-tool opt-in 2026-04-16 12:36:49 -07:00
test_ollama_cloud_auth.py fix(opencode): derive api_mode from target model, not stale config default (#15106) 2026-04-24 04:58:46 -07:00
test_ollama_cloud_provider.py fix: wire up Ollama Cloud dynamic model discovery in /model TUI picker 2026-04-16 07:17:45 -07:00
test_opencode_go_in_model_list.py feat(/model): merge models.dev entries for lesser-loved providers (#14221) 2026-04-22 17:33:42 -07:00
test_opencode_go_validation_fallback.py fix(/model): accept provider switches when /models is unreachable 2026-04-21 05:19:43 -07:00
test_overlay_slug_resolution.py fix(model_picker): detect mapped-provider auth-store credentials 2026-04-24 05:20:05 -07:00
test_path_completion.py
test_placeholder_usage.py
test_plugin_cli_registration.py test: remove 169 change-detector tests across 21 files (#11472) 2026-04-17 01:05:09 -07:00
test_plugin_scanner_recursion.py feat(plugins): pluggable image_gen backends + OpenAI provider (#13799) 2026-04-21 21:30:10 -07:00
test_plugins_cmd.py test: remove 169 change-detector tests across 21 files (#11472) 2026-04-17 01:05:09 -07:00
test_plugins.py refactor(spotify): convert to built-in bundled plugin under plugins/spotify (#15174) 2026-04-24 07:06:11 -07:00
test_profile_export_credentials.py
test_profiles.py fix(profiles): stage profile imports to prevent directory clobbering 2026-04-23 03:02:34 -07:00
test_provider_config_validation.py fix(config): preserve list-format models in custom_providers normalize 2026-04-23 02:37:07 -07:00
test_pty_bridge.py feat(web): add /api/pty WebSocket bridge to embed TUI in dashboard 2026-04-24 10:51:49 -04:00
test_reasoning_effort_menu.py
test_redact_config_bridge.py fix(redact): honor security.redact_secrets from config.yaml (#15109) 2026-04-24 05:03:26 -07:00
test_runtime_provider_resolution.py fix(runtime): resolve bare custom provider to loopback or CUSTOM_BASE_URL 2026-04-24 04:54:16 -07:00
test_session_browse.py
test_sessions_delete.py
test_set_config_value.py
test_setup_agent_settings.py fix(setup): stop hardcoding max-iterations copy 2026-04-19 00:28:25 -07:00
test_setup_hermes_script.py
test_setup_matrix_e2ee.py
test_setup_model_provider.py
test_setup_noninteractive.py
test_setup_openclaw_migration.py Update env vars for openclaw migration 2026-04-20 14:56:04 -07:00
test_setup_prompt_menus.py fix(ci): resolve 4 pre-existing main failures (docs lint + 3 stale tests) (#11373) 2026-04-16 20:43:41 -07:00
test_setup.py fix(cli): validate user-defined providers consistently 2026-04-24 04:48:56 -07:00
test_skills_config.py fix(tests): resolve 17 persistent CI test failures (#15084) 2026-04-24 03:46:46 -07:00
test_skills_hub.py
test_skills_install_flags.py
test_skills_skip_confirm.py
test_skills_subparser.py
test_skin_engine.py fix: align status bar skin tests with upstream main 2026-04-22 13:20:02 -07:00
test_spotify_auth.py feat(spotify): interactive setup wizard + docs page (#15130) 2026-04-24 05:30:05 -07:00
test_status_model_provider.py feat: ungate Tool Gateway — subscription-based access with per-tool opt-in 2026-04-16 12:36:49 -07:00
test_status.py fix: validate nous auth status against runtime credentials 2026-04-24 05:20:05 -07:00
test_subparser_routing_fallback.py test: remove 169 change-detector tests across 21 files (#11472) 2026-04-17 01:05:09 -07:00
test_subprocess_timeouts.py
test_terminal_menu_fallbacks.py
test_timeouts.py fix(config): add stale timeout settings 2026-04-20 00:52:50 -07:00
test_tips.py
test_tool_token_estimation.py
test_tools_config.py fix(tools): dedupe bundled plugin toolsets with built-in entries (#15634) 2026-04-25 05:53:08 -07:00
test_tools_disable_enable.py
test_tui_npm_install.py chore: uptick 2026-04-15 10:23:15 -05:00
test_tui_resume_flow.py chore(tui): strip noise comments 2026-04-16 19:14:05 -05:00
test_update_autostash.py
test_update_check.py test: remove 8 flaky tests that fail under parallel xdist scheduling (#12784) 2026-04-19 19:38:02 -07:00
test_update_config_clears_custom_fields.py fix(anthropic): complete third-party Anthropic-compatible provider support (#12846) 2026-04-19 22:43:09 -07:00
test_update_gateway_restart.py fix(gateway): drain-aware hermes update + faster still-working pings (#14736) 2026-04-23 14:01:57 -07:00
test_update_hangup_protection.py fix(update): survive mid-update terminal disconnect (#11960) 2026-04-17 21:29:24 -07:00
test_user_providers_model_switch.py fix(cli): non-zero /model counts for native OpenAI and direct API rows 2026-04-24 05:48:15 -07:00
test_voice_wrapper.py feat(tui): match CLI's voice slash + VAD-continuous recording model 2026-04-23 16:18:15 -07:00
test_web_server_host_header.py fix(web_server,whatsapp-bridge): validate Host header against bound interface (#13530) 2026-04-21 06:26:35 -07:00
test_web_server.py feat(dashboard): page-scoped plugin slots for built-in pages (#15658) 2026-04-25 06:55:35 -07:00
test_webhook_cli.py
test_xiaomi_provider.py fix(normalize): lowercase Xiaomi model IDs for case-insensitive config (#15066) 2026-04-24 03:33:05 -07:00