hermes-agent

History

Teknium 1dce908930 fix(gateway): shutdown + restart hygiene (drain timeout, false-fatal, success log) (#18761 ) * fix(gateway): config.yaml wins over .env for agent/display/timezone settings Regression from the silent config→env bridge. The bridge at module import time is correct for max_turns (unconditional overwrite), but every other agent., display., timezone, and security bridge key was guarded by 'if X not in os.environ' — so a stale .env entry from an old 'hermes setup' run would shadow the user's current config.yaml indefinitely. Symptom: agent.max_turns: 500 in config.yaml, HERMES_MAX_ITERATIONS=60 in .env from an old setup, and the gateway silently capped at 60 iterations per turn. Gateway logs confirmed api_calls never exceeded 60. Three changes: 1. gateway/run.py: drop the 'not in os.environ' guards for all agent., display., timezone, and security.* bridge keys. config.yaml is now authoritative for these settings — same semantics already in place for max_turns, terminal., and auxiliary.. Also surface the bridge failure (previously 'except Exception: pass') to stderr so operators see bridge errors instead of silently falling back to .env. 2. gateway/run.py: INFO-log the resolved max_iterations at gateway start so operators can verify the config→env bridge did the right thing instead of chasing a phantom budget ceiling. 3. hermes_cli/setup.py: stop writing HERMES_MAX_ITERATIONS to .env in the setup wizard. config.yaml is the single source of truth. Also clean up any stale .env entry left behind by pre-fix setups. Regression tests in tests/gateway/test_config_env_bridge_authority.py guard each config→env key against the 'stale .env shadows config' bug. * fix(gateway): shutdown + restart hygiene (drain timeout, false-fatal, success log) Three issues observed in production gateway.log during a rapid restart chain on 2026-05-02, all fixed here. 1. _send_restart_notification logged unconditional success adapter.send() catches provider errors (e.g. Telegram 'Chat not found') and returns SendResult(success=False); it never raises. The caller ignored the return value and always logged 'Sent restart notification to <chat>' at INFO, producing a misleading success line directly below the 'Failed to send Telegram message' traceback on every boot. Now inspects result.success and logs WARNING with the error otherwise. 2. WhatsApp bridge SIGTERM on shutdown classified as fatal error _check_managed_bridge_exit() saw the bridge's returncode -15 (our own SIGTERM from disconnect()) and fired the full fatal-error path, producing 'ERROR ... WhatsApp bridge process exited unexpectedly' plus 'Fatal whatsapp adapter error (whatsapp_bridge_exited)' on every planned shutdown, immediately before the normal '✓ whatsapp disconnected'. Adds a _shutting_down flag that disconnect() sets before the terminate, and _check_managed_bridge_exit() returns None for returncode in {0, -2, -15} while shutting down. OOM-kill (137) and other non-signal exits still hit the fatal path. 3. restart_drain_timeout default 60s → 180s On 2026-05-02 01:43:27 a user /restart fired while three agents were mid-API-call (82s, 112s, 154s into their turns). The 60s drain budget expired and all three were force-interrupted. 180s covers realistic in-flight agent turns; users on very-long-reasoning models can still raise it further via agent.restart_drain_timeout in config.yaml. Existing explicit user values are preserved by deep-merge. Tests - tests/gateway/test_restart_notification.py: two new tests assert INFO is only logged on SendResult(success=True) and WARNING with the error string is logged on SendResult(success=False). - tests/gateway/test_whatsapp_connect.py: parametrized test for returncode in {0, -2, -15} proves shutdown-time exits are suppressed; separate test proves returncode 137 (SIGKILL/OOM) still surfaces as fatal even when _shutting_down is set. - _check_managed_bridge_exit() reads _shutting_down via getattr-with- default so existing _make_adapter() test helpers that bypass __init__ (pitfall #17 in AGENTS.md) keep working unmodified.		2026-05-02 02:08:06 -07:00
..
__init__.py
test_ai_gateway_models.py
test_anthropic_model_flow_stale_oauth.py
test_anthropic_oauth_flow.py
test_anthropic_provider_persistence.py
test_api_key_providers.py	chore(salvage): strip duplicated/merge-corrupted blocks from PR #17664	2026-04-29 21:56:51 -07:00
test_apply_model_switch_result_context.py
test_arcee_provider.py	feat(providers): add tencent-tokenhub provider support	2026-04-28 03:45:52 -07:00
test_argparse_flag_propagation.py
test_at_context_completion_filter.py
test_atomic_json_write.py
test_atomic_yaml_write.py
test_auth_codex_provider.py
test_auth_commands.py	fix(auth): make provider config writes atomic	2026-04-30 20:39:41 -07:00
test_auth_nous_provider.py	auth: coerce tls insecure flag safely instead of using Python truthiness	2026-04-30 19:55:48 -07:00
test_auth_provider_gate.py
test_auth_qwen_provider.py
test_auth_ssl_macos.py
test_aux_config.py
test_azure_detect.py
test_backup.py	feat(claw-migrate): harden OpenClaw import with plan-first apply, redaction, and pre-migration backup (#16911 )	2026-04-28 01:50:23 -07:00
test_banner_git_state.py
test_banner_skills.py
test_banner.py
test_bedrock_model_picker.py	test(bedrock): add model picker and region routing tests	2026-04-28 03:53:11 -07:00
test_chat_skills_flag.py
test_claw.py	fix(ci): stabilize main test suite regressions (#17660 )	2026-04-29 23:18:55 -07:00
test_clear_stale_base_url.py
test_cmd_update.py	fix(docker): materialize bundled TUI Ink package (#16690 )	2026-04-28 15:11:47 -05:00
test_coalesce_session_args.py
test_codex_cli_model_picker.py
test_codex_models.py
test_commands.py	fix(discord): complete #18741 for /skill autocomplete and drop legacy 25x25 caps (#18745 )	2026-05-02 02:00:06 -07:00
test_completion.py
test_config_drift.py
test_config_env_expansion.py	fix(ci): stabilize main test suite regressions (#17660 )	2026-04-29 23:18:55 -07:00
test_config_env_refs.py
test_config_validation.py	fix(config): accept fallback_model list (chain) in validator + save	2026-04-28 01:40:25 -07:00
test_config.py	fix(cli): prevent .env sanitizer from splitting GLM_API_KEY by LM_API_KEY suffix	2026-04-28 22:22:45 -07:00
test_container_aware_cli.py	fix(ci): stabilize main test suite regressions (#17660 )	2026-04-29 23:18:55 -07:00
test_copilot_auth.py
test_copilot_catalog_oauth_fallback.py	fix(copilot): require successful exchange when walking credential_pool catalog tokens	2026-04-28 01:18:09 -07:00
test_copilot_context.py
test_copilot_in_model_list.py
test_copilot_token_exchange.py
test_cron.py
test_curator_status.py	feat(curator): show most-used and least-used skills in `hermes curator status` (#18033 )	2026-04-30 10:37:33 -07:00
test_custom_provider_context_length.py
test_custom_provider_model_switch.py	fix(model): avoid persisting key_env-resolved secrets to providers entry (#16372 )	2026-04-26 21:52:12 -07:00
test_dashboard_browser_safe_imports.py	Merge upstream/main and address Copilot review feedback	2026-04-30 06:43:22 -04:00
test_dashboard_lifecycle_flags.py	feat(dashboard): add --stop and --status flags (#17840 )	2026-04-30 02:30:20 -07:00
test_dashboard_profiles_nav_label.py	fix(dashboard): keep profiles list resilient	2026-04-29 01:39:52 -04:00
test_debug.py
test_deprecated_cwd_warning.py
test_detect_api_mode_for_url.py
test_determine_api_mode_hostname.py
test_dingtalk_auth.py
test_discord_skill_clamp_warning.py	fix(discord): warn on 32-char clamp collisions in the /skill collector (#18759 )	2026-05-02 02:05:01 -07:00
test_doctor_command_install.py
test_doctor.py	fix(cli): decode .env as UTF-8 to avoid GBK crash on Windows	2026-05-02 01:40:31 -07:00
test_env_loader.py
test_env_sanitize_on_load.py
test_fallback_cmd.py
test_gateway_linger.py
test_gateway_runtime_health.py
test_gateway_service.py	fix(ci): stabilize main test suite regressions (#17660 )	2026-04-29 23:18:55 -07:00
test_gateway_wsl.py
test_gateway.py	fix: handle gateway Ctrl+C shutdown cleanly	2026-04-30 03:29:57 -07:00
test_gemini_free_tier_setup_block.py
test_gemini_provider.py
test_gmi_provider.py	fix(providers/gmi): post-salvage review fixes	2026-04-27 11:17:59 -07:00
test_goals.py	feat: /goal — persistent cross-turn goals (Ralph loop) (#18262 )	2026-04-30 23:10:20 -07:00
test_hooks_cli.py
test_ignore_user_config_flags.py	refactor(cli): derive relaunch flag table from argparse introspection	2026-04-29 20:33:29 -07:00
test_image_gen_picker.py
test_kanban_cli.py	feat(kanban): durable multi-profile collaboration board (#17805 )	2026-04-30 13:36:47 -07:00
test_kanban_core_functionality.py	feat(kanban): durable multi-profile collaboration board (#17805 )	2026-04-30 13:36:47 -07:00
test_kanban_db.py	feat(kanban): durable multi-profile collaboration board (#17805 )	2026-04-30 13:36:47 -07:00
test_launcher.py
test_logs.py
test_managed_installs.py
test_mcp_config.py
test_mcp_reload_confirm_gate.py	feat(gateway,cli): confirm /reload-mcp to warn about prompt cache invalidation	2026-04-29 21:56:47 -07:00
test_mcp_tools_config.py
test_memory_reset.py
test_model_catalog.py
test_model_normalize.py
test_model_picker_viewport.py
test_model_provider_persistence.py	fix(auth): make provider config writes atomic	2026-04-30 20:39:41 -07:00
test_model_switch_context_display.py
test_model_switch_copilot_api_mode.py
test_model_switch_custom_providers.py	test(model_switch): update regression to reflect bare-custom guard	2026-04-30 04:32:11 -07:00
test_model_switch_opencode_anthropic.py
test_model_switch_variant_tags.py
test_model_validation.py	chore(salvage): strip duplicated/merge-corrupted blocks from PR #17664	2026-04-29 21:56:51 -07:00
test_models_dev_preferred_merge.py
test_models.py
test_non_ascii_credential.py
test_nous_hermes_non_agentic.py
test_nous_subscription.py	fix(cli): coerce use_gateway config flags in tool routing	2026-04-26 19:02:55 -07:00
test_ollama_cloud_auth.py
test_ollama_cloud_provider.py
test_opencode_go_in_model_list.py
test_opencode_go_validation_fallback.py
test_overlay_slug_resolution.py
test_path_completion.py
test_placeholder_usage.py
test_plugin_cli_registration.py
test_plugin_scanner_recursion.py
test_plugins_cmd.py
test_plugins.py	fix(plugins): bound async plugin command await with 30s timeout	2026-04-30 19:56:18 -07:00
test_profile_export_credentials.py
test_profiles.py	Merge upstream/main and address Copilot review feedback	2026-04-30 06:43:22 -04:00
test_provider_config_validation.py	fix(config): add request_timeout_seconds and stale_timeout_seconds to provider _KNOWN_KEYS	2026-04-28 01:28:25 -07:00
test_pty_bridge.py	fix(ci): stabilize main test suite regressions (#17660 )	2026-04-29 23:18:55 -07:00
test_reasoning_effort_menu.py
test_redact_config_bridge.py	feat(security): make secret redaction off by default (#16794 )	2026-04-27 21:24:08 -07:00
test_regression_16767.py	test(cli): regression coverage for user-provider routing fix (#16767 )	2026-04-28 01:47:20 -07:00
test_relaunch.py	remove relaunch_chat	2026-04-29 20:33:29 -07:00
test_resolve_last_session.py	fix(cli): tighten MRU lookup and session DB cleanup	2026-04-27 08:52:12 -07:00
test_runtime_provider_resolution.py	fix(fallback): let custom_providers shadow built-in aliases	2026-04-30 20:18:44 -07:00
test_session_browse.py	fix(sessions): /save lands under $HERMES_HOME, widen browse+TUI picker, force-refresh ollama-cloud on setup (#16296 )	2026-04-26 18:49:48 -07:00
test_sessions_delete.py	test(sessions): wire sessions_dir through auto-prune + file-cleanup regression tests	2026-04-26 18:31:07 -07:00
test_set_config_value.py	fix(config): preserve YAML lists in hermes config set (#17876 )	2026-04-30 04:32:17 -07:00
test_setup_agent_settings.py	fix(gateway): shutdown + restart hygiene (drain timeout, false-fatal, success log) (#18761 )	2026-05-02 02:08:06 -07:00
test_setup_hermes_script.py
test_setup_irc.py	feat(plugins): bundled platform plugins auto-load by default	2026-04-29 21:56:51 -07:00
test_setup_matrix_e2ee.py
test_setup_model_provider.py
test_setup_noninteractive.py
test_setup_ollama_cloud_force_refresh.py	fix(sessions): /save lands under $HERMES_HOME, widen browse+TUI picker, force-refresh ollama-cloud on setup (#16296 )	2026-04-26 18:49:48 -07:00
test_setup_openclaw_migration.py	feat(plugins): bundled platform plugins auto-load by default	2026-04-29 21:56:51 -07:00
test_setup_prompt_menus.py
test_setup_reconfigure.py
test_setup.py	fix(ci): stabilize main test suite regressions (#17660 )	2026-04-29 23:18:55 -07:00
test_skills_config.py
test_skills_hub.py	feat(skills): install skills from a direct HTTP(S) URL (#16323 )	2026-04-26 20:57:10 -07:00
test_skills_install_flags.py
test_skills_skip_confirm.py
test_skills_subparser.py
test_skin_engine.py	fix(tui): restore macOS copy behavior and theme polish (#17131 )	2026-04-28 18:47:14 -05:00
test_spotify_auth.py
test_status_model_provider.py	feat(agent): add lmstudio integration	2026-04-28 12:27:36 -07:00
test_status.py	feat: add Vercel Sandbox backend	2026-04-29 07:22:33 -07:00
test_subparser_routing_fallback.py
test_subprocess_timeouts.py
test_suppress_eio_on_interrupt.py
test_tencent_tokenhub_provider.py	feat(providers): add tencent-tokenhub provider support	2026-04-28 03:45:52 -07:00
test_terminal_menu_fallbacks.py
test_timeouts.py
test_tips.py
test_tool_token_estimation.py
test_tools_config.py	test(toolsets): include kanban in expected post-#17805 toolset assertions	2026-04-30 19:43:03 -07:00
test_tools_disable_enable.py
test_tui_npm_install.py	fix(tui): mouse + keyboard text selection in the composer (#16732 )	2026-04-27 16:43:48 -07:00
test_tui_resume_flow.py	fix(tui): honor launch toolsets (#17623 )	2026-04-29 16:55:27 -07:00
test_update_autostash.py	fix(ci): recover 38 failing tests on main (#17642 )	2026-04-29 20:05:32 -07:00
test_update_check.py
test_update_config_clears_custom_fields.py
test_update_gateway_restart.py	fix(gateway): drain manual profile gateways via SIGUSR1 before respawn	2026-04-30 20:00:31 -07:00
test_update_hangup_protection.py
test_update_stale_dashboard.py	fix(tests): make test_update_stale_dashboard immune to hermes_cli.main reload (#17881 )	2026-04-30 02:46:56 -07:00
test_update_yes_flag.py	feat(update): add --yes/-y flag to skip interactive prompts (#18261 )	2026-04-30 23:06:32 -07:00
test_user_providers_model_switch.py	test(model_switch): cover private user_providers override	2026-04-30 19:44:26 -07:00
test_voice_wrapper.py
test_web_server_host_header.py
test_web_server.py	Merge upstream/main and address Copilot review feedback	2026-04-30 06:43:22 -04:00
test_web_ui_build.py	fix(cli): check hermes_cli/web_dist/ not web/dist/ for build staleness	2026-04-26 18:43:57 -07:00
test_webhook_cli.py
test_xiaomi_provider.py	feat(providers): add tencent-tokenhub provider support	2026-04-28 03:45:52 -07:00