molecule-ai/hermes-agent

test+fix: snapshot-drift batch + pending_title regression #4

Open

claude-ceo-assistant wants to merge 10 commits from fix/snapshot-drift-batch into main

Author	SHA1	Message	Date
claude-ceo-assistant	4d71bcb47d	chore(release): map claude-ceo-assistant email for AUTHOR_MAP Some checks failed Tests / e2e (pull_request) Failing after 1m56s Details Tests / test (pull_request) Failing after 1m55s Details Contributor Attribution Check / check-attribution (pull_request) Successful in 25s Details Supply Chain Audit / Scan PR for critical supply chain risks (pull_request) Successful in 28s Details Nix / nix (ubuntu-latest) (pull_request) Failing after 6m41s Details Nix / nix (macos-latest) (pull_request) Has been cancelled Details The contributor-check.yml workflow requires every commit author email to have an entry in scripts/release.py:AUTHOR_MAP. claude-ceo-assistant is the Gitea-only bot identity used by Claude-Code-driven PRs in the molecule-ai fork (introduced post-2026-05-06 GitHub suspension; no upstream/GitHub equivalent). Register it so PRs from that identity pass the attribution check. Pattern matches recent same-shape commits: `73bcd83`, `50f9f38`, `9c626ef`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 04:06:53 +00:00
claude-ceo-assistant	c7ec523a21	test(env-isolation): mock is_container + _preflight_user_systemd in tests asserting non-container behaviour Some checks failed Supply Chain Audit / Scan PR for critical supply chain risks (pull_request) Successful in 11s Details Tests / e2e (pull_request) Successful in 33s Details Contributor Attribution Check / check-attribution (pull_request) Failing after 11s Details Tests / test (pull_request) Failing after 7m59s Details Nix / nix (ubuntu-latest) (pull_request) Failing after 8m14s Details Nix / nix (macos-latest) (pull_request) Has been cancelled Details Production code grew two environment-detection chains that test mocks didn't anticipate, causing 10 tests to fail on container CI runners without testing what their assertions claimed: 1. supports_systemd_services() routes container hosts through _container_systemd_operational(), which returns False inside the act_runner Docker. Two tests (test_native_linux, test_supports_systemd_services_returns_true_when_systemctl_present) mocked is_linux/is_termux/is_wsl but not is_container, so they hit the container branch and got False instead of True. Add the missing monkeypatch. 2. systemd_start / systemd_restart now call _preflight_user_systemd() which probes the user D-Bus socket and raises UserSystemdUnavailableError on its absence (common in containers and fresh SSH sessions without linger). Four tests (test_systemd_start_refreshes_outdated_unit, test_systemd_restart_refreshes_outdated_unit, test_systemd_restart_self_requests_graceful_restart_and_waits, test_systemd_restart_recovers_failed_planned_restart) exercise the unit-refresh / self-restart routing logic and shouldn't care about D-Bus availability. Mock _preflight_user_systemd as a no-op. 3. detect_audio_environment() routes container hosts through is_container() and emits a hard-fail "no audio devices" warning. Four tests in TestDetectAudioEnvironment assert per-environment detection (clean Linux, WSL with PulseAudio, Termux) and expect `available is True`; the container check overrode them. Add a class-level autouse fixture that mocks is_container=False so the per-environment logic runs unobstructed. These are deliberate "isolate the unit under test from environmental concerns" patches — production code is not changed. Tests that want to exercise the container/D-Bus branches can opt in by overriding the mocks. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 04:00:44 +00:00
claude-ceo-assistant	c04e05f05c	fix(tui_gateway): drop pending_title on ValueError, retain on transient errors Some checks failed Contributor Attribution Check / check-attribution (pull_request) Failing after 9s Details Supply Chain Audit / Scan PR for critical supply chain risks (pull_request) Successful in 9s Details Tests / e2e (pull_request) Successful in 1m10s Details Nix / nix (ubuntu-latest) (pull_request) Failing after 5m24s Details Tests / test (pull_request) Failing after 7m44s Details Nix / nix (macos-latest) (pull_request) Has been cancelled Details Production bug + missing test coverage. `c5b4c48` ("fix: lazy session creation — defer DB row until first message (#18370)") moved pending_title application from the eager _start_agent_build path to a post-message-complete handler. The original block had: except ValueError as e: current["pending_title"] = None logger.info("Dropping pending title for session %s: %s", sid, e) except Exception: logger.warning("Failed to apply pending title ...", exc_info=True) …differentiating "title is invalid / duplicate, retrying won't help" (ValueError, drop) from "transient DB failure, retry on next message" (other Exception, keep + log). The replacement block collapsed both into: except Exception: pass # Best effort — auto-title will handle it below …so a duplicate-title session keeps the same dud pending_title forever, hitting set_session_title with the same losing argument on every message-complete. Auto-title can't kick in because pending_title still shadows it. Fix: extract a documented _apply_pending_session_title helper that restores the three-branch semantics (success → clear, ValueError → drop, other Exception → retain). Call it from the message-complete handler instead of the inline try/except. Test rewrite: the previous test_session_create_drops_pending_title_on_valueerror exercised an obsolete code path (eager apply during session.create) that no longer existed after `c5b4c48`. Replace with four focused tests against the helper: - drops_on_valueerror — invariant from the original test name - clears_on_success — happy path - retains_on_transient_exception — guards the new "don't lose title on a flaky DB" behaviour - no_op_without_pending — most calls hit this path Mutation-tested mentally: deleting the `session["pending_title"] = None` in the ValueError branch fails drops_on_valueerror; deleting the same in the success branch fails clears_on_success; widening except ValueError to except Exception fails retains_on_transient_exception. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 03:50:39 +00:00
claude-ceo-assistant	f88e022229	test(teams): patch adapter's TypingActivityInput binding for test_send_typing The teams adapter imports TypingActivityInput at module load time: try: from microsoft_teams.api.activities.typing import TypingActivityInput except ImportError: TypingActivityInput = None When the real microsoft_teams package isn't installed (CI runner image doesn't bundle Microsoft Teams SDK), the import fails and the local binding stays None — even though the test file's _ensure_teams_mock fixture registers a MockTypingActivityInput in sys.modules. The test-time mock-in-sys.modules trick only fixes future imports; a binding captured before the mock was registered remains stale. send_typing() calls TypingActivityInput() and the resulting TypeError ('NoneType' object is not callable) is swallowed by `except Exception: pass`, so self._app.send is never reached and the test's assert_awaited_once fails with "Awaited 0 times" — invisibly, because the swallowed error hid the real cause. Fix: monkey-patch the adapter module's local TypingActivityInput binding in test_send_typing only — narrowest possible patch since no other test exercises send_typing. Document the import-time-vs-mock-time gap inline so a future reader doesn't fall into the same trap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 03:46:57 +00:00
claude-ceo-assistant	d70d6b1ab3	test(update-restart): model successful primary-kill in find_gateway_pids mocks cmd_update gained a post-restart survivor sweep in hermes_cli/main.py (around line 7568, for issue #17648): after the primary kill attempts, sleep 3s, re-poll find_gateway_pids, and SIGKILL any PIDs from killed_pids that are still alive. This catches gateways that ignore SIGTERM (stuck drain, blocked I/O, zombies). Three tests patched find_gateway_pids with a static return_value (or a fake that ignored call count), so the sweep always saw the killed PIDs as still alive and fired an extra SIGKILL — contaminating the kill-call assertions: - test_update_restarts_profile_manual_gateways saw kill called 1× (SIGKILL from sweep) when expecting 0 - test_update_profile_manual_gateway_falls_back_to_sigterm saw kill called 2× (SIGTERM + SIGKILL from sweep) when expecting 1 - test_update_kills_manual_pid_but_not_service_pid saw 2 kills on the manual PID instead of 1 Real behaviour: after SIGTERM lands, the process exits and the next find_gateway_pids() returns []. The static mocks didn't model that. Make each find_gateway_pids mock call-count-aware: first call returns the live PID(s), subsequent calls return empty / service-only — modelling a successful primary kill. The survivor sweep then sees nothing to escalate, so the original assertions hold. Tests that want to exercise the sweep path can opt in by keeping the PID alive across calls; none of the existing tests need that. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 03:43:43 +00:00
claude-ceo-assistant	5f179d6d35	test(run_agent): keep concurrent-interrupt stub in sync with AIAgent surface The stub in tests/run_agent/test_concurrent_interrupt.py mirrors a subset of AIAgent attributes/methods deliberately ("Avoid full AIAgent init — just import the class and build a stub"). Two recent additions to the real AIAgent broke it: 1. Tool-call guardrails: AIAgent gained _tool_guardrails (line 1160) + _append_guardrail_observation / _guardrail_block_result / _set_tool_guardrail_halt that the concurrent-execution path now invokes during result collection. Add a MagicMock guardrail controller whose decision objects mirror the ToolGuardrailDecision shape (action="allow", allows_execution=True, should_halt=False) — the bound methods read sensible defaults instead of truthy MagicMock children. Bind the three methods the same way as other AIAgent methods. 2. _invoke_tool gained kwargs: messages= and pre_tool_block_checked= are now forwarded by the concurrent path. The two test fakes (slow_tool, polling_tool) didn't accept them and raised TypeError on every call. Add **kwargs so future kwargs additions don't break these fakes. These are pure stub-drift fixes — production behaviour is unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 03:41:31 +00:00
claude-ceo-assistant	74d5e5a899	test(dockerfile): rewrite tui-resolution invariants for post-a49f4c6 design Two contract tests guarded properties of the now-retired @hermes/ink materialization dance (--prefix node_modules/@hermes/ink + omit=dev + nested react cleanup + await-import smoke). The Dockerfile was restructured in `a49f4c6` ("fix: prevent tui rebuilding assets") to instead: 1. Copy the full ui-tui/packages/hermes-ink/ tree (not just manifests) 2. Set ENV npm_config_install_links=false to force symlink resolution despite Debian's bundled npm 9.x defaulting to install-links=true The new mechanism solves the same workspace-resolution problem with fewer build steps (no rebuild on container start). Update the tests to guard the new invariants, both designed to catch a regression that would break @hermes/ink at runtime: - test_dockerfile_installs_tui_dependencies now asserts the full packages/hermes-ink/ COPY (catches manifest-only revert) - new test_dockerfile_forces_npm_install_links_false_for_workspace_resolution asserts the env var (catches accidental removal that re-introduces the npm 9 install-as-copy bug) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 03:38:13 +00:00
claude-ceo-assistant	ed0957256b	test(gateway-service): compute expected TimeoutStopSec from SSOT, not literal The literal "TimeoutStopSec=90" assertion was correct when default restart_drain_timeout was 60s (max(60,60)+30=90). The default was bumped to 180s in hermes_cli/config.py, making the unit emit TimeoutStopSec=210 — still satisfying issue #8202 ("timeout must exceed drain so systemd doesn't SIGKILL pre-cleanup") but breaking the literal assertion. Refactor: compute the expected minimum from gateway.restart.DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT and assert >=. The test now expresses the invariant directly and survives future drain bumps. Mutation-tested mentally: deleting the +30 headroom in hermes_cli/gateway.py:1636 still fails this test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 03:35:28 +00:00
claude-ceo-assistant	7672bd9b4f	test(credential-pool): invert obsolete os.environ-wins test for #18254 fix The stale invariant "os.environ wins over .env" was deliberately inverted in `2ef1ad2` ("fix: prefer ~/.hermes/.env over os.environ when seeding credential pool"). The fix targets the case where a parent shell (Codex CLI, harness scripts) exports a stale OPENROUTER_API_KEY, the user updates ~/.hermes/.env with a fresh value, and Hermes silently 401s because auth.json cached the stale env-var. Rename + invert this test to assert the new invariant (.env wins). The positive load_pool coverage already exists in tests/agent/test_credential_pool.py::test_load_pool_prefers_dotenv_over_stale_os_environ (added in `0a6865b` alongside the fix); this case still serves a purpose because it exercises _seed_from_env directly, which is a separate code path from load_pool. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 03:34:17 +00:00
claude-ceo-assistant	447801cbcb	test(acp): update commands list snapshot for steer + queue acp_adapter/server.py:_ADVERTISED_COMMANDS now includes "steer" (inject guidance into active turn) and "queue" (run prompt after current turn finishes) between "compact" and "version". Production code is the source of truth; this test was the last reader still on the pre-feature snapshot. The substantive features were added in PRs that introduced steer/queue themselves; this is purely test-snapshot follow-through. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 03:33:40 +00:00