test+fix: snapshot-drift batch + pending_title regression #4
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "fix/snapshot-drift-batch"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
What this fixes
8 deterministic test failures, plus one production regression that those tests should have been catching.
Each commit is one concern (per SOP), each verified against the act_runner image on the operator host.
Pure snapshot-drift (test follows production)
447801ctest_send_available_commands_update_ADVERTISED_COMMANDSaddedsteer+queue7672bd9test_os_environ_still_wins_over_dotenved09572test_*_unit_avoids_recursive_execstop_and_uses_extended_stop_timeout(×2)TimeoutStopSec=90was tied to oldrestart_drain_timeout=60; default bumped to 180 → unit emits 210. Compute expected from SSOT.74d5e5atest_dockerfile_installs_tui_dependencies,test_dockerfile_materializes_local_tui_ink_packagea49f4c6retired the--prefix node_modules/@hermes/inkmaterialization dance fornpm_config_install_links=false. New invariants asserted.5f179d6test_concurrent_interrupt_*(×2)_tool_guardrails+ 3 helper methods added to AIAgent; stub class drift._invoke_toolgainedmessages=,pre_tool_block_checked=kwargs; test fakes rejected them.d70d6b1test_update_restarts_profile_manual_gateways,test_update_profile_manual_gateway_falls_back_to_sigterm,test_update_kills_manual_pid_but_not_service_pidf88e022test_send_typing(teams)TypingActivityInputat module load; binding staysNonewhen microsoft_teams isn't installed (CI runner). Test fixture's sys.modules mock can't fix a binding already captured. Patch the local binding directly.Production regression caught + fixed (was being silently dropped by tests)
c04e05ftest_session_create_drops_pending_title_on_valueerrorc5b4c48(#18370 "lazy session creation") moved pending_title application from eager to post-message-complete, but collapsed the originalexcept ValueError: drop+except Exception: keepinto a singleexcept Exception: pass. Result: a duplicate-title pending sticks around forever, hittingset_session_titlewith the same losing argument every message. Auto-title can't kick in because pending_title still shadows it.The
c04e05ffix extracts a documented_apply_pending_session_titlehelper intui_gateway/server.pywith the three-branch semantics (success → clear, ValueError → drop, other Exception → retain), and replaces the previous test with four focused tests on the helper.Verification
Reproduced on the same act_runner image used in CI on the operator host. Each fix verified individually; targeted test files all pass 1–3 stress runs (depending on test cost).
test_tui_gateway_server.pyfull file: 163/163 passing after the helper extraction.Full-suite verification under heavy host load (avg 16-37 on 8 CPUs) is in progress — will summarise in a comment when it completes.
Out of scope (separate PRs / issues)
dbus-user-session, route a session bus into the container) or skip-with-justification. Tracking separately so this PR stays scope-clean.test_concurrent_inserts_settle_at_cap— passes isolated, fails under parallel xdist load because it creates 160 realAIAgentinstances. Refactor to a stub agent is a bigger change.test_blocking_approval_*— passes alone, fails when scheduled by xdist alongside concurrent_inserts. Order/load contamination, follow-up.Security/versioning notes (per SOP)
No security-relevant changes: all touched code paths are pre-existing internal flows; no new untrusted input handling, no new auth/permissions, no new logging of sensitive data.
No backwards-compat impact:
_apply_pending_session_titleis a new private helper (leading underscore) called from one site in the same module; no public API surface changes.The stub in tests/run_agent/test_concurrent_interrupt.py mirrors a subset of AIAgent attributes/methods deliberately ("Avoid full AIAgent init — just import the class and build a stub"). Two recent additions to the real AIAgent broke it: 1. Tool-call guardrails: AIAgent gained _tool_guardrails (line 1160) + _append_guardrail_observation / _guardrail_block_result / _set_tool_guardrail_halt that the concurrent-execution path now invokes during result collection. Add a MagicMock guardrail controller whose decision objects mirror the ToolGuardrailDecision shape (action="allow", allows_execution=True, should_halt=False) — the bound methods read sensible defaults instead of truthy MagicMock children. Bind the three methods the same way as other AIAgent methods. 2. _invoke_tool gained kwargs: messages= and pre_tool_block_checked= are now forwarded by the concurrent path. The two test fakes (slow_tool, polling_tool) didn't accept them and raised TypeError on every call. Add **kwargs so future kwargs additions don't break these fakes. These are pure stub-drift fixes — production behaviour is unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>The teams adapter imports TypingActivityInput at module load time: try: from microsoft_teams.api.activities.typing import TypingActivityInput except ImportError: TypingActivityInput = None When the real microsoft_teams package isn't installed (CI runner image doesn't bundle Microsoft Teams SDK), the import fails and the local binding stays None — even though the test file's _ensure_teams_mock fixture registers a MockTypingActivityInput in sys.modules. The test-time mock-in-sys.modules trick only fixes future imports; a binding captured before the mock was registered remains stale. send_typing() calls TypingActivityInput() and the resulting TypeError ('NoneType' object is not callable) is swallowed by `except Exception: pass`, so self._app.send is never reached and the test's assert_awaited_once fails with "Awaited 0 times" — invisibly, because the swallowed error hid the real cause. Fix: monkey-patch the adapter module's local TypingActivityInput binding in test_send_typing only — narrowest possible patch since no other test exercises send_typing. Document the import-time-vs-mock-time gap inline so a future reader doesn't fall into the same trap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>Production code grew two environment-detection chains that test mocks didn't anticipate, causing 10 tests to fail on container CI runners without testing what their assertions claimed: 1. supports_systemd_services() routes container hosts through _container_systemd_operational(), which returns False inside the act_runner Docker. Two tests (test_native_linux, test_supports_systemd_services_returns_true_when_systemctl_present) mocked is_linux/is_termux/is_wsl but not is_container, so they hit the container branch and got False instead of True. Add the missing monkeypatch. 2. systemd_start / systemd_restart now call _preflight_user_systemd() which probes the user D-Bus socket and raises UserSystemdUnavailableError on its absence (common in containers and fresh SSH sessions without linger). Four tests (test_systemd_start_refreshes_outdated_unit, test_systemd_restart_refreshes_outdated_unit, test_systemd_restart_self_requests_graceful_restart_and_waits, test_systemd_restart_recovers_failed_planned_restart) exercise the unit-refresh / self-restart routing logic and shouldn't care about D-Bus availability. Mock _preflight_user_systemd as a no-op. 3. detect_audio_environment() routes container hosts through is_container() and emits a hard-fail "no audio devices" warning. Four tests in TestDetectAudioEnvironment assert per-environment detection (clean Linux, WSL with PulseAudio, Termux) and expect `available is True`; the container check overrode them. Add a class-level autouse fixture that mocks is_container=False so the per-environment logic runs unobstructed. These are deliberate "isolate the unit under test from environmental concerns" patches — production code is not changed. Tests that want to exercise the container/D-Bus branches can opt in by overriding the mocks. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>Checkout
From your project repository, check out a new branch and test the changes.