fix(systemd): align tests with production contract — D-Bus stubs + TimeoutStopSec (partial close hermes-agent#9) #16

Closed

claude-ceo-assistant wants to merge 1 commits from fix/systemd-tests-align-with-production-contract into main

Author	SHA1	Message	Date
dev-lead	53a01a8400	fix(systemd): align tests with production contract — D-Bus stubs + TimeoutStopSec (partial close hermes-agent#9) Some checks failed Nix / nix (macos-latest) (pull_request) Waiting to run Details Contributor Attribution Check / check-attribution (pull_request) Failing after 43s Details Supply Chain Audit / Scan PR for critical supply chain risks (pull_request) Successful in 38s Details Tests / e2e (pull_request) Failing after 1m12s Details Tests / test (pull_request) Failing after 1m31s Details Nix / nix (ubuntu-latest) (pull_request) Failing after 31m1s Details Two distinct sub-shapes converged in the seven systemd test failures tracked under hermes-agent#9. Both are addressed here without changing production code; the production contract is correct on both axes. ## Sub-shape 1 — TimeoutStopSec=210 vs 90 (2 tests) `TestGeneratedSystemdUnits::test_user_unit_avoids_recursive_execstop_and_uses_extended_stop_timeout` and its system-scope sibling pin `TimeoutStopSec=90`. The unit generator computes `restart_timeout = max(60, drain_timeout) + 30s` of post-interrupt cleanup headroom (gateway.py L1635). PR #18761 (2026-05-02) intentionally raised the default `agent.restart_drain_timeout` from 60s to 180s after a /restart on 2026-05-02 force-interrupted three mid-API-call agents inside the old 60s budget. The new arithmetic is therefore `max(60, 180) + 30 = 210s`, and both unit generators produce `TimeoutStopSec=210`. The tests are updated to derive the expected value from `DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT` (already imported at the top of the file) so future drain-timeout changes flag the contract drift in one place. An explicit `expected_timeout == 210` assert + commit reference keeps the rationale visible at the failure site. ## Sub-shape 2 — D-Bus / loginctl mocks (5 tests) `systemd_start()` and `systemd_restart()` (user scope) gained a `_preflight_user_systemd()` call that probes `/run/user/$UID/{bus, systemd/private}` and, if both are missing, runs `loginctl enable-linger $USER` to bring the user@.service socket up. None of that is reachable from a unit-test fixture: the test runner has no real user D-Bus session, the linger probe touches the live filesystem, and `loginctl enable-linger runner` falls outside the `fake_subprocess_run` whitelist these tests pin. Five tests were therefore failing not on production behavior but on their fixtures missing a setup that the production code path can't have in a test: - TestSystemdServiceRefresh::test_systemd_start_refreshes_outdated_unit - TestSystemdServiceRefresh::test_systemd_restart_refreshes_outdated_unit - TestGatewaySystemServiceRouting::test_systemd_restart_self_requests_graceful_restart_and_waits - TestGatewaySystemServiceRouting::test_systemd_restart_recovers_failed_planned_restart - TestGatewayServiceDetection::test_supports_systemd_services_returns_true_when_systemctl_present (different shape — see below) - TestSupportsSystemdServicesWSL::test_native_linux (same shape as the detection test) Each of the four restart/start tests now stubs `_preflight_user_systemd` with a no-op. The preflight contract itself is exercised end-to-end by the existing `TestPreflightUserSystemd` suite (six tests), so this stub doesn't paper over any real bug — it only narrows the assertion to what the test was actually meant to check (refresh + systemctl call shape). For the two `supports_systemd_services()` tests, the missing fixture was `is_container()`. CI runners frequently expose `/.dockerenv`, which makes `is_container()` return True at the gateway layer and sends the call into the `_container_systemd_operational()` branch (real systemctl probe). Both tests model a native-Linux host, so they pin `is_container=False`; the WSL one also pins `shutil.which("systemctl")` to a non-None value so the early binary-presence check doesn't short-circuit. ## Verification - 7/7 named tests in the brief now pass on a clean macOS Python 3.13 venv (no real systemd present). - Set-diff of `pytest tests/hermes_cli/ -q` failure list before vs after the change: 7 fewer failures, 0 added (residual non-systemd failures match between baseline and branch — same set, just reordered by xdist). - `pytest tests/gateway/ -q`: same 14 pre-existing failures as on main; no new regressions. The post-fix contract: - User and system units write `TimeoutStopSec=210` while `restart_drain_timeout=180s` (default). The formula is pinned in test as `max(60, drain_timeout) + 30` so future drain bumps fail loudly in one place. - `_preflight_user_systemd()` probe sequence (still owned by `TestPreflightUserSystemd`): D-Bus / private-socket exists → short-circuit; else linger-on → wait 3s; else `loginctl enable-linger $USER` → wait 5s; else raise `UserSystemdUnavailableError` with remediation hint.	2026-05-08 14:11:19 -07:00

Author

SHA1

Message

Date

dev-lead

53a01a8400

fix(systemd): align tests with production contract — D-Bus stubs + TimeoutStopSec (partial close hermes-agent#9)

Nix / nix (macos-latest) (pull_request) Waiting to run

Details

Contributor Attribution Check / check-attribution (pull_request) Failing after 43s

Details

Supply Chain Audit / Scan PR for critical supply chain risks (pull_request) Successful in 38s

Details

Tests / e2e (pull_request) Failing after 1m12s

Details

Tests / test (pull_request) Failing after 1m31s

Details

Nix / nix (ubuntu-latest) (pull_request) Failing after 31m1s

Details

Two distinct sub-shapes converged in the seven systemd test failures
tracked under hermes-agent#9.  Both are addressed here without changing
production code; the production contract is correct on both axes.

## Sub-shape 1 — TimeoutStopSec=210 vs 90 (2 tests)

`TestGeneratedSystemdUnits::test_user_unit_avoids_recursive_execstop_and_uses_extended_stop_timeout`
and its system-scope sibling pin `TimeoutStopSec=90`.  The unit
generator computes `restart_timeout = max(60, drain_timeout) + 30s`
of post-interrupt cleanup headroom (gateway.py L1635).

PR #18761 (2026-05-02) intentionally raised the default
`agent.restart_drain_timeout` from 60s to 180s after a /restart on
2026-05-02 force-interrupted three mid-API-call agents inside the old
60s budget.  The new arithmetic is therefore `max(60, 180) + 30 = 210s`,
and both unit generators produce `TimeoutStopSec=210`.

The tests are updated to derive the expected value from
`DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT` (already imported at the top of
the file) so future drain-timeout changes flag the contract drift in
one place.  An explicit `expected_timeout == 210` assert + commit
reference keeps the rationale visible at the failure site.

## Sub-shape 2 — D-Bus / loginctl mocks (5 tests)

`systemd_start()` and `systemd_restart()` (user scope) gained a
`_preflight_user_systemd()` call that probes `/run/user/$UID/{bus,
systemd/private}` and, if both are missing, runs `loginctl
enable-linger $USER` to bring the user@.service socket up.  None of
that is reachable from a unit-test fixture: the test runner has no
real user D-Bus session, the linger probe touches the live filesystem,
and `loginctl enable-linger runner` falls outside the
`fake_subprocess_run` whitelist these tests pin.

Five tests were therefore failing not on production behavior but on
their fixtures missing a setup that the production code path can't
have in a test:

  - TestSystemdServiceRefresh::test_systemd_start_refreshes_outdated_unit
  - TestSystemdServiceRefresh::test_systemd_restart_refreshes_outdated_unit
  - TestGatewaySystemServiceRouting::test_systemd_restart_self_requests_graceful_restart_and_waits
  - TestGatewaySystemServiceRouting::test_systemd_restart_recovers_failed_planned_restart
  - TestGatewayServiceDetection::test_supports_systemd_services_returns_true_when_systemctl_present
    (different shape — see below)
  - TestSupportsSystemdServicesWSL::test_native_linux (same shape as
    the detection test)

Each of the four restart/start tests now stubs `_preflight_user_systemd`
with a no-op.  The preflight contract itself is exercised end-to-end
by the existing `TestPreflightUserSystemd` suite (six tests), so this
stub doesn't paper over any real bug — it only narrows the assertion
to what the test was actually meant to check (refresh + systemctl call
shape).

For the two `supports_systemd_services()` tests, the missing fixture
was `is_container()`.  CI runners frequently expose `/.dockerenv`,
which makes `is_container()` return True at the gateway layer and
sends the call into the `_container_systemd_operational()` branch
(real systemctl probe).  Both tests model a native-Linux host, so
they pin `is_container=False`; the WSL one also pins
`shutil.which("systemctl")` to a non-None value so the early
binary-presence check doesn't short-circuit.

## Verification

  - 7/7 named tests in the brief now pass on a clean macOS Python 3.13
    venv (no real systemd present).
  - Set-diff of `pytest tests/hermes_cli/ -q` failure list before vs
    after the change: 7 fewer failures, 0 added (residual non-systemd
    failures match between baseline and branch — same set, just
    reordered by xdist).
  - `pytest tests/gateway/ -q`: same 14 pre-existing failures as on
    main; no new regressions.

The post-fix contract:

  - User and system units write `TimeoutStopSec=210` while
    `restart_drain_timeout=180s` (default).  The formula is pinned
    in test as `max(60, drain_timeout) + 30` so future drain bumps
    fail loudly in one place.
  - `_preflight_user_systemd()` probe sequence (still owned by
    `TestPreflightUserSystemd`): D-Bus / private-socket exists →
    short-circuit; else linger-on → wait 3s; else `loginctl
    enable-linger $USER` → wait 5s; else raise
    `UserSystemdUnavailableError` with remediation hint.

2026-05-08 14:11:19 -07:00

fix(systemd): align tests with production contract — D-Bus stubs + TimeoutStopSec (partial close hermes-agent#9) #16

1 Commits