Tests/test: ~28 pre-existing real test failures (env errors masked these — they surfaced after 2026-05-08 disk-pressure fix) #9

Closed
opened 2026-05-08 16:21:36 +00:00 by claude-ceo-assistant · 0 comments

Context

2026-05-08 hermes-agent CI investigation: every recent run had been failing with a mix of:

  1. OSError: could not create numbered dir... after 10 tries (164 such errors per run)
  2. mkdir: ... No space left on device during Nix builds
  3. API rate limit exceeded for 5.78.80.188 during Install uv
  4. ~28 actual test-code failures

The first three were masking the fourth. After today's fixes:

  • internal#89/#91 + cron-based docker GC + runner-recycle → disk pressure relieved (88% → 79%, more after weekly cron)
  • hermes-agent#8 → setup-uv version pin bypasses anon GitHub API rate limit

…the actual test bugs are now visible. Latest run on main HEAD (after #8 merged):

  • Tests/e2e: PASS
  • Nix/ubuntu-latest: PASS
  • Nix/macos-latest: in flight
  • Tests/test: 27 real failures

Categorized failures (real bugs)

Grouped by root-cause shape from the log at actions_log/molecule-ai/hermes-agent/a3/8099.log (full log on operator host):

Signal handling / process orphan (4)

  • tests/tools/test_local_interrupt_cleanup.py::test_wait_for_process_kills_subprocess_on_keyboardinterruptsubprocess group X is STILL ALIVE after worker received KeyboardInterrupt — orphan bug regressed. This is the sleep-300-survives-SIGTERM scenario from Physikal's Apr 2026 report. See tools/environments/base.py _wait_for_process except-block.
  • tests/hermes_cli/test_update_gateway_restart.py::TestCmdUpdateLaunchdRestart::test_update_restarts_profile_manual_gateways — Expected 'kill' to NOT be called, was called 1×.
  • tests/hermes_cli/test_update_gateway_restart.py::TestCmdUpdateLaunchdRestart::test_update_profile_manual_gateway_falls_back_to_sigterm — Expected 'kill' to be called 1×, called 2× (SIGTERM + SIGKILL).
  • tests/hermes_cli/test_update_gateway_restart.py::TestServicePidExclusion::test_update_kills_manual_pid_but_not_service_pid — assert 2 == 1.

Likely a regression in the SIGTERM/SIGKILL escalation logic in tools/environments/base.py _wait_for_process. Per feedback_no_such_thing_as_flakes, this is a real bug.

Systemd/loginctl detection (2)

  • tests/hermes_cli/test_gateway_service.py::TestGatewaySystemServiceRouting::test_systemd_restart_recovers_failed_planned_restartUserSystemdUnavailableError: loginctl enable-linger failed.
  • tests/hermes_cli/test_gateway_wsl.py::TestSupportsSystemdServicesWSL::test_native_linuxassert False is True on gateway.supports_systemd_services().

Likely environment-detection logic doesn't handle the act_runner container correctly.

Concurrent interrupt (2)

  • tests/run_agent/test_concurrent_interrupt.py::test_concurrent_interrupt_cancels_pendingAttributeError: '_Stub' object has no attribute '_tool_guardrails'.
  • tests/run_agent/test_concurrent_interrupt.py::test_running_concurrent_worker_sees_is_interrupted — same.

The _Stub test fixture is missing the _tool_guardrails attribute that production code now references. Fixture out of date with code.

Update yes flag (2)

  • tests/hermes_cli/test_update_yes_flag.py::TestUpdateYesConfigMigration::test_no_yes_flag_still_prompts_in_tty — mock input not called.
  • tests/hermes_cli/test_update_yes_flag.py::TestUpdateYesStashRestore::test_yes_restores_stash_without_prompting — mock _restore_stashed_changes not called.

Mock setup or call-site changed; tests didn't.

Dockerfile (2)

  • tests/tools/test_dockerfile_pid1_reaping.py::test_dockerfile_installs_tui_dependencies — assertion that Dockerfile content contains ui-tui/packages/hermes-ink/package-lock.json. Current Dockerfile uses ghcr.io/astral-sh/uv:0.11.6-python3.13-trixie and doesn't include that file.
  • tests/tools/test_dockerfile_pid1_reaping.py::test_dockerfile_materializes_local_tui_ink_package — assert False.

Dockerfile content drifted from what tests assert.

Voice mode environment detection (5)

  • tests/tools/test_voice_mode.py::TestDetectAudioEnvironment::test_clean_environment_is_available — assert False is True.
  • test_wsl_with_pulse_allows_voice
  • test_wsl_device_query_fails_with_pulse_continues
  • test_termux_api_microphone_allows_voice_without_sounddevice

Audio environment detection logic on the act_runner container — possibly a real bug, possibly tests need to skip without audio devices.

Test isolation / timeouts (2)

  • tests/hermes_cli/test_web_server.py::TestPtyWebSocket::test_pub_broadcasts_to_events_subscribers — TimeoutError 30s.
  • tests/gateway/test_agent_cache.py::TestAgentCacheSpilloverLive::test_concurrent_inserts_settle_at_cap — TimeoutError 30s.

Might be slow CI host (not yet bumped vitest-style). Or genuine async race.

Misc (5)

  • tests/test_tui_gateway_server.py::test_session_create_drops_pending_title_on_valueerrorassert 'duplicate title' is None.
  • tests/gateway/test_teams.py::TestTeamsSend::test_send_typing — Expected send awaited 1×, awaited 0×.
  • tests/plugins/test_achievements_plugin.py::test_evaluate_all_stale_cache_serves_stale_and_refreshes_in_backgroundassert 1778256884 == 1778256704.
  • tests/tools/test_credential_pool_env_fallback.py::TestCredentialPoolSeedsFromDotEnv::test_os_environ_still_wins_over_dotenvassert 'sk-dotenv-stale' == 'sk-env-fresh-xyz' (dotenv leaking past os.environ priority).
  • tests/gateway/test_approve_deny_commands.py::TestBlockingApprovalE2E::test_blocking_approval_approve_onceassert 0 == 1 (count assertion).

Suggested triage order

  1. Signal-handling cluster (4 failures, Physikal regression cited explicitly) — most likely a real subprocess-management bug; user-visible.
  2. _tool_guardrails fixture drift (2) — easy win.
  3. Dockerfile assertions (2) — tests vs Dockerfile drift; fix one or the other.
  4. Voice mode env detection (5) — verify whether tests should skip in containerized CI.
  5. The rest — handle individually.

Why this issue exists

All 27 are pre-existing on hermes-agent main; today's CI investigation surfaced them after the env-induced 164 errors got cleared. Filing for visibility. Each cluster is its own investigation; no single PR closes them.

🤖 Generated with Claude Code

## Context 2026-05-08 hermes-agent CI investigation: every recent run had been failing with a mix of: 1. `OSError: could not create numbered dir... after 10 tries` (164 such errors per run) 2. `mkdir: ... No space left on device` during Nix builds 3. `API rate limit exceeded for 5.78.80.188` during `Install uv` 4. ~28 actual test-code failures The first three were masking the fourth. After today's fixes: - `internal#89/#91` + cron-based docker GC + runner-recycle → disk pressure relieved (88% → 79%, more after weekly cron) - `hermes-agent#8` → setup-uv version pin bypasses anon GitHub API rate limit …the actual test bugs are now visible. Latest run on main HEAD (after #8 merged): - ✅ Tests/e2e: PASS - ✅ Nix/ubuntu-latest: PASS - ⏳ Nix/macos-latest: in flight - ❌ Tests/test: **27 real failures** ## Categorized failures (real bugs) Grouped by root-cause shape from the log at `actions_log/molecule-ai/hermes-agent/a3/8099.log` (full log on operator host): ### Signal handling / process orphan (4) - `tests/tools/test_local_interrupt_cleanup.py::test_wait_for_process_kills_subprocess_on_keyboardinterrupt` — `subprocess group X is STILL ALIVE after worker received KeyboardInterrupt — orphan bug regressed. This is the sleep-300-survives-SIGTERM scenario from Physikal's Apr 2026 report. See tools/environments/base.py _wait_for_process except-block.` - `tests/hermes_cli/test_update_gateway_restart.py::TestCmdUpdateLaunchdRestart::test_update_restarts_profile_manual_gateways` — Expected 'kill' to NOT be called, was called 1×. - `tests/hermes_cli/test_update_gateway_restart.py::TestCmdUpdateLaunchdRestart::test_update_profile_manual_gateway_falls_back_to_sigterm` — Expected 'kill' to be called 1×, called 2× (SIGTERM + SIGKILL). - `tests/hermes_cli/test_update_gateway_restart.py::TestServicePidExclusion::test_update_kills_manual_pid_but_not_service_pid` — assert 2 == 1. Likely a regression in the SIGTERM/SIGKILL escalation logic in `tools/environments/base.py _wait_for_process`. Per `feedback_no_such_thing_as_flakes`, this is a real bug. ### Systemd/loginctl detection (2) - `tests/hermes_cli/test_gateway_service.py::TestGatewaySystemServiceRouting::test_systemd_restart_recovers_failed_planned_restart` — `UserSystemdUnavailableError: loginctl enable-linger failed`. - `tests/hermes_cli/test_gateway_wsl.py::TestSupportsSystemdServicesWSL::test_native_linux` — `assert False is True` on `gateway.supports_systemd_services()`. Likely environment-detection logic doesn't handle the act_runner container correctly. ### Concurrent interrupt (2) - `tests/run_agent/test_concurrent_interrupt.py::test_concurrent_interrupt_cancels_pending` — `AttributeError: '_Stub' object has no attribute '_tool_guardrails'`. - `tests/run_agent/test_concurrent_interrupt.py::test_running_concurrent_worker_sees_is_interrupted` — same. The `_Stub` test fixture is missing the `_tool_guardrails` attribute that production code now references. Fixture out of date with code. ### Update yes flag (2) - `tests/hermes_cli/test_update_yes_flag.py::TestUpdateYesConfigMigration::test_no_yes_flag_still_prompts_in_tty` — mock `input` not called. - `tests/hermes_cli/test_update_yes_flag.py::TestUpdateYesStashRestore::test_yes_restores_stash_without_prompting` — mock `_restore_stashed_changes` not called. Mock setup or call-site changed; tests didn't. ### Dockerfile (2) - `tests/tools/test_dockerfile_pid1_reaping.py::test_dockerfile_installs_tui_dependencies` — assertion that Dockerfile content contains `ui-tui/packages/hermes-ink/package-lock.json`. Current Dockerfile uses `ghcr.io/astral-sh/uv:0.11.6-python3.13-trixie` and doesn't include that file. - `tests/tools/test_dockerfile_pid1_reaping.py::test_dockerfile_materializes_local_tui_ink_package` — assert False. Dockerfile content drifted from what tests assert. ### Voice mode environment detection (5) - `tests/tools/test_voice_mode.py::TestDetectAudioEnvironment::test_clean_environment_is_available` — assert False is True. - `test_wsl_with_pulse_allows_voice` - `test_wsl_device_query_fails_with_pulse_continues` - `test_termux_api_microphone_allows_voice_without_sounddevice` Audio environment detection logic on the act_runner container — possibly a real bug, possibly tests need to skip without audio devices. ### Test isolation / timeouts (2) - `tests/hermes_cli/test_web_server.py::TestPtyWebSocket::test_pub_broadcasts_to_events_subscribers` — TimeoutError 30s. - `tests/gateway/test_agent_cache.py::TestAgentCacheSpilloverLive::test_concurrent_inserts_settle_at_cap` — TimeoutError 30s. Might be slow CI host (not yet bumped vitest-style). Or genuine async race. ### Misc (5) - `tests/test_tui_gateway_server.py::test_session_create_drops_pending_title_on_valueerror` — `assert 'duplicate title' is None`. - `tests/gateway/test_teams.py::TestTeamsSend::test_send_typing` — Expected send awaited 1×, awaited 0×. - `tests/plugins/test_achievements_plugin.py::test_evaluate_all_stale_cache_serves_stale_and_refreshes_in_background` — `assert 1778256884 == 1778256704`. - `tests/tools/test_credential_pool_env_fallback.py::TestCredentialPoolSeedsFromDotEnv::test_os_environ_still_wins_over_dotenv` — `assert 'sk-dotenv-stale' == 'sk-env-fresh-xyz'` (dotenv leaking past os.environ priority). - `tests/gateway/test_approve_deny_commands.py::TestBlockingApprovalE2E::test_blocking_approval_approve_once` — `assert 0 == 1` (count assertion). ## Suggested triage order 1. **Signal-handling cluster** (4 failures, Physikal regression cited explicitly) — most likely a real subprocess-management bug; user-visible. 2. **`_tool_guardrails` fixture drift** (2) — easy win. 3. **Dockerfile assertions** (2) — tests vs Dockerfile drift; fix one or the other. 4. **Voice mode env detection** (5) — verify whether tests should skip in containerized CI. 5. **The rest** — handle individually. ## Why this issue exists All 27 are pre-existing on hermes-agent main; today's CI investigation surfaced them after the env-induced 164 errors got cleared. Filing for visibility. Each cluster is its own investigation; no single PR closes them. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
Sign in to join this conversation.
No Label
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: molecule-ai/hermes-agent#9
No description provided.