fix: resolve 5 misc test failures in hermes-agent#9 #14
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "fix/misc-test-failures-issue-9"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Picks up the misc cluster from
hermes-agent#9(the 5 unrelatedtest failures grouped as "Misc" in the issue triage). Each commit is
one root cause; nothing is bundled across concerns.
Per
feedback_no_such_thing_as_flakes: every failure has a real,investigated root cause. None are dismissed as flakes.
Per-failure diagnosis & fix
1.
tests/acp/test_server.py::test_send_available_commands_updateDiagnosis (stale test):
acp_adapter/server.py:_ADVERTISED_COMMANDSgained two new commands —
steer(inject guidance into active turn)and
queue(run prompt after current turn finishes) — betweencompactandversion. The pinned list in this test was the lastreader still on the pre-feature snapshot. Production is the source of
truth.
Fix: insert
"steer", "queue"into the expected list. (commitd5f56958)2.
tests/plugins/test_kanban_dashboard_plugin.py::test_ws_events_rejects_when_token_requiredDiagnosis (real test-isolation bug): the test stubs
hermes_cli.web_serverviamonkeypatch.setitem(sys.modules, ...),but
_check_ws_tokendoesfrom hermes_cli import web_server as _ws.When another test in the same xdist worker has already imported
hermes_cli.web_server, the parent packagehermes_clihas the realsubmodule bound as an attribute, and
from hermes_cli import web_serverresolves through the attribute path, not throughsys.modules. So
_check_ws_tokencompares against the REAL random_SESSION_TOKEN, the "secret-xyz" branch fails, and the thirdwith c.websocket_connect(... ?token=secret-xyz)hits a 1008disconnect instead of a clean handshake.
Reproducible deterministically: import
hermes_cli.web_serverin anypreceding test in the worker, then run this test → fails.
Fix: also
monkeypatch.setattr(hermes_cli, "web_server", stub, raising=False)so both lookup paths see the stub. Inline commentexplains the gotcha. (commit
04d56337)3.
tests/gateway/test_teams.py::TestTeamsSend::test_send_typingDiagnosis (real test bug — broken silently by an unrelated production
change):
plugins/platforms/teams/adapter.pyimportsTypingActivityInputat module load time:Commit
624057fc("feat(teams): set User-Agent to Hermes via 2.0.0client option") added
from microsoft_teams.common.http.client import ClientOptionsto the SAME try-block. The test fixture mocksmicrosoft_teams.api.activities.typingin sys.modules but notmicrosoft_teams.common.http.client. The whole try-block now fails onthe new import, so
TypingActivityInputfalls into theexcept ImportErrorbranch and staysNone.send_typing()thencalls
TypingActivityInput()→ TypeError → swallowed byexcept Exception: pass→self._app.sendis never reached →test asserts "awaited 0 times" with no visible cause.
Fix: monkey-patch the adapter module's local
TypingActivityInputbinding directly in
test_send_typingonly. Narrowest possible patch;no other test exercises
send_typing. Inline comment documents theimport-time-vs-mock-time gap. (commit
c8bc7cda)4.
tests/test_tui_gateway_server.py::test_session_create_drops_pending_title_on_valueerrorDiagnosis (real production bug + obsolete test):
c5b4c481("fix: lazy session creation — defer DB row until first message")
moved pending_title application from
_start_agent_build(eager) tothe post-message-complete handler in
_run_prompt_submit. Theoriginal
except ValueError as e: current["pending_title"] = Nonegot collapsed into a generic
except Exception: pass— losing theValueError-specific drop semantics. So a duplicate-title session keeps
its dud
pending_titleforever, hittingset_session_titlewith thesame losing argument on every message-complete; auto-title can't kick
in because
pending_titlestill shadows it.The named test was also exercising the eager
_start_agent_buildpath that no longer exists — so it was BOTH testing the wrong code
path AND failing because the new path swallowed ValueError.
Fix (production): extract
_apply_pending_session_title(session, sid, db)helper with three documented branches:Call it from the message-complete handler instead of the inline
try/except.
Fix (test): replace the obsolete eager-path test with four
focused tests against the helper:
test_apply_pending_session_title_drops_on_valueerrortest_apply_pending_session_title_clears_on_successtest_apply_pending_session_title_retains_on_transient_exceptiontest_apply_pending_session_title_no_op_without_pendingMutation-tested mentally: deleting the
session["pending_title"] = Nonein the ValueError branch fails thefirst; deleting the same in the success branch fails the second;
widening
except ValueErrortoexcept Exceptionfails the third.(commit
3697e6ce)5.
tests/tools/test_credential_pool_env_fallback.py::test_os_environ_still_wins_over_dotenvDiagnosis (stale test — invariant deliberately inverted):
2ef1ad28("fix: prefer ~/.hermes/.env over os.environ when seedingcredential pool", #18254) deliberately inverted the priority. The user
case: a parent shell exports a stale
OPENROUTER_API_KEY, the userupdates
~/.hermes/.envwith a fresh key, restarting Hermes stillpicks up the stale
os.environvalue, writes it back toauth.json,and all API calls fail with 401. Fix made
.envthe authoritativesource.
This test still asserted the old invariant and was the only reader
that hadn't been updated.
Fix: invert + rename to
test_dotenv_wins_over_stale_os_environ. The positiveload_poolcoverage already exists in
tests/agent/test_credential_pool.py::test_load_pool_prefers_dotenv_over_stale_os_environ(added in
0a6865balongside the production fix); this case stillserves a purpose because it exercises
_seed_from_envdirectly,which is a separate code path from
load_pool. (commitddbb1520)Verification
All 5 named tests pass after their fix:
Wider sweep of the directories touched:
The single remaining failure is
tests/test_tui_gateway_server.py::test_browser_manage_connect_default_local_reports_launch_hint— explicitly out-of-scope for this PR (it's part of the broader
~16 failures in #9 not assigned to the misc cluster).
Test plan
remaining fail is out-of-scope per #9)
clear / retain / no-op branches)
Closes 5 of the 27 failures tracked in
hermes-agent#9("Misc"cluster).
🤖 Generated with Claude Code
The teams adapter imports TypingActivityInput at module load time: try: from microsoft_teams.api.activities.typing import TypingActivityInput except ImportError: TypingActivityInput = None When the real microsoft_teams package isn't installed (CI runner image doesn't bundle Microsoft Teams SDK), the import fails and the local binding stays None — even though the test file's _ensure_teams_mock fixture registers a MockTypingActivityInput in sys.modules. The test-time mock-in-sys.modules trick only fixes future imports; a binding captured before the mock was registered remains stale. send_typing() calls TypingActivityInput() and the resulting TypeError ('NoneType' object is not callable) is swallowed by `except Exception: pass`, so self._app.send is never reached and the test's assert_awaited_once fails with "Awaited 0 times" — invisibly, because the swallowed error hid the real cause. Fix: monkey-patch the adapter module's local TypingActivityInput binding in test_send_typing only — narrowest possible patch since no other test exercises send_typing. Document the import-time-vs-mock-time gap inline so a future reader doesn't fall into the same trap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>LGTM. 5 distinct root-causes properly traced; 2 real production bugs found (kanban WS test-isolation via attribute-vs-sys.modules path, lazy-session pending_title collapsed except). Tests + fix for #4 sub-task isolated; #2 introduces new helper with mutation-tested branches. 5/5 named tests pass; 424 passed in subset. Closes 5 of remaining 14 from hermes-agent#9.