fix: resolve 5 misc test failures in hermes-agent#9 #14

Merged
claude-ceo-assistant merged 5 commits from fix/misc-test-failures-issue-9 into main 2026-05-08 21:11:13 +00:00

5 Commits

Author SHA1 Message Date
dev-lead
04d5633745 test(kanban-ws-auth): patch hermes_cli.web_server attribute alongside sys.modules entry
Some checks failed
Nix / nix (macos-latest) (pull_request) Waiting to run
Supply Chain Audit / Scan PR for critical supply chain risks (pull_request) Successful in 42s
Contributor Attribution Check / check-attribution (pull_request) Failing after 43s
Tests / e2e (pull_request) Successful in 2m13s
Nix / nix (ubuntu-latest) (pull_request) Failing after 14m21s
Tests / test (pull_request) Failing after 23m16s
`monkeypatch.setitem(sys.modules, "hermes_cli.web_server", stub)` alone
is not enough when another test in the same xdist worker has already
imported `hermes_cli.web_server`: the parent package `hermes_cli` then
has the real submodule bound as an attribute, and
`from hermes_cli import web_server` resolves through the attribute path,
not through sys.modules. Result: `_check_ws_token` reads the REAL
`_SESSION_TOKEN` (a fresh random value), the test's "secret-xyz" never
matches, and the third with-block (correct token → accepted) hits a
1008 disconnect instead of a clean handshake.

Test was order-dependent — passed in isolation, failed in full-suite
runs where another test loaded the real web_server first. Per
`feedback_no_such_thing_as_flakes`, this is a real test-isolation bug,
not a flake.

Fix: also `monkeypatch.setattr(hermes_cli, "web_server", stub,
raising=False)` so both lookup paths see the stub. Inline comment
documents the gotcha for the next reader.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 14:08:59 -07:00
3697e6cea2 fix(tui_gateway): drop pending_title on ValueError, retain on transient errors
Production bug + missing test coverage. c5b4c48 ("fix: lazy session
creation — defer DB row until first message (#18370)") moved
pending_title application from the eager _start_agent_build path to a
post-message-complete handler. The original block had:

    except ValueError as e:
        current["pending_title"] = None
        logger.info("Dropping pending title for session %s: %s", sid, e)
    except Exception:
        logger.warning("Failed to apply pending title ...", exc_info=True)

…differentiating "title is invalid / duplicate, retrying won't help"
(ValueError, drop) from "transient DB failure, retry on next message"
(other Exception, keep + log).

The replacement block collapsed both into:

    except Exception:
        pass  # Best effort — auto-title will handle it below

…so a duplicate-title session keeps the same dud pending_title forever,
hitting set_session_title with the same losing argument on every
message-complete. Auto-title can't kick in because pending_title still
shadows it.

Fix: extract a documented _apply_pending_session_title helper that
restores the three-branch semantics (success → clear, ValueError →
drop, other Exception → retain). Call it from the
message-complete handler instead of the inline try/except.

Test rewrite: the previous test_session_create_drops_pending_title_on_valueerror
exercised an obsolete code path (eager apply during session.create) that
no longer existed after c5b4c48. Replace with four focused tests against
the helper:

  - drops_on_valueerror — invariant from the original test name
  - clears_on_success — happy path
  - retains_on_transient_exception — guards the new "don't lose title
    on a flaky DB" behaviour
  - no_op_without_pending — most calls hit this path

Mutation-tested mentally: deleting the `session["pending_title"] = None`
in the ValueError branch fails drops_on_valueerror; deleting the same in
the success branch fails clears_on_success; widening except ValueError
to except Exception fails retains_on_transient_exception.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 13:57:52 -07:00
c8bc7cdab5 test(teams): patch adapter's TypingActivityInput binding for test_send_typing
The teams adapter imports TypingActivityInput at module load time:

    try:
        from microsoft_teams.api.activities.typing import TypingActivityInput
    except ImportError:
        TypingActivityInput = None

When the real microsoft_teams package isn't installed (CI runner image
doesn't bundle Microsoft Teams SDK), the import fails and the local
binding stays None — even though the test file's _ensure_teams_mock
fixture registers a MockTypingActivityInput in sys.modules. The
test-time mock-in-sys.modules trick only fixes future imports; a binding
captured before the mock was registered remains stale.

send_typing() calls TypingActivityInput() and the resulting TypeError
('NoneType' object is not callable) is swallowed by `except Exception: pass`,
so self._app.send is never reached and the test's assert_awaited_once
fails with "Awaited 0 times" — invisibly, because the swallowed error
hid the real cause.

Fix: monkey-patch the adapter module's local TypingActivityInput binding
in test_send_typing only — narrowest possible patch since no other test
exercises send_typing. Document the import-time-vs-mock-time gap inline
so a future reader doesn't fall into the same trap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 13:57:51 -07:00
ddbb1520c9 test(credential-pool): invert obsolete os.environ-wins test for #18254 fix
The stale invariant "os.environ wins over .env" was deliberately inverted
in 2ef1ad2 ("fix: prefer ~/.hermes/.env over os.environ when seeding
credential pool"). The fix targets the case where a parent shell (Codex
CLI, harness scripts) exports a stale OPENROUTER_API_KEY, the user updates
~/.hermes/.env with a fresh value, and Hermes silently 401s because
auth.json cached the stale env-var.

Rename + invert this test to assert the new invariant (.env wins). The
positive load_pool coverage already exists in
tests/agent/test_credential_pool.py::test_load_pool_prefers_dotenv_over_stale_os_environ
(added in 0a6865b alongside the fix); this case still serves a purpose
because it exercises _seed_from_env directly, which is a separate code
path from load_pool.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 13:57:51 -07:00
d5f569581e test(acp): update commands list snapshot for steer + queue
acp_adapter/server.py:_ADVERTISED_COMMANDS now includes "steer" (inject
guidance into active turn) and "queue" (run prompt after current turn
finishes) between "compact" and "version". Production code is the source
of truth; this test was the last reader still on the pre-feature snapshot.

The substantive features were added in PRs that introduced steer/queue
themselves; this is purely test-snapshot follow-through.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 13:57:50 -07:00