RCA: hermes-agent PR-only Tests/test failures under xdist runner load #36
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
MECHANISM: The
pull_requestfailures are xdist/load flakes in the full unit suite, not a hermes#35 regression and not caused by theenable-cache: falseworkflow setting..gitea/workflows/tests.yml:97-101runspython -m pytest ... -n auto;pyproject.toml:151also sets-n auto; andtests/conftest.py:483-489arms a hard 30s per-test timeout, so slow SQLite commits/FTS schema creation and thread-notification races become hard failures under PR runner load even when main push and rerun passes succeed.EVIDENCE: hermes#35 head
3b2f9861d476226b7b84151cbb2c2bb966ad1909currently hasTests / test (pull_request)success in run 520 after 10m4s, with log summary19112 passed, 63 skipped. The stale failing PR #3 run 515 on head374d388a753c3d58c583d7becf3b7340b427138cfailed after 12m5s with3 failed, 19107 passed, includingTimeoutError: Test exceeded 30 second timeoutfromhermes_state.py:230/tests/conftest.py:489,messages_fts_trigraminitialization athermes_state.py:507-509, the approval race attests/gateway/test_approve_deny_commands.py:405-410, and the stale-cache timing assertion attests/plugins/test_achievements_plugin.py:247-252. Main HEADb52ae4dad0832f3905095ec201e054d467a98111hasTests / test (push)success in run 470 after 43s, showing the repo/workflow can pass outside the loaded PR execution shape.RECOMMENDED FIX SHAPE: Treat this as hermes-agent test isolation/CI-shape debt, not as a code regression in hermes#35. Responsible files are
tests/conftest.py,pyproject.toml,.gitea/workflows/tests.yml, and the specific flaky tests named above: remove the duplicate/implicit xdist amplification, make the 30s timeout opt-in or substantially higher for SQLite-heavy tests, isolate gateway approval globals from parallel workers, and freeze/mock time in the achievements stale-cache test. Re-run stale PR heads after those guardrails; hermes#35 itself is green and should be considered a flake victim rather than the source.MECHANISM: Follow-up on the event-context hypothesis: the current
.gitea/workflows/tests.ymldoes not define separate push vs pull_request job bodies, env, permissions, runner labels, or setup-uv options. Both events enter the sametestjob at.gitea/workflows/tests.yml:18-101withruns-on: ubuntu-latest,permissions: contents: read,enable-cache: false, and the same pytest command, so I do not see workflow-file divergence or token-scope-specific behavior as the primary cause.EVIDENCE: The PR logs show actions/checkout is checking out the PR head ref, not a synthetic
refs/pull/N/merge: PR #3 fetched374d388a...:refs/remotes/pull/3/headand checked outrefs/remotes/pull/3/head, while PR #35 fetched3b2f9861...:refs/remotes/pull/35/headand checked outrefs/remotes/pull/35/head. Current statuses are also asymmetric by run freshness: #35Tests / testis successful in 10m4s and #34 is successful in 16s, while #3/#1 still carry stale failing statuses from older heads/runs.RECOMMENDED FIX SHAPE: Keep the RCA direction on test/runner-load hardening rather than PR event permissions. If engineers want to eliminate the remaining ambiguity, add a temporary diagnostic step to
molecule-ai/hermes-agent:.gitea/workflows/tests.ymlthat printsgithub.event_name,github.ref,github.sha,github.event.pull_request.head.sha, runner name, CPU count, and pytest worker count before the test step; the responsible stabilization files remaintests/conftest.py,pyproject.toml,.gitea/workflows/tests.yml, and the named flaky tests.MECHANISM: PR #3 is failing because the
pull_requestpath runs the full xdist unit suite on the PR head, while the cited mainpushsuccess did not run pytest at all. In run 470 the workflow took.gitea/workflows/tests.yml:58-60and printedNo Python/package/test changes, so it is not evidence that the same workload passes on push. The failing workload is.gitea/workflows/tests.yml:97-101(python -m pytest ... -n auto) plus the global 30s SIGALRM intests/conftest.py:524-535, which turns loaded xdist stalls/races into hard failures.EVIDENCE: PR #3 run 515 checked out
refs/remotes/pull/3/headat374d388a, then failed after 9m52s with3 failed, 19107 passed, 63 skipped, 2 errors. Specific failures:tests/cli/test_cli_new_session.py:122-137timed out insideSessionDB.create_session/hermes_state.py:208-230;tests/agent/test_insights.py:43-50timed out appending SQLite messages;tests/gateway/test_session.py:532-540timed out creatingmessages_fts_trigramviahermes_state.py:499-511;tests/gateway/test_approve_deny_commands.py:405-410missed its 2.5s notify window aftercheck_all_command_guardsperforms tirith resolution/scanning intools/approval.py:962-1073; andtests/plugins/test_achievements_plugin.py:231-252races a stale timestamp against a background refresh path inplugins/hermes-achievements/dashboard/plugin_api.py:920-975. PR #35 is the better comparator: it did run the full pytest command and passed19112 passedin run 520.RECOMMENDED FIX SHAPE: Do not classify this as pull_request token/permission drift. Responsible surfaces are
molecule-ai/hermes-agent:.gitea/workflows/tests.yml,pyproject.toml,tests/conftest.py,hermes_state.py,tests/gateway/test_approve_deny_commands.py, andtests/plugins/test_achievements_plugin.py. Make the push/PR comparison honest by reporting whether pytest was skipped; cap/declare xdist worker count for the 4GB act_runner shape; replace the global 30s SIGALRM with per-test marks or longer budgets for SQLite/FTS-heavy tests; pre-disable/mock tirith in approval tests; and freeze/mock time or block background refresh in the achievements stale-cache test.