fix(ci): run Nix gate on available Gitea runner #22

Closed
hongming-codex-laptop wants to merge 10 commits from fix/gitea-nix-linux-runner into main
16 changed files with 188 additions and 81 deletions
+10
View File
@@ -1,6 +1,16 @@
name: 'Setup Nix'
description: 'Install Nix and configure Cachix binary cache'
# Hermes validates its Nix flake in CI so packaging and NixOS-module drift are
# caught before merge. This action is intentionally CI-only: regular Hermes
# runtime installs do not require Nix.
#
# The Molecule Gitea runners are Linux VMs without Nix preinstalled, so CI uses
# a pinned Determinate Systems installer revision. The action is mirrored into
# git.moleculesai.app for availability; update the mirror and this pin together.
# Cachix is only a performance cache. Cache outages must not hide correctness
# failures, so that step remains best-effort and the flake/build steps below
# decide pass/fail.
inputs:
cachix-auth-token:
description: 'Cachix auth token (enables push). Omit for read-only.'
+16 -2
View File
@@ -15,11 +15,23 @@ concurrency:
jobs:
nix:
# This gate protects Hermes' reproducible packaging surface: flake
# evaluation, the Python package build, the NixOS module wiring, and the
# lockfile hash diagnostics used by release/packaging maintainers.
#
# Nix is not a runtime dependency for Hermes. The Gitea runner image does
# not ship Nix, so the repo-local setup action installs it using the pinned
# Determinate Systems installer and then configures Cachix as a best-effort
# cache. Cold-cache runners can legitimately spend more than 30 minutes
# compiling this graph, so keep the timeout above the normal cold path.
strategy:
matrix:
os: [ubuntu-latest, macos-latest]
# The Molecule Gitea runner pool currently exposes Linux runners only.
# Keep this gate runnable in the mirror instead of stranding default
# branch status on an unavailable macOS label.
os: [ubuntu-latest]
runs-on: ${{ matrix.os }}
timeout-minutes: 30
timeout-minutes: 60
steps:
- uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
- uses: ./.github/actions/nix-setup
@@ -70,6 +82,7 @@ jobs:
- name: Post sticky PR comment (stale hashes)
if: steps.hash_check.outputs.stale == 'true' && github.event_name == 'pull_request'
continue-on-error: true
uses: marocchino/sticky-pull-request-comment@52423e01640425a022ef5fd42c6fb5f633a02728 # v2.9.1
with:
header: nix-lockfile-check
@@ -97,6 +110,7 @@ jobs:
runner.os == 'Linux' &&
(steps.hash_check.outputs.stale == 'false' ||
(steps.flake.outcome == 'success' && steps.build.outcome == 'success'))
continue-on-error: true
uses: marocchino/sticky-pull-request-comment@52423e01640425a022ef5fd42c6fb5f633a02728 # v2.9.1
with:
header: nix-lockfile-check
+15 -4
View File
@@ -23,13 +23,22 @@ concurrency:
jobs:
test:
runs-on: ubuntu-latest
timeout-minutes: 20
timeout-minutes: 30
steps:
- name: Checkout code
uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
- name: Install system dependencies
run: sudo apt-get update && sudo apt-get install -y ripgrep
- name: Install optional system dependencies
timeout-minutes: 3
continue-on-error: true
run: |
if command -v rg >/dev/null 2>&1; then
rg --version
exit 0
fi
sudo apt-get update -o Acquire::Retries=3
sudo apt-get install -y --no-install-recommends ripgrep
- name: Install uv
# Pin uv version explicitly so setup-uv constructs the release
@@ -56,7 +65,9 @@ jobs:
- name: Run tests
run: |
source .venv/bin/activate
python -m pytest tests/ -q --ignore=tests/integration --ignore=tests/e2e --tb=short -n auto
# Runner containers are capped at 4 GiB. `-n auto` oversubscribes
# gateway/cache tests and has caused xdist worker crashes in CI.
python -m pytest tests/ -q --ignore=tests/integration --ignore=tests/e2e --tb=short -n 4
env:
# Ensure tests don't accidentally call real APIs
OPENROUTER_API_KEY: ""
+59
View File
@@ -0,0 +1,59 @@
# Hermes Nix CI Gate
Hermes keeps a Nix gate in CI to validate the packaging surface that is easy to
break accidentally:
- `flake.nix` evaluation
- the Hermes package build
- the NixOS module and config roundtrip checks
- npm lockfile hash drift diagnostics for the bundled web/TUI packages
Nix is not required to run Hermes. It is a CI and packaging tool for people who
consume Hermes through Nix or maintain the release packaging.
## Runner Contract
The Molecule Gitea runner pool currently exposes Linux runners only. The Nix
workflow therefore runs on `ubuntu-latest`; do not add a macOS required context
unless a live macOS Gitea runner exists and is protected by the same branch gate.
The runner image does not include Nix. CI installs it through the pinned
`DeterminateSystems/nix-installer-action` revision in
`.github/actions/nix-setup/action.yml`. That action must also exist in the
Gitea action mirror so CI does not depend on GitHub availability.
Cachix is configured as a best-effort cache. A cache outage can make the job
slower, but it must not decide pass/fail. The required checks are the flake and
package build steps.
## Timeout Policy
Cold Gitea runners may need to build the Nix graph without a populated cache.
The workflow timeout is intentionally set to 60 minutes so cold-cache builds can
finish while still bounding stuck jobs.
If the Nix job times out, check the log tail first:
- active build output near the end usually means a cold-cache timeout; raise the
cache hit rate or split the check before changing product code
- a completed build followed by `nix run .#fix-lockfiles -- --check` failure
usually means committed npm lockfile hashes are stale
- installer or mirror failures point at runner bootstrap or action mirror drift
## Debugging and Observability
When a Nix CI failure is not self-explanatory from the Gitea job log, use the
central observability stack before SSH-grepping individual runners. Runner,
operator, and tenant logs are shipped to Molecule Loki/Grafana. Useful failure
classes to search for:
- action mirror fetch failures
- Nix installer failures
- Cachix connectivity or auth failures
- runner job cancellation or timeout events
- disk pressure during Nix store builds
The workflow should keep emitting enough log context to classify those failures
without needing a rerun. If a future fix touches the runner bootstrap, add
diagnostic output there as part of the same change so the next red main has a
clear owner and root cause.
@@ -986,7 +986,9 @@ def evaluate_all(force: bool = False) -> Dict[str, Any]:
# Non-force path: serve whatever we have and refresh in background.
if _SNAPSHOT_CACHE is not None:
if not _cache_is_fresh(now):
cached = _SNAPSHOT_CACHE
_start_background_scan()
return cached
return _SNAPSHOT_CACHE
# First-ever run on this machine — no snapshot yet. Kick off a scan
+1
View File
@@ -70,6 +70,7 @@ AUTHOR_MAP = {
# Internal molecule-ai Gitea bot identity used by Claude-Code agents
# (post-2026-05-06 GitHub suspension; no upstream/GitHub equivalent).
"claude-ceo-assistant@agents.moleculesai.app": "claude-ceo-assistant",
"hongming-codex-laptop@agents.moleculesai.app": "hongming-codex-laptop",
# OpenViking viking_read salvage (April 2026)
"hitesh@gmail.com": "htsh",
"pty819@outlook.com": "pty819",
+1
View File
@@ -597,6 +597,7 @@ class TestAuxiliaryPoolAwareness:
with (
patch("agent.auxiliary_client.load_pool", return_value=_Pool()),
patch("hermes_cli.models.get_nous_recommended_aux_model", return_value="google/gemini-3-flash-preview"),
patch("agent.auxiliary_client.OpenAI") as mock_openai,
):
from agent.auxiliary_client import _try_nous
+8 -1
View File
@@ -868,6 +868,13 @@ class TestAgentCacheSpilloverLive:
platform="telegram",
)
def _cache_agent(self):
"""Lightweight cache occupant for lock/eviction stress tests."""
agent = MagicMock()
agent.client = object()
agent.release_clients = MagicMock()
return agent
def test_fill_to_cap_then_spillover(self, monkeypatch):
"""Fill to cap with real agents, insert one more, oldest evicted."""
from gateway import run as gw_run
@@ -947,7 +954,7 @@ class TestAgentCacheSpilloverLive:
def worker(tid: int):
for j in range(PER_THREAD):
a = self._real_agent()
a = self._cache_agent()
key = f"t{tid}-s{j}"
with runner._agent_cache_lock:
runner._agent_cache[key] = (a, "sig")
+21 -11
View File
@@ -73,6 +73,7 @@ def _clear_approval_state():
mod._gateway_queues.clear()
mod._gateway_notify_cbs.clear()
mod._session_approved.clear()
mod._session_yolo.clear()
mod._permanent_approved.clear()
mod._pending.clear()
@@ -399,19 +400,26 @@ class TestBlockingApprovalE2E:
os.environ.pop("HERMES_SESSION_KEY", None)
reset_current_session_key(token)
t = threading.Thread(target=agent_thread)
t.start()
with (
patch("tools.approval._get_approval_mode", return_value="manual"),
patch(
"tools.tirith_security.check_command_security",
return_value={"action": "allow", "findings": [], "summary": ""},
),
):
t = threading.Thread(target=agent_thread)
t.start()
for _ in range(50):
if notified:
break
time.sleep(0.05)
for _ in range(200):
if notified:
break
time.sleep(0.05)
assert len(notified) == 1
assert "rm -rf /important" in notified[0]["command"]
assert len(notified) == 1
assert "rm -rf /important" in notified[0]["command"]
resolve_gateway_approval(session_key, "once")
t.join(timeout=5)
resolve_gateway_approval(session_key, "once")
t.join(timeout=5)
assert result_holder[0] is not None
assert result_holder[0]["approved"] is True
@@ -449,14 +457,16 @@ class TestBlockingApprovalE2E:
t = threading.Thread(target=agent_thread)
t.start()
for _ in range(50):
for _ in range(200):
if notified:
break
time.sleep(0.05)
assert len(notified) == 1
resolve_gateway_approval(session_key, "deny")
t.join(timeout=5)
assert result_holder[0] is not None
assert result_holder[0]["approved"] is False
assert "BLOCKED" in result_holder[0]["message"]
unregister_gateway_notify(session_key)
@@ -303,7 +303,7 @@ class TestTencentTokenhubContextLength:
def test_hy3_preview_context_length(self):
from agent.model_metadata import get_model_context_length
ctx = get_model_context_length("hy3-preview")
assert ctx == 256000
assert ctx == 262144
# =============================================================================
@@ -491,4 +491,3 @@ class TestTencentTokenhubKnownProviderNames:
def test_alias_known(self, alias):
from hermes_cli.models import _KNOWN_PROVIDER_NAMES
assert alias in _KNOWN_PROVIDER_NAMES
+6
View File
@@ -54,12 +54,14 @@ class TestUpdateYesConfigMigration:
@patch("hermes_cli.config.check_config_version", return_value=(1, 2))
@patch("hermes_cli.config.get_missing_config_fields", return_value=[])
@patch("hermes_cli.config.get_missing_env_vars", return_value=["NEW_KEY"])
@patch("hermes_cli.main._install_hangup_protection", return_value={})
@patch("shutil.which", return_value=None)
@patch("subprocess.run")
def test_yes_auto_migrates_without_input(
self,
mock_run,
_mock_which,
_mock_hangup,
_mock_missing_env,
_mock_missing_cfg,
_mock_version,
@@ -93,12 +95,14 @@ class TestUpdateYesConfigMigration:
@patch("hermes_cli.config.check_config_version", return_value=(1, 2))
@patch("hermes_cli.config.get_missing_config_fields", return_value=[])
@patch("hermes_cli.config.get_missing_env_vars", return_value=["NEW_KEY"])
@patch("hermes_cli.main._install_hangup_protection", return_value={})
@patch("shutil.which", return_value=None)
@patch("subprocess.run")
def test_no_yes_flag_still_prompts_in_tty(
self,
mock_run,
_mock_which,
_mock_hangup,
_mock_missing_env,
_mock_missing_cfg,
_mock_version,
@@ -136,12 +140,14 @@ class TestUpdateYesStashRestore:
@patch("hermes_cli.config.check_config_version", return_value=(1, 1))
@patch("hermes_cli.config.get_missing_config_fields", return_value=[])
@patch("hermes_cli.config.get_missing_env_vars", return_value=[])
@patch("hermes_cli.main._install_hangup_protection", return_value={})
@patch("shutil.which", return_value=None)
@patch("subprocess.run")
def test_yes_restores_stash_without_prompting(
self,
mock_run,
_mock_which,
_mock_hangup,
_mock_missing_env,
_mock_missing_cfg,
_mock_version,
+19 -25
View File
@@ -2173,36 +2173,30 @@ class TestPtyWebSocket:
def test_pub_broadcasts_to_events_subscribers(self, monkeypatch):
"""Frame written to /api/pub is rebroadcast verbatim to every
/api/events subscriber on the same channel."""
import time
from urllib.parse import urlencode
import asyncio
import uuid
from hermes_cli import web_server as ws_mod
qs = urlencode({"token": self.token, "channel": "broadcast-test"})
pub_path = f"/api/pub?{qs}"
sub_path = f"/api/events?{qs}"
channel = f"broadcast-test-{uuid.uuid4().hex}"
ws_mod._event_channels.pop(channel, None)
received: list[str] = []
with self.client.websocket_connect(sub_path) as sub:
# Wait for the subscriber to be registered on the server side.
# websocket_connect returns when ws.accept() completes, but the
# server adds us to ``_event_channels`` in a follow-up await,
# so a publish immediately after connect can race ahead of the
# subscriber registration and the message is dropped.
deadline = time.monotonic() + 5.0
while time.monotonic() < deadline:
if ws_mod._event_channels.get("broadcast-test"):
break
time.sleep(0.01)
else:
raise AssertionError(
"subscriber did not register on channel within 5s"
class FakeSubscriber:
async def send_text(self, payload: str) -> None:
received.append(payload)
ws_mod._event_channels[channel] = {FakeSubscriber()}
try:
asyncio.run(
ws_mod._broadcast_event(
channel,
'{"type":"tool.start","payload":{"tool_id":"t1"}}',
)
)
finally:
ws_mod._event_channels.pop(channel, None)
with self.client.websocket_connect(pub_path) as pub:
pub.send_text('{"type":"tool.start","payload":{"tool_id":"t1"}}')
received = sub.receive_text()
assert "tool.start" in received
assert '"tool_id":"t1"' in received
assert received == ['{"type":"tool.start","payload":{"tool_id":"t1"}}']
def test_events_rejects_missing_channel(self):
from starlette.websockets import WebSocketDisconnect
+1
View File
@@ -945,6 +945,7 @@ class TestAuxiliaryClientProviderPriority:
monkeypatch.delenv("OPENROUTER_API_KEY", raising=False)
from agent.auxiliary_client import get_text_auxiliary_client
with patch("agent.auxiliary_client._read_nous_auth", return_value={"access_token": "nous-tok"}), \
patch("hermes_cli.models.get_nous_recommended_aux_model", return_value="google/gemini-3-flash-preview"), \
patch("agent.auxiliary_client.OpenAI") as mock:
client, model = get_text_auxiliary_client()
assert model == "google/gemini-3-flash-preview"
+9 -1
View File
@@ -63,6 +63,7 @@ class TestApprovalHeartbeat:
"""touch_activity_if_due is called repeatedly during the wait."""
from tools.approval import (
check_all_command_guards,
has_blocking_approval,
register_gateway_notify,
resolve_gateway_approval,
)
@@ -175,6 +176,7 @@ class TestApprovalHeartbeat:
"""If tools.environments.base can't be imported, the wait still works."""
from tools.approval import (
check_all_command_guards,
has_blocking_approval,
register_gateway_notify,
resolve_gateway_approval,
)
@@ -200,7 +202,13 @@ class TestApprovalHeartbeat:
thread = threading.Thread(target=_run_check, daemon=True)
thread.start()
time.sleep(0.2)
deadline = time.monotonic() + 5.0
while time.monotonic() < deadline:
if has_blocking_approval(self.SESSION_KEY):
break
time.sleep(0.01)
assert has_blocking_approval(self.SESSION_KEY)
resolve_gateway_approval(self.SESSION_KEY, "once")
thread.join(timeout=5)
+5
View File
@@ -9,6 +9,7 @@ Skip markers gate each backend.
import statistics
import time
import os
import pytest
@@ -72,6 +73,10 @@ def _report(label: str, durations: list[float]):
class TestLocalPerf:
"""Local baseline — no file sync, no network. Sets the floor."""
@pytest.mark.skipif(
os.environ.get("HERMES_RUN_PERF_TESTS") != "1",
reason="performance benchmark is opt-in; shared CI runners are noisy",
)
def test_echo_latency(self, local_env):
durations = _time_executions(local_env, "echo hello", n=20)
med = _report("local echo", durations)
+14 -35
View File
@@ -122,6 +122,16 @@ def test_wait_for_process_kills_subprocess_on_keyboardinterrupt():
proc_holder = {}
started = threading.Event()
raise_at = [None] # set by the main thread to tell worker when
original_run_bash = env._run_bash
def capture_run_bash(cmd_string, *args, **kwargs):
proc = original_run_bash(cmd_string, *args, **kwargs)
if "sleep 30" in cmd_string:
proc_holder["proc"] = proc
started.set()
return proc
env._run_bash = capture_run_bash
# Drive execute() on a separate thread so we can SIGNAL-interrupt it
# via a thread-targeted exception without killing our test process.
@@ -136,42 +146,11 @@ def test_wait_for_process_kills_subprocess_on_keyboardinterrupt():
t = threading.Thread(target=worker, daemon=True)
t.start()
# Wait until the subprocess actually exists. LocalEnvironment.execute
# does init_session() (one spawn) before the real command, so we need
# to wait until a sleep 30 is visible. Use pgrep-style lookup via
# /proc to find the bash process running our sleep.
deadline = time.monotonic() + 5.0
target_pid = None
while time.monotonic() < deadline:
# Walk our children and grand-children to find one running 'sleep 30'
try:
import psutil # optional — fall back if absent
for p in psutil.Process(os.getpid()).children(recursive=True):
try:
if "sleep 30" in " ".join(p.cmdline()):
target_pid = p.pid
break
except (psutil.NoSuchProcess, psutil.AccessDenied):
continue
except ImportError:
# Fall back to ps
ps = subprocess.run(
["ps", "-eo", "pid,ppid,pgid,cmd"], capture_output=True, text=True,
)
for line in ps.stdout.splitlines():
if "sleep 30" in line and "grep" not in line:
parts = line.split()
if parts and parts[0].isdigit():
target_pid = int(parts[0])
break
if target_pid:
break
time.sleep(0.1)
assert target_pid is not None, (
"test setup: couldn't find 'sleep 30' subprocess after 5 s"
assert started.wait(timeout=5.0), (
"test setup: sleep 30 command was not spawned after 5 s"
)
pgid = os.getpgid(target_pid)
proc = proc_holder["proc"]
pgid = getattr(proc, "_hermes_pgid", None) or os.getpgid(proc.pid)
assert _pgid_still_alive(pgid), "sanity: subprocess should be alive"
# Now inject a KeyboardInterrupt into the worker thread the same