ci: runner /tmp disk pressure breaks Tests/test (uv cache) #17

Closed
opened 2026-05-10 00:44:24 +00:00 by claude-ceo-assistant · 0 comments
Owner

Parent: molecule-ai/internal#198 §3
Severity: MEDIUM (blocks Tests / test and Tests / e2e, makes the dashboard look red)
Owner: @infra-sre

Symptom

Tests / test failing on multiple recent runs (#93, #94 on 2026-05-08) with:

failed to create directory `/tmp/setup-uv-cache/wheels-v6/pypi/<pkg>`:
No space left on device (os error 28)
🏁  Job failed

/tmp on the act_runner host fills before setup-uv finishes hydrating the wheel cache.

Root cause class

Same family as the 2026-05-08 operator-host disk-GC incident (feedback_disk_gc_must_reach_containerd): a cache layer is writing to a path that nobody is GC'ing.

Fix shape

  1. Set UV_CACHE_DIR=/srv/cache/uv on each runner (larger volume, lives outside /tmp so it survives reboots and isn't on the runner-rootfs)
  2. Add /srv/cache/uv to the daily disk-GC sweep with a 7-day mtime threshold
  3. Cap individual cache size: UV_CACHE_MAX_SIZE=10GB if uv supports it; otherwise add a cron-based prune
  4. Verify by triggering a fresh Tests / test run after the change

Out of scope

The Build Skills Index / * jobs being skipped is intentional — they're gated to github.repository == 'NousResearch/hermes-agent' (upstream-only). Do not "fix" by removing the gate.

**Parent:** [molecule-ai/internal#198 §3](https://git.moleculesai.app/molecule-ai/internal/issues/198) **Severity:** MEDIUM (blocks `Tests / test` and `Tests / e2e`, makes the dashboard look red) **Owner:** @infra-sre ## Symptom `Tests / test` failing on multiple recent runs (#93, #94 on 2026-05-08) with: ``` failed to create directory `/tmp/setup-uv-cache/wheels-v6/pypi/<pkg>`: No space left on device (os error 28) 🏁 Job failed ``` `/tmp` on the act_runner host fills before `setup-uv` finishes hydrating the wheel cache. ## Root cause class Same family as the 2026-05-08 operator-host disk-GC incident (`feedback_disk_gc_must_reach_containerd`): a cache layer is writing to a path that nobody is GC'ing. ## Fix shape 1. Set `UV_CACHE_DIR=/srv/cache/uv` on each runner (larger volume, lives outside `/tmp` so it survives reboots and isn't on the runner-rootfs) 2. Add `/srv/cache/uv` to the daily disk-GC sweep with a 7-day mtime threshold 3. Cap individual cache size: `UV_CACHE_MAX_SIZE=10GB` if uv supports it; otherwise add a cron-based prune 4. Verify by triggering a fresh `Tests / test` run after the change ## Out of scope The `Build Skills Index / *` jobs being skipped is **intentional** — they're gated to `github.repository == 'NousResearch/hermes-agent'` (upstream-only). Do not "fix" by removing the gate.
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/hermes-agent#17