CI gate RED on all core PRs: 4 conductor-snapshot tests use a frozen literal ts (10-min freshness time-bomb) -> /repos/// crash #2550

Closed
opened 2026-06-10 18:41:58 +00:00 by core-be · 1 comment
Member

Summary

The required commit-status context Ops Scripts Tests / Ops scripts (unittest) (pull_request) is RED on every core PR, including identically on origin/main. This blocks the core merge gate for all PRs that touch scripts/** or .gitea/scripts/** (and the gate-check sees the red context regardless).

Reproduced on a fresh origin/main checkout:

4 failed, 379 passed, 2 skipped, 75 subtests passed
FAILED .gitea/scripts/tests/test_gitea_merge_queue.py::test_list_candidate_issues_uses_snapshot_when_present
FAILED .gitea/scripts/tests/test_gitea_merge_queue.py::test_list_queued_issues_uses_snapshot_label_filter
FAILED .gitea/scripts/tests/test_gitea_merge_queue.py::test_get_combined_status_uses_snapshot_when_sha_matches
FAILED .gitea/scripts/tests/test_status_reaper_api.py::test_get_combined_status_uses_snapshot_when_sha_matches

Note on the green-workflow trap: the job carries continue-on-error: true (the mc#1982 mask), so the workflow shows green — but the per-step pytest failure still posts the required commit-status context as failure, and that is what the gate keys off. Don't be misled by the green check on the Actions tab.

Failing tests (exact)

  • test_gitea_merge_queue.py::test_list_candidate_issues_uses_snapshot_when_present
  • test_gitea_merge_queue.py::test_list_queued_issues_uses_snapshot_label_filter
  • test_gitea_merge_queue.py::test_get_combined_status_uses_snapshot_when_sha_matches
  • test_status_reaper_api.py::test_get_combined_status_uses_snapshot_when_sha_matches

All four crash with:

ValueError / ApiError: GET /repos///commits/<sha>/status ...

and captured stdout:

::notice::conductor snapshot stale (NNNNNs); self-fetching

Root cause (verified — this is a time-bomb test, not a flake)

The snapshot fixtures hardcode a frozen literal timestamp:

  • test_gitea_merge_queue.py: def _make_snapshot(prs, ts="2026-06-10T12:00:00Z")
  • test_status_reaper_api.py: two inline "ts": "2026-06-10T12:00:00Z" snapshot dicts

load_conductor_snapshot() (in .gitea/scripts/gitea-merge-queue.py lines ~247-264, mirrored in status-reaper.py ~185-202) only honors a snapshot within a 10-minute freshness window:

age_sec = (datetime.now(timezone.utc) - ts).total_seconds()
if age_sec > 600:  # 10 minutes
    print(f"::notice::conductor snapshot stale ({int(age_sec)}s); self-fetching")
    return None

These tests were authored on 2026-06-10 and only passed during the ~10-minute window after 2026-06-10T12:00:00Z. Once wall-clock passes 12:10:00Z, the loader treats the frozen snapshot as stale → returns None → the code falls through to a real API self-fetch → with empty GITEA_OWNER/GITEA_REPO env in the test harness this builds a malformed /repos///... URL → crash.

Mechanism is named, not "environmental": frozen-literal snapshot timestamp vs. a relative-age freshness check. The sibling test test_load_conductor_snapshot_ignores_stale_snapshot already does the correct thing — it builds ts as now - 15min — so the fix is to make the "uses_snapshot" fixtures equally relative.

Verified fix direction

Make _make_snapshot / the inline snapshot dicts default to a fresh now()-based timestamp (a _fresh_ts() helper), so the snapshot is always inside the freshness window when the suite runs. Tests that intentionally want a STALE snapshot pass ts= explicitly (the existing stale test is unchanged). Test-only change; no production-script change required.

Verified locally on a clean origin/main checkout: after patching both fixtures to a relative now() ts, the suite goes from 4 failed, 379 passed to 383 passed, 2 skipped with zero failures.

(Alternative considered and rejected: monkeypatch/inject the staleness check per-test — more invasive, and the fixture being time-relative is the more honest fix since it mirrors real conductor behavior. Bare-disabling the tests is explicitly NOT acceptable — it would re-mask the snapshot-consumption coverage.)

Impact

This required red context blocks the core merge gate for every affected PR, including the in-flight CI-hygiene stack:

  • #2548 — fix(merge-queue): silent skip for non-main base PRs (the durable skip-comment-flood fix)
  • #2539 — ci(lint): guard actions/setup-go cache on self-hosted fleet
  • #2541 — ci(lint): forbid continue-on-error on required-context jobs

Until this is fixed, those PRs cannot ride a clean gate.

A fix PR is being opened alongside this issue (test-only).

## Summary The required commit-status context **`Ops Scripts Tests / Ops scripts (unittest) (pull_request)`** is RED on **every** core PR, including identically on `origin/main`. This blocks the core merge gate for all PRs that touch `scripts/**` or `.gitea/scripts/**` (and the gate-check sees the red context regardless). Reproduced on a fresh `origin/main` checkout: ``` 4 failed, 379 passed, 2 skipped, 75 subtests passed FAILED .gitea/scripts/tests/test_gitea_merge_queue.py::test_list_candidate_issues_uses_snapshot_when_present FAILED .gitea/scripts/tests/test_gitea_merge_queue.py::test_list_queued_issues_uses_snapshot_label_filter FAILED .gitea/scripts/tests/test_gitea_merge_queue.py::test_get_combined_status_uses_snapshot_when_sha_matches FAILED .gitea/scripts/tests/test_status_reaper_api.py::test_get_combined_status_uses_snapshot_when_sha_matches ``` > Note on the green-workflow trap: the job carries `continue-on-error: true` (the mc#1982 mask), so the **workflow** shows green — but the per-step `pytest` failure still posts the **required commit-status context** as `failure`, and that is what the gate keys off. Don't be misled by the green check on the Actions tab. ## Failing tests (exact) - `test_gitea_merge_queue.py::test_list_candidate_issues_uses_snapshot_when_present` - `test_gitea_merge_queue.py::test_list_queued_issues_uses_snapshot_label_filter` - `test_gitea_merge_queue.py::test_get_combined_status_uses_snapshot_when_sha_matches` - `test_status_reaper_api.py::test_get_combined_status_uses_snapshot_when_sha_matches` All four crash with: ``` ValueError / ApiError: GET /repos///commits/<sha>/status ... ``` and captured stdout: ``` ::notice::conductor snapshot stale (NNNNNs); self-fetching ``` ## Root cause (verified — this is a time-bomb test, not a flake) The snapshot fixtures hardcode a **frozen literal** timestamp: - `test_gitea_merge_queue.py`: `def _make_snapshot(prs, ts="2026-06-10T12:00:00Z")` - `test_status_reaper_api.py`: two inline `"ts": "2026-06-10T12:00:00Z"` snapshot dicts `load_conductor_snapshot()` (in `.gitea/scripts/gitea-merge-queue.py` lines ~247-264, mirrored in `status-reaper.py` ~185-202) only honors a snapshot **within a 10-minute freshness window**: ```python age_sec = (datetime.now(timezone.utc) - ts).total_seconds() if age_sec > 600: # 10 minutes print(f"::notice::conductor snapshot stale ({int(age_sec)}s); self-fetching") return None ``` These tests were authored on 2026-06-10 and only passed during the ~10-minute window after `2026-06-10T12:00:00Z`. Once wall-clock passes `12:10:00Z`, the loader treats the frozen snapshot as stale → returns `None` → the code falls through to a **real API self-fetch** → with empty `GITEA_OWNER`/`GITEA_REPO` env in the test harness this builds a malformed `/repos///...` URL → crash. Mechanism is named, not "environmental": **frozen-literal snapshot timestamp vs. a relative-age freshness check.** The sibling test `test_load_conductor_snapshot_ignores_stale_snapshot` already does the correct thing — it builds `ts` as `now - 15min` — so the fix is to make the "uses_snapshot" fixtures equally relative. ## Verified fix direction Make `_make_snapshot` / the inline snapshot dicts default to a **fresh `now()`-based** timestamp (a `_fresh_ts()` helper), so the snapshot is always inside the freshness window when the suite runs. Tests that intentionally want a STALE snapshot pass `ts=` explicitly (the existing stale test is unchanged). Test-only change; no production-script change required. Verified locally on a clean `origin/main` checkout: after patching both fixtures to a relative `now()` ts, the suite goes from `4 failed, 379 passed` to **`383 passed, 2 skipped`** with zero failures. (Alternative considered and rejected: monkeypatch/inject the staleness check per-test — more invasive, and the fixture being time-relative is the more honest fix since it mirrors real conductor behavior. Bare-disabling the tests is explicitly NOT acceptable — it would re-mask the snapshot-consumption coverage.) ## Impact This required red context blocks the core merge gate for every affected PR, including the in-flight CI-hygiene stack: - **#2548** — fix(merge-queue): silent skip for non-main base PRs (the durable skip-comment-flood fix) - **#2539** — ci(lint): guard `actions/setup-go` cache on self-hosted fleet - **#2541** — ci(lint): forbid `continue-on-error` on required-context jobs Until this is fixed, those PRs cannot ride a clean gate. A fix PR is being opened alongside this issue (test-only).
core-be added the area/cikind/infrastructurerelease-blocker labels 2026-06-10 18:41:58 +00:00
Author
Member

Fix PR opened: #2551 (test-only, +24/-4). Verified in live CI on the PR head: Ops Scripts Tests / Ops scripts (unittest) is now success (was failure), gate-check-v3 green, qa-review + security-review approved, mergeable. Root cause confirmed: frozen-literal snapshot ts vs. the 10-minute freshness window in load_conductor_snapshot().

Fix PR opened: #2551 (test-only, +24/-4). Verified in live CI on the PR head: `Ops Scripts Tests / Ops scripts (unittest)` is now **success** (was failure), gate-check-v3 green, qa-review + security-review approved, mergeable. Root cause confirmed: frozen-literal snapshot ts vs. the 10-minute freshness window in `load_conductor_snapshot()`.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#2550