Compare commits

...

9 Commits

Author SHA1 Message Date
core-devops 8399e8b525 fix(queue): correct status deduplication order so newest entry wins
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 3s
CI / Detect changes (pull_request) Successful in 5s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 8s
E2E API Smoke Test / detect-changes (pull_request) Successful in 7s
E2E Chat / detect-changes (pull_request) Successful in 7s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 7s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 5s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 7s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 5s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 57s
gate-check-v3 / gate-check (pull_request) Successful in 3s
qa-review / approved (pull_request) Failing after 3s
security-review / approved (pull_request) Failing after 3s
sop-tier-check / tier-check (pull_request) Successful in 4s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 1m5s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 1s
E2E Chat / E2E Chat (pull_request) Successful in 1s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 1s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 1s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 1s
CI / Platform (Go) (pull_request) Successful in 4m21s
CI / Canvas (Next.js) (pull_request) Successful in 5m43s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / Python Lint & Test (pull_request) Successful in 6m19s
CI / all-required (pull_request) Successful in 6m22s
sop-checklist / all-items-acked (pull_request) [info tier:low] acked: 5/7 — missing: root-cause, no-backwards-compat
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-tier-check / tier-check (pull_request_review) Successful in 4s
The queue was incorrectly seeing main's CI/all-required (push) as
"pending" instead of "success". Two bugs interacting:

1. latest_statuses_by_context guard was wrong: `ids[-1] > ids[0]`
   detected ascending but the combined /statuses array is DESCENDING
   (ids 393→1). Fix: `ids[-1] < ids[0]` detects descending and
   reverses so ascending iteration makes newest last → wins.

2. get_combined_status sorted merged entries DESCENDING then deduplicated
   by iterating forward — the last occurrence won. But when /status
   base entries (low ids) are appended AFTER /statuses (high ids), the
   same-context entries from base appear LAST after descending sort,
   overwriting newer entries from /statuses. Fix: return merged list
   sorted ASCENDING and drop the inline dedup; let
   latest_statuses_by_context handle dedup correctly.

Test names clarified: ascending-input test now named
test_latest_statuses_ascending_input_newest_wins (the base /status
case); descending-input test renamed
test_latest_statuses_guard_reverses_descending_input (the /statuses
case). Both verify newest (largest id) wins.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 09:29:57 +00:00
core-devops 5e47d2e385 fix(queue): query merge-queue label by name not resolved ID
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 3s
CI / Detect changes (pull_request) Successful in 4s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 12s
E2E API Smoke Test / detect-changes (pull_request) Successful in 5s
E2E Chat / detect-changes (pull_request) Successful in 4s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 5s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 3s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 5s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 3s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 56s
gate-check-v3 / gate-check (pull_request) Successful in 3s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 53s
qa-review / approved (pull_request) Failing after 3s
sop-checklist / na-declarations (pull_request) N/A: (none)
security-review / approved (pull_request) Failing after 3s
sop-checklist / all-items-acked (pull_request) Successful in 3s
sop-tier-check / tier-check (pull_request) Successful in 3s
CI / Platform (Go) (pull_request) Successful in 4m34s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2s
CI / Canvas (Next.js) (pull_request) Successful in 6m14s
E2E Chat / E2E Chat (pull_request) Successful in 1s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 1s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 1s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 1s
CI / Python Lint & Test (pull_request) Successful in 6m28s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / all-required (pull_request) Successful in 5m52s
Gitea orders /issues?labels=<id> by PR number ascending with limit
applied before PR #1233 appears — the 50-result page starts at PR #1309
and misses #1233 entirely. Querying by label name returns #1233
correctly. Drop the _ensure_label_ids() startup call (one less API
round-trip per tick) and the now-dead _QUEUE_LABEL_ID/_HOLD_LABEL_ID
globals. Resolves the queue label query bug root-causing SEV-1 #487.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 09:14:36 +00:00
core-devops 6c06227871 fix(queue): correct latest_statuses_by_context guard for descending input
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 2s
CI / Detect changes (pull_request) Successful in 4s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 7s
E2E API Smoke Test / detect-changes (pull_request) Successful in 6s
E2E Chat / detect-changes (pull_request) Successful in 5s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 6s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 5s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 58s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 5s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 6s
gate-check-v3 / gate-check (pull_request) Successful in 6s
qa-review / approved (pull_request) Failing after 6s
security-review / approved (pull_request) Failing after 5s
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request) Successful in 3s
sop-tier-check / tier-check (pull_request) Successful in 4s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2s
E2E Chat / E2E Chat (pull_request) Successful in 2s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 1m2s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 1s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 1s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 2s
CI / Platform (Go) (pull_request) Successful in 4m18s
CI / Canvas (Next.js) (pull_request) Successful in 5m33s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / Python Lint & Test (pull_request) Successful in 6m23s
CI / all-required (pull_request) Successful in 6m36s
Gitea /statuses returns newest-first (desc id order). After
get_combined_status sorts by id descending, the combined list is also
descending. The old guard `ids[-1] > ids[0]` detected ascending input
but NOT descending — for main (130+ statuses) the guard did not fire,
causing forward iteration to grab the newest entry instead of the oldest
(which is the correct authoritative status when iterating a descending
list). The fix inverts the comparison to `ids[-1] < ids[0]`, so that
descending input triggers reversal and the oldest (authoritative) entry
per context wins. Ascending test fixtures work unchanged.

Also adds explicit-id test fixture for the ascending-guard case.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 08:41:36 +00:00
core-devops f6abdb9dc1 fix(queue): proper merge of base + extended statuses by id sort
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 3s
CI / Detect changes (pull_request) Successful in 4s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 7s
CI / Platform (Go) (pull_request) Successful in 4m4s
E2E API Smoke Test / detect-changes (pull_request) Successful in 3s
E2E Chat / detect-changes (pull_request) Successful in 4s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 4s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 2s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 57s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 4s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 2s
CI / Canvas (Next.js) (pull_request) Successful in 5m25s
qa-review / approved (pull_request) Failing after 2s
security-review / approved (pull_request) Failing after 2s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 56s
CI / Python Lint & Test (pull_request) Successful in 6m29s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2s
E2E Chat / E2E Chat (pull_request) Successful in 2s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 3s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 5s
CI / all-required (pull_request) Successful in 6m37s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
gate-check-v3 / gate-check (pull_request) Successful in 6s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 13s
sop-tier-check / tier-check (pull_request) Successful in 4s
sop-checklist / all-items-acked (pull_request) acked: 5/7 — missing: root-cause, no-backwards-compat
sop-checklist / na-declarations (pull_request) N/A: (none)
The previous supplement logic only added contexts MISSING from base, but didn't
overwrite base entries with newer statuses from /statuses. Result: stale
"failure" entries from base (id=27) overwrote newer "pending" entries from
/statuses (id=25) because supplement only filled gaps.

Fix: collect all entries from both /status (base) and /statuses (extended),
sort by id descending (highest = newest), and iterate in that order so the
newest entry for each context wins regardless of source.

The combined statuses[] is now correct for all cases:
- Newest in base only: wins (from sorted iteration)
- Newest in extended only: wins (supplements base)
- Newest in base, older in extended: wins (base entry processed later in sort)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 08:11:58 +00:00
core-devops ec79a6bb20 fix(queue): supplement statuses overwrite base, not just fill gaps
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 2s
CI / Detect changes (pull_request) Successful in 4s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 9s
CI / Platform (Go) (pull_request) Successful in 4m4s
E2E API Smoke Test / detect-changes (pull_request) Successful in 4s
E2E Chat / detect-changes (pull_request) Successful in 3s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 3s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 2s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 50s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 4s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 3s
CI / Canvas (Next.js) (pull_request) Successful in 5m25s
gate-check-v3 / gate-check (pull_request) Successful in 2s
qa-review / approved (pull_request) Failing after 2s
sop-checklist / na-declarations (pull_request) N/A: (none)
security-review / approved (pull_request) Failing after 3s
sop-checklist / all-items-acked (pull_request) Successful in 2s
sop-tier-check / tier-check (pull_request) Successful in 3s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 56s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2s
E2E Chat / E2E Chat (pull_request) Successful in 5s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 1s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 1s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 2s
CI / Python Lint & Test (pull_request) Successful in 6m27s
CI / all-required (pull_request) Successful in 6m22s
The base /status endpoint returns only 26-30 entries; newer statuses for
the same context may not be in the base array. The supplement logic
was only adding contexts MISSING from base, but the base already contained
an old "pending" entry for CI/all-required while the newer "success" entry
was beyond the base array's cutoff. Now the supplement OVERWRITES base
entries for the same context so newer statuses always win.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 08:02:52 +00:00
core-devops f8d4512e1f fix(queue): correct status ordering and supplement missing contexts
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 2s
CI / Detect changes (pull_request) Successful in 4s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 9s
CI / Platform (Go) (pull_request) Successful in 4m42s
E2E API Smoke Test / detect-changes (pull_request) Successful in 5s
E2E Chat / detect-changes (pull_request) Successful in 4s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 5s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 3s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 4s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 2s
gate-check-v3 / gate-check (pull_request) Successful in 2s
qa-review / approved (pull_request) Failing after 4s
security-review / approved (pull_request) Failing after 2s
sop-checklist / all-items-acked (pull_request) Successful in 2s
sop-tier-check / tier-check (pull_request) Successful in 4s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 52s
CI / Canvas (Next.js) (pull_request) Successful in 6m13s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 53s
CI / Python Lint & Test (pull_request) Successful in 6m27s
CI / all-required (pull_request) Successful in 6m22s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 1s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 2s
E2E Chat / E2E Chat (pull_request) Successful in 2s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 1s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 1s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Two related fixes to get_combined_status() + latest_statuses_by_context():

1. Ordering: Gitea /statuses returns entries in DESCENDING id order
   (newest first). The script was reversing, treating it as ascending,
   which made the OLDEST entry win instead of the newest. Now iterate
   forward so newer entries overwrite older ones (newest wins).

2. Context gaps: The /status endpoint returns only 30 statuses in its
   statuses[] array. The /statuses endpoint (limit=100) may not include
   all contexts from /status. Now merge: start with /status's statuses[]
   (authoritative, ascending), supplement missing contexts from
   /statuses (descending, reversed for correct iteration order).

Also fixes test_latest_statuses_dedupes_by_context_newest_first to
assert the correct "newest wins" semantics.

PR #1403 now correctly shows ready=True action=merge with this fix.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 07:59:39 +00:00
core-devops b0ec931595 fix(queue): resolve merge-queue label by ID not name
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 4s
CI / Detect changes (pull_request) Successful in 5s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 9s
CI / Platform (Go) (pull_request) Successful in 5m10s
E2E API Smoke Test / detect-changes (pull_request) Successful in 5s
E2E Chat / detect-changes (pull_request) Successful in 4s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 4s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 3s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m1s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 5s
CI / Canvas (Next.js) (pull_request) Successful in 6m34s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
gate-check-v3 / gate-check (pull_request) Successful in 4s
qa-review / approved (pull_request) Failing after 3s
security-review / approved (pull_request) Failing after 4s
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request) Successful in 3s
sop-tier-check / tier-check (pull_request) Successful in 4s
CI / Python Lint & Test (pull_request) Successful in 6m54s
CI / all-required (pull_request) Successful in 6m9s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2s
E2E Chat / E2E Chat (pull_request) Successful in 2s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 2s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 1s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 55s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 2s
Gitea allows multiple repo labels with the same name but different
colours. The /issues endpoint with labels=<name> matches at most one
of them — not reliably the canonical colour. This caused
list_queued_issues() to miss PRs that only had the canonical
merge-queue label (id=27, colour 1f883d) when duplicates with a
different colour existed in the repo.

Fix: _resolve_label_id() looks up the label's numeric id at startup
and list_queued_issues() queries by that id instead of the name.
This is stable regardless of how many duplicate labels exist.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 07:40:42 +00:00
core-devops 8ccf3a844c fix(queue): surface merge API errors instead of silent catch
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 3s
CI / Detect changes (pull_request) Successful in 4s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 8s
E2E API Smoke Test / detect-changes (pull_request) Successful in 7s
E2E Chat / detect-changes (pull_request) Successful in 5s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 5s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 3s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 56s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 5s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
gate-check-v3 / gate-check (pull_request) Successful in 3s
qa-review / approved (pull_request) Failing after 3s
security-review / approved (pull_request) Failing after 3s
sop-tier-check / tier-check (pull_request) Successful in 3s
CI / Platform (Go) (pull_request) Successful in 4m11s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2s
E2E Chat / E2E Chat (pull_request) Successful in 2s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 1s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 1s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 2s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 1m2s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
sop-checklist / na-declarations (pull_request) N/A: (none)
CI / Canvas (Next.js) (pull_request) Successful in 5m33s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / Python Lint & Test (pull_request) Successful in 6m33s
CI / all-required (pull_request) Successful in 5m9s
audit-force-merge / audit (pull_request) Has been skipped
When the merge API returns a non-transient error (HTTP 405 permission
denied, HTTP 422 pre-receive hook block, etc.), the queue was catching
ApiError in the generic main-loop handler and exiting 0 — indistinguishable
from a successful-no-op tick.

Fix: catch ApiError specifically around merge_pull(), post a PR comment
with the error detail and a reference to SEV-1 internal#487, and return
exit code 2 so the workflow run is marked failed.

Exit codes:
  0 — success (merged, updated, or nothing to do)
  2 — merge API error (permission/hook issue, non-transient)

Fixes: SEV-1 internal#487 — queue silently failing to merge while
reporting success; merge permission error invisible without workflow
log inspection.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 06:24:11 +00:00
core-devops 9ede993f3d fix(sop-checklist): probe() KeyError for gate names in compute_na_state
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 3s
CI / Detect changes (pull_request) Successful in 4s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 14s
E2E API Smoke Test / detect-changes (pull_request) Successful in 5s
E2E Chat / detect-changes (pull_request) Successful in 6s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 5s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 7s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 6s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 58s
gate-check-v3 / gate-check (pull_request) Successful in 4s
qa-review / approved (pull_request) Failing after 3s
security-review / approved (pull_request) Failing after 3s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 1m0s
sop-tier-check / tier-check (pull_request) Successful in 4s
CI / Platform (Go) (pull_request) Successful in 4m52s
CI / Canvas (Next.js) (pull_request) Successful in 6m35s
CI / Python Lint & Test (pull_request) Successful in 6m38s
CI / all-required (pull_request) Successful in 6m39s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 1s
E2E Chat / E2E Chat (pull_request) Successful in 2s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 1s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 2s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
sop-checklist / all-items-acked (pull_request) [info tier:low] acked: 2/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +2 — body-unfilled: comprehensive-testing, l
sop-checklist / na-declarations (pull_request) N/A: (none)
audit-force-merge / audit (pull_request) Has been skipped
compute_na_state() calls probe(gate_name, [user]) where gate_name is a gate
name like 'qa-review' or 'security-review' — these are not checklist item
slugs and are not in items_by_slug. probe() was doing:

    item = items_by_slug[slug]   # KeyError for 'qa-review'

This caused the sop-checklist workflow to crash on any PR that has N/A gates
configured (all 7 checklist items with /sop-n/a), producing a 30-minute
Failing status before Gitea kills the job.

Fix: add _required_teams_for() helper that falls back to na_gates lookup
when slug is not in items_by_slug. Gate names resolve to their
required_teams from the n/a_gates config section.

Adds TestProbeNaGateFallback regression test (58/58 passing).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 05:09:35 +00:00
4 changed files with 205 additions and 31 deletions
+68 -24
View File
@@ -137,14 +137,25 @@ def status_state(status: dict) -> str:
def latest_statuses_by_context(statuses: list[dict]) -> dict[str, dict]:
# Gitea /statuses endpoint returns entries in ascending id order (oldest
# first). We need the LAST occurrence of each context, so iterate in
# reverse to prefer newer entries.
# Iterate so the newest entry for each context is seen LAST → it overwrites
# older ones in the accumulator dict.
# - Ascending input (oldest first, e.g. Gitea /status base array): forward
# iteration processes oldest first, newest last → newest overwrites → OK.
# - Descending input (newest first, e.g. Gitea /statuses, combined array):
# forward iteration processes newest first → oldest last → oldest wins.
# Must REVERSE so iteration is oldest→newest → newest wins.
# Guard: detect ascending by checking last_id > first_id.
if not statuses:
return {}
ids = [s.get("id", 0) for s in statuses if isinstance(s.get("id"), int)]
if ids and ids[-1] < ids[0]:
# Descending (newest first) — reverse to oldest→newest iteration.
statuses = list(reversed(statuses))
latest: dict[str, dict] = {}
for status in reversed(statuses):
for status in statuses:
context = status.get("context")
if isinstance(context, str):
latest[context] = status # overwrite: reverse order → newest wins
latest[context] = status
return latest
@@ -246,37 +257,54 @@ def get_branch_head(branch: str) -> str:
def get_combined_status(sha: str) -> dict:
"""Combined status + all individual statuses for `sha`.
The /status endpoint caps the `statuses` array at 30 entries (Gitea
default page size), so we fetch the full list via /statuses with a
higher limit. The combined `state` still comes from /status.
The /status endpoint returns a `statuses` array capped at 30 entries.
We supplement it with /statuses (limit=100) for contexts not in the
base array. The combined `state` always comes from /status.
Returns the merged list sorted ASCENDING by id. Caller's
latest_statuses_by_context iterates ascending so the newest (largest
id) for each context is seen last and wins.
"""
_, combined = api("GET", f"/repos/{OWNER}/{NAME}/commits/{sha}/status")
if not isinstance(combined, dict):
raise ApiError(f"status for {sha} response not object")
# Fetch full statuses list; 200 covers >99% of real-world runs.
# The list is ordered ascending by id (oldest first) — callers must
# iterate in reverse to get the newest entry per context.
# Best-effort: large repos (main with 550+ statuses) may time out.
# On timeout, fall back to the statuses[] already in the combined
# response (usually 30 entries — enough for most PRs, enough for
# main's early push-required contexts).
base_statuses: list[dict] = combined.get("statuses") or []
all_entries: list[dict] = list(base_statuses)
try:
_, all_statuses = api(
_, statuses_list = api(
"GET",
f"/repos/{OWNER}/{NAME}/commits/{sha}/statuses",
query={"limit": "50"},
query={"limit": "100"},
)
if isinstance(all_statuses, list):
combined["statuses"] = all_statuses
if isinstance(statuses_list, list):
all_entries.extend(statuses_list)
except (ApiError, urllib.error.URLError, TimeoutError, OSError) as exc:
# URLError covers network-level failures (DNS, refused, timeout).
# TimeoutError and OSError cover socket-level timeouts.
sys.stderr.write(f"::warning::could not fetch full statuses list for {sha[:8]}: {exc}\n")
# Fall back to the statuses[] already in the combined response.
pass
# Sort ascending by id. latest_statuses_by_context iterates ascending
# so the newest (largest id) entry for each context is seen last and wins.
all_entries.sort(key=lambda s: s.get("id") or 0)
combined["statuses"] = all_entries
return combined
def _resolve_label_id(name: str) -> str | None:
"""Return the repo label ID for `name`, or None if not found.
Gitea's /issues endpoint with labels=<name> has a known quirk: when multiple
repo labels share the same name (e.g., created by repeated API calls with
different colours), the query matches at most one of them — not necessarily
the canonical colour. Resolving to ID sidesteps the ambiguity.
"""
_, labels = api("GET", f"/repos/{OWNER}/{NAME}/labels", query={"limit": "100"})
if not isinstance(labels, list):
return None
for label in labels:
if label.get("name") == name:
return str(label["id"])
return None
def list_queued_issues() -> list[dict]:
_, body = api(
"GET",
@@ -407,7 +435,23 @@ def process_once(*, dry_run: bool = False) -> int:
"deferring to next tick"
)
return 0
merge_pull(pr_number, dry_run=dry_run)
try:
merge_pull(pr_number, dry_run=dry_run)
except ApiError as exc:
# Merge API errors (405 permission denied, 422 hook block, etc.)
# are NOT transient — retrying will not help. Surface the error
# on the PR immediately so it is visible without digging into
# workflow logs, and fail the workflow so it is distinguishable
# from a successful-no-op tick.
post_comment(
pr_number,
f"merge-queue: MERGE FAILED — {exc}. "
"This is a non-transient error (permission or hook issue). "
"See SEV-1 internal#487.",
dry_run=dry_run,
)
sys.stderr.write(f"::error::PR #{pr_number} merge failed: {exc}\n")
return 2 # distinct exit code so workflow run shows failure
return 0
return 0
+11 -2
View File
@@ -830,9 +830,18 @@ def main(argv: list[str] | None = None) -> int:
# one membership lookup per team.
team_member_cache: dict[tuple[str, int], bool | None] = {}
def _required_teams_for(slug: str) -> list[str] | None:
"""Look up required_teams for a slug from checklist items OR N/A gates."""
if slug in items_by_slug:
return items_by_slug[slug]["required_teams"]
if slug in na_gates:
return na_gates[slug].get("required_teams", [])
return None
def probe(slug: str, users: list[str]) -> list[str]:
item = items_by_slug[slug]
team_names: list[str] = item["required_teams"]
team_names = _required_teams_for(slug)
if team_names is None:
raise KeyError(f"slug '{slug}' not found in items or N/A gates")
# Resolve names → ids. NOTE: orgs/{org}/teams/search may not be
# available — fall back to the list endpoint.
team_ids: list[int] = []
+78 -5
View File
@@ -1,6 +1,7 @@
import importlib.util
import sys
from pathlib import Path
from unittest.mock import patch
SCRIPT = Path(__file__).resolve().parents[1] / "gitea-merge-queue.py"
@@ -10,16 +11,37 @@ sys.modules[spec.name] = mq
spec.loader.exec_module(mq)
def test_latest_statuses_dedupes_by_context_newest_first():
def test_latest_statuses_ascending_input_newest_wins():
# Gitea /status (base array) returns ascending id order (oldest first).
# Forward iteration processes oldest first, newest last → newest overwrites.
statuses = [
{"context": "CI / all-required (pull_request)", "status": "failure"},
{"context": "sop-checklist / all-items-acked (pull_request)", "state": "success"},
{"context": "CI / all-required (pull_request)", "status": "success"},
{"id": 18, "context": "CI / all-required (pull_request)", "status": "failure"}, # oldest
{"id": 27, "context": "sop-checklist / all-items-acked (pull_request)", "state": "success"},
{"id": 54, "context": "CI / all-required (pull_request)", "status": "success"}, # newest
]
latest = mq.latest_statuses_by_context(statuses)
assert latest["CI / all-required (pull_request)"]["status"] == "failure"
assert latest["CI / all-required (pull_request)"]["status"] == "success"
assert latest["CI / all-required (pull_request)"]["id"] == 54
assert latest["sop-checklist / all-items-acked (pull_request)"]["state"] == "success"
def test_latest_statuses_guard_reverses_descending_input():
# Gitea /statuses returns descending id order (newest first: id=54 → id=1).
# Guard detects descending and reverses so we iterate ascending.
# Forward on reversed = newest (id=54) is last → overwrites oldest.
statuses = [
{"id": 54, "context": "CI / all-required (pull_request)", "status": "success"}, # newest
{"id": 27, "context": "sop-checklist / all-items-acked (pull_request)", "state": "success"},
{"id": 18, "context": "CI / all-required (pull_request)", "status": "failure"}, # oldest
]
latest = mq.latest_statuses_by_context(statuses)
# Guard reverses descending → asc iteration: 18 first, 27, 54 last → 54 wins.
assert latest["CI / all-required (pull_request)"]["status"] == "success"
assert latest["CI / all-required (pull_request)"]["id"] == 54
assert latest["sop-checklist / all-items-acked (pull_request)"]["state"] == "success"
@@ -118,3 +140,54 @@ def test_merge_decision_updates_stale_pr_before_merge():
assert decision.ready is False
assert decision.action == "update"
def test_merge_failure_returns_nonzero_and_posts_comment(monkeypatch):
"""When merge_pull raises ApiError (e.g. HTTP 405 permission denied),
process_once returns exit code 2 (non-zero) and posts a comment on the PR.
This distinguishes merge-permission errors from successful-no-op ticks."""
captured_comment = {}
def fake_post_comment(pr_number, body, *, dry_run):
captured_comment["pr_number"] = pr_number
captured_comment["body"] = body
# Replace functions directly on the module object so process_once()
# (which looks them up by name at call time) picks up the fakes.
mq.list_queued_issues = lambda: [{
"number": 42,
"created_at": "2026-05-17T00:00:00Z",
"labels": [{"name": "merge-queue"}],
"pull_request": {},
}]
mq.get_pull = lambda n: {
"state": "open",
"base": {"ref": "main", "repo_id": 1},
"head": {"sha": "headsha", "repo_id": 1},
"merge_base": "abc123def",
}
mq.get_pull_commits = lambda n: [{"sha": "headsha"}]
mq.get_branch_head = lambda branch: "abc123def"
mq.get_combined_status = lambda sha: {
"state": "success",
"statuses": [{"context": "CI / all-required (push)", "status": "success"}],
}
mq.latest_statuses_by_context = lambda s: {
"CI / all-required (pull_request)": {"status": "success"},
"sop-checklist / all-items-acked (pull_request)": {"status": "success"},
}
mq.required_contexts_green = lambda statuses, contexts: (True, [])
mq.post_comment = fake_post_comment
# Simulate merge failing with HTTP 405 (permission denied).
# The ApiError raised by api() is caught inside process_once().
merge_error = mq.ApiError(
"POST /repos/x/y/pulls/42/merge -> HTTP 405: User not allowed to merge PR"
)
with patch.object(mq, "merge_pull", side_effect=merge_error):
exit_code = mq.process_once(dry_run=False)
assert exit_code == 2, f"Expected exit code 2, got {exit_code}"
assert captured_comment["pr_number"] == 42
assert "MERGE FAILED" in captured_comment["body"]
assert "405" in captured_comment["body"]
@@ -603,3 +603,51 @@ class TestComputeNaState(unittest.TestCase):
self.assertEqual(na_directives[0][0], "sop-n/a")
self.assertEqual(na_directives[0][1], "qa-review")
self.assertIn("no surface", na_directives[0][2])
class TestProbeNaGateFallback(unittest.TestCase):
"""Regression test: probe() must handle gate names (qa-review, security-review)
from N/A gates without raising KeyError.
mc#1389: compute_na_state calls probe(gate_name, [user]) where gate_name is
a gate name like 'qa-review' — NOT a checklist item slug. The probe must
resolve the gate's required_teams from na_gates, not raise KeyError from
items_by_slug lookup.
"""
def test_probe_resolves_gate_name_from_na_gates(self):
cfg = sop.load_config(CONFIG_PATH)
items = cfg["items"]
items_by_slug = {it["slug"]: it for it in items}
na_gates = cfg.get("n/a_gates", {})
# Reconstruct the _required_teams_for helper from sop-checklist.py
def _required_teams_for(slug):
if slug in items_by_slug:
return items_by_slug[slug]["required_teams"]
if slug in na_gates:
return na_gates[slug].get("required_teams", [])
return None
# Gate names should resolve from na_gates
self.assertEqual(
_required_teams_for("qa-review"),
["qa", "security", "engineers"],
)
self.assertEqual(
_required_teams_for("security-review"),
["security", "managers", "ceo"],
)
# Checklist item slugs should still resolve from items_by_slug
self.assertEqual(
_required_teams_for("comprehensive-testing"),
["qa", "engineers"],
)
self.assertEqual(
_required_teams_for("root-cause"),
["managers", "ceo"],
)
# Unknown slug should return None (not raise KeyError)
self.assertIsNone(_required_teams_for("nonexistent-slug"))