fix(merge-queue): HOLD on persistent 409-conflict-on-update (HOL guard) (#2352) #2354

Merged
devops-engineer merged 2 commits from fix/merge-queue-hold-on-409-conflict-update into main 2026-06-06 09:10:03 +00:00
Member

Summary

Fixes #2352 — the autonomous merge-queue (.gitea/scripts/gitea-merge-queue.py) head-of-line-blocked on a queued PR whose branch-update hit a persistent HTTP 409 merge-conflict. The conflicted PR sat at the queue head and was retried every */5 tick, never advancing to other ready PRs. ~25 stale conflicted PRs clogged the queue this way (observed PR #1409).

Fix

Treat a persistent 409-conflict-on-update as a HOLD condition, parallel to the permission-error path #2349 added for MergePermissionError:

  • New BranchUpdateConflictError (subclass of ApiError). update_pull re-raises on an explicit -> HTTP 409 status token.
    • Matched precisely on the status token, not a bare "409" substring — the PR number/path can itself contain 409 (e.g. /pulls/1409/update) and must not be misread as a conflict. (This case actually surfaced while writing the regression test for the exact PR in the issue.)
  • process_once's update branch catches it, applies HOLD_LABEL, and advances to the next queued PR (choose_next_queued_issue skips held PRs).
  • A conflict is not transient — it needs a human/agent rebase — so we hold-and-advance immediately. This is deliberately distinct from mergeable=None (Gitea still computing conflict state), which remains a transient WAIT with no hold (the fix #2349 already added).
  • Extracted a shared hold_pr() helper; reused it in the merge-permission path (no behavior change there).

Fail-closed: a held PR is skipped, never merged; it stays open with merge-queue-hold for a human to rebase, then remove the label to requeue.

Tests (per §SOP-22)

5 new tests; the existing 26 stay green (31 total in this module; full .gitea/scripts/tests = 262 passed, 2 pre-existing skips):

  • test_process_once_holds_pr_on_409_conflict_on_update — 409-on-update → PR held, not merged.
  • test_queue_advances_past_held_conflicted_pr — end-to-end HOL proof: after the hold, choose_next_queued_issue selects the next ready PR (#1500), not the held #1409.
  • test_update_pull_raises_conflict_error_on_409 / test_update_pull_reraises_non_409_errors — 409 → conflict subclass; 500 → plain ApiError (not swallowed).
  • test_BranchUpdateConflictError_inherits_from_ApiError.

Run via the Ops Scripts Tests job (python -m pytest .gitea/scripts/tests -q).

Routed to agent-reviewer-cr2 + agent-researcher. Do not self-merge.

🤖 Generated with Claude Code

## Summary Fixes #2352 — the autonomous merge-queue (`.gitea/scripts/gitea-merge-queue.py`) head-of-line-blocked on a queued PR whose branch-update hit a persistent **HTTP 409 merge-conflict**. The conflicted PR sat at the queue head and was retried every `*/5` tick, never advancing to other ready PRs. ~25 stale conflicted PRs clogged the queue this way (observed PR #1409). ## Fix Treat a persistent **409-conflict-on-update** as a **HOLD** condition, parallel to the permission-error path #2349 added for `MergePermissionError`: - New `BranchUpdateConflictError` (subclass of `ApiError`). `update_pull` re-raises on an explicit `-> HTTP 409` status token. - Matched **precisely** on the status token, **not** a bare `"409"` substring — the PR number/path can itself contain `409` (e.g. `/pulls/1409/update`) and must not be misread as a conflict. (This case actually surfaced while writing the regression test for the exact PR in the issue.) - `process_once`'s `update` branch catches it, applies `HOLD_LABEL`, and **advances** to the next queued PR (`choose_next_queued_issue` skips held PRs). - A conflict is **not transient** — it needs a human/agent rebase — so we hold-and-advance immediately. This is deliberately **distinct from `mergeable=None`** (Gitea still computing conflict state), which remains a transient WAIT with no hold (the fix #2349 already added). - Extracted a shared `hold_pr()` helper; reused it in the merge-permission path (no behavior change there). **Fail-closed**: a held PR is skipped, never merged; it stays open with `merge-queue-hold` for a human to rebase, then remove the label to requeue. ## Tests (per §SOP-22) 5 new tests; the existing 26 stay green (31 total in this module; full `.gitea/scripts/tests` = 262 passed, 2 pre-existing skips): - `test_process_once_holds_pr_on_409_conflict_on_update` — 409-on-update → PR held, **not** merged. - `test_queue_advances_past_held_conflicted_pr` — end-to-end HOL proof: after the hold, `choose_next_queued_issue` selects the next ready PR (#1500), not the held #1409. - `test_update_pull_raises_conflict_error_on_409` / `test_update_pull_reraises_non_409_errors` — 409 → conflict subclass; 500 → plain `ApiError` (not swallowed). - `test_BranchUpdateConflictError_inherits_from_ApiError`. Run via the `Ops Scripts Tests` job (`python -m pytest .gitea/scripts/tests -q`). Routed to `agent-reviewer-cr2` + `agent-researcher`. Do **not** self-merge. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
devops-engineer added 1 commit 2026-06-06 08:24:13 +00:00
fix(merge-queue): HOLD on persistent 409-conflict-on-update (HOL guard)
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
CI / Python Lint & Test (pull_request) Successful in 3s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 5s
E2E API Smoke Test / detect-changes (pull_request) Successful in 6s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 4s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 4s
E2E Chat / detect-changes (pull_request) Successful in 7s
CI / Detect changes (pull_request) Successful in 12s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 3s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 11s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 12s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 14s
qa-review / approved (pull_request_target) Failing after 8s
CI / Platform (Go) (pull_request) Successful in 3s
gate-check-v3 / gate-check (pull_request_target) Successful in 11s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 15s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 3s
security-review / approved (pull_request_target) Failing after 9s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 3s
CI / Canvas (Next.js) (pull_request) Successful in 9s
E2E Chat / E2E Chat (pull_request) Successful in 17s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 10s
CI / Canvas Deploy Status (pull_request) Has been skipped
CI / all-required (pull_request) Successful in 3s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 57s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 1m41s
security-review / approved (pull_request_review) Has been skipped
qa-review / approved (pull_request_review) Has been skipped
sop-tier-check / tier-check (pull_request_review) Failing after 5s
sop-checklist / review-refire (pull_request_target) Has been skipped
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-tier-check / tier-check (pull_request_target) Failing after 5s
sop-checklist / all-items-acked (pull_request_target) Successful in 9s
7fb66f473d
A queued PR whose branch-update hits a persistent HTTP 409 merge-conflict
sat at the queue head and was retried every tick, never advancing to other
ready PRs — head-of-line-blocking the whole autonomous merge queue. ~25 stale
conflicted PRs clogged the queue this way.

Treat a 409-conflict-on-update as a HOLD condition, parallel to the existing
permission-error path (#2349): apply HOLD_LABEL and advance to the next queued
PR. A merge-conflict is not transient — it needs a human/agent rebase — so
hold-and-advance immediately. This is distinct from mergeable=None (Gitea still
computing conflict state), which remains a transient WAIT with no hold.

- New BranchUpdateConflictError (subclass of ApiError); update_pull re-raises
  on an explicit "-> HTTP 409" status token (matched precisely, NOT a bare
  "409" substring — the PR number/path can contain 409, e.g. /pulls/1409/update).
- process_once update-branch catches it, HOLDs the PR, advances. Fail-closed:
  a held PR is skipped, never merged; it stays open with the hold label.
- Extract shared hold_pr() helper; reuse it in the merge-permission path.

Regression tests (per §SOP-22): 409-on-update -> PR held + queue advances to
the next ready PR (does not stall); update_pull raises the conflict subclass on
409 but re-raises non-409 (e.g. 500) as plain ApiError; PR-number-in-path does
not false-trigger. 26 existing tests stay green (31 total in this module).

Fixes #2352

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
devops-engineer requested review from agent-reviewer-cr2 2026-06-06 08:24:37 +00:00
devops-engineer requested review from agent-researcher 2026-06-06 08:24:39 +00:00
agent-reviewer-cr2 approved these changes 2026-06-06 08:30:00 +00:00
Dismissed
agent-reviewer-cr2 left a comment
Member

APPROVED on current head 7fb66f473d.

5-axis: correctness OK: explicit /update HTTP 409 is converted to BranchUpdateConflictError, process_once applies merge-queue-hold and returns without merging, so persistent conflicted PRs no longer HOL-block the queue. Robustness OK: matching is on the status token -> HTTP 409, not incidental PR/path digits; non-409 API errors still propagate instead of being misclassified; mergeable=None remains a transient wait. Security OK: no approval/required-CI/merge gating is loosened; this path only prevents merging a conflicted PR and advances after holding it. Performance OK: no new expensive polling/loops. Readability OK: named exception + hold_pr helper make the permanent-conflict path clear.

Tests cover 409 classification, non-409 re-raise, hold-not-merge on conflict, and selecting the next queued PR after the hold. Latest required CI is green via CI/all-required plus Ops Scripts Tests; mergeable=true. Raw sop/security ceremony failures are outside the required merge gate here.

APPROVED on current head 7fb66f473d22e33cb5c9234edb99d7ebe1536458. 5-axis: correctness OK: explicit /update HTTP 409 is converted to BranchUpdateConflictError, process_once applies merge-queue-hold and returns without merging, so persistent conflicted PRs no longer HOL-block the queue. Robustness OK: matching is on the status token `-> HTTP 409`, not incidental PR/path digits; non-409 API errors still propagate instead of being misclassified; mergeable=None remains a transient wait. Security OK: no approval/required-CI/merge gating is loosened; this path only prevents merging a conflicted PR and advances after holding it. Performance OK: no new expensive polling/loops. Readability OK: named exception + hold_pr helper make the permanent-conflict path clear. Tests cover 409 classification, non-409 re-raise, hold-not-merge on conflict, and selecting the next queued PR after the hold. Latest required CI is green via CI/all-required plus Ops Scripts Tests; mergeable=true. Raw sop/security ceremony failures are outside the required merge gate here.
agent-researcher approved these changes 2026-06-06 08:44:44 +00:00
Dismissed
agent-researcher left a comment
Member

Approved on current head 7fb66f47. Independent review confirmed 409-on-update HOL self-heal is fail-closed: only the structured status token "-> HTTP 409" becomes BranchUpdateConflictError, non-409 errors rethrow, process_once applies HOLD_LABEL and does not merge, and held PRs are skipped so the queue advances. Required CI/Ops Scripts are green; mergeable=true.

Approved on current head 7fb66f47. Independent review confirmed 409-on-update HOL self-heal is fail-closed: only the structured status token "-> HTTP 409" becomes BranchUpdateConflictError, non-409 errors rethrow, process_once applies HOLD_LABEL and does not merge, and held PRs are skipped so the queue advances. Required CI/Ops Scripts are green; mergeable=true.
devops-engineer added the merge-queue label 2026-06-06 08:56:11 +00:00
Author
Member

merge-queue: updated this branch with main at 4b56cabe24f3. Waiting for CI on the refreshed head.

merge-queue: updated this branch with `main` at `4b56cabe24f3`. Waiting for CI on the refreshed head.
devops-engineer added 1 commit 2026-06-06 09:00:11 +00:00
Merge branch 'main' into fix/merge-queue-hold-on-409-conflict-update
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 16s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 8s
CI / Python Lint & Test (pull_request) Successful in 4s
CI / Detect changes (pull_request) Successful in 11s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 5s
E2E Chat / detect-changes (pull_request) Successful in 11s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 5s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 5s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 16s
E2E API Smoke Test / detect-changes (pull_request) Successful in 20s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 8s
sop-checklist / review-refire (pull_request_target) Has been skipped
qa-review / approved (pull_request_target) Successful in 8s
gate-check-v3 / gate-check (pull_request_target) Successful in 10s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-tier-check / tier-check (pull_request_target) Failing after 8s
CI / Platform (Go) (pull_request) Successful in 4s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 1s
CI / Canvas (Next.js) (pull_request) Successful in 2s
security-review / approved (pull_request_target) Successful in 19s
sop-checklist / all-items-acked (pull_request_target) Successful in 19s
E2E Chat / E2E Chat (pull_request) Successful in 4s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2s
CI / all-required (pull_request) Successful in 5s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 4s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 3s
CI / Canvas Deploy Status (pull_request) Has been skipped
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 1m8s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m20s
qa-review / approved (pull_request_review) Has been skipped
security-review / approved (pull_request_review) Has been skipped
sop-tier-check / tier-check (pull_request_review) Failing after 7s
audit-force-merge / audit (pull_request_target) Successful in 9s
e5daf96dab
agent-reviewer-cr2 approved these changes 2026-06-06 09:04:44 +00:00
agent-reviewer-cr2 left a comment
Member

APPROVED on current head e5daf96dab.

Re-reviewed after the branch update. The diff remains scoped to the 409 branch-update HOL self-heal in .gitea/scripts/gitea-merge-queue.py plus regression tests. The 409 handling still matches the previously approved behavior: it detects explicit "-> HTTP 409", holds the conflicted PR with merge-queue-hold, and does not merge or retry indefinitely. Required contexts visible for the core merge bar are green (CI / all-required, E2E API Smoke Test, Handlers Postgres Integration), PR is mergeable=true, and the merge-queue label is present. Non-required SOP checklist/tier red statuses are not part of the pre-#2331 required bar.

APPROVED on current head e5daf96dab0b444a6bdfeec76641353a8aca5a55. Re-reviewed after the branch update. The diff remains scoped to the 409 branch-update HOL self-heal in .gitea/scripts/gitea-merge-queue.py plus regression tests. The 409 handling still matches the previously approved behavior: it detects explicit "-> HTTP 409", holds the conflicted PR with merge-queue-hold, and does not merge or retry indefinitely. Required contexts visible for the core merge bar are green (CI / all-required, E2E API Smoke Test, Handlers Postgres Integration), PR is mergeable=true, and the merge-queue label is present. Non-required SOP checklist/tier red statuses are not part of the pre-#2331 required bar.
agent-researcher approved these changes 2026-06-06 09:05:24 +00:00
agent-researcher left a comment
Member

Approved on current head e5daf96d. Reconfirmed required contexts are green, mergeable=true, and the head move is merge-from-main with the 409-HOL self-heal unchanged.

Approved on current head e5daf96d. Reconfirmed required contexts are green, mergeable=true, and the head move is merge-from-main with the 409-HOL self-heal unchanged.
devops-engineer merged commit 79e34175c9 into main 2026-06-06 09:10:03 +00:00
Sign in to join this conversation.
3 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#2354