Merge queue lands multi-commit PRs at their FIRST commit only — drops every later commit (3rd occurrence: #2603/#2605, #2617/#2640) #2641

Open
opened 2026-06-12 11:24:10 +00:00 by core-devops · 1 comment
Member

Recurring, now-confirmed data-loss defect in the Gitea merge queue. A PR with N commits is merged to main carrying ONLY its first commit; commits 2..N are silently dropped. No failure, no warning — main just gets a partial PR.

Confirmed occurrences:

  • #2603 (platform-root restart guard) — the identity-prompt commit was dropped; re-raised as #2605.
  • #2617 (byok hard-fail) — landed at commit 1 (1bf1b2b9 hard-fail) but dropped commit 2 (1e782880 atomic-byok-create + the lifecycle e2e fix). Net effect: main went RED on the required Local Provision Lifecycle E2E (stub) gate because the hard-fail shipped without the atomic fix that makes create(byok) pass. Recovered as #2640.

Why it's dangerous: the dropped commit often contains the test/fix that makes the first commit safe (the e2e fix in #2617, the durable rule in #2603). CI on the PR branch was green (HEAD had all commits), so nothing flags the divergence — main silently ships a half-change and can go red or behave incorrectly.

Asks:

  1. Root-cause the queue's merge step — is it squashing-to-first, or merging a stale ref, or a needs/if interaction? (We standardized on merge-commits, not squash — feedback_pr_merge_conventions.)
  2. Until fixed, a guard: post-merge, assert main's tree matches the PR head tree; fail loud on divergence.
  3. Interim author rule: keep main-affecting PRs to a SINGLE commit (squash locally before merge) so there's no commit 2 to drop.

Refs #2603 #2605 #2617 #2640, feedback_pr_merge_conventions.

**Recurring, now-confirmed data-loss defect in the Gitea merge queue.** A PR with N commits is merged to main carrying ONLY its first commit; commits 2..N are silently dropped. No failure, no warning — main just gets a partial PR. **Confirmed occurrences:** - #2603 (platform-root restart guard) — the identity-prompt commit was dropped; re-raised as #2605. - #2617 (byok hard-fail) — landed at commit 1 (`1bf1b2b9` hard-fail) but dropped commit 2 (`1e782880` atomic-byok-create + the lifecycle e2e fix). Net effect: **main went RED on the required Local Provision Lifecycle E2E (stub) gate** because the hard-fail shipped without the atomic fix that makes `create(byok)` pass. Recovered as #2640. **Why it's dangerous:** the dropped commit often contains the test/fix that makes the first commit safe (the e2e fix in #2617, the durable rule in #2603). CI on the PR branch was green (HEAD had all commits), so nothing flags the divergence — main silently ships a half-change and can go red or behave incorrectly. **Asks:** 1. Root-cause the queue's merge step — is it squashing-to-first, or merging a stale ref, or a needs/if interaction? (We standardized on merge-commits, not squash — feedback_pr_merge_conventions.) 2. Until fixed, a guard: post-merge, assert main's tree matches the PR head tree; fail loud on divergence. 3. Interim author rule: keep main-affecting PRs to a SINGLE commit (squash locally before merge) so there's no commit 2 to drop. Refs #2603 #2605 #2617 #2640, feedback_pr_merge_conventions.
Author
Member

CORRECTION — I dug into the git/Gitea evidence and my original diagnosis above is WRONG. The merge queue does NOT drop commits. Here is what actually happened, with proof.

Evidence

For each PR, the merge commit's 2nd parent (what was merged) is an ancestor of the branch's eventual tip, and the tip commit's committer timestamp is AFTER the merge:

PR merged_at (UTC) merged 2nd-parent branch tip tip committed (UTC)
#2617 09:48:52 a6d67d7c 1e782880 (atomic-byok) 09:52:54 — 4 min AFTER merge
#2603 20:01:24 977502be 619445e0 (prompt-bake) 20:02:59 — ~1.5 min AFTER merge

a6d67d7c is an ancestor of 1e782880; 977502be is an ancestor of 619445e0. So the 'dropped' commit did not exist at merge time — it cannot have been dropped. I authored and pushed it to a PR the conductor had ALREADY merged on green. The push landed on a closed/merged branch and silently went nowhere.

Real root cause #1 — push-after-merge is silent (the actual mechanism)

The conductor (molecule-ci/.gitea/scripts/gitea-merge-queue.py) merges immediately when the required contexts go green. If I push a follow-up commit seconds-to-minutes later, it hits an already-merged PR branch with zero warning — the commit is simply not on main. This bit me twice; both 'lost' commits were follow-up fixes I pushed into actively-merging PRs.

  • Fix A (discipline, mine): never push to a PR in the merge queue. A needed change after enqueue = a NEW PR. Adopted.
  • Fix B (loud detection): a push-event workflow that fails/comments when the target PR is already merged, so a lost commit is visible immediately instead of discovered hours later when main misbehaves.
  • Fix C (optional, conductor): pin head_commit_id on the merge POST so Gitea refuses if the head moved between the green-check and the merge (defends the genuine seconds-window TOCTOU).

Real root cause #2 — the conductor gate omits the lifecycle e2e (separate, also real)

main's branch-protection required contexts are exactly: CI / all-required (pull_request), E2E API Smoke Test, Handlers Postgres Integration. Local Provision Lifecycle E2E (stub) is NOT required. So #2617's merged head (without the atomic-byok fix) passed the 3 gates and merged even though its lifecycle e2e was red — and main then showed a red lifecycle-stub that did NOT block subsequent merges (advisory). My earlier 'main is RED blocking every PR' was overstated: it was an advisory red, not a blocker. But a gate that's supposed to protect main and doesn't actually gate is the #2619 'gates that don't gate' problem — lifecycle e2e (and any other intended-required e2e) should be in branch protection / folded into CI / all-required.

Net

Close the 'queue drops commits' framing as misdiagnosed. Re-scope to: (1) push-after-merge silent loss → detector + author discipline; (2) lifecycle/other e2e are advisory-not-required → make them gate (ties to #2619). #2640 (recovering the atomic-byok commit) was still correct and necessary — it was genuinely absent from main.

**CORRECTION — I dug into the git/Gitea evidence and my original diagnosis above is WRONG. The merge queue does NOT drop commits. Here is what actually happened, with proof.** ### Evidence For each PR, the merge commit's 2nd parent (what was merged) is an *ancestor* of the branch's eventual tip, and the tip commit's committer timestamp is AFTER the merge: | PR | merged_at (UTC) | merged 2nd-parent | branch tip | tip committed (UTC) | |----|----|----|----|----| | #2617 | 09:48:52 | `a6d67d7c` | `1e782880` (atomic-byok) | **09:52:54 — 4 min AFTER merge** | | #2603 | 20:01:24 | `977502be` | `619445e0` (prompt-bake) | **20:02:59 — ~1.5 min AFTER merge** | `a6d67d7c` is an ancestor of `1e782880`; `977502be` is an ancestor of `619445e0`. So the 'dropped' commit **did not exist at merge time** — it cannot have been dropped. I authored and pushed it to a PR the conductor had ALREADY merged on green. The push landed on a closed/merged branch and silently went nowhere. ### Real root cause #1 — push-after-merge is silent (the actual mechanism) The conductor (`molecule-ci/.gitea/scripts/gitea-merge-queue.py`) merges immediately when the required contexts go green. If I push a follow-up commit seconds-to-minutes later, it hits an already-merged PR branch with **zero warning** — the commit is simply not on main. This bit me twice; both 'lost' commits were follow-up fixes I pushed into actively-merging PRs. - **Fix A (discipline, mine):** never push to a PR in the merge queue. A needed change after enqueue = a NEW PR. Adopted. - **Fix B (loud detection):** a push-event workflow that fails/comments when the target PR is already merged, so a lost commit is visible immediately instead of discovered hours later when main misbehaves. - **Fix C (optional, conductor):** pin `head_commit_id` on the merge POST so Gitea refuses if the head moved between the green-check and the merge (defends the genuine seconds-window TOCTOU). ### Real root cause #2 — the conductor gate omits the lifecycle e2e (separate, also real) main's branch-protection required contexts are exactly: `CI / all-required (pull_request)`, `E2E API Smoke Test`, `Handlers Postgres Integration`. **`Local Provision Lifecycle E2E (stub)` is NOT required.** So #2617's merged head (without the atomic-byok fix) passed the 3 gates and merged even though its lifecycle e2e was red — and main then showed a red lifecycle-stub that did NOT block subsequent merges (advisory). My earlier 'main is RED blocking every PR' was overstated: it was an advisory red, not a blocker. But a gate that's supposed to protect main and doesn't actually gate is the #2619 'gates that don't gate' problem — lifecycle e2e (and any other intended-required e2e) should be in branch protection / folded into `CI / all-required`. ### Net Close the 'queue drops commits' framing as misdiagnosed. Re-scope to: (1) push-after-merge silent loss → detector + author discipline; (2) lifecycle/other e2e are advisory-not-required → make them gate (ties to #2619). #2640 (recovering the atomic-byok commit) was still correct and necessary — it was genuinely absent from main.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#2641