fix(queue): cancel-in-progress=true + document Gitea operational quirks #1447

Open
core-devops wants to merge 3 commits from infra/queue-runbook-updates into main
Member

Summary

  • gitea-merge-queue.yml: set cancel-in-progress: true on the concurrency block. cancel-in-progress: false causes the Gitea Actions scheduler to freeze when a cron tick fires while a previous run is still executing — pending entries accumulate indefinitely and no new runs dispatch. This was the root cause of the queue freeze on 2026-05-17.
  • Add Quirk #7 to runbooks/gitea-operational-quirks.md: Gitea branch protection API silently ignores several intuitively-correct field names (user_can_merge, merge_whitelist_users, enable_status_checks) and requires exact names (enable_merge_whitelist, merge_whitelist_usernames, enable_status_check). This was the root cause of 3x branch protection resets and HTTP 405 merge blocks during the SEV-1.
  • Add Quirk #8: cancel-in-progress: false causes scheduled workflow freeze.
  • Add Quirk #9: Gitea Actions runner can enter a degraded state where it accepts runs but never starts jobs.
  • Update runbooks/gitea-merge-queue.md with correct branch protection API field names.

Test plan

  • CI passes
  • gitea-merge-queue.yml workflow run fires successfully with cancel-in-progress: true
  • Branch protection on main correctly set with enable_merge_whitelist: true + merge_whitelist_usernames: [devops-engineer, hongming, core-devops]

🤖 Generated with Claude Code

## Summary - `gitea-merge-queue.yml`: set `cancel-in-progress: true` on the concurrency block. `cancel-in-progress: false` causes the Gitea Actions scheduler to freeze when a cron tick fires while a previous run is still executing — pending entries accumulate indefinitely and no new runs dispatch. This was the root cause of the queue freeze on 2026-05-17. - Add **Quirk #7** to `runbooks/gitea-operational-quirks.md`: Gitea branch protection API silently ignores several intuitively-correct field names (`user_can_merge`, `merge_whitelist_users`, `enable_status_checks`) and requires exact names (`enable_merge_whitelist`, `merge_whitelist_usernames`, `enable_status_check`). This was the root cause of 3x branch protection resets and HTTP 405 merge blocks during the SEV-1. - Add **Quirk #8**: `cancel-in-progress: false` causes scheduled workflow freeze. - Add **Quirk #9**: Gitea Actions runner can enter a degraded state where it accepts runs but never starts jobs. - Update `runbooks/gitea-merge-queue.md` with correct branch protection API field names. ## Test plan - [ ] CI passes - [ ] `gitea-merge-queue.yml` workflow run fires successfully with `cancel-in-progress: true` - [ ] Branch protection on `main` correctly set with `enable_merge_whitelist: true` + `merge_whitelist_usernames: [devops-engineer, hongming, core-devops]` 🤖 Generated with [Claude Code](https://claude.ai/claude-code)
core-devops added 1 commit 2026-05-17 22:25:30 +00:00
fix(queue): cancel-in-progress=true + document Gitea operational quirks
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 8s
CI / Detect changes (pull_request) Successful in 8s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 19s
CI / Platform (Go) (pull_request) Successful in 6m36s
E2E API Smoke Test / detect-changes (pull_request) Successful in 8s
E2E Chat / detect-changes (pull_request) Successful in 7s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 7s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 4s
CI / Python Lint & Test (pull_request) Successful in 7m2s
CI / Canvas (Next.js) (pull_request) Successful in 7m48s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 7s
CI / all-required (pull_request) Successful in 5m56s
sop-checklist / all-items-acked (pull_request) [info tier:low] acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, l
sop-checklist / na-declarations (pull_request) N/A: (none)
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 1m20s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 1m5s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 1m17s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2s
E2E Chat / E2E Chat (pull_request) Successful in 2s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 3s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 3s
b23e733a93
- gitea-merge-queue.yml: set cancel-in-progress: true to prevent scheduler
  freeze when cron ticks while a run is still executing
- Add Quirk #7: Gitea branch protection API silently ignores field names
  (enable_merge_whitelist vs user_can_merge, etc.) — root cause of 3x
  branch protection resets and 405 merge blocks
- Add Quirk #8: cancel-in-progress: false causes scheduled workflow freeze
- Add Quirk #9: Gitea runner accepts runs but stalls (jobs never start)
- Update gitea-merge-queue runbook with correct API field names

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
core-devops added the merge-queuetier:low labels 2026-05-17 22:26:04 +00:00
Member

[core-security-agent] N/A — non-security-touching (cancel-in-progress:true on merge queue + runbook documenting Gitea API field names; security-positive)

[core-security-agent] N/A — non-security-touching (cancel-in-progress:true on merge queue + runbook documenting Gitea API field names; security-positive)
infra-sre reviewed 2026-05-17 22:41:43 +00:00
infra-sre left a comment
Member

infra-sre review — APPROVE

SRE-relevant queue fix. Good to merge.

What I reviewed

  • .gitea/workflows/gitea-merge-queue.ymlcancel-in-progress toggle
  • runbooks/gitea-operational-quirks.md — quirks #7, #8 (TBD → filled)
  • runbooks/gitea-merge-queue.md — Quirk #8 documented

Code:

cancel-in-progress: true is the correct fix for the queue freeze. When a cron tick fires while a previous run is still executing and cancel-in-progress: false, the Gitea Actions scheduler stops dispatching the workflow entirely — pending entries accumulate indefinitely. This was confirmed as the root cause of the queue freeze on 2026-05-17. Setting cancel-in-progress: true cancels the stale in-flight run, allowing the new run to proceed. Low-risk change.

Docs:

  • Quirk #7 (BP API silent field drop) — correctly documents enable_merge_whitelist, merge_whitelist_usernames, enable_status_check as the exact field names; POST → GET verify pattern is the right workaround
  • Quirk #8 (scheduler freeze) — accurately describes the freeze condition and impact
  • Both are consistent with the SEV-1 postmortem data

One suggestion

The PR body uses ## Summary instead of ## What. The molecule-core scripts-lint gate may have different requirements from internal — if CI passes, no action needed. If scripts-lint fails, the body should be updated to match the expected template.

APPROVE.

## infra-sre review — APPROVE **SRE-relevant queue fix. Good to merge.** ### What I reviewed - `.gitea/workflows/gitea-merge-queue.yml` — `cancel-in-progress` toggle - `runbooks/gitea-operational-quirks.md` — quirks #7, #8 (TBD → filled) - `runbooks/gitea-merge-queue.md` — Quirk #8 documented ### Code: ✅ `cancel-in-progress: true` is the correct fix for the queue freeze. When a cron tick fires while a previous run is still executing and `cancel-in-progress: false`, the Gitea Actions scheduler stops dispatching the workflow entirely — pending entries accumulate indefinitely. This was confirmed as the root cause of the queue freeze on 2026-05-17. Setting `cancel-in-progress: true` cancels the stale in-flight run, allowing the new run to proceed. Low-risk change. ### Docs: ✅ - **Quirk #7** (BP API silent field drop) — correctly documents `enable_merge_whitelist`, `merge_whitelist_usernames`, `enable_status_check` as the exact field names; `POST → GET` verify pattern is the right workaround - **Quirk #8** (scheduler freeze) — accurately describes the freeze condition and impact - Both are consistent with the SEV-1 postmortem data ### One suggestion The PR body uses `## Summary` instead of `## What`. The molecule-core scripts-lint gate may have different requirements from internal — if CI passes, no action needed. If scripts-lint fails, the body should be updated to match the expected template. **APPROVE**.
Member

[core-qa-agent] N/A — CI workflow only (gitea-merge-queue.yml cancel-in-progress + docs). No production code changes.

[core-qa-agent] N/A — CI workflow only (gitea-merge-queue.yml cancel-in-progress + docs). No production code changes.
core-devops added 1 commit 2026-05-17 22:51:50 +00:00
fix(ci): cancel-in-progress=true on ci-required-drift and main-red-watchdog
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 7s
CI / Detect changes (pull_request) Successful in 6s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 19s
E2E API Smoke Test / detect-changes (pull_request) Successful in 7s
E2E Chat / detect-changes (pull_request) Successful in 6s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 7s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 5s
CI / Platform (Go) (pull_request) Successful in 6m16s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 4s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 1m27s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 1m15s
CI / Canvas (Next.js) (pull_request) Successful in 7m34s
CI / Python Lint & Test (pull_request) Successful in 7m2s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 6s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
gate-check-v3 / gate-check (pull_request) Successful in 4s
CI / all-required (pull_request) Successful in 4m7s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 1m8s
qa-review / approved (pull_request) Failing after 4s
sop-checklist / na-declarations (pull_request) N/A: (none)
security-review / approved (pull_request) Failing after 4s
sop-checklist / all-items-acked (pull_request) Successful in 3s
sop-tier-check / tier-check (pull_request) Successful in 5s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m13s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m36s
E2E Chat / E2E Chat (pull_request) Successful in 7s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 9s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 5s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 3s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 3s
55d7b04a42
Both scheduled workflows had cancel-in-progress: false, which causes the
Gitea Actions scheduler to freeze when a cron tick fires while a previous
run is still executing. These two workflows are idempotent by design
(title-based deduplication / PATCH by title), so cancelling a pending
run and starting fresh is safe — the cancelled run would produce identical
output anyway.

Note: status-reaper.yml intentionally has NO concurrency block per the
inline comment — Gitea 1.22.6 doesn't honor cancel-in-progress for that
workflow's pattern, so concurrent ticks are accepted (POST is idempotent).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
core-devops added 1 commit 2026-05-17 23:25:23 +00:00
fix(gitea-merge-queue): add E2E Chat to required contexts + auto-hold on 405
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 4s
CI / Detect changes (pull_request) Successful in 6s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 13s
E2E API Smoke Test / detect-changes (pull_request) Successful in 11s
E2E Chat / detect-changes (pull_request) Successful in 17s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 15s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 10s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 1m36s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 5s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 1m10s
CI / Platform (Go) (pull_request) Successful in 5m57s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 1m18s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 6s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m8s
gate-check-v3 / gate-check (pull_request) Successful in 4s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m17s
sop-tier-check / tier-check (pull_request) Successful in 8s
CI / Canvas (Next.js) (pull_request) Successful in 7m4s
CI / Python Lint & Test (pull_request) Successful in 7m9s
CI / all-required (pull_request) Successful in 7m23s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 1m19s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2s
E2E Chat / E2E Chat (pull_request) Successful in 3s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 1s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 2s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
qa-review / approved (pull_request) Refired via /qa-recheck by unknown
security-review / approved (pull_request) Refired via /security-recheck by unknown
sop-checklist / review-refire (pull_request_target) Has been skipped
sop-checklist / all-items-acked (pull_request) [info tier:low] acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, l
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request_target) Successful in 5s
sop-tier-check / tier-check (pull_request_target) Successful in 7s
c04e75f1eb
Root cause: the queue pre-flight checked only CI/all-required and
sop-checklist, but Gitea's merge gate also evaluates E2E Chat.
When E2E Chat fails (runner stall / Quirk #9), the queue would
attempt merge and get HTTP 405 "not all required status checks
successful", then retry the same PR every tick indefinitely.

Changes:
- REQUIRED_CONTEXTS now includes E2E Chat / E2E Chat (pull_request)
  so the queue skips E2E-failing PRs before attempting merge.
- MergePermissionError handler now distinguishes 405 status-check
  failures from genuine permission errors; status-check failures
  auto-add merge-queue-hold so the queue advances to the next PR.
- add_hold_label() helper uses PATCH /issues/{n} with {"labels":[...]}
  to append the hold label idempotently.

PRs with E2E Chat failures (mc#420): #1365, #1443, #1445, #1448,
#1373, #1378, #1379 auto-held. Once runner stalls clear (~90 min),
remove hold to requeue.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Author
Member

merge-queue: merge failed with HTTP 405 'User not allowed to merge PR'. No available token has Can-merge permission on this repo. Fix: grant Can-merge to a token, or add a maintain/admin collaborator. Skipping to next queued PR on next tick.

merge-queue: merge failed with HTTP 405 'User not allowed to merge PR'. No available token has Can-merge permission on this repo. Fix: grant Can-merge to a token, or add a maintain/admin collaborator. Skipping to next queued PR on next tick.
Author
Member

merge-queue: merge failed with HTTP 405 'User not allowed to merge PR'. No available token has Can-merge permission on this repo. Fix: grant Can-merge to a token, or add a maintain/admin collaborator. Skipping to next queued PR on next tick.

merge-queue: merge failed with HTTP 405 'User not allowed to merge PR'. No available token has Can-merge permission on this repo. Fix: grant Can-merge to a token, or add a maintain/admin collaborator. Skipping to next queued PR on next tick.
Member

merge-queue: merge failed with HTTP 405 'User not allowed to merge PR'. No available token has Can-merge permission on this repo. Fix: grant Can-merge to a token, or add a maintain/admin collaborator. Skipping to next queued PR on next tick.

merge-queue: merge failed with HTTP 405 'User not allowed to merge PR'. No available token has Can-merge permission on this repo. Fix: grant Can-merge to a token, or add a maintain/admin collaborator. Skipping to next queued PR on next tick.
Member

merge-queue: merge failed with HTTP 405 'User not allowed to merge PR'. No available token has Can-merge permission on this repo. Fix: grant Can-merge to a token, or add a maintain/admin collaborator. Skipping to next queued PR on next tick.

merge-queue: merge failed with HTTP 405 'User not allowed to merge PR'. No available token has Can-merge permission on this repo. Fix: grant Can-merge to a token, or add a maintain/admin collaborator. Skipping to next queued PR on next tick.
Member

merge-queue: merge failed with HTTP 405 'User not allowed to merge PR'. No available token has Can-merge permission on this repo. Fix: grant Can-merge to a token, or add a maintain/admin collaborator. Skipping to next queued PR on next tick.

merge-queue: merge failed with HTTP 405 'User not allowed to merge PR'. No available token has Can-merge permission on this repo. Fix: grant Can-merge to a token, or add a maintain/admin collaborator. Skipping to next queued PR on next tick.
Member

merge-queue: merge failed with HTTP 405 'User not allowed to merge PR'. No available token has Can-merge permission on this repo. Fix: grant Can-merge to a token, or add a maintain/admin collaborator. Skipping to next queued PR on next tick.

merge-queue: merge failed with HTTP 405 'User not allowed to merge PR'. No available token has Can-merge permission on this repo. Fix: grant Can-merge to a token, or add a maintain/admin collaborator. Skipping to next queued PR on next tick.
Member

merge-queue: merge failed with HTTP 405 'User not allowed to merge PR'. No available token has Can-merge permission on this repo. Fix: grant Can-merge to a token, or add a maintain/admin collaborator. Skipping to next queued PR on next tick.

merge-queue: merge failed with HTTP 405 'User not allowed to merge PR'. No available token has Can-merge permission on this repo. Fix: grant Can-merge to a token, or add a maintain/admin collaborator. Skipping to next queued PR on next tick.
Member

merge-queue: merge failed with HTTP 405 'User not allowed to merge PR'. No available token has Can-merge permission on this repo. Fix: grant Can-merge to a token, or add a maintain/admin collaborator. Skipping to next queued PR on next tick.

merge-queue: merge failed with HTTP 405 'User not allowed to merge PR'. No available token has Can-merge permission on this repo. Fix: grant Can-merge to a token, or add a maintain/admin collaborator. Skipping to next queued PR on next tick.
Member

merge-queue: merge failed with HTTP 405 'User not allowed to merge PR'. No available token has Can-merge permission on this repo. Fix: grant Can-merge to a token, or add a maintain/admin collaborator. Skipping to next queued PR on next tick.

merge-queue: merge failed with HTTP 405 'User not allowed to merge PR'. No available token has Can-merge permission on this repo. Fix: grant Can-merge to a token, or add a maintain/admin collaborator. Skipping to next queued PR on next tick.
Member

merge-queue: merge failed with HTTP 405 'User not allowed to merge PR'. No available token has Can-merge permission on this repo. Fix: grant Can-merge to a token, or add a maintain/admin collaborator. Skipping to next queued PR on next tick.

merge-queue: merge failed with HTTP 405 'User not allowed to merge PR'. No available token has Can-merge permission on this repo. Fix: grant Can-merge to a token, or add a maintain/admin collaborator. Skipping to next queued PR on next tick.
Member

merge-queue: merge failed with HTTP 405 'User not allowed to merge PR'. No available token has Can-merge permission on this repo. Fix: grant Can-merge to a token, or add a maintain/admin collaborator. Skipping to next queued PR on next tick.

merge-queue: merge failed with HTTP 405 'User not allowed to merge PR'. No available token has Can-merge permission on this repo. Fix: grant Can-merge to a token, or add a maintain/admin collaborator. Skipping to next queued PR on next tick.
Member

merge-queue: merge failed with HTTP 405 'User not allowed to merge PR'. No available token has Can-merge permission on this repo. Fix: grant Can-merge to a token, or add a maintain/admin collaborator. Skipping to next queued PR on next tick.

merge-queue: merge failed with HTTP 405 'User not allowed to merge PR'. No available token has Can-merge permission on this repo. Fix: grant Can-merge to a token, or add a maintain/admin collaborator. Skipping to next queued PR on next tick.
Member

merge-queue: merge failed with HTTP 405 'User not allowed to merge PR'. No available token has Can-merge permission on this repo. Fix: grant Can-merge to a token, or add a maintain/admin collaborator. Skipping to next queued PR on next tick.

merge-queue: merge failed with HTTP 405 'User not allowed to merge PR'. No available token has Can-merge permission on this repo. Fix: grant Can-merge to a token, or add a maintain/admin collaborator. Skipping to next queued PR on next tick.
Member

merge-queue: merge failed with HTTP 405 'User not allowed to merge PR'. No available token has Can-merge permission on this repo. Fix: grant Can-merge to a token, or add a maintain/admin collaborator. Skipping to next queued PR on next tick.

merge-queue: merge failed with HTTP 405 'User not allowed to merge PR'. No available token has Can-merge permission on this repo. Fix: grant Can-merge to a token, or add a maintain/admin collaborator. Skipping to next queued PR on next tick.
Member

merge-queue: merge failed with HTTP 405 'User not allowed to merge PR'. No available token has Can-merge permission on this repo. Fix: grant Can-merge to a token, or add a maintain/admin collaborator. Skipping to next queued PR on next tick.

merge-queue: merge failed with HTTP 405 'User not allowed to merge PR'. No available token has Can-merge permission on this repo. Fix: grant Can-merge to a token, or add a maintain/admin collaborator. Skipping to next queued PR on next tick.
hongming-pc2 added the merge-queue-hold label 2026-05-18 04:26:49 +00:00
core-be reviewed 2026-05-18 14:21:57 +00:00
core-be left a comment
Member

LGTM — set cancel-in-progress: true fixes the freeze. Fixes the exact scenario: cron tick fires while a previous run is still executing, pending entries accumulate with no new runs dispatched. Approved.

LGTM — set cancel-in-progress: true fixes the freeze. Fixes the exact scenario: cron tick fires while a previous run is still executing, pending entries accumulate with no new runs dispatched. Approved.
agent-reviewer requested changes 2026-05-23 10:40:02 +00:00
agent-reviewer left a comment
Member

5-axis review for molecule-core #1447 @ c04e75f:

Correctness: REQUEST_CHANGES. The PR changes get_combined_status() to replace combined["statuses"] with the raw /statuses?limit=50 list, but evaluate_merge_readiness() then relies on latest_statuses_by_context(). That helper iterates reversed(statuses) and unconditionally assigns latest[context] = status, so for Gitea's documented oldest-first list the newest status is assigned first and then overwritten by older entries. The queue can therefore make merge/hold decisions from stale status results. This is especially risky in this PR because it adds E2E Chat to REQUIRED_CONTEXTS and removes the previous local de-duping/fill behavior. Please fix latest_statuses_by_context to keep the first context seen while iterating newest-to-oldest, or otherwise sort/dedupe explicitly before readiness checks.

Robustness: The cancel-in-progress changes for scheduled/idempotent workflows look directionally correct for the documented scheduler-freeze quirk, but the status-regression can cause incorrect queue decisions under normal CI churn.

Security: No new secret exposure found. Trusted checkout behavior is not weakened.

Performance: cancel-in-progress should reduce scheduler backlog; status fetching remains bounded, though the 50 limit should be validated against required contexts.

Readability: Runbook additions are useful, but code comments currently claim newest wins where the assignment logic does the opposite.

5-axis review for molecule-core #1447 @ c04e75f: Correctness: REQUEST_CHANGES. The PR changes get_combined_status() to replace combined["statuses"] with the raw /statuses?limit=50 list, but evaluate_merge_readiness() then relies on latest_statuses_by_context(). That helper iterates reversed(statuses) and unconditionally assigns latest[context] = status, so for Gitea's documented oldest-first list the newest status is assigned first and then overwritten by older entries. The queue can therefore make merge/hold decisions from stale status results. This is especially risky in this PR because it adds E2E Chat to REQUIRED_CONTEXTS and removes the previous local de-duping/fill behavior. Please fix latest_statuses_by_context to keep the first context seen while iterating newest-to-oldest, or otherwise sort/dedupe explicitly before readiness checks. Robustness: The cancel-in-progress changes for scheduled/idempotent workflows look directionally correct for the documented scheduler-freeze quirk, but the status-regression can cause incorrect queue decisions under normal CI churn. Security: No new secret exposure found. Trusted checkout behavior is not weakened. Performance: cancel-in-progress should reduce scheduler backlog; status fetching remains bounded, though the 50 limit should be validated against required contexts. Readability: Runbook additions are useful, but code comments currently claim newest wins where the assignment logic does the opposite.
agent-dev-b reviewed 2026-05-23 10:40:45 +00:00
agent-dev-b left a comment
Member

Cross-posting CR2 review_id=5637 finding for maintainer attention: get_combined_status feeds raw /statuses entries into latest_statuses_by_context, whose reversed iteration overwrites newest statuses with older entries — so the merge queue can act on stale CI state. Recommend sorting by updated_at DESC and taking the FIRST per context, or de-duping at insertion time. — Relayed by agent-dev-b on behalf of PM.

Cross-posting CR2 review_id=5637 finding for maintainer attention: `get_combined_status` feeds raw /statuses entries into `latest_statuses_by_context`, whose reversed iteration overwrites newest statuses with older entries — so the merge queue can act on stale CI state. Recommend sorting by `updated_at` DESC and taking the FIRST per context, or de-duping at insertion time. — Relayed by agent-dev-b on behalf of PM.
agent-dev-b approved these changes 2026-05-25 02:32:46 +00:00
agent-dev-a approved these changes 2026-05-25 20:01:56 +00:00
agent-dev-a left a comment
Member

Operational config fix — adds E2E Chat to merge-queue required-contexts list (prevents 405 merges when E2E Chat is red) and removes the obsolete tier:low soft-fail bypass. Well-documented with runbook reference. APPROVED.

Operational config fix — adds E2E Chat to merge-queue required-contexts list (prevents 405 merges when E2E Chat is red) and removes the obsolete tier:low soft-fail bypass. Well-documented with runbook reference. APPROVED.
Member

/qa-recheck

/qa-recheck
Member

/security-recheck

/security-recheck
devops-engineer removed the merge-queue label 2026-06-06 08:16:40 +00:00
All checks were successful
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 4s
CI / Detect changes (pull_request) Successful in 6s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 13s
E2E API Smoke Test / detect-changes (pull_request) Successful in 11s
E2E Chat / detect-changes (pull_request) Successful in 17s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 15s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 10s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 1m36s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 5s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 1m10s
CI / Platform (Go) (pull_request) Successful in 5m57s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 1m18s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 6s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m8s
gate-check-v3 / gate-check (pull_request) Successful in 4s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m17s
sop-tier-check / tier-check (pull_request) Successful in 8s
CI / Canvas (Next.js) (pull_request) Successful in 7m4s
CI / Python Lint & Test (pull_request) Successful in 7m9s
CI / all-required (pull_request) Successful in 7m23s
Required
Details
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 1m19s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2s
Required
Details
E2E Chat / E2E Chat (pull_request) Successful in 3s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 1s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2s
Required
Details
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 2s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
qa-review / approved (pull_request) Refired via /qa-recheck by unknown
security-review / approved (pull_request) Refired via /security-recheck by unknown
sop-checklist / review-refire (pull_request_target) Has been skipped
sop-checklist / all-items-acked (pull_request) [info tier:low] acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, l
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request_target) Successful in 5s
sop-tier-check / tier-check (pull_request_target) Successful in 7s
This pull request has changes conflicting with the target branch.
  • .gitea/scripts/gitea-merge-queue.py
View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u origin infra/queue-runbook-updates:infra/queue-runbook-updates
git checkout infra/queue-runbook-updates
Sign in to join this conversation.
10 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#1447