Compare commits

...

2 Commits

Author SHA1 Message Date
infra-runtime-be 05bd6b3098 fix(queue): auto-hold PRs when required contexts not green
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 3s
CI / Detect changes (pull_request) Successful in 6s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 10s
E2E API Smoke Test / detect-changes (pull_request) Successful in 10s
E2E Chat / detect-changes (pull_request) Successful in 7s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 8s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 16s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 13s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 1m24s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 1m14s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m5s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 11s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 7s
qa-review / approved (pull_request) Failing after 6s
gate-check-v3 / gate-check (pull_request) Successful in 7s
security-review / approved (pull_request) Failing after 8s
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request) Successful in 6s
sop-tier-check / tier-check (pull_request) Successful in 7s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 3s
E2E Chat / E2E Chat (pull_request) Successful in 5s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 3s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 3s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m26s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 3s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 1m50s
audit-force-merge / audit (pull_request) Waiting to run
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 1m8s
CI / Platform (Go) (pull_request) Successful in 4m26s
CI / Python Lint & Test (pull_request) Successful in 7m32s
CI / Canvas (Next.js) (pull_request) Failing after 9m48s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / all-required (pull_request) Failing after 10m3s
When the merge queue encounters a PR whose required status checks are not
green, it now applies the merge-queue-hold label and posts a comment
explaining the blocker. Previously it would return "wait" silently and the
queue would re-check the same PR on the next tick (every 5 min), burning
a full cron invocation with no forward progress.

Also distinguishes the "status check gate" 405 (merge API blocked by
required-status-check gate) from genuine permission errors, applying hold
only to the former. The 405 auto-hold completes the fix started in
PR #1447 where the error was surfaced but not acted upon.

Fixes: internal#287 (queue cycling on qa/sec-failing PRs)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-18 04:35:47 +00:00
infra-runtime-be 0bc41713d4 fix(ci): add secrets:read to qa-review and security-review workflows
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 6s
CI / Detect changes (pull_request) Successful in 9s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 21s
E2E API Smoke Test / detect-changes (pull_request) Successful in 8s
E2E Chat / detect-changes (pull_request) Successful in 10s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 14s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 4s
CI / Platform (Go) (pull_request) Successful in 7m10s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 8s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 1m31s
CI / Canvas (Next.js) (pull_request) Successful in 7m49s
CI / Python Lint & Test (pull_request) Successful in 7m4s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 1m9s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 6s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
gate-check-v3 / gate-check (pull_request) Successful in 3s
CI / all-required (pull_request) Successful in 6m57s
qa-review / approved (pull_request) Failing after 4s
security-review / approved (pull_request) Failing after 5s
sop-tier-check / tier-check (pull_request) Successful in 6s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m12s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 1m24s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m26s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 5s
E2E Chat / E2E Chat (pull_request) Successful in 5s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 3s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 3s
sop-checklist / all-items-acked (pull_request) [info tier:low] acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, l
sop-checklist / na-declarations (pull_request) N/A: (none)
The SOP_TIER_CHECK_TOKEN team-membership probe (GET
/api/v1/teams/{id}/members/{u}) requires the workflow token to
carry secrets:read scope. Without it the API returns 403 and the
approval gate reports failure even when a valid team APPROVE exists.

Adds secrets: read to both qa-review.yml and security-review.yml
permissions blocks, consistent with sop-checklist/sop-tier-check
fix in PR #1414.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 22:58:43 +00:00
4 changed files with 154 additions and 14 deletions
+73 -14
View File
@@ -348,6 +348,30 @@ def post_comment(pr_number: int, body: str, *, dry_run: bool) -> None:
api("POST", f"/repos/{OWNER}/{NAME}/issues/{pr_number}/comments", body={"body": body})
def add_hold_label(pr_number: int, dry_run: bool) -> bool:
"""Apply the merge-queue-hold label to a PR. Returns True if the label
was added (or was already present)."""
if not HOLD_LABEL:
return False
print(f"::notice::adding `{HOLD_LABEL}` to PR #{pr_number}")
if dry_run:
return True
try:
api(
"POST",
f"/repos/{OWNER}/{NAME}/issues/{pr_number}/labels",
body={"labels": [HOLD_LABEL]},
)
return True
except ApiError as exc:
# 404 = PR already closed/deleted; 422 = label already present (Gitea
# returns 422 for duplicate label assignment — not a real error).
if "404" in str(exc) or "422" in str(exc):
return True
sys.stderr.write(f"::warning::could not add hold label to PR #{pr_number}: {exc}\n")
return False
def update_pull(pr_number: int, *, dry_run: bool) -> None:
print(f"::notice::updating PR #{pr_number} with base branch via style={UPDATE_STYLE}")
if dry_run:
@@ -444,6 +468,24 @@ def process_once(*, dry_run: bool = False) -> int:
dry_run=dry_run,
)
return 0
if decision.action == "wait":
# Required contexts are not green. Auto-hold so the queue stops cycling
# on this PR and processes the next. Holds are removed manually once the
# blocker (e.g. qa/sec gate, missing SOP_TIER_CHECK_TOKEN) is resolved.
# Distinguish "not all required status checks successful" 405 (merge
# attempted → add hold + comment) from permanent permission errors.
add_hold_label(pr_number, dry_run=dry_run)
post_comment(
pr_number,
(
f"merge-queue: auto-held — required contexts not green: "
f"{decision.reason}. "
"Remove the `merge-queue-hold` label and re-label `merge-queue` "
"to restart queue processing once the blocker is resolved."
),
dry_run=dry_run,
)
return 0
if decision.ready:
latest_main_sha = get_branch_head(WATCH_BRANCH)
if latest_main_sha != main_sha:
@@ -455,21 +497,38 @@ def process_once(*, dry_run: bool = False) -> int:
try:
merge_pull(pr_number, dry_run=dry_run)
except MergePermissionError as exc:
# Permanent merge failure (HTTP 403/404/405). Post a comment so
# maintainers know why, then return 0 so this tick is done.
# The PR stays in the queue; future ticks can retry after the
# permission issue is resolved.
# Permanent merge failure (HTTP 403/404/405). Distinguish the
# Gitea-internal "status check gate" 405 (merge attempted, gate
# blocked) from a genuine permission error.
msg_lower = str(exc).lower()
is_status_check_failure = "not all required status checks successful" in msg_lower
sys.stderr.write(f"::error::merge permission error for PR #{pr_number}: {exc}\n")
post_comment(
pr_number,
(
"merge-queue: merge failed with HTTP 405 'User not allowed to merge PR'. "
"No available token has Can-merge permission on this repo. "
"Fix: grant Can-merge to a token, or add a maintain/admin collaborator. "
"Skipping to next queued PR on next tick."
),
dry_run=dry_run,
)
if is_status_check_failure:
# Merge API returned 405 because a required status check (e.g.
# qa-review, security-review) was still failing at merge time.
# Auto-hold so the queue stops cycling and processes the next PR.
add_hold_label(pr_number, dry_run=dry_run)
post_comment(
pr_number,
(
"merge-queue: merge attempt blocked by Gitea's required-status-check "
"gate (HTTP 405 'not all required status checks successful'). "
"Auto-held — remove `merge-queue-hold` and re-label `merge-queue` "
"once the blocking checks pass."
),
dry_run=dry_run,
)
else:
post_comment(
pr_number,
(
"merge-queue: merge failed with HTTP 405 'User not allowed to merge PR'. "
"No available token has Can-merge permission on this repo. "
"Fix: grant Can-merge to a token, or add a maintain/admin collaborator. "
"Skipping to next queued PR on next tick."
),
dry_run=dry_run,
)
return 0
return 0
return 0
@@ -128,3 +128,82 @@ def test_MergePermissionError_message_preserved():
exc = mq.MergePermissionError("POST /merge -> HTTP 405: User not allowed")
assert "405" in str(exc)
assert "User not allowed" in str(exc)
def test_merge_decision_waits_when_required_contexts_not_green():
"""When a required context (e.g. CI / all-required) is not success, the
decision is 'wait' — the queue can then auto-hold on this."""
required = [
"CI / all-required (pull_request)",
"sop-checklist / all-items-acked (pull_request)",
]
decision = mq.evaluate_merge_readiness(
main_status={
"state": "success",
"statuses": [{"context": "CI / all-required (push)", "status": "success"}],
},
pr_status={
"state": "failure",
"statuses": [
{"context": "CI / all-required (pull_request)", "status": "failure"},
{"context": "sop-checklist / all-items-acked (pull_request)", "status": "success"},
],
},
required_contexts=required,
pr_has_current_base=True,
pr_labels=None,
)
assert decision.ready is False
assert decision.action == "wait"
assert "CI / all-required" in decision.reason
def test_tier_low_sop_checklist_pending_is_accepted():
"""tier:low PRs get soft-fail on sop-checklist: pending is OK."""
required = ["sop-checklist / all-items-acked (pull_request)"]
statuses = {
"sop-checklist / all-items-acked (pull_request)": {
"status": "pending",
}
}
ok, missing = mq.required_contexts_green(
statuses, required, pr_labels={"tier:low"}
)
assert ok is True
assert missing == []
def test_tier_low_sop_checklist_failure_is_not_accepted():
"""tier:low soft-fail only covers pending, not actual failure."""
required = ["sop-checklist / all-items-acked (pull_request)"]
statuses = {
"sop-checklist / all-items-acked (pull_request)": {
"status": "failure",
}
}
ok, missing = mq.required_contexts_green(
statuses, required, pr_labels={"tier:low"}
)
assert ok is False
def test_is_tier_low_pending_ok_true():
statuses = {
"sop-checklist / all-items-acked (pull_request)": {"status": "pending"}
}
assert mq._is_tier_low_pending_ok(
statuses,
"sop-checklist / all-items-acked (pull_request)",
{"tier:low"},
) is True
def test_is_tier_low_pending_ok_not_tier_low():
statuses = {
"sop-checklist / all-items-acked (pull_request)": {"status": "pending"}
}
assert mq._is_tier_low_pending_ok(
statuses,
"sop-checklist / all-items-acked (pull_request)",
set(),
) is False
+1
View File
@@ -89,6 +89,7 @@ on:
permissions:
contents: read
pull-requests: read
secrets: read # required for SOP_TIER_CHECK_TOKEN team-membership probe
jobs:
# bp-exempt: PR review bot signal; required merge state is enforced by CI / all-required.
+1
View File
@@ -16,6 +16,7 @@ on:
permissions:
contents: read
pull-requests: read
secrets: read # required for SOP_TIER_CHECK_TOKEN team-membership probe
jobs:
# bp-exempt: PR security review bot signal; required merge state is enforced by CI / all-required.