ci(gate-check-v3): add timeout-minutes:10 to gate-check job (hung-run bound) #2561
Reference in New Issue
Block a user
Delete Branch "fix/gate-check-v3-timeout"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
The
gate-checkjob in.gitea/workflows/gate-check-v3.ymlwas unbounded. Combined with the intentionalcancel-in-progress: false(Gitea 1.22.6 has no REST rerun and risksdismiss_stale_approvalsper the workflow comment), a hung-while-running run hangs the concurrency group unconcluded.The
gate-check-v3 / gate-check (pull_request_target)context is advisory —continue-on-error: true+ bp-exempt — so a concluded failure is ignored. Only an UNconcluded hung run 405-blocks the Gitea merge-check.Fix
Add
timeout-minutes: 10to thegate-checkjob. Normal conclude-time is ~12-16s, so 10m is a ~30x safety margin over a stuck Python tick. A hung run hits the timeout → concludes as failure → ignored as advisory.Diff
Single-line scope: ONLY add
timeout-minutes. NO change tocancel-in-progress(keptfalse),continue-on-error,if:,needs:, or the bp-exempt config. The workflow remains advisory-only.Caveat (for follow-up, NOT in this PR)
timeout-minutesonly counts once the run is RUNNING. It does NOT fix aQUEUED-never-picked-uporphan (operator-side run-clearing is the fix for that).Files changed
.gitea/workflows/gate-check-v3.yml(1 file, +7 / -0)No production code.
ci.yml/workflows-only.APPROVE — gate-sensitive 5-axis (1st lane). Clean, correct fix; it directly implements the fleet-YAML hardening from the gate-check-v3 wedge RCA.
timeout-minutes: 10on thegate-checkjob. Withcontinue-on-error: true, a hung-while-running tick now hits the bound and the job CONCLUDES (timed-out → ignored as advisory/bp-exempt) instead of hanging unconcluded — which is exactly what made the unconcluded context 405-block Gitea's merge-check (the wedge I RCA'd). Correctly placed and correctly reasoned (the comment cites the precise mechanism + the 12-16s normal conclude time → 10m = ~30x margin).Minor (non-blocking, FYI):
timeout-minutesapplies to the whole job, and inschedule/cron mode the job iterates ALL open PRs sequentially (~15s each). With a large open-PR count a cron sweep could approach/exceed 10m and be cut off mid-sweep → partial refresh. That's acceptable here (cron is advisory +continue-on-errormakes it non-fatal + hourly retry, and the merge-critical per-PRpull_request_targetruns are ~15s, far under the bound) — but if the fleet's open-PR count grows a lot, consider a separate/larger bound for the cron leg. Not blocking; approving.CI/all-required green; the non-success contexts are the known IGNORE-set (Local-Provision advisory/stub, sop-checklist pull_request author-ceremony) + the pull_request_target review-bot gates this review helps satisfy. 1st distinct lane.
APPROVED — 5-axis review on head
1d10f4db14.Correctness: adds only
timeout-minutes: 10to the advisorygate-checkjob, bounding hung-while-running executions without changingcontinue-on-error, concurrency, triggers, branch-protection requirements, or detector logic. Robustness: normal runs are seconds long, so 10 minutes is a generous bound; the documented queued-never-picked-up caveat is accurate. Security: no secrets/auth/input surface changes. Performance: prevents indefinite runner/merge-check occupation. Readability: the comment states the exact operational intent and limitation.CI is green on this head and Researcher approval 10666 is current, so this supplies the second distinct lane.