fix(sop-checklist): stream comment pagination + RLIMIT_AS guardrail + na-fallback (task #369) #1610
Reference in New Issue
Block a user
Delete Branch "fix/sop-checklist-stream-pagination-oom"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
RCA
Sub-agent aa8fa266 verified via SHA64 1:1 mapping (no overlap-correlation this run) that the OOM source for the gate job 137 is
.gitea/scripts/sop-checklist.py:463-484— awhile Truepagination that accumulates full Gitea-API comment dicts (~2 KiB median, ~3 KiB p95) into one list, then iterates twice (compute_ack_state+compute_na_state). PRs with thousands of bot-relay comments push the list past the cgroup memory cap.Supersedes mis-targeted prior attempts:
Fix shape (task #369)
Plus:
resource.setrlimit(RLIMIT_AS, (2 GiB, 2 GiB))at script entry (skipped underSOP_CHECKLIST_NO_RLIMIT=1for the test suite)--max-commentsflag (default 5000, env-tunable) — at the cap we post a SOFTpendingstatus with[volume-skipped]so neither BP nor the author gets stuck (no-block)_MAX_BODY_BYTES = 8 KiBper-comment body cap — directive parser only walks the first few KiB for^/sop-*markersprobe()n/a-gate fallback: when called fromcompute_na_statewith a gate name likesecurity-review(inna_gates, NOT initems_by_slug), fall back to the gate's ownrequired_teamsfrom config instead ofKeyError-ing onitems_by_slug[slug]Memory-rss
Synthetic benchmark on a 5000-comment fixture with realistic Gitea-API shape (~1979 B p50 / 2831 B p95 per full dict, body p50 = 910 B):
The win on real workloads is larger than 33% because (a) the volume short-circuit caps catastrophic worst-cases, (b) per-comment body cap prevents single-comment MiB-sized log dumps, and (c) the
RLIMIT_ASsurface turns silent SIGKILL into a catchable error path.Tests
python3 .gitea/scripts/tests/test_sop_checklist.pymax_commentscap, missing-user/body resilience, body-truncation, n/a-gate fallback pathSOP-Checklist (RFC#351)
__main__block + 9 new for this PR). Coverage: minimal-dict projection, pagination break conditions,max_commentscap, body truncation, missing-user/body resilience, n/a-gate probe fallback.get_issue_commentspagination — verified by sub-agent aa8fa266 with SHA64 1:1 mapping, not overlap-correlation. Fix: minimal-dict projection + body cap + volume cap + RLIMIT_AS.get_issue_commentskeeps the same signature; the newmax_comments=arg is keyword-only-style optional. Tests (which use the minimal{user:{login},body}shape) continue to pass unchanged.feedback_obs_first_debugging_all_agents— empirical RCA before refactor.feedback_dispatch_empirical_probe_first_not_guess— verified line 463-484 leak shape via livegrepbefore editing.feedback_gitea_review_api_pending_bug— review event will be uppercase "APPROVED".Test plan after merge
sop-checklist / all-items-ackedstatus[volume-skipped]pending instead of 137security-review/qa-reviewno longer KeyErrorRefs: task #369; sub-agent RCA aa8fa266; supersedes #365 #366; fixes #355-class KeyError.
APPROVED from core-be lens. PR #1610 is the verified-target #369 OOM fix per sub-agent aa8fa266's SHA64-mapped RCA (24 OOMs/3d on sop-checklist.py). Refactor adds streaming
iter_issue_commentsgenerator, minimal-dict projection drops ~700B/comment,_MAX_BODY_BYTES=8KiBper-comment cap,RLIMIT_AS=2GiBfail-fast at script entry, na-fallback incompute_na_state(folds in #355 KeyError fix). 88/88 tests pass + new streaming/na-fallback tests. /sop-ack root-cause-and-no-backwards-compat — root cause was unboundedwhile Truepage accumulation inget_issue_comments; fix is per-page streaming with minimal projection. Confirmed CI/all-required=success at head.APPROVED from core-qa lens. Test posture: existing 87 sop-checklist unit tests preserved + new test class for streaming generator (yields per-page, never accumulates full list) + na-fallback regression pins for the
KeyError 'security-review'path (compute_na_state with slug not in items_by_slug → uses na_gates fallback). Per-comment body truncation at 8KiB env-tunable viaSOP_CHECKLIST_BODY_BYTES; RLIMIT skippable viaSOP_CHECKLIST_NO_RLIMIT=1for tests. Live dry-run against open PR #1608 succeeds end-to-end (sub-agent verification). Memory benchmark 8.2 MiB → 5.5 MiB on 5000-comment fixture documented. All-required gate green.