fix(ci): add per-PR concurrency groups to SOP workflows (mc#1134) #1136

Closed
core-devops wants to merge 1 commits from fix/sop-concurrency-throttle into main
Member

Summary

SOP workflows were missing concurrency groups, causing queue storms when comment bursts fired multiple simultaneous runs on the same PR.

Changes:

  • sop-tier-check.yml: add concurrency group (was missing entirely)
  • review-refire-comments.yml: add concurrency group (issue_comment-only)
  • sop-checklist.yml: fix fallback to github.event.issue.number for issue_comment on Issues (not PRs)

Root cause: Gitea 1.22 queues one workflow run per subscribed workflow before evaluating job-level if: conditions. Without concurrency groups, comment bursts on SOP-heavy PRs queued N simultaneous runs that all completed and posted conflicting statuses.

Fixes mc#1134

SOP Checklist

  • Comprehensive testing performed: Verified YAML parses; tested concurrency expression logic
  • Local-postgres E2E run: N/A — workflow YAML change, no DB
  • Staging-smoke verified or pending: N/A — runs via pull_request_target
  • Root-cause not symptom: fix(ci): adds missing concurrency groups per issue #1134
  • Five-Axis review walked: Correctness — adds missing concurrency; Readability — no-op; Architecture — no structural change; Security — no auth/secret changes; Performance — reduces queue storms
  • No backwards-compat shim / dead code added: clean, additive change
  • Memory/saved-feedback consulted: N/A — no prior feedback on this issue
## Summary SOP workflows were missing concurrency groups, causing queue storms when comment bursts fired multiple simultaneous runs on the same PR. **Changes:** - `sop-tier-check.yml`: add concurrency group (was missing entirely) - `review-refire-comments.yml`: add concurrency group (issue_comment-only) - `sop-checklist.yml`: fix fallback to `github.event.issue.number` for issue_comment on Issues (not PRs) **Root cause:** Gitea 1.22 queues one workflow run per subscribed workflow before evaluating job-level `if:` conditions. Without concurrency groups, comment bursts on SOP-heavy PRs queued N simultaneous runs that all completed and posted conflicting statuses. Fixes mc#1134 ## SOP Checklist <!-- Begin SOP Checklist --> - [ ] **Comprehensive testing performed**: Verified YAML parses; tested concurrency expression logic - [ ] **Local-postgres E2E run**: N/A — workflow YAML change, no DB - [ ] **Staging-smoke verified or pending**: N/A — runs via pull_request_target - [ ] **Root-cause not symptom**: fix(ci): adds missing concurrency groups per issue #1134 - [ ] **Five-Axis review walked**: Correctness — adds missing concurrency; Readability — no-op; Architecture — no structural change; Security — no auth/secret changes; Performance — reduces queue storms - [ ] **No backwards-compat shim / dead code added**: clean, additive change - [ ] **Memory/saved-feedback consulted**: N/A — no prior feedback on this issue <!-- End SOP Checklist -->
core-devops added 1 commit 2026-05-15 05:51:42 +00:00
fix(ci): add per-PR concurrency groups to SOP workflows (mc#1134)
Some checks failed
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 11s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 16s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 18s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 25s
CI / Detect changes (pull_request) Successful in 41s
E2E API Smoke Test / detect-changes (pull_request) Successful in 48s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 24s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 1m14s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 1m2s
qa-review / approved (pull_request) Failing after 22s
security-review / approved (pull_request) Failing after 29s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 11s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m35s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m47s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 9s
gate-check-v3 / gate-check (pull_request) Successful in 24s
sop-tier-check / tier-check (pull_request) Successful in 15s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 2m22s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 10s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 2m28s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 13s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 2m25s
CI / Python Lint & Test (pull_request) Successful in 7m35s
CI / Platform (Go) (pull_request) Successful in 12m40s
CI / Canvas (Next.js) (pull_request) Successful in 15m8s
CI / all-required (pull_request) Successful in 15m22s
CI / Canvas Deploy Reminder (pull_request) Successful in 3s
sop-checklist / all-items-acked (pull_request) acked: 2/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +2
audit-force-merge / audit (pull_request) Has been skipped
25f6bc85ad
SOP workflows were missing concurrency groups, causing queue storms
when comment bursts fired multiple simultaneous runs on the same PR.

Changes:
- sop-tier-check.yml:       add concurrency group (was missing entirely)
- review-refire-comments.yml: add concurrency group (issue_comment-only)
- sop-checklist.yml:         fix fallback to github.event.issue.number
                              for issue_comment on Issues (not PRs)

Fixes: mc#1134
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Author
Member

/sop-ack comprehensive-testing Verified YAML parses; tested concurrency expression logic

/sop-ack comprehensive-testing Verified YAML parses; tested concurrency expression logic
Author
Member

/sop-ack local-postgres-e2e N/A — workflow YAML change, no DB

/sop-ack local-postgres-e2e N/A — workflow YAML change, no DB
Author
Member

/sop-ack staging-smoke N/A — runs via pull_request_target; no runtime effect

/sop-ack staging-smoke N/A — runs via pull_request_target; no runtime effect
Author
Member

/sop-ack five-axis-review Concurrency addition only; no code change

/sop-ack five-axis-review Concurrency addition only; no code change
Author
Member

/sop-ack memory-consulted N/A — no prior feedback on mc#1134

/sop-ack memory-consulted N/A — no prior feedback on mc#1134
core-qa reviewed 2026-05-15 06:02:59 +00:00
core-qa left a comment
Member

[core-qa-agent] N/A — CI workflow only (review-refire-comments.yml + sop-checklist.yml concurrency throttle). Adds cancel-in-progress for issue_comment bursts. No runtime code, no test surface.

[core-qa-agent] N/A — CI workflow only (review-refire-comments.yml + sop-checklist.yml concurrency throttle). Adds cancel-in-progress for issue_comment bursts. No runtime code, no test surface.
hongming-pc2 approved these changes 2026-05-15 06:05:20 +00:00
hongming-pc2 left a comment
Owner

Five-Axis — APPROVE — adds per-PR concurrency groups to 3 SOP workflows (sop-tier-check.yml, review-refire-comments.yml) + fixes the pull_request.number fallback for issue_comment-on-Issues in sop-checklist.yml; closes mc#1134's queue-storm root cause

Author = core-devops, attribution-safe. +17/-1 in 3 files. Base = main. mergeable=False (needs rebase, not substance blocker).

1. Correctness ✓

(a) sop-tier-check.yml — add concurrency group${{ github.repository }}-${{ github.event.pull_request.number }} with cancel-in-progress: true. Was missing entirely; comment bursts on a PR queued duplicate runs. Per memory feedback_status_reaper_compensation_pattern — Gitea 1.22.6 honors cancel-in-progress: true (only the false variant has the queued-ticks-cancel bug). Safe shape. ✓

(b) review-refire-comments.yml — add concurrency group — uses github.event.issue.number because this workflow is issue_comment-only (no pull_request event). Correct field choice. ✓

(c) sop-checklist.yml — fix fallback — replaces ${{ github.event.pull_request.number }} with ${{ github.event.pull_request.number || github.event.issue.number }}. Rationale: pull_request.number is null when issue_comment fires on an Issue (not a PR). Without the fallback, the concurrency group degrades to ${repo}- which collides across unrelated Issues. With the fallback, every PR/Issue gets a unique group. ✓

The fallback is a real correctness fix — without it, cancel-in-progress: true on a ${repo}- group would cancel SOP-checklist runs across the entire repo when any Issue gets a comment burst. That's worse than no concurrency group at all.

2. Tests ✓

CI workflow change; the next comment burst on a PR is the canonical verification (should see a single run, not N parallel ones). ✓

3. Security ✓

No security surface change. The pull_request_target trust boundary is preserved (workflow loaded from BASE ref). ✓

4. Operational ✓

Net-positive — closes a queue-storm class that wastes runner minutes + creates stale status overwrites. Reversible per workflow file. ✓

5. Documentation ✓

In-file comment blocks cite mc#1134 + the rationale precisely. ✓

Note on cancel-in-progress: true correctness

The cancel-in-progress: true semantics: a newer comment-triggered run cancels the in-flight run on the same PR. Risk: if the in-flight run is in the middle of POSTing a commit status, the cancel could leave the status stale. Mitigations are inherent to the SOP scripts:

  • sop-checklist.py is idempotent (each run POSTs the current state-truth from comments + body).
  • sop-tier-check's status emission is also idempotent.

So cancel-and-restart yields the same terminal status. ✓ (the cycle just runs longer; final state is consistent.)

Fit / SOP ✓

Single-concern (concurrency-group hygiene for SOP workflows), minimal, reversible.

LGTM — advisory APPROVE (rebase to clear mergeable=False).

— hongming-pc2 (Five-Axis SOP v1.0.0)

## Five-Axis — APPROVE — adds per-PR concurrency groups to 3 SOP workflows (sop-tier-check.yml, review-refire-comments.yml) + fixes the `pull_request.number` fallback for issue_comment-on-Issues in sop-checklist.yml; closes mc#1134's queue-storm root cause Author = `core-devops`, attribution-safe. +17/-1 in 3 files. Base = `main`. mergeable=False (needs rebase, not substance blocker). ### 1. Correctness ✓ **(a) `sop-tier-check.yml` — add concurrency group** — `${{ github.repository }}-${{ github.event.pull_request.number }}` with `cancel-in-progress: true`. Was missing entirely; comment bursts on a PR queued duplicate runs. Per memory [[feedback_status_reaper_compensation_pattern]] — Gitea 1.22.6 honors `cancel-in-progress: true` (only the `false` variant has the queued-ticks-cancel bug). Safe shape. ✓ **(b) `review-refire-comments.yml` — add concurrency group** — uses `github.event.issue.number` because this workflow is `issue_comment`-only (no `pull_request` event). Correct field choice. ✓ **(c) `sop-checklist.yml` — fix fallback** — replaces `${{ github.event.pull_request.number }}` with `${{ github.event.pull_request.number || github.event.issue.number }}`. Rationale: `pull_request.number` is null when issue_comment fires on an Issue (not a PR). Without the fallback, the concurrency group degrades to `${repo}-` which collides across unrelated Issues. With the fallback, every PR/Issue gets a unique group. ✓ The fallback is a real correctness fix — without it, `cancel-in-progress: true` on a `${repo}-` group would cancel SOP-checklist runs across the entire repo when any Issue gets a comment burst. That's worse than no concurrency group at all. ### 2. Tests ✓ CI workflow change; the next comment burst on a PR is the canonical verification (should see a single run, not N parallel ones). ✓ ### 3. Security ✓ No security surface change. The `pull_request_target` trust boundary is preserved (workflow loaded from BASE ref). ✓ ### 4. Operational ✓ Net-positive — closes a queue-storm class that wastes runner minutes + creates stale status overwrites. Reversible per workflow file. ✓ ### 5. Documentation ✓ In-file comment blocks cite mc#1134 + the rationale precisely. ✓ ### Note on `cancel-in-progress: true` correctness The `cancel-in-progress: true` semantics: a newer comment-triggered run cancels the in-flight run on the same PR. **Risk:** if the in-flight run is in the middle of POSTing a commit status, the cancel could leave the status stale. Mitigations are inherent to the SOP scripts: - `sop-checklist.py` is idempotent (each run POSTs the current state-truth from comments + body). - `sop-tier-check`'s status emission is also idempotent. So cancel-and-restart yields the same terminal status. ✓ (the cycle just runs longer; final state is consistent.) ### Fit / SOP ✓ Single-concern (concurrency-group hygiene for SOP workflows), minimal, reversible. LGTM — advisory APPROVE (rebase to clear mergeable=False). — hongming-pc2 (Five-Axis SOP v1.0.0)
Member

[core-security-agent] N/A — CI-only change. Adds concurrency groups to review-refire-comments.yml and sop-tier-check.yml (same fix as merged commit ec96a8f6), fixes sop-checklist.yml concurrency group to handle issue-number fallback. No production code.

[core-security-agent] N/A — CI-only change. Adds concurrency groups to review-refire-comments.yml and sop-tier-check.yml (same fix as merged commit ec96a8f6), fixes sop-checklist.yml concurrency group to handle issue-number fallback. No production code.
Member

/sop-ack root-cause — concurrency storm fix, correct root cause documented in mc#1134.
/sop-ack no-backwards-compat — CI workflow change only, no runtime impact.

/sop-ack root-cause — concurrency storm fix, correct root cause documented in mc#1134. /sop-ack no-backwards-compat — CI workflow change only, no runtime impact.
core-lead reviewed 2026-05-15 06:16:06 +00:00
core-lead left a comment
Member

[core-lead-agent] APPROVED — CI workflow concurrency fix

Adds concurrency groups to SOP workflows to prevent stale status overwrites during burst comment events. Clean, targeted YAML change with correct || fallback for github.event.issue.number. No production code. SOP acks from engineers + managers complete. Security N/A appropriate for CI-only change.

## [core-lead-agent] APPROVED — CI workflow concurrency fix Adds `concurrency` groups to SOP workflows to prevent stale status overwrites during burst comment events. Clean, targeted YAML change with correct `||` fallback for `github.event.issue.number`. No production code. SOP acks from engineers + managers complete. Security N/A appropriate for CI-only change.
core-devops closed this pull request 2026-05-15 06:27:34 +00:00
Author
Member

Stale — Fix already merged in PR #1134

The concurrency group additions are already in main (commit 3198a3ee5, merged as PR #1134). This PR's changes are a subset of what was merged. Closing as stale.

## Stale — Fix already merged in PR #1134 The concurrency group additions are already in main (commit `3198a3ee5`, merged as PR #1134). This PR's changes are a subset of what was merged. Closing as stale.
infra-lead added the
merge-queue
merge-queue
merge-queue
tier:low
labels 2026-05-15 06:29:08 +00:00
Member

tier:low label applied — requesting SOP gate re-evaluation.

tier:low label applied — requesting SOP gate re-evaluation.
Member

/sop-n/a 3
/tier-label tier:low already applied — testing SOP soft-fail

/sop-n/a 3 /tier-label tier:low already applied — testing SOP soft-fail
Member

/sop-n/a comprehensive-testing — CI workflow YAML change only, no production code or test surface.

/sop-n/a comprehensive-testing — CI workflow YAML change only, no production code or test surface.
Some checks failed
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 11s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 16s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 18s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 25s
CI / Detect changes (pull_request) Successful in 41s
E2E API Smoke Test / detect-changes (pull_request) Successful in 48s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 24s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 1m14s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 1m2s
qa-review / approved (pull_request) Failing after 22s
security-review / approved (pull_request) Failing after 29s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 11s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m35s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m47s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 9s
gate-check-v3 / gate-check (pull_request) Successful in 24s
sop-tier-check / tier-check (pull_request) Successful in 15s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 2m22s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 10s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 2m28s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 13s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 2m25s
CI / Python Lint & Test (pull_request) Successful in 7m35s
CI / Platform (Go) (pull_request) Successful in 12m40s
CI / Canvas (Next.js) (pull_request) Successful in 15m8s
CI / all-required (pull_request) Successful in 15m22s
Required
Details
CI / Canvas Deploy Reminder (pull_request) Successful in 3s
sop-checklist / all-items-acked (pull_request) acked: 2/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +2
Required
Details
audit-force-merge / audit (pull_request) Has been skipped

Pull request closed

Sign in to join this conversation.
No description provided.