fix(ci): main-red-watchdog skips cancel-cascade entries (mc#1564) (#1571)
CI / Canvas Deploy Reminder (push) Blocked by required conditions
E2E API Smoke Test / E2E API Smoke Test (push) Blocked by required conditions
E2E Chat / E2E Chat (push) Blocked by required conditions
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Blocked by required conditions
Handlers Postgres Integration / Handlers Postgres Integration (push) Blocked by required conditions
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (push) Waiting to run
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (push) Waiting to run
Runtime PR-Built Compatibility / detect-changes (push) Waiting to run
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Blocked by required conditions
Secret scan / Scan diff for credential-shaped strings (push) Waiting to run
Block internal-flavored paths / Block forbidden paths (push) Successful in 9s
CI / Detect changes (push) Successful in 13s
publish-workspace-server-image / build-and-push (push) Successful in 5m51s
CI / Shellcheck (E2E scripts) (push) Successful in 16s
E2E API Smoke Test / detect-changes (push) Successful in 20s
publish-workspace-server-image / Production auto-deploy (push) Failing after 33s
E2E Chat / detect-changes (push) Successful in 11s
CI / Canvas (Next.js) (push) Has been cancelled
CI / all-required (push) Has been cancelled
CI / Python Lint & Test (push) Has been cancelled
Handlers Postgres Integration / detect-changes (push) Has been cancelled
E2E Staging Canvas (Playwright) / detect-changes (push) Has been cancelled
CI / Platform (Go) (push) Has been cancelled
Lint no tenant GITEA/GITHUB token write / Scan for repo-host token write into tenant workspace surface (push) Successful in 4s
Ops Scripts Tests / Ops scripts (unittest) (push) Successful in 26s
gitea-merge-queue / queue (push) Successful in 14s
status-reaper / reap (push) Compensated by status-reaper (workflow has no push: trigger; Gitea 1.22.6 hardcoded-suffix bug — see .gitea/scripts/status-reaper.py)

This commit was merged in pull request #1571.
This commit is contained in:
2026-05-19 20:23:42 +00:00
2 changed files with 159 additions and 2 deletions
+46 -2
View File
@@ -218,6 +218,31 @@ def is_red(status: dict) -> tuple[bool, list[dict]]:
`failed_statuses` is the list of per-context entries whose own
`state` is in the red set; useful for the issue body.
Cancel-cascade filter (mc#1564, 2026-05-19):
Gitea maps BOTH `action_run.status=2 (Failure)` AND
`action_run.status=3 (Cancelled)` to commit-status string
`"failure"`. On a busy main with
`concurrency: cancel-in-progress: true`, every merge burst
cancels prior in-flight runs (status=3) — those bubble to the
combined-status `failure` and inflate the watchdog's red%,
generating phantom `[main-red]` issues (mc#1562/#1552/#1540/...).
Canonical Gitea 1.22.6 enum per `models/actions/status.go` +
`reference_gitea_action_status_enum_corrected_2026_05_19`:
1=Success, 2=Failure, 3=Cancelled, 4=Skipped,
5=Waiting, 6=Running, 7=Blocked
We only want status=2 (real defects) to file. At the
commit-status layer we don't have the integer enum directly
(only the `failure` rollup string), so we use the description
string Gitea writes when a run is cancelled — empirically
`"Has been cancelled"` (verified 2026-05-19 via #1562 body).
Real failures show `"Failing after Ns"` and are unaffected.
This is option B from mc#1564 (description-string filter, no
extra API call). Description-string stability is a soft contract
with Gitea; if a future release renames it, the cancel-cascade
entries will simply leak back through (visible-not-silent), and
we'll either re-pin the string or upgrade to option A (resolve
the underlying action_run.status integer via target_url).
"""
combined = status.get("state")
statuses = status.get("statuses") or []
@@ -233,11 +258,30 @@ def is_red(status: dict) -> tuple[bool, list[dict]]:
def _entry_state(s: dict) -> str:
return s.get("status") or s.get("state") or ""
def _is_cancel_cascade(s: dict) -> bool:
"""status=3 entry per Gitea 1.22.6 description-string contract.
Match exactly (after strip) — substring match would catch
legitimate test names like "Has been cancelled by the user
unexpectedly" in failure logs."""
desc = (s.get("description") or "").strip()
return desc == "Has been cancelled"
failed = [
s for s in statuses
if isinstance(s, dict) and _entry_state(s) in red_states
if isinstance(s, dict)
and _entry_state(s) in red_states
and not _is_cancel_cascade(s)
]
return (combined in red_states or bool(failed), failed)
# Combined state alone is no longer sufficient — combined=failure
# may be 100% cancel-cascade. Drive `red` off the FILTERED list:
# if every red-shaped per-entry was cancel-cascade, `failed` is
# empty and we report green. Combined-failure with no per-entry
# detail (empty `statuses[]`) still trips red — that's the
# "CI emitter set combined-status directly" edge case from
# render_body's fallback path; we keep filing on it so the
# operator sees the breadcrumb.
combined_red_no_detail = combined in red_states and not statuses
return (bool(failed) or combined_red_no_detail, failed)
# --------------------------------------------------------------------------
+113
View File
@@ -244,6 +244,119 @@ def test_is_red_state_only_fallback_still_works(wd_module):
assert len(failed) == 1
# --------------------------------------------------------------------------
# Cancel-cascade filter (mc#1564) — Gitea maps action_run.status=2 (Failure)
# AND status=3 (Cancelled) BOTH to commit-status `"failure"`. We only want
# real failures (status=2) to file. status=3 entries carry description
# `"Has been cancelled"`; real failures carry `"Failing after Ns"`.
# Canonical Gitea 1.22.6 enum (1=Success, 2=Failure, 3=Cancelled, 4=Skipped,
# 5=Waiting, 6=Running, 7=Blocked) per
# `reference_gitea_action_status_enum_corrected_2026_05_19`.
# --------------------------------------------------------------------------
def test_is_red_skips_cancel_cascade_entry(wd_module):
"""status=3 (Cancelled, description='Has been cancelled') must NOT
count as red. Cancel-cascade from `concurrency: cancel-in-progress`
on a busy main was generating phantom `[main-red]` issues (mc#1564
evidence: mc#1562/#1552/#1540 et al). The filter is the durable fix."""
red, failed = wd_module.is_red({
"state": "failure",
"statuses": [
{"context": "ci/canvas-deploy-reminder",
"status": "failure",
"description": "Has been cancelled"},
],
})
assert red is False, (
"cancel-cascade entry (description='Has been cancelled', i.e. "
"Gitea action_run.status=3) must not trip the watchdog"
)
assert failed == []
def test_is_red_keeps_real_failure_entry(wd_module):
"""status=2 (Failure, description='Failing after Ns') IS red.
Companion to the cancel-cascade filter — we must not over-filter."""
red, failed = wd_module.is_red({
"state": "failure",
"statuses": [
{"context": "ci/test",
"status": "failure",
"description": "Failing after 12s"},
],
})
assert red is True
assert len(failed) == 1
assert failed[0]["context"] == "ci/test"
def test_is_red_mixed_cancel_and_real_failure(wd_module):
"""Real-world shape (mc#1562 body, verified 2026-05-19): combined
`failure` with a mix of 'Failing after Ns' and 'Has been cancelled'
entries. The watchdog must file (real failures present) AND the
failed[] list must contain ONLY the real failures — cancel-cascade
noise is filtered out of the issue body."""
red, failed = wd_module.is_red({
"state": "failure",
"statuses": [
{"context": "ci/test", "status": "failure",
"description": "Failing after 1m49s"},
{"context": "ci/canvas-deploy-reminder", "status": "failure",
"description": "Has been cancelled"},
{"context": "ci/lint", "status": "failure",
"description": "Failing after 8s"},
],
})
assert red is True
assert [s["context"] for s in failed] == ["ci/test", "ci/lint"], (
"cancel-cascade entry should be filtered out of failed[] body"
)
def test_is_red_all_entries_cancelled_is_green(wd_module):
"""Pure cancel-cascade (every red-shaped entry is status=3) = green.
This is the phantom-issue case the watchdog was generating before
mc#1564. With the filter, no issue files."""
red, failed = wd_module.is_red({
"state": "failure",
"statuses": [
{"context": "ci/a", "status": "failure",
"description": "Has been cancelled"},
{"context": "ci/b", "status": "failure",
"description": "Has been cancelled"},
],
})
assert red is False
assert failed == []
def test_is_red_combined_failure_no_per_entry_still_red(wd_module):
"""Edge case: combined=failure with empty statuses[] — preserved
from rev4 behaviour. This is the "CI emitter set combined-status
directly without a per-context status" path (render_body fallback);
the operator still needs the breadcrumb. The cancel-cascade filter
only fires on per-entry detail, so this is unaffected."""
red, failed = wd_module.is_red({"state": "failure", "statuses": []})
assert red is True
assert failed == []
def test_is_red_cancel_cascade_filter_exact_match_only(wd_module):
"""The cancel-cascade filter matches description EXACTLY (after
strip) — substring would over-match (e.g. a hypothetical test
output `"Has been cancelled by the user unexpectedly"` should
remain a real failure). Locks down the contract."""
red, failed = wd_module.is_red({
"state": "failure",
"statuses": [
{"context": "ci/edge",
"status": "failure",
"description": "Has been cancelled by the user unexpectedly"},
],
})
assert red is True
assert len(failed) == 1
def test_render_body_uses_status_key_for_per_entry_state(wd_module):
"""render_body must surface the per-entry `status` value in the
issue body. Pre-rev4 it read `state` (always None on real Gitea) →