fix(ci): main-red-watchdog skips cancel-cascade entries (mc#1564) (#1571)
CI / Canvas Deploy Reminder (push) Blocked by required conditions
E2E API Smoke Test / E2E API Smoke Test (push) Blocked by required conditions
E2E Chat / E2E Chat (push) Blocked by required conditions
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Blocked by required conditions
Handlers Postgres Integration / Handlers Postgres Integration (push) Blocked by required conditions
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (push) Waiting to run
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (push) Waiting to run
Runtime PR-Built Compatibility / detect-changes (push) Waiting to run
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Blocked by required conditions
Secret scan / Scan diff for credential-shaped strings (push) Waiting to run
Block internal-flavored paths / Block forbidden paths (push) Successful in 9s
CI / Detect changes (push) Successful in 13s
publish-workspace-server-image / build-and-push (push) Successful in 5m51s
CI / Shellcheck (E2E scripts) (push) Successful in 16s
E2E API Smoke Test / detect-changes (push) Successful in 20s
publish-workspace-server-image / Production auto-deploy (push) Failing after 33s
E2E Chat / detect-changes (push) Successful in 11s
CI / Canvas (Next.js) (push) Has been cancelled
CI / all-required (push) Has been cancelled
CI / Python Lint & Test (push) Has been cancelled
Handlers Postgres Integration / detect-changes (push) Has been cancelled
E2E Staging Canvas (Playwright) / detect-changes (push) Has been cancelled
CI / Platform (Go) (push) Has been cancelled
Lint no tenant GITEA/GITHUB token write / Scan for repo-host token write into tenant workspace surface (push) Successful in 4s
Ops Scripts Tests / Ops scripts (unittest) (push) Successful in 26s
gitea-merge-queue / queue (push) Successful in 14s
status-reaper / reap (push) Compensated by status-reaper (workflow has no push: trigger; Gitea 1.22.6 hardcoded-suffix bug — see .gitea/scripts/status-reaper.py)
CI / Canvas Deploy Reminder (push) Blocked by required conditions
E2E API Smoke Test / E2E API Smoke Test (push) Blocked by required conditions
E2E Chat / E2E Chat (push) Blocked by required conditions
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Blocked by required conditions
Handlers Postgres Integration / Handlers Postgres Integration (push) Blocked by required conditions
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (push) Waiting to run
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (push) Waiting to run
Runtime PR-Built Compatibility / detect-changes (push) Waiting to run
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Blocked by required conditions
Secret scan / Scan diff for credential-shaped strings (push) Waiting to run
Block internal-flavored paths / Block forbidden paths (push) Successful in 9s
CI / Detect changes (push) Successful in 13s
publish-workspace-server-image / build-and-push (push) Successful in 5m51s
CI / Shellcheck (E2E scripts) (push) Successful in 16s
E2E API Smoke Test / detect-changes (push) Successful in 20s
publish-workspace-server-image / Production auto-deploy (push) Failing after 33s
E2E Chat / detect-changes (push) Successful in 11s
CI / Canvas (Next.js) (push) Has been cancelled
CI / all-required (push) Has been cancelled
CI / Python Lint & Test (push) Has been cancelled
Handlers Postgres Integration / detect-changes (push) Has been cancelled
E2E Staging Canvas (Playwright) / detect-changes (push) Has been cancelled
CI / Platform (Go) (push) Has been cancelled
Lint no tenant GITEA/GITHUB token write / Scan for repo-host token write into tenant workspace surface (push) Successful in 4s
Ops Scripts Tests / Ops scripts (unittest) (push) Successful in 26s
gitea-merge-queue / queue (push) Successful in 14s
status-reaper / reap (push) Compensated by status-reaper (workflow has no push: trigger; Gitea 1.22.6 hardcoded-suffix bug — see .gitea/scripts/status-reaper.py)
This commit was merged in pull request #1571.
This commit is contained in:
@@ -218,6 +218,31 @@ def is_red(status: dict) -> tuple[bool, list[dict]]:
|
||||
|
||||
`failed_statuses` is the list of per-context entries whose own
|
||||
`state` is in the red set; useful for the issue body.
|
||||
|
||||
Cancel-cascade filter (mc#1564, 2026-05-19):
|
||||
Gitea maps BOTH `action_run.status=2 (Failure)` AND
|
||||
`action_run.status=3 (Cancelled)` to commit-status string
|
||||
`"failure"`. On a busy main with
|
||||
`concurrency: cancel-in-progress: true`, every merge burst
|
||||
cancels prior in-flight runs (status=3) — those bubble to the
|
||||
combined-status `failure` and inflate the watchdog's red%,
|
||||
generating phantom `[main-red]` issues (mc#1562/#1552/#1540/...).
|
||||
Canonical Gitea 1.22.6 enum per `models/actions/status.go` +
|
||||
`reference_gitea_action_status_enum_corrected_2026_05_19`:
|
||||
1=Success, 2=Failure, 3=Cancelled, 4=Skipped,
|
||||
5=Waiting, 6=Running, 7=Blocked
|
||||
We only want status=2 (real defects) to file. At the
|
||||
commit-status layer we don't have the integer enum directly
|
||||
(only the `failure` rollup string), so we use the description
|
||||
string Gitea writes when a run is cancelled — empirically
|
||||
`"Has been cancelled"` (verified 2026-05-19 via #1562 body).
|
||||
Real failures show `"Failing after Ns"` and are unaffected.
|
||||
This is option B from mc#1564 (description-string filter, no
|
||||
extra API call). Description-string stability is a soft contract
|
||||
with Gitea; if a future release renames it, the cancel-cascade
|
||||
entries will simply leak back through (visible-not-silent), and
|
||||
we'll either re-pin the string or upgrade to option A (resolve
|
||||
the underlying action_run.status integer via target_url).
|
||||
"""
|
||||
combined = status.get("state")
|
||||
statuses = status.get("statuses") or []
|
||||
@@ -233,11 +258,30 @@ def is_red(status: dict) -> tuple[bool, list[dict]]:
|
||||
def _entry_state(s: dict) -> str:
|
||||
return s.get("status") or s.get("state") or ""
|
||||
|
||||
def _is_cancel_cascade(s: dict) -> bool:
|
||||
"""status=3 entry per Gitea 1.22.6 description-string contract.
|
||||
Match exactly (after strip) — substring match would catch
|
||||
legitimate test names like "Has been cancelled by the user
|
||||
unexpectedly" in failure logs."""
|
||||
desc = (s.get("description") or "").strip()
|
||||
return desc == "Has been cancelled"
|
||||
|
||||
failed = [
|
||||
s for s in statuses
|
||||
if isinstance(s, dict) and _entry_state(s) in red_states
|
||||
if isinstance(s, dict)
|
||||
and _entry_state(s) in red_states
|
||||
and not _is_cancel_cascade(s)
|
||||
]
|
||||
return (combined in red_states or bool(failed), failed)
|
||||
# Combined state alone is no longer sufficient — combined=failure
|
||||
# may be 100% cancel-cascade. Drive `red` off the FILTERED list:
|
||||
# if every red-shaped per-entry was cancel-cascade, `failed` is
|
||||
# empty and we report green. Combined-failure with no per-entry
|
||||
# detail (empty `statuses[]`) still trips red — that's the
|
||||
# "CI emitter set combined-status directly" edge case from
|
||||
# render_body's fallback path; we keep filing on it so the
|
||||
# operator sees the breadcrumb.
|
||||
combined_red_no_detail = combined in red_states and not statuses
|
||||
return (bool(failed) or combined_red_no_detail, failed)
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------
|
||||
|
||||
@@ -244,6 +244,119 @@ def test_is_red_state_only_fallback_still_works(wd_module):
|
||||
assert len(failed) == 1
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------
|
||||
# Cancel-cascade filter (mc#1564) — Gitea maps action_run.status=2 (Failure)
|
||||
# AND status=3 (Cancelled) BOTH to commit-status `"failure"`. We only want
|
||||
# real failures (status=2) to file. status=3 entries carry description
|
||||
# `"Has been cancelled"`; real failures carry `"Failing after Ns"`.
|
||||
# Canonical Gitea 1.22.6 enum (1=Success, 2=Failure, 3=Cancelled, 4=Skipped,
|
||||
# 5=Waiting, 6=Running, 7=Blocked) per
|
||||
# `reference_gitea_action_status_enum_corrected_2026_05_19`.
|
||||
# --------------------------------------------------------------------------
|
||||
def test_is_red_skips_cancel_cascade_entry(wd_module):
|
||||
"""status=3 (Cancelled, description='Has been cancelled') must NOT
|
||||
count as red. Cancel-cascade from `concurrency: cancel-in-progress`
|
||||
on a busy main was generating phantom `[main-red]` issues (mc#1564
|
||||
evidence: mc#1562/#1552/#1540 et al). The filter is the durable fix."""
|
||||
red, failed = wd_module.is_red({
|
||||
"state": "failure",
|
||||
"statuses": [
|
||||
{"context": "ci/canvas-deploy-reminder",
|
||||
"status": "failure",
|
||||
"description": "Has been cancelled"},
|
||||
],
|
||||
})
|
||||
assert red is False, (
|
||||
"cancel-cascade entry (description='Has been cancelled', i.e. "
|
||||
"Gitea action_run.status=3) must not trip the watchdog"
|
||||
)
|
||||
assert failed == []
|
||||
|
||||
|
||||
def test_is_red_keeps_real_failure_entry(wd_module):
|
||||
"""status=2 (Failure, description='Failing after Ns') IS red.
|
||||
Companion to the cancel-cascade filter — we must not over-filter."""
|
||||
red, failed = wd_module.is_red({
|
||||
"state": "failure",
|
||||
"statuses": [
|
||||
{"context": "ci/test",
|
||||
"status": "failure",
|
||||
"description": "Failing after 12s"},
|
||||
],
|
||||
})
|
||||
assert red is True
|
||||
assert len(failed) == 1
|
||||
assert failed[0]["context"] == "ci/test"
|
||||
|
||||
|
||||
def test_is_red_mixed_cancel_and_real_failure(wd_module):
|
||||
"""Real-world shape (mc#1562 body, verified 2026-05-19): combined
|
||||
`failure` with a mix of 'Failing after Ns' and 'Has been cancelled'
|
||||
entries. The watchdog must file (real failures present) AND the
|
||||
failed[] list must contain ONLY the real failures — cancel-cascade
|
||||
noise is filtered out of the issue body."""
|
||||
red, failed = wd_module.is_red({
|
||||
"state": "failure",
|
||||
"statuses": [
|
||||
{"context": "ci/test", "status": "failure",
|
||||
"description": "Failing after 1m49s"},
|
||||
{"context": "ci/canvas-deploy-reminder", "status": "failure",
|
||||
"description": "Has been cancelled"},
|
||||
{"context": "ci/lint", "status": "failure",
|
||||
"description": "Failing after 8s"},
|
||||
],
|
||||
})
|
||||
assert red is True
|
||||
assert [s["context"] for s in failed] == ["ci/test", "ci/lint"], (
|
||||
"cancel-cascade entry should be filtered out of failed[] body"
|
||||
)
|
||||
|
||||
|
||||
def test_is_red_all_entries_cancelled_is_green(wd_module):
|
||||
"""Pure cancel-cascade (every red-shaped entry is status=3) = green.
|
||||
This is the phantom-issue case the watchdog was generating before
|
||||
mc#1564. With the filter, no issue files."""
|
||||
red, failed = wd_module.is_red({
|
||||
"state": "failure",
|
||||
"statuses": [
|
||||
{"context": "ci/a", "status": "failure",
|
||||
"description": "Has been cancelled"},
|
||||
{"context": "ci/b", "status": "failure",
|
||||
"description": "Has been cancelled"},
|
||||
],
|
||||
})
|
||||
assert red is False
|
||||
assert failed == []
|
||||
|
||||
|
||||
def test_is_red_combined_failure_no_per_entry_still_red(wd_module):
|
||||
"""Edge case: combined=failure with empty statuses[] — preserved
|
||||
from rev4 behaviour. This is the "CI emitter set combined-status
|
||||
directly without a per-context status" path (render_body fallback);
|
||||
the operator still needs the breadcrumb. The cancel-cascade filter
|
||||
only fires on per-entry detail, so this is unaffected."""
|
||||
red, failed = wd_module.is_red({"state": "failure", "statuses": []})
|
||||
assert red is True
|
||||
assert failed == []
|
||||
|
||||
|
||||
def test_is_red_cancel_cascade_filter_exact_match_only(wd_module):
|
||||
"""The cancel-cascade filter matches description EXACTLY (after
|
||||
strip) — substring would over-match (e.g. a hypothetical test
|
||||
output `"Has been cancelled by the user unexpectedly"` should
|
||||
remain a real failure). Locks down the contract."""
|
||||
red, failed = wd_module.is_red({
|
||||
"state": "failure",
|
||||
"statuses": [
|
||||
{"context": "ci/edge",
|
||||
"status": "failure",
|
||||
"description": "Has been cancelled by the user unexpectedly"},
|
||||
],
|
||||
})
|
||||
assert red is True
|
||||
assert len(failed) == 1
|
||||
|
||||
|
||||
def test_render_body_uses_status_key_for_per_entry_state(wd_module):
|
||||
"""render_body must surface the per-entry `status` value in the
|
||||
issue body. Pre-rev4 it read `state` (always None on real Gitea) →
|
||||
|
||||
Reference in New Issue
Block a user