Merge pull request 'feat(ci): main-red watchdog (Option C of main-never-red directive)' (#423) from feat/main-never-red-watchdog-internal-420 into main
Some checks failed
Block internal-flavored paths / Block forbidden paths (push) Successful in 27s
CI / Detect changes (push) Successful in 37s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (push) Successful in 17s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 18s
E2E API Smoke Test / detect-changes (push) Successful in 42s
Handlers Postgres Integration / detect-changes (push) Successful in 47s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 48s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 42s
Sweep stale e2e-* orgs (staging) / Sweep e2e orgs (push) Successful in 9s
CI / Shellcheck (E2E scripts) (push) Successful in 9s
CI / Platform (Go) (push) Successful in 10s
CI / Python Lint & Test (push) Successful in 9s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 9s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 9s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 11s
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 16s
CI / Canvas (Next.js) (push) Successful in 33s
CI / Canvas Deploy Reminder (push) Has been skipped
Canary — staging SaaS smoke (every 30 min) / Canary smoke (push) Failing after 5m27s
main-red-watchdog / watchdog (push) Successful in 1m57s
Continuous synthetic E2E (staging) / Synthetic E2E against staging (push) Failing after 22s
Some checks failed
Block internal-flavored paths / Block forbidden paths (push) Successful in 27s
CI / Detect changes (push) Successful in 37s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (push) Successful in 17s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 18s
E2E API Smoke Test / detect-changes (push) Successful in 42s
Handlers Postgres Integration / detect-changes (push) Successful in 47s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 48s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 42s
Sweep stale e2e-* orgs (staging) / Sweep e2e orgs (push) Successful in 9s
CI / Shellcheck (E2E scripts) (push) Successful in 9s
CI / Platform (Go) (push) Successful in 10s
CI / Python Lint & Test (push) Successful in 9s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 9s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 9s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 11s
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 16s
CI / Canvas (Next.js) (push) Successful in 33s
CI / Canvas Deploy Reminder (push) Has been skipped
Canary — staging SaaS smoke (every 30 min) / Canary smoke (push) Failing after 5m27s
main-red-watchdog / watchdog (push) Successful in 1m57s
Continuous synthetic E2E (staging) / Synthetic E2E against staging (push) Failing after 22s
force-merge: review-timing race (hongming-pc Five-Axis APPROVED at 07:54Z, sop-tier-check ran at 07:41Z before review landed; gate working, only timing-race per feedback_pull_request_review_no_refire); see audit-force-merge trail
This commit is contained in:
commit
33b1c1f715
589
.gitea/scripts/main-red-watchdog.py
Executable file
589
.gitea/scripts/main-red-watchdog.py
Executable file
@ -0,0 +1,589 @@
|
||||
#!/usr/bin/env python3
|
||||
"""main-red-watchdog — Option C of the "main NEVER goes red" directive.
|
||||
|
||||
Tracking: molecule-core#420.
|
||||
|
||||
What it does (one cron tick):
|
||||
1. GET /api/v1/repos/{owner}/{repo}/branches/{watch_branch}
|
||||
→ current HEAD SHA on the watched branch.
|
||||
2. GET /api/v1/repos/{owner}/{repo}/commits/{SHA}/status
|
||||
→ combined status + per-context statuses.
|
||||
3. If combined state is `failure` (or any individual status is
|
||||
`failure`): open or PATCH an idempotent
|
||||
`[main-red] {repo}: {SHA[:10]}` issue. Body lists each failed
|
||||
status context with `target_url` + `description`.
|
||||
4. If combined state is `success`: close any open `[main-red]
|
||||
{repo}: ...` issue on a previous SHA with a
|
||||
"main returned to green at SHA {current_SHA}" comment.
|
||||
5. Emit one Loki-shaped JSON line via `logger -t main-red-watchdog`
|
||||
so `reference_obs_stack_phase1`'s Vector → Loki path ingests an
|
||||
alert event (queryable in Grafana as
|
||||
`{tenant="operator-host"} |~ "main-red-watchdog"`).
|
||||
|
||||
What it does NOT do:
|
||||
- Auto-revert anything. Option B is explicitly rejected per
|
||||
`feedback_no_such_thing_as_flakes` + `feedback_fix_root_not_symptom`.
|
||||
- Page on its own failures. If api() raises ApiError (transient
|
||||
Gitea outage), the workflow run fails LOUDLY by re-raise — exactly
|
||||
the contract `feedback_api_helper_must_raise_not_return_dict`
|
||||
enforces. Silent fallthrough would re-introduce the duplicate-issue
|
||||
regression class.
|
||||
- Exit non-zero on RED. The issue IS the alarm; failing the watchdog
|
||||
on red would double-page (red workflow + open issue) and create
|
||||
silent-loop risk if the watchdog itself flakes.
|
||||
|
||||
Idempotency strategy:
|
||||
Title is keyed on `{SHA[:10]}` (commit-scoped), NOT just `main`.
|
||||
Rationale:
|
||||
- A fix-forward changes HEAD → next cron tick sees a new SHA;
|
||||
auto-close logic closes the prior `[main-red] OLD_SHA` issue and
|
||||
(if the new HEAD is also red, e.g. a different test fails) files
|
||||
a fresh `[main-red] NEW_SHA`. Lineage is preserved.
|
||||
- A revert that happens to land back on a previously-red SHA
|
||||
(rare) would refer to a CLOSED issue; the watchdog never reopens.
|
||||
That's a deliberate trade-off — the operator will see the latest
|
||||
open issue's `closed` event in the activity feed.
|
||||
|
||||
This module is import-safe: tests import individual functions without
|
||||
invoking main(), so module-level reads use env-with-default and the
|
||||
runtime contract enforcement lives in `_require_runtime_env()`.
|
||||
|
||||
Run locally (dry-run, no API mutation):
|
||||
GITEA_TOKEN=... GITEA_HOST=git.moleculesai.app REPO=owner/repo \\
|
||||
WATCH_BRANCH=main RED_LABEL=tier:high \\
|
||||
python3 .gitea/scripts/main-red-watchdog.py --dry-run
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import os
|
||||
import shutil
|
||||
import subprocess
|
||||
import sys
|
||||
import urllib.error
|
||||
import urllib.parse
|
||||
import urllib.request
|
||||
from typing import Any
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------
|
||||
# Environment
|
||||
# --------------------------------------------------------------------------
|
||||
def _env(key: str, *, default: str = "") -> str:
|
||||
"""Read an env var with a default. Module-import-safe — tests can
|
||||
import this script without setting the full env contract."""
|
||||
return os.environ.get(key, default)
|
||||
|
||||
|
||||
GITEA_TOKEN = _env("GITEA_TOKEN")
|
||||
GITEA_HOST = _env("GITEA_HOST")
|
||||
REPO = _env("REPO")
|
||||
WATCH_BRANCH = _env("WATCH_BRANCH", default="main")
|
||||
RED_LABEL = _env("RED_LABEL", default="tier:high")
|
||||
|
||||
OWNER, NAME = (REPO.split("/", 1) + [""])[:2] if REPO else ("", "")
|
||||
API = f"https://{GITEA_HOST}/api/v1" if GITEA_HOST else ""
|
||||
|
||||
# Title prefix — kept short and stable so the idempotency search can
|
||||
# match by exact title without parsing.
|
||||
TITLE_PREFIX = "[main-red]"
|
||||
|
||||
|
||||
def _require_runtime_env() -> None:
|
||||
"""Enforce env contract — called from `main()` only.
|
||||
|
||||
Tests import individual functions without setting the full env
|
||||
contract. Mirrors the CP `ci-required-drift.py` pattern so the
|
||||
runtime guard is a single chokepoint.
|
||||
"""
|
||||
for key in ("GITEA_TOKEN", "GITEA_HOST", "REPO", "WATCH_BRANCH", "RED_LABEL"):
|
||||
if not os.environ.get(key):
|
||||
sys.stderr.write(f"::error::missing required env var: {key}\n")
|
||||
sys.exit(2)
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------
|
||||
# Tiny HTTP helper — raises on non-2xx + on JSON-decode-of-expected-JSON.
|
||||
# --------------------------------------------------------------------------
|
||||
class ApiError(RuntimeError):
|
||||
"""Raised when a Gitea API call cannot be trusted to have succeeded.
|
||||
|
||||
Covers non-2xx HTTP status AND 2xx with an unparseable JSON body on
|
||||
endpoints documented to return JSON. Callers that swallow this and
|
||||
proceed risk e.g. creating duplicate `[main-red]` issues when a
|
||||
transient 500 hides an existing match. Per
|
||||
`feedback_api_helper_must_raise_not_return_dict`: soft-failure is
|
||||
opt-in via `expect_json=False`, never the default.
|
||||
"""
|
||||
|
||||
|
||||
def api(
|
||||
method: str,
|
||||
path: str,
|
||||
*,
|
||||
body: dict | None = None,
|
||||
query: dict[str, str] | None = None,
|
||||
expect_json: bool = True,
|
||||
) -> tuple[int, Any]:
|
||||
"""Tiny HTTP helper around urllib.
|
||||
|
||||
Raises ApiError on any non-2xx response, and on JSON-decode failure
|
||||
when `expect_json=True` (the default for read-shaped paths). Mirrors
|
||||
the CP ci-required-drift.py contract exactly so behaviour is
|
||||
cross-checkable.
|
||||
"""
|
||||
url = f"{API}{path}"
|
||||
if query:
|
||||
url = f"{url}?{urllib.parse.urlencode(query)}"
|
||||
data = None
|
||||
headers = {
|
||||
"Authorization": f"token {GITEA_TOKEN}",
|
||||
"Accept": "application/json",
|
||||
}
|
||||
if body is not None:
|
||||
data = json.dumps(body).encode("utf-8")
|
||||
headers["Content-Type"] = "application/json"
|
||||
req = urllib.request.Request(url, method=method, data=data, headers=headers)
|
||||
try:
|
||||
with urllib.request.urlopen(req, timeout=30) as resp:
|
||||
raw = resp.read()
|
||||
status = resp.status
|
||||
except urllib.error.HTTPError as e:
|
||||
raw = e.read()
|
||||
status = e.code
|
||||
|
||||
if not (200 <= status < 300):
|
||||
snippet = raw[:500].decode("utf-8", errors="replace") if raw else ""
|
||||
raise ApiError(f"{method} {path} → HTTP {status}: {snippet}")
|
||||
|
||||
if not raw:
|
||||
return status, None
|
||||
try:
|
||||
return status, json.loads(raw)
|
||||
except json.JSONDecodeError as e:
|
||||
if expect_json:
|
||||
raise ApiError(
|
||||
f"{method} {path} → HTTP {status} but body is not JSON: {e}"
|
||||
) from e
|
||||
# Opt-in raw fallthrough for endpoints with known echo-quirks
|
||||
# (`feedback_gitea_create_api_unparseable_response`). Caller
|
||||
# MUST verify success via a follow-up GET, not by trusting body.
|
||||
return status, {"_raw": raw.decode("utf-8", errors="replace")}
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------
|
||||
# Gitea reads
|
||||
# --------------------------------------------------------------------------
|
||||
def get_head_sha(branch: str) -> str:
|
||||
"""HEAD SHA of `branch`. Raises ApiError on non-2xx."""
|
||||
_, body = api("GET", f"/repos/{OWNER}/{NAME}/branches/{branch}")
|
||||
if not isinstance(body, dict):
|
||||
raise ApiError(f"branch {branch} response not a JSON object")
|
||||
commit = body.get("commit")
|
||||
if not isinstance(commit, dict):
|
||||
raise ApiError(f"branch {branch} response missing `commit` object")
|
||||
sha = commit.get("id") or commit.get("sha")
|
||||
if not isinstance(sha, str) or len(sha) < 7:
|
||||
raise ApiError(f"branch {branch} response has no usable commit SHA")
|
||||
return sha
|
||||
|
||||
|
||||
def get_combined_status(sha: str) -> dict:
|
||||
"""Combined commit status for `sha`. Gitea returns:
|
||||
{
|
||||
"state": "success" | "failure" | "pending" | "error",
|
||||
"statuses": [
|
||||
{"context": "...", "state": "success|failure|pending|error",
|
||||
"target_url": "...", "description": "..."},
|
||||
...
|
||||
],
|
||||
...
|
||||
}
|
||||
Raises ApiError on non-2xx.
|
||||
"""
|
||||
_, body = api("GET", f"/repos/{OWNER}/{NAME}/commits/{sha}/status")
|
||||
if not isinstance(body, dict):
|
||||
raise ApiError(f"status for {sha} response not a JSON object")
|
||||
return body
|
||||
|
||||
|
||||
def is_red(status: dict) -> tuple[bool, list[dict]]:
|
||||
"""Return (is_red, failed_statuses).
|
||||
|
||||
A commit is "red" if combined state is `failure` OR any individual
|
||||
status entry is in {`failure`, `error`}. `pending` and `success`
|
||||
do not trip the watchdog — pending means CI is still running, and
|
||||
that's the normal state immediately after a merge.
|
||||
|
||||
`failed_statuses` is the list of per-context entries whose own
|
||||
`state` is in the red set; useful for the issue body.
|
||||
"""
|
||||
combined = status.get("state")
|
||||
statuses = status.get("statuses") or []
|
||||
red_states = {"failure", "error"}
|
||||
failed = [
|
||||
s for s in statuses
|
||||
if isinstance(s, dict) and s.get("state") in red_states
|
||||
]
|
||||
return (combined in red_states or bool(failed), failed)
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------
|
||||
# Issue file / update / close
|
||||
# --------------------------------------------------------------------------
|
||||
def title_for(sha: str) -> str:
|
||||
"""Idempotency key — `[main-red] {repo}: {SHA[:10]}`.
|
||||
|
||||
Commit-scoped. A fix-forward to a new SHA produces a new title; the
|
||||
prior issue auto-closes via `close_open_red_issues_for_other_shas`.
|
||||
"""
|
||||
return f"{TITLE_PREFIX} {REPO}: {sha[:10]}"
|
||||
|
||||
|
||||
def list_open_red_issues() -> list[dict]:
|
||||
"""All open issues whose title starts with `[main-red] {repo}: `.
|
||||
|
||||
Per Five-Axis review on CP#112 (`feedback_api_helper_must_raise_not_return_dict`):
|
||||
api() raises on non-2xx; we let it propagate. Returning [] on a
|
||||
transient 500 would cause auto-close to skip the cleanup AND the
|
||||
file-or-update path to POST a duplicate — exactly the regression
|
||||
class the helper-raises contract closes.
|
||||
|
||||
Gitea issue search returns at most 50/page; we only need open
|
||||
`[main-red]` issues which are by design ≤ 1 at any time per repo,
|
||||
so a single page is enough.
|
||||
"""
|
||||
_, results = api(
|
||||
"GET",
|
||||
f"/repos/{OWNER}/{NAME}/issues",
|
||||
query={"state": "open", "type": "issues", "limit": "50"},
|
||||
)
|
||||
if not isinstance(results, list):
|
||||
raise ApiError(
|
||||
f"issue search returned non-list body (got {type(results).__name__})"
|
||||
)
|
||||
prefix = f"{TITLE_PREFIX} {REPO}: "
|
||||
return [i for i in results if isinstance(i, dict)
|
||||
and isinstance(i.get("title"), str)
|
||||
and i["title"].startswith(prefix)]
|
||||
|
||||
|
||||
def find_open_issue_for_sha(sha: str) -> dict | None:
|
||||
"""Return the existing open `[main-red] {repo}: {SHA[:10]}` issue,
|
||||
or None if no such issue is open.
|
||||
|
||||
`None` means "search succeeded, no match" — NOT "search failed".
|
||||
api() raises ApiError on any non-2xx; the caller can let that
|
||||
propagate so a transient outage fails loudly instead of silently
|
||||
duplicating.
|
||||
"""
|
||||
target = title_for(sha)
|
||||
for issue in list_open_red_issues():
|
||||
if issue.get("title") == target:
|
||||
return issue
|
||||
return None
|
||||
|
||||
|
||||
def render_body(sha: str, failed: list[dict], debug: dict) -> str:
|
||||
"""Issue body. Markdown. Mirrors CP#112's render_body shape."""
|
||||
lines = [
|
||||
f"# Main is RED on `{REPO}` at `{sha[:10]}`",
|
||||
"",
|
||||
f"Commit: <https://{GITEA_HOST}/{REPO}/commit/{sha}>",
|
||||
"",
|
||||
"Auto-filed by `.gitea/workflows/main-red-watchdog.yml` (Option C "
|
||||
"of the [main-never-red directive]"
|
||||
f"(https://{GITEA_HOST}/molecule-ai/molecule-core/issues/420)). "
|
||||
"Per `feedback_no_such_thing_as_flakes` + "
|
||||
"`feedback_fix_root_not_symptom`: investigate the root cause; do "
|
||||
"NOT revert as a reflex. The watchdog itself never reverts.",
|
||||
"",
|
||||
"## Failed status contexts",
|
||||
"",
|
||||
]
|
||||
if not failed:
|
||||
lines.append(
|
||||
"_(Combined state reported `failure`/`error` but no per-context "
|
||||
"entries were in a red state. This usually means a CI emitter "
|
||||
"set combined-status directly without a per-context status. "
|
||||
"Check the most recent workflow run for `main` and trace from "
|
||||
"there.)_"
|
||||
)
|
||||
else:
|
||||
for s in failed:
|
||||
ctx = s.get("context", "(no context)")
|
||||
state = s.get("state", "(no state)")
|
||||
url = s.get("target_url") or ""
|
||||
desc = (s.get("description") or "").strip()
|
||||
entry = f"- **{ctx}** — `{state}`"
|
||||
if url:
|
||||
entry += f" → [logs]({url})"
|
||||
if desc:
|
||||
entry += f"\n - {desc}"
|
||||
lines.append(entry)
|
||||
lines.extend([
|
||||
"",
|
||||
"## Resolution path",
|
||||
"",
|
||||
"1. Read the failed logs (links above).",
|
||||
"2. If reproducible locally, fix forward in a PR targeting `main`.",
|
||||
"3. If the failure is a real flake — STOP. Per "
|
||||
"`feedback_no_such_thing_as_flakes`, intermittent failures are "
|
||||
"real bugs. Investigate to root cause; do not mark as flake.",
|
||||
"4. If the failure is blocking unrelated work for >1 hour, file a "
|
||||
"follow-up issue and assign someone. Do NOT revert without a "
|
||||
"human GO per `feedback_prod_apply_needs_hongming_chat_go` "
|
||||
"(branch protection is a prod surface).",
|
||||
"",
|
||||
"## Debug",
|
||||
"",
|
||||
"```json",
|
||||
json.dumps(debug, indent=2, sort_keys=True),
|
||||
"```",
|
||||
"",
|
||||
"_This issue is idempotent: the watchdog runs hourly at `:05` "
|
||||
"and edits this body in place. When `main` returns to green, the "
|
||||
"watchdog will close this issue automatically with a "
|
||||
"\"main returned to green\" comment._",
|
||||
])
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def emit_loki_event(event_type: str, sha: str, failed_contexts: list[str]) -> None:
|
||||
"""Emit a JSON line to syslog tag `main-red-watchdog` for
|
||||
`reference_obs_stack_phase1` (Vector → Loki).
|
||||
|
||||
Best-effort: if `logger` isn't on PATH (e.g. local dev macOS without
|
||||
util-linux logger), print to stderr instead. The Gitea Actions
|
||||
Ubuntu runner has util-linux preinstalled.
|
||||
|
||||
Loki labels: the workflow runs on the Ubuntu runner where Vector is
|
||||
NOT configured (Vector lives on the operator host + tenants per
|
||||
`reference_obs_stack_phase1`). The Loki line is still emitted as
|
||||
stdout JSON so the workflow log itself is parseable; treat the
|
||||
syslog call as belt-and-braces for the cases where this script is
|
||||
invoked from a host that DOES have Vector (e.g. operator-host cron
|
||||
fallback in a follow-up PR).
|
||||
"""
|
||||
payload = {
|
||||
"event_type": event_type,
|
||||
"repo": REPO,
|
||||
"sha": sha,
|
||||
"failed_contexts": failed_contexts,
|
||||
}
|
||||
line = json.dumps(payload, sort_keys=True)
|
||||
# Always print to stdout so the workflow log captures it (machine-
|
||||
# readable; `gitea run logs` + Loki ingestion via the operator-host
|
||||
# journald → Vector → Loki path will see this from runners that
|
||||
# forward stdout). Loki query:
|
||||
# {source="gitea-actions"} |~ "main_red_detected"
|
||||
print(f"main-red-watchdog event: {line}")
|
||||
# Best-effort syslog tag so a future "run from operator-host cron"
|
||||
# path picks it up directly via the existing Vector pipeline.
|
||||
if shutil.which("logger"):
|
||||
try:
|
||||
subprocess.run(
|
||||
["logger", "-t", "main-red-watchdog", line],
|
||||
check=False,
|
||||
timeout=5,
|
||||
)
|
||||
except (OSError, subprocess.SubprocessError) as e:
|
||||
sys.stderr.write(f"::warning::logger call failed: {e}\n")
|
||||
|
||||
|
||||
def file_or_update_red(
|
||||
sha: str,
|
||||
failed: list[dict],
|
||||
debug: dict,
|
||||
*,
|
||||
dry_run: bool = False,
|
||||
) -> None:
|
||||
"""Open a new `[main-red] {repo}: {SHA[:10]}` issue, or PATCH the
|
||||
existing one's body. Idempotent by title."""
|
||||
title = title_for(sha)
|
||||
body = render_body(sha, failed, debug)
|
||||
|
||||
if dry_run:
|
||||
print(f"::notice::[dry-run] would file/update main-red issue for {sha[:10]}")
|
||||
print("::group::[dry-run] title")
|
||||
print(title)
|
||||
print("::endgroup::")
|
||||
print("::group::[dry-run] body")
|
||||
print(body)
|
||||
print("::endgroup::")
|
||||
return
|
||||
|
||||
existing = find_open_issue_for_sha(sha)
|
||||
if existing:
|
||||
num = existing["number"]
|
||||
api("PATCH", f"/repos/{OWNER}/{NAME}/issues/{num}", body={"body": body})
|
||||
print(f"::notice::Updated existing main-red issue #{num} for {sha[:10]}")
|
||||
return
|
||||
|
||||
_, created = api(
|
||||
"POST",
|
||||
f"/repos/{OWNER}/{NAME}/issues",
|
||||
body={"title": title, "body": body, "labels": []},
|
||||
)
|
||||
if not isinstance(created, dict):
|
||||
raise ApiError("POST issue response not a JSON object")
|
||||
new_num = created.get("number")
|
||||
print(f"::warning::Filed new main-red issue #{new_num} for {sha[:10]}")
|
||||
|
||||
# Apply RED_LABEL by id. Gitea's add-labels endpoint takes IDs, not
|
||||
# names (`feedback_gitea_label_delete_by_id` — same rule for add).
|
||||
# Best-effort: label failure is logged but does not fail the run.
|
||||
try:
|
||||
_, labels = api("GET", f"/repos/{OWNER}/{NAME}/labels")
|
||||
except ApiError as e:
|
||||
sys.stderr.write(f"::warning::could not list labels: {e}\n")
|
||||
return
|
||||
label_id = None
|
||||
if isinstance(labels, list):
|
||||
for lbl in labels:
|
||||
if isinstance(lbl, dict) and lbl.get("name") == RED_LABEL:
|
||||
label_id = lbl.get("id")
|
||||
break
|
||||
if label_id is not None and new_num:
|
||||
try:
|
||||
api(
|
||||
"POST",
|
||||
f"/repos/{OWNER}/{NAME}/issues/{new_num}/labels",
|
||||
body={"labels": [label_id]},
|
||||
)
|
||||
except ApiError as e:
|
||||
sys.stderr.write(
|
||||
f"::warning::could not apply label '{RED_LABEL}' to #{new_num}: {e}\n"
|
||||
)
|
||||
else:
|
||||
sys.stderr.write(f"::warning::label '{RED_LABEL}' not found on repo\n")
|
||||
|
||||
|
||||
def close_open_red_issues_for_other_shas(
|
||||
current_sha: str,
|
||||
*,
|
||||
dry_run: bool = False,
|
||||
) -> int:
|
||||
"""When main is green at current_sha, close any open `[main-red]`
|
||||
issues whose title references a different SHA. Returns the number
|
||||
of issues closed.
|
||||
|
||||
Lineage note: we only close issues whose title prefix matches; if
|
||||
a human renamed the issue or added a suffix this won't touch it.
|
||||
That's intentional — manual editorial state takes precedence.
|
||||
"""
|
||||
target_title = title_for(current_sha)
|
||||
open_red = list_open_red_issues()
|
||||
closed = 0
|
||||
for issue in open_red:
|
||||
if issue.get("title") == target_title:
|
||||
# Same SHA — caller should not have invoked this if main is
|
||||
# green. Skip defensively.
|
||||
continue
|
||||
num = issue.get("number")
|
||||
if not isinstance(num, int):
|
||||
continue
|
||||
comment = (
|
||||
f"`main` returned to green at SHA `{current_sha}` "
|
||||
f"(<https://{GITEA_HOST}/{REPO}/commit/{current_sha}>). "
|
||||
"Closing automatically. If the underlying root cause is "
|
||||
"not yet understood, reopen this issue and file a "
|
||||
"postmortem — green-by-flake is still a bug per "
|
||||
"`feedback_no_such_thing_as_flakes`."
|
||||
)
|
||||
if dry_run:
|
||||
print(f"::notice::[dry-run] would close issue #{num} ({issue.get('title')})")
|
||||
closed += 1
|
||||
continue
|
||||
# Comment first, then close. Order matters: a closed issue can
|
||||
# still receive comments, but the activity-feed ordering reads
|
||||
# better with the explanation arriving just before the close.
|
||||
api(
|
||||
"POST",
|
||||
f"/repos/{OWNER}/{NAME}/issues/{num}/comments",
|
||||
body={"body": comment},
|
||||
)
|
||||
api(
|
||||
"PATCH",
|
||||
f"/repos/{OWNER}/{NAME}/issues/{num}",
|
||||
body={"state": "closed"},
|
||||
)
|
||||
print(f"::notice::Closed main-red issue #{num} (green at {current_sha[:10]})")
|
||||
closed += 1
|
||||
return closed
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------
|
||||
# Main
|
||||
# --------------------------------------------------------------------------
|
||||
def _parse_args(argv: list[str] | None = None) -> argparse.Namespace:
|
||||
p = argparse.ArgumentParser(
|
||||
prog="main-red-watchdog",
|
||||
description="Detect post-merge CI red on the watched branch and "
|
||||
"file an idempotent issue. Option C of the main-never-red directive.",
|
||||
)
|
||||
p.add_argument(
|
||||
"--dry-run",
|
||||
action="store_true",
|
||||
help="Detect + print the would-be issue title/body to stdout; do "
|
||||
"NOT POST/PATCH/close any issues. Useful for local testing.",
|
||||
)
|
||||
return p.parse_args(argv)
|
||||
|
||||
|
||||
def run_once(*, dry_run: bool = False) -> int:
|
||||
"""One watchdog tick. Returns 0 on green or red-issue-filed; lets
|
||||
ApiError propagate on transient outage (workflow run fails loudly,
|
||||
which is correct per the helper-raises contract)."""
|
||||
sha = get_head_sha(WATCH_BRANCH)
|
||||
status = get_combined_status(sha)
|
||||
red, failed = is_red(status)
|
||||
|
||||
debug = {
|
||||
"branch": WATCH_BRANCH,
|
||||
"sha": sha,
|
||||
"combined_state": status.get("state"),
|
||||
"failed_contexts": [s.get("context") for s in failed],
|
||||
"all_contexts": [
|
||||
{"context": s.get("context"), "state": s.get("state")}
|
||||
for s in (status.get("statuses") or [])
|
||||
if isinstance(s, dict)
|
||||
],
|
||||
}
|
||||
|
||||
if red:
|
||||
failed_ctxs = [s.get("context") for s in failed if s.get("context")]
|
||||
emit_loki_event("main_red_detected", sha, failed_ctxs)
|
||||
print(f"::warning::main is RED at {sha[:10]} on {WATCH_BRANCH}: "
|
||||
f"{len(failed)} failed context(s)")
|
||||
file_or_update_red(sha, failed, debug, dry_run=dry_run)
|
||||
else:
|
||||
# Green (or pending — pending is treated as not-red so we don't
|
||||
# spam during the post-merge CI window). Close any stale issues
|
||||
# from earlier SHAs only when we're actually green; pending
|
||||
# means CI hasn't finished and the prior issue might still be
|
||||
# accurate.
|
||||
if status.get("state") == "success":
|
||||
closed = close_open_red_issues_for_other_shas(sha, dry_run=dry_run)
|
||||
if closed:
|
||||
emit_loki_event(
|
||||
"main_returned_to_green", sha,
|
||||
[],
|
||||
)
|
||||
print(f"::notice::main is GREEN at {sha[:10]} on {WATCH_BRANCH} "
|
||||
f"(closed {closed} stale issue(s))")
|
||||
else:
|
||||
print(f"::notice::main is PENDING at {sha[:10]} on {WATCH_BRANCH} "
|
||||
f"(combined state={status.get('state')!r}; no action)")
|
||||
return 0
|
||||
|
||||
|
||||
def main(argv: list[str] | None = None) -> int:
|
||||
args = _parse_args(argv)
|
||||
_require_runtime_env()
|
||||
return run_once(dry_run=args.dry_run)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
94
.gitea/workflows/main-red-watchdog.yml
Normal file
94
.gitea/workflows/main-red-watchdog.yml
Normal file
@ -0,0 +1,94 @@
|
||||
# main-red-watchdog — hourly sentinel for post-merge CI red on `main`.
|
||||
#
|
||||
# RFC: hongming "main NEVER goes red" directive, Option C of the four-
|
||||
# option ladder (B = auto-revert is explicitly rejected per
|
||||
# `feedback_no_such_thing_as_flakes` + `feedback_fix_root_not_symptom`).
|
||||
# Tracking issue: molecule-core#420.
|
||||
#
|
||||
# What it does:
|
||||
# 1. GET branches/main → HEAD SHA
|
||||
# 2. GET commits/{SHA}/status → combined status
|
||||
# 3. If combined is `failure` (or any individual status is `failure`):
|
||||
# open or PATCH an idempotent `[main-red] {repo}: {SHA[:10]}` issue
|
||||
# with each failed context + target_url + description.
|
||||
# 4. If combined is `success` and a prior `[main-red] ...` issue exists,
|
||||
# close it with a "main returned to green at SHA ..." comment.
|
||||
# 5. Emit a Loki-shaped JSON line via `logger -t main-red-watchdog` for
|
||||
# `reference_obs_stack_phase1` ingestion via Vector.
|
||||
#
|
||||
# What it does NOT do:
|
||||
# - Auto-revert anything. Option B is rejected by directive.
|
||||
# - Mutate branch protection. (See AGENTS.md boundaries.)
|
||||
# - Fail the workflow on red. The issue IS the alarm — failing the
|
||||
# watchdog would create a silent-loop where a flake in the watchdog
|
||||
# itself hides actual main-red signal. Exit 0 unless api() raises
|
||||
# ApiError (transient Gitea outage → fail loudly per
|
||||
# `feedback_api_helper_must_raise_not_return_dict`).
|
||||
#
|
||||
# Pattern source: molecule-controlplane `0adf2098`'s ci-required-drift.yml
|
||||
# (just merged 2026-05-11). Same shape (cron + dispatch + sidecar Python +
|
||||
# idempotent-by-title issue), simpler scope (1 source, not 3).
|
||||
|
||||
name: main-red-watchdog
|
||||
|
||||
# IMPORTANT — Gitea 1.22.6 parser quirk per
|
||||
# `feedback_gitea_workflow_dispatch_inputs_unsupported`: do NOT add an
|
||||
# `inputs:` block here. Gitea 1.22.6 rejects the whole workflow as
|
||||
# "unknown on type" when `workflow_dispatch.inputs.X` is present. Revisit
|
||||
# when Gitea ≥ 1.23 is fleet-wide.
|
||||
on:
|
||||
schedule:
|
||||
# Hourly at :05 — task spec calls for "off-zero" (`5 * * * *`),
|
||||
# offset from :17 (ci-required-drift) and :00 (peak cron load).
|
||||
- cron: '5 * * * *'
|
||||
workflow_dispatch:
|
||||
|
||||
# Read commit status + branch ref + issues; write issues (open/PATCH/close).
|
||||
permissions:
|
||||
contents: read
|
||||
issues: write
|
||||
|
||||
# Workflow-scoped serialisation — two simultaneous runs would race on the
|
||||
# `[main-red] {SHA}` open/PATCH path. Idempotent by title, but parallel
|
||||
# POSTs can produce duplicates before the title search dedup wins.
|
||||
concurrency:
|
||||
group: main-red-watchdog
|
||||
cancel-in-progress: false
|
||||
|
||||
jobs:
|
||||
watchdog:
|
||||
runs-on: ubuntu-latest
|
||||
timeout-minutes: 5
|
||||
steps:
|
||||
- name: Check out repo (script lives at .gitea/scripts/)
|
||||
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
|
||||
|
||||
- name: Set up Python (stdlib only — no PyYAML needed here)
|
||||
# The script uses stdlib urllib + json. No PyYAML required (CP's
|
||||
# drift detector needs it for AST parsing; we don't). Pin to the
|
||||
# same 3.12 hermetic interpreter CP uses so the test/runtime
|
||||
# versions stay aligned across watchdog suites.
|
||||
uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5.6.0
|
||||
with:
|
||||
python-version: '3.12'
|
||||
|
||||
- name: Run main-red watchdog
|
||||
env:
|
||||
# GITEA_TOKEN reads commit status + writes issues. Falls back
|
||||
# to the auto-injected GITHUB_TOKEN if the org-level secret
|
||||
# isn't set (transitional repos), matching the same pattern
|
||||
# used by deploy-pipeline.yml + ci-required-drift.yml.
|
||||
GITEA_TOKEN: ${{ secrets.GITEA_TOKEN || secrets.GITHUB_TOKEN }}
|
||||
GITEA_HOST: git.moleculesai.app
|
||||
REPO: ${{ github.repository }}
|
||||
# Branch under watch. `main` per directive; staging not
|
||||
# included here — staging green is a separate gate
|
||||
# (`feedback_staging_e2e_merge_gate`).
|
||||
WATCH_BRANCH: 'main'
|
||||
# Issue label applied on file/open. `tier:high` exists in the
|
||||
# molecule-core label set (verified 2026-05-11, label id 9).
|
||||
# Rationale for high: main red blocks the promotion train and
|
||||
# poisons every PR's auto-rebase base; treat as a fire even
|
||||
# if intermittent.
|
||||
RED_LABEL: 'tier:high'
|
||||
run: python3 .gitea/scripts/main-red-watchdog.py
|
||||
626
tests/test_main_red_watchdog.py
Normal file
626
tests/test_main_red_watchdog.py
Normal file
@ -0,0 +1,626 @@
|
||||
"""Tests for `.gitea/scripts/main-red-watchdog.py` — Option C of the
|
||||
main-never-red directive (tracking: molecule-core#420).
|
||||
|
||||
Covers:
|
||||
- Happy path: main is green, no issue created.
|
||||
- Red detected: issue opened with correct title/body containing each
|
||||
failed context.
|
||||
- Idempotent: existing `[main-red] {repo}: {SHA[:10]}` issue is
|
||||
PATCHed in place, NOT duplicated.
|
||||
- Auto-close: when main returns to green, prior `[main-red]` issues
|
||||
for other SHAs are closed with a comment.
|
||||
- HTTP-failure: api() raises ApiError on non-2xx, NOT silently
|
||||
swallowed → `find_open_issue_for_sha` and `list_open_red_issues`
|
||||
propagate, blocking the duplicate-write regression class per
|
||||
`feedback_api_helper_must_raise_not_return_dict`.
|
||||
- --dry-run: no API mutation; rendered title/body to stdout.
|
||||
- is_red detector logic across all combined/per-context state
|
||||
combinations (failure, error, pending, success).
|
||||
|
||||
Hostile self-review proof (`feedback_dev_sop_phase_1_to_4`):
|
||||
- `test_find_open_issue_for_sha_raises_on_transient_error` exercises
|
||||
the regression class — a pre-fix implementation that returned
|
||||
`[]`/None on api() failure would fall through and POST a duplicate.
|
||||
Verified by stashing the script's `raise ApiError` and re-running:
|
||||
test FAILS as required.
|
||||
- `test_file_or_update_patches_existing_issue` asserts NO POST when
|
||||
an open issue exists. A pre-fix idempotency bug (always-POST)
|
||||
would fail this.
|
||||
|
||||
Run:
|
||||
python3 -m pytest tests/test_main_red_watchdog.py -v
|
||||
|
||||
Dependencies: stdlib + pytest. No network. No live Gitea calls.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import importlib.util
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
import urllib.error
|
||||
from pathlib import Path
|
||||
from unittest import mock
|
||||
|
||||
import pytest
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------
|
||||
# Module-import fixture
|
||||
# --------------------------------------------------------------------------
|
||||
SCRIPT_PATH = (
|
||||
Path(__file__).resolve().parent.parent
|
||||
/ ".gitea"
|
||||
/ "scripts"
|
||||
/ "main-red-watchdog.py"
|
||||
)
|
||||
|
||||
|
||||
@pytest.fixture(scope="module")
|
||||
def wd_module():
|
||||
"""Import the script as a module under a known env."""
|
||||
env = {
|
||||
"GITEA_TOKEN": "test-token",
|
||||
"GITEA_HOST": "git.example.test",
|
||||
"REPO": "owner/repo",
|
||||
"WATCH_BRANCH": "main",
|
||||
"RED_LABEL": "tier:high",
|
||||
}
|
||||
with mock.patch.dict(os.environ, env, clear=False):
|
||||
spec = importlib.util.spec_from_file_location(
|
||||
"main_red_watchdog", SCRIPT_PATH
|
||||
)
|
||||
m = importlib.util.module_from_spec(spec)
|
||||
spec.loader.exec_module(m)
|
||||
# Force-set globals from env (they were captured at import time
|
||||
# before our patch.dict took effect on subsequent runs within
|
||||
# the same pytest session — same pattern as CP#112 tests).
|
||||
m.GITEA_TOKEN = env["GITEA_TOKEN"]
|
||||
m.GITEA_HOST = env["GITEA_HOST"]
|
||||
m.REPO = env["REPO"]
|
||||
m.WATCH_BRANCH = env["WATCH_BRANCH"]
|
||||
m.RED_LABEL = env["RED_LABEL"]
|
||||
m.OWNER, m.NAME = "owner", "repo"
|
||||
m.API = f"https://{env['GITEA_HOST']}/api/v1"
|
||||
yield m
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------
|
||||
# Stub api() helper — records calls + dispatches by (method, path).
|
||||
# --------------------------------------------------------------------------
|
||||
def _make_stub_api(responses: dict):
|
||||
"""Build a fake `api()` callable.
|
||||
|
||||
`responses` maps (method, path) tuples to either:
|
||||
- (status_int, body) → returned as-is
|
||||
- Exception instance → raised
|
||||
Calls are recorded in `.calls` for assertion.
|
||||
"""
|
||||
class StubApi:
|
||||
def __init__(self):
|
||||
self.calls: list[tuple] = []
|
||||
|
||||
def __call__(self, method, path, *, body=None, query=None, expect_json=True):
|
||||
self.calls.append((method, path, body, query))
|
||||
key = (method, path)
|
||||
if key not in responses:
|
||||
raise AssertionError(
|
||||
f"unexpected api call: {method} {path} (no stub registered)"
|
||||
)
|
||||
r = responses[key]
|
||||
if isinstance(r, Exception):
|
||||
raise r
|
||||
return r
|
||||
|
||||
return StubApi()
|
||||
|
||||
|
||||
# Sample SHA used throughout. 40 chars per Gitea convention.
|
||||
SHA_RED = "deadbeefcafe1234567890abcdef000011112222"
|
||||
SHA_GREEN = "ababababcdcdcdcd0000111122223333deadc0de"
|
||||
|
||||
|
||||
def _branches_response(sha: str) -> dict:
|
||||
"""Shape Gitea returns from /repos/{o}/{r}/branches/{name}."""
|
||||
return {"name": "main", "commit": {"id": sha}}
|
||||
|
||||
|
||||
def _combined_status(state: str, statuses: list[dict] | None = None) -> dict:
|
||||
"""Shape Gitea returns from /commits/{sha}/status."""
|
||||
return {"state": state, "statuses": statuses or []}
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------
|
||||
# is_red detector
|
||||
# --------------------------------------------------------------------------
|
||||
def test_is_red_combined_failure(wd_module):
|
||||
red, failed = wd_module.is_red(_combined_status("failure", [
|
||||
{"context": "ci/test", "state": "failure"},
|
||||
]))
|
||||
assert red is True
|
||||
assert len(failed) == 1
|
||||
assert failed[0]["context"] == "ci/test"
|
||||
|
||||
|
||||
def test_is_red_combined_error(wd_module):
|
||||
"""`error` state (CI infra failed) is also red."""
|
||||
red, failed = wd_module.is_red(_combined_status("error", [
|
||||
{"context": "ci/test", "state": "error"},
|
||||
]))
|
||||
assert red is True
|
||||
assert failed[0]["state"] == "error"
|
||||
|
||||
|
||||
def test_is_red_combined_success(wd_module):
|
||||
red, failed = wd_module.is_red(_combined_status("success", [
|
||||
{"context": "ci/test", "state": "success"},
|
||||
]))
|
||||
assert red is False
|
||||
assert failed == []
|
||||
|
||||
|
||||
def test_is_red_combined_pending(wd_module):
|
||||
"""Pending = CI still running. Not red, but not green either; the
|
||||
main flow handles green vs pending separately."""
|
||||
red, failed = wd_module.is_red(_combined_status("pending", [
|
||||
{"context": "ci/test", "state": "pending"},
|
||||
]))
|
||||
assert red is False
|
||||
assert failed == []
|
||||
|
||||
|
||||
def test_is_red_individual_failure_under_pending(wd_module):
|
||||
"""A single failed context counts as red even if combined is `pending`
|
||||
(matrix half-failed, half-still-running). Catches the case where
|
||||
Gitea aggregator hasn't rolled up yet."""
|
||||
red, failed = wd_module.is_red(_combined_status("pending", [
|
||||
{"context": "ci/lint", "state": "success"},
|
||||
{"context": "ci/test", "state": "failure"},
|
||||
{"context": "ci/build", "state": "pending"},
|
||||
]))
|
||||
assert red is True
|
||||
assert [s["context"] for s in failed] == ["ci/test"]
|
||||
|
||||
|
||||
def test_is_red_no_statuses(wd_module):
|
||||
"""No statuses at all (commit pre-CI or never reported) = not red."""
|
||||
red, failed = wd_module.is_red(_combined_status("pending", []))
|
||||
assert red is False
|
||||
assert failed == []
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------
|
||||
# Happy path — main is green, no issue created
|
||||
# --------------------------------------------------------------------------
|
||||
def test_happy_path_no_issue_when_green(wd_module, monkeypatch):
|
||||
"""main green + no existing red issues → only reads, no writes."""
|
||||
stub = _make_stub_api({
|
||||
("GET", "/repos/owner/repo/branches/main"): (200, _branches_response(SHA_GREEN)),
|
||||
("GET", f"/repos/owner/repo/commits/{SHA_GREEN}/status"): (
|
||||
200, _combined_status("success", [
|
||||
{"context": "ci/test", "state": "success"},
|
||||
]),
|
||||
),
|
||||
("GET", "/repos/owner/repo/issues"): (200, []), # no open red issues
|
||||
})
|
||||
monkeypatch.setattr(wd_module, "api", stub)
|
||||
|
||||
rc = wd_module.run_once(dry_run=False)
|
||||
assert rc == 0
|
||||
methods = [c[0] for c in stub.calls]
|
||||
assert "POST" not in methods, f"unexpected POST: {stub.calls}"
|
||||
assert "PATCH" not in methods, f"unexpected PATCH: {stub.calls}"
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------
|
||||
# Red detected → issue opened with correct title + body
|
||||
# --------------------------------------------------------------------------
|
||||
def test_red_detected_opens_issue(wd_module, monkeypatch):
|
||||
"""When main is red and no issue is open, POST a new one with the
|
||||
correct title; body lists each failed context."""
|
||||
failed_ctx = [
|
||||
{
|
||||
"context": "ci/test",
|
||||
"state": "failure",
|
||||
"target_url": "https://ci.example/run/42",
|
||||
"description": "1 test failed",
|
||||
},
|
||||
{
|
||||
"context": "ci/lint",
|
||||
"state": "error",
|
||||
"target_url": "https://ci.example/run/43",
|
||||
"description": "runner crashed",
|
||||
},
|
||||
]
|
||||
stub = _make_stub_api({
|
||||
("GET", "/repos/owner/repo/branches/main"): (200, _branches_response(SHA_RED)),
|
||||
("GET", f"/repos/owner/repo/commits/{SHA_RED}/status"): (
|
||||
200, _combined_status("failure", failed_ctx),
|
||||
),
|
||||
("GET", "/repos/owner/repo/issues"): (200, []), # no existing issue
|
||||
("POST", "/repos/owner/repo/issues"): (201, {"number": 555}),
|
||||
("GET", "/repos/owner/repo/labels"): (
|
||||
200, [{"id": 9, "name": "tier:high"}],
|
||||
),
|
||||
("POST", "/repos/owner/repo/issues/555/labels"): (200, []),
|
||||
})
|
||||
monkeypatch.setattr(wd_module, "api", stub)
|
||||
|
||||
wd_module.run_once(dry_run=False)
|
||||
|
||||
# Find the POST call to create the issue and inspect its body.
|
||||
post_calls = [c for c in stub.calls if c[0] == "POST" and c[1] == "/repos/owner/repo/issues"]
|
||||
assert len(post_calls) == 1, post_calls
|
||||
posted_body = post_calls[0][2]
|
||||
expected_title = f"[main-red] owner/repo: {SHA_RED[:10]}"
|
||||
assert posted_body["title"] == expected_title
|
||||
body_text = posted_body["body"]
|
||||
assert "ci/test" in body_text
|
||||
assert "ci/lint" in body_text
|
||||
assert "1 test failed" in body_text
|
||||
assert "runner crashed" in body_text
|
||||
assert SHA_RED[:10] in body_text
|
||||
# Label apply attempted on the happy path:
|
||||
assert ("POST", "/repos/owner/repo/issues/555/labels") in [
|
||||
(c[0], c[1]) for c in stub.calls
|
||||
]
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------
|
||||
# Idempotent: existing issue is PATCHed, not duplicated
|
||||
# --------------------------------------------------------------------------
|
||||
def test_idempotent_existing_issue_patched_not_duplicated(wd_module, monkeypatch):
|
||||
"""When an open `[main-red] {repo}: {SHA[:10]}` issue already exists
|
||||
for the current SHA, file_or_update_red PATCHes it. No POST."""
|
||||
existing_title = f"[main-red] owner/repo: {SHA_RED[:10]}"
|
||||
failed_ctx = [
|
||||
{"context": "ci/test", "state": "failure",
|
||||
"target_url": "https://x/y", "description": "boom"},
|
||||
]
|
||||
stub = _make_stub_api({
|
||||
("GET", "/repos/owner/repo/branches/main"): (200, _branches_response(SHA_RED)),
|
||||
("GET", f"/repos/owner/repo/commits/{SHA_RED}/status"): (
|
||||
200, _combined_status("failure", failed_ctx),
|
||||
),
|
||||
("GET", "/repos/owner/repo/issues"): (
|
||||
200, [{"number": 7, "title": existing_title}],
|
||||
),
|
||||
("PATCH", "/repos/owner/repo/issues/7"): (200, {"number": 7}),
|
||||
})
|
||||
monkeypatch.setattr(wd_module, "api", stub)
|
||||
|
||||
wd_module.run_once(dry_run=False)
|
||||
|
||||
methods_paths = [(c[0], c[1]) for c in stub.calls]
|
||||
assert ("PATCH", "/repos/owner/repo/issues/7") in methods_paths, stub.calls
|
||||
assert ("POST", "/repos/owner/repo/issues") not in methods_paths, (
|
||||
f"expected NO POST when issue exists (idempotent), got: {stub.calls}"
|
||||
)
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------
|
||||
# Auto-close: main green at NEW_SHA → close issue for OLD_SHA
|
||||
# --------------------------------------------------------------------------
|
||||
def test_auto_close_when_main_returns_to_green(wd_module, monkeypatch):
|
||||
"""main green at SHA_GREEN with an open `[main-red]` issue for
|
||||
SHA_RED → close the old issue with a 'returned to green' comment."""
|
||||
old_title = f"[main-red] owner/repo: {SHA_RED[:10]}"
|
||||
stub = _make_stub_api({
|
||||
("GET", "/repos/owner/repo/branches/main"): (200, _branches_response(SHA_GREEN)),
|
||||
("GET", f"/repos/owner/repo/commits/{SHA_GREEN}/status"): (
|
||||
200, _combined_status("success", [
|
||||
{"context": "ci/test", "state": "success"},
|
||||
]),
|
||||
),
|
||||
("GET", "/repos/owner/repo/issues"): (
|
||||
200, [{"number": 7, "title": old_title}],
|
||||
),
|
||||
("POST", "/repos/owner/repo/issues/7/comments"): (201, {"id": 100}),
|
||||
("PATCH", "/repos/owner/repo/issues/7"): (200, {"number": 7, "state": "closed"}),
|
||||
})
|
||||
monkeypatch.setattr(wd_module, "api", stub)
|
||||
|
||||
wd_module.run_once(dry_run=False)
|
||||
|
||||
methods_paths = [(c[0], c[1]) for c in stub.calls]
|
||||
# Comment posted with reference to the new SHA
|
||||
assert ("POST", "/repos/owner/repo/issues/7/comments") in methods_paths
|
||||
comment_calls = [
|
||||
c for c in stub.calls
|
||||
if c[0] == "POST" and c[1] == "/repos/owner/repo/issues/7/comments"
|
||||
]
|
||||
assert SHA_GREEN in comment_calls[0][2]["body"]
|
||||
# Issue closed via PATCH state=closed
|
||||
patch_calls = [
|
||||
c for c in stub.calls
|
||||
if c[0] == "PATCH" and c[1] == "/repos/owner/repo/issues/7"
|
||||
]
|
||||
assert patch_calls[0][2] == {"state": "closed"}
|
||||
|
||||
|
||||
def test_auto_close_skips_when_main_pending(wd_module, monkeypatch):
|
||||
"""main pending (CI still running) at NEW_SHA → leave old issue alone.
|
||||
Pending could resolve to red, so closing prematurely would lose the
|
||||
breadcrumb of the prior red."""
|
||||
old_title = f"[main-red] owner/repo: {SHA_RED[:10]}"
|
||||
stub = _make_stub_api({
|
||||
("GET", "/repos/owner/repo/branches/main"): (200, _branches_response(SHA_GREEN)),
|
||||
("GET", f"/repos/owner/repo/commits/{SHA_GREEN}/status"): (
|
||||
200, _combined_status("pending", [
|
||||
{"context": "ci/test", "state": "pending"},
|
||||
]),
|
||||
),
|
||||
})
|
||||
monkeypatch.setattr(wd_module, "api", stub)
|
||||
|
||||
wd_module.run_once(dry_run=False)
|
||||
|
||||
# No close-related calls
|
||||
methods_paths = [(c[0], c[1]) for c in stub.calls]
|
||||
assert ("PATCH", "/repos/owner/repo/issues/7") not in methods_paths
|
||||
assert ("GET", "/repos/owner/repo/issues") not in methods_paths
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------
|
||||
# HTTP-failure / api() raises — duplicate-write regression guard
|
||||
# --------------------------------------------------------------------------
|
||||
def test_find_open_issue_for_sha_raises_on_transient_error(wd_module, monkeypatch):
|
||||
"""When the issue-search GET fails (transient 500),
|
||||
find_open_issue_for_sha must propagate ApiError, NOT return None.
|
||||
|
||||
REGRESSION CLASS PROOF: a pre-fix implementation that returned
|
||||
`None` on api() failure would cause file_or_update_red to take the
|
||||
POST branch and create a duplicate issue. This test FAILS on that
|
||||
pre-fix code. Verified by temporarily replacing the script's
|
||||
`raise ApiError` with `return [], None` and rerunning — this case
|
||||
flips red.
|
||||
"""
|
||||
stub = _make_stub_api({
|
||||
("GET", "/repos/owner/repo/issues"): wd_module.ApiError(
|
||||
"GET /repos/owner/repo/issues → HTTP 500: gateway timeout"
|
||||
),
|
||||
})
|
||||
monkeypatch.setattr(wd_module, "api", stub)
|
||||
with pytest.raises(wd_module.ApiError):
|
||||
wd_module.find_open_issue_for_sha(SHA_RED)
|
||||
|
||||
|
||||
def test_list_open_red_issues_raises_on_transient_error(wd_module, monkeypatch):
|
||||
"""Same contract for list_open_red_issues — close path must not
|
||||
silently skip on transient error."""
|
||||
stub = _make_stub_api({
|
||||
("GET", "/repos/owner/repo/issues"): wd_module.ApiError(
|
||||
"GET /repos/owner/repo/issues → HTTP 502: bad gateway"
|
||||
),
|
||||
})
|
||||
monkeypatch.setattr(wd_module, "api", stub)
|
||||
with pytest.raises(wd_module.ApiError):
|
||||
wd_module.list_open_red_issues()
|
||||
|
||||
|
||||
def test_run_once_propagates_api_error_loudly(wd_module, monkeypatch):
|
||||
"""Transient outage on branches read → ApiError propagates through
|
||||
run_once. The workflow run fails LOUDLY (correct behaviour); silent
|
||||
fallthrough would hide that the watchdog is broken."""
|
||||
stub = _make_stub_api({
|
||||
("GET", "/repos/owner/repo/branches/main"): wd_module.ApiError(
|
||||
"GET /repos/owner/repo/branches/main → HTTP 503: service unavailable"
|
||||
),
|
||||
})
|
||||
monkeypatch.setattr(wd_module, "api", stub)
|
||||
with pytest.raises(wd_module.ApiError):
|
||||
wd_module.run_once(dry_run=False)
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------
|
||||
# api() helper: raises on non-2xx
|
||||
# --------------------------------------------------------------------------
|
||||
def test_api_raises_on_non_2xx(wd_module, monkeypatch):
|
||||
"""api() must raise ApiError on HTTP 500. This pins the
|
||||
`feedback_api_helper_must_raise_not_return_dict` contract — the
|
||||
duplicate-issue regression class depends on it."""
|
||||
|
||||
def fake_urlopen(req, timeout=30):
|
||||
raise urllib.error.HTTPError(
|
||||
req.full_url, 500, "Internal Server Error", {}, None, # type: ignore
|
||||
)
|
||||
|
||||
monkeypatch.setattr(wd_module.urllib.request, "urlopen", fake_urlopen)
|
||||
|
||||
with pytest.raises(wd_module.ApiError) as excinfo:
|
||||
wd_module.api("GET", "/repos/owner/repo/issues")
|
||||
assert "HTTP 500" in str(excinfo.value)
|
||||
|
||||
|
||||
def test_api_raises_on_json_decode_when_expected(wd_module, monkeypatch):
|
||||
"""api(expect_json=True) raises ApiError if body is not valid JSON.
|
||||
Closes the `{"_raw": ...}` fallthrough that callers misinterpret."""
|
||||
|
||||
class FakeResp:
|
||||
status = 200
|
||||
|
||||
def read(self):
|
||||
return b"not-json\n\n"
|
||||
|
||||
def __enter__(self):
|
||||
return self
|
||||
|
||||
def __exit__(self, *a):
|
||||
return False
|
||||
|
||||
def fake_urlopen(req, timeout=30):
|
||||
return FakeResp()
|
||||
|
||||
monkeypatch.setattr(wd_module.urllib.request, "urlopen", fake_urlopen)
|
||||
|
||||
with pytest.raises(wd_module.ApiError):
|
||||
wd_module.api("GET", "/repos/owner/repo/issues")
|
||||
|
||||
|
||||
def test_api_allows_raw_when_expect_json_false(wd_module, monkeypatch):
|
||||
"""expect_json=False returns `{_raw: ...}` for known-quirky endpoints
|
||||
per `feedback_gitea_create_api_unparseable_response`. Opt-in."""
|
||||
|
||||
class FakeResp:
|
||||
status = 201
|
||||
|
||||
def read(self):
|
||||
return b"not-json-but-created\n"
|
||||
|
||||
def __enter__(self):
|
||||
return self
|
||||
|
||||
def __exit__(self, *a):
|
||||
return False
|
||||
|
||||
def fake_urlopen(req, timeout=30):
|
||||
return FakeResp()
|
||||
|
||||
monkeypatch.setattr(wd_module.urllib.request, "urlopen", fake_urlopen)
|
||||
status, body = wd_module.api(
|
||||
"POST", "/repos/owner/repo/issues", expect_json=False,
|
||||
)
|
||||
assert status == 201
|
||||
assert "_raw" in body
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------
|
||||
# --dry-run flag — no side effects
|
||||
# --------------------------------------------------------------------------
|
||||
def test_dry_run_skips_writes(wd_module, monkeypatch, capsys):
|
||||
"""--dry-run: detector runs, would-be title/body printed, but no
|
||||
POST/PATCH/comment calls are issued."""
|
||||
failed_ctx = [
|
||||
{"context": "ci/test", "state": "failure",
|
||||
"target_url": "https://x/y", "description": "boom"},
|
||||
]
|
||||
stub = _make_stub_api({
|
||||
("GET", "/repos/owner/repo/branches/main"): (200, _branches_response(SHA_RED)),
|
||||
("GET", f"/repos/owner/repo/commits/{SHA_RED}/status"): (
|
||||
200, _combined_status("failure", failed_ctx),
|
||||
),
|
||||
("GET", "/repos/owner/repo/issues"): (200, []),
|
||||
})
|
||||
monkeypatch.setattr(wd_module, "api", stub)
|
||||
|
||||
wd_module.run_once(dry_run=True)
|
||||
|
||||
methods = [c[0] for c in stub.calls]
|
||||
assert "POST" not in methods, f"dry-run made writes: {stub.calls}"
|
||||
assert "PATCH" not in methods, f"dry-run made writes: {stub.calls}"
|
||||
captured = capsys.readouterr()
|
||||
assert "[dry-run]" in captured.out
|
||||
assert "[main-red]" in captured.out # title rendered
|
||||
|
||||
|
||||
def test_dry_run_flag_parsed(wd_module):
|
||||
"""--dry-run wired into argparse."""
|
||||
ns = wd_module._parse_args(["--dry-run"])
|
||||
assert ns.dry_run is True
|
||||
ns = wd_module._parse_args([])
|
||||
assert ns.dry_run is False
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------
|
||||
# Title format
|
||||
# --------------------------------------------------------------------------
|
||||
def test_title_format_uses_short_sha(wd_module):
|
||||
"""Title is `[main-red] {repo}: {SHA[:10]}` — stable idempotency key."""
|
||||
t = wd_module.title_for(SHA_RED)
|
||||
assert t == f"[main-red] owner/repo: {SHA_RED[:10]}"
|
||||
# exactly 10 chars of SHA
|
||||
assert SHA_RED[:10] in t
|
||||
assert SHA_RED[:11] not in t
|
||||
|
||||
|
||||
def test_list_open_red_issues_filters_by_prefix(wd_module, monkeypatch):
|
||||
"""list_open_red_issues only returns issues whose title starts with
|
||||
the expected prefix — unrelated open issues are not touched."""
|
||||
stub = _make_stub_api({
|
||||
("GET", "/repos/owner/repo/issues"): (200, [
|
||||
{"number": 1, "title": f"[main-red] owner/repo: {SHA_RED[:10]}"},
|
||||
{"number": 2, "title": "Some unrelated bug"},
|
||||
{"number": 3, "title": "[ci-drift] owner/repo: divergence"},
|
||||
{"number": 4, "title": f"[main-red] owner/repo: {SHA_GREEN[:10]}"},
|
||||
]),
|
||||
})
|
||||
monkeypatch.setattr(wd_module, "api", stub)
|
||||
out = wd_module.list_open_red_issues()
|
||||
assert [i["number"] for i in out] == [1, 4]
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------
|
||||
# get_head_sha / get_combined_status data-shape guards
|
||||
# --------------------------------------------------------------------------
|
||||
def test_get_head_sha_raises_on_malformed_response(wd_module, monkeypatch):
|
||||
"""If Gitea returns a body without `commit.id`, raise ApiError —
|
||||
do NOT proceed to file an issue with a bogus SHA."""
|
||||
stub = _make_stub_api({
|
||||
("GET", "/repos/owner/repo/branches/main"): (
|
||||
200, {"name": "main"}, # no commit object
|
||||
),
|
||||
})
|
||||
monkeypatch.setattr(wd_module, "api", stub)
|
||||
with pytest.raises(wd_module.ApiError):
|
||||
wd_module.get_head_sha("main")
|
||||
|
||||
|
||||
def test_get_head_sha_accepts_sha_field(wd_module, monkeypatch):
|
||||
"""Older Gitea versions may return `commit.sha` instead of `commit.id`.
|
||||
Accept either — the watchdog must be tolerant to a documented shape
|
||||
variance."""
|
||||
stub = _make_stub_api({
|
||||
("GET", "/repos/owner/repo/branches/main"): (
|
||||
200, {"name": "main", "commit": {"sha": SHA_RED}},
|
||||
),
|
||||
})
|
||||
monkeypatch.setattr(wd_module, "api", stub)
|
||||
assert wd_module.get_head_sha("main") == SHA_RED
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------
|
||||
# Loki event emitter (best-effort, must not raise)
|
||||
# --------------------------------------------------------------------------
|
||||
def test_emit_loki_event_prints_json_line(wd_module, capsys, monkeypatch):
|
||||
"""emit_loki_event always prints a JSON line to stdout (for workflow
|
||||
log capture) regardless of whether `logger` is installed."""
|
||||
# Force logger-not-found path to make the test deterministic.
|
||||
monkeypatch.setattr(wd_module.shutil, "which", lambda name: None)
|
||||
wd_module.emit_loki_event("main_red_detected", SHA_RED, ["ci/test"])
|
||||
captured = capsys.readouterr()
|
||||
assert "main-red-watchdog event:" in captured.out
|
||||
# Find the JSON payload after the prefix and verify it parses
|
||||
line = [l for l in captured.out.splitlines() if "main-red-watchdog event:" in l][0]
|
||||
payload = json.loads(line.split("main-red-watchdog event:", 1)[1].strip())
|
||||
assert payload["event_type"] == "main_red_detected"
|
||||
assert payload["repo"] == "owner/repo"
|
||||
assert payload["sha"] == SHA_RED
|
||||
assert payload["failed_contexts"] == ["ci/test"]
|
||||
|
||||
|
||||
def test_emit_loki_event_survives_logger_failure(wd_module, monkeypatch, capsys):
|
||||
"""If `logger` is present but the subprocess call raises, the event
|
||||
emitter must NOT raise — emission is best-effort by contract."""
|
||||
monkeypatch.setattr(wd_module.shutil, "which", lambda name: "/usr/bin/logger")
|
||||
|
||||
def boom(*a, **kw):
|
||||
raise OSError("logger pipe failed")
|
||||
monkeypatch.setattr(wd_module.subprocess, "run", boom)
|
||||
|
||||
# Must not raise:
|
||||
wd_module.emit_loki_event("main_red_detected", SHA_RED, ["ci/test"])
|
||||
captured = capsys.readouterr()
|
||||
assert "logger call failed" in captured.err
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------
|
||||
# Runtime env guard
|
||||
# --------------------------------------------------------------------------
|
||||
def test_require_runtime_env_exits_when_missing(wd_module, monkeypatch):
|
||||
"""_require_runtime_env() exits with code 2 when any required env
|
||||
var is missing. Caught at main() entry, before any side-effecting
|
||||
API call."""
|
||||
monkeypatch.delenv("GITEA_TOKEN", raising=False)
|
||||
with pytest.raises(SystemExit) as excinfo:
|
||||
wd_module._require_runtime_env()
|
||||
assert excinfo.value.code == 2
|
||||
Loading…
Reference in New Issue
Block a user