feat(ci): main-red watchdog (Option C of main-never-red directive) #423

Merged
claude-ceo-assistant merged 1 commits from feat/main-never-red-watchdog-internal-420 into main 2026-05-11 07:57:47 +00:00

1 Commits

Author SHA1 Message Date
2588b4ecbc feat(ci): main-red watchdog (Option C of main-never-red directive) — closes #420
All checks were successful
audit-force-merge / audit (pull_request) Successful in 18s
Adds a sentinel that detects post-merge CI red on `main` and files an
idempotent `[main-red] {repo}: {SHA[:10]}` issue. Auto-closes the issue
when main returns to green. Emits a Loki-shaped JSON event for the
operator-host observability pipeline.

Pattern source: CP `0adf2098` (ci-required-drift). Simpler scope here —
one source surface (combined commit status of main HEAD) versus three
in CP. Same `ApiError`-raises-on-non-2xx contract per
`feedback_api_helper_must_raise_not_return_dict` so the duplicate-issue
regression class stays closed.

Does NOT auto-revert. Option B is explicitly rejected per
`feedback_no_such_thing_as_flakes` + `feedback_fix_root_not_symptom`.
The watchdog files an alarm; humans fix forward.

Files:
  - .gitea/workflows/main-red-watchdog.yml — hourly `5 * * * *` cron +
    workflow_dispatch (no inputs, per
    `feedback_gitea_workflow_dispatch_inputs_unsupported`).
  - .gitea/scripts/main-red-watchdog.py — sidecar with `--dry-run`.
  - tests/test_main_red_watchdog.py — 26 pytest cases.

Tests (26 / 26 passing):
  - is_red detector across failure/error/pending/success state combos
  - happy path: green main → no writes
  - red detected: POST issue with correct title + body listing each
    failed context + label apply
  - idempotent: existing issue PATCHed, NOT duplicated
  - auto-close: green at new SHA → close prior `[main-red]` w/ comment
  - auto-close skipped when main pending (don't lose the breadcrumb)
  - HTTP-failure: `api()` raises ApiError; `list_open_red_issues` and
    `find_open_issue_for_sha` and `run_once` ALL propagate (regression
    guards for `feedback_api_helper_must_raise_not_return_dict`)
  - JSON-decode failure raises when expect_json=True; opt-in raw OK
  - --dry-run skips all writes
  - title format `[main-red] {repo}: {SHA[:10]}`
  - Gitea branch response shape tolerance (`commit.id` OR `commit.sha`)
  - Loki emitter survives `logger` not installed / subprocess failure
  - runtime env guard exits when required vars missing

Hostile self-review proven: 2 transient-error tests FAIL on a pre-fix
implementation (verified by injecting `try: ... except ApiError:
return []` into `list_open_red_issues` and running pytest — both
transient-error guards flipped red with `DID NOT RAISE`).

Live dry-run against molecule-ai/molecule-core main confirms the script
parses the real Gitea combined-status response correctly (current main
is in fact red at cb716f96).

Replication to other repos (operator-config, internal,
molecule-controlplane, hermes-agent, etc.) is out of scope for this
PR — molecule-core pilot only, per task brief.

Tracking: #420.
2026-05-11 00:36:20 -07:00