fix(ci)(interim): disable status-reaper + main-red-watchdog crons (machinery-down) #645

Merged
claude-ceo-assistant merged 1 commits from infra/interim-disable-reaper-watchdog-crons into main 2026-05-12 02:45:53 +00:00

1 Commits

Author SHA1 Message Date
6ee9ecdf0d fix(ci)(interim): disable status-reaper + main-red-watchdog crons
Some checks failed
CI / Platform (Go) (pull_request) Successful in 5s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 8s
CI / Canvas (Next.js) (pull_request) Successful in 5s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 4s
audit-force-merge / audit (pull_request) Successful in 10s
CI / Python Lint & Test (pull_request) Successful in 5s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 10s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 6s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 11s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 5s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 5s
qa-review / approved (pull_request) Failing after 12s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 4s
security-review / approved (pull_request) Failing after 10s
CI / all-required (pull_request) Successful in 2s
CI / Detect changes (pull_request) Successful in 17s
E2E API Smoke Test / detect-changes (pull_request) Successful in 19s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 19s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 20s
sop-tier-check / tier-check (pull_request) Successful in 11s
gate-check-v3 / gate-check (pull_request) Successful in 16s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 19s
RFC#420 Option-C machinery has been down ~2.5h:
- status-reaper rev2 (PR#633, merged 01:48Z): 0 'Compensated by status-reaper'
  status on the last 14 main commits. Schedule reds stranded on stale
  commits despite the rev2 sweep-last-10 design.
- main-red-watchdog: 'Failing after 10m56s' with timeout-minutes:5 — runner
  saturation queue-lag pushed it past its own timeout. No [main-red] issues
  filed during the outage despite 5 reds on HEAD e7965a0f at the high
  watermark.

Both workflows were themselves contributing to the red pileup on main +
queuing the ubuntu-latest pool. Cheap-and-safe interim: comment out the
schedule: blocks. workflow_dispatch: stays so they can be triggered
manually for debugging.

Re-enable after:
1. rev3 lands (likely scan_workflows() should LOG-and-skip rather than
   sys.exit on a malformed workflow; list_recent_commit_shas() should
   degrade gracefully)
2. Dedicated status-ops runner-label (route status-reaper + watchdog +
   ci-required-drift to it so they don't queue behind CI-merge-churn)

Per hongming-pc2 02:31Z directive: 'pick one: rev3+raise-timeout OR
temporarily disable the crons'. Choosing disable for safety while rev3
investigation proceeds.

Reviewed-by: hongming-pc2 (pre-APPROVE on sight 02:31Z)
Author: claude-ceo-assistant (orchestrator emergency; operator-host
unreachable 02:01-02:38Z blocked SSH-bridge to core-devops persona)

Cross-links: task #90 (rev2), task #75 (main-red sweep), RFC#420 Option-C
2026-05-11 19:39:43 -07:00