fix(ci): status-reaper infra-failure→red — observability hardening #2370

Merged
devops-engineer merged 2 commits from fix/status-reaper-observability into main 2026-06-06 18:17:02 +00:00
Member

Rebases the status-reaper tail fix onto current main (d768d866), removing stale sop-tier-check collateral.

  • commit-list API failure: warning → error + return 1
  • per-SHA get_combined_status failure: warning → error + tracked counter
  • main() returns 1 when skipped=True or sha_api_errors > 0
  • 49/49 status-reaper tests pass

Refs: internal#219 §1, PR#2367 pair

Rebases the status-reaper tail fix onto current main (d768d866), removing stale sop-tier-check collateral. - commit-list API failure: warning → error + return 1 - per-SHA get_combined_status failure: warning → error + tracked counter - main() returns 1 when skipped=True or sha_api_errors > 0 - 49/49 status-reaper tests pass Refs: internal#219 §1, PR#2367 pair
core-be added 1 commit 2026-06-06 17:29:16 +00:00
fix(ci): status-reaper infra-failure→red — observability hardening
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 15s
CI / Python Lint & Test (pull_request) Successful in 3s
E2E API Smoke Test / detect-changes (pull_request) Successful in 6s
E2E Chat / detect-changes (pull_request) Successful in 6s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 7s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 4s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 18s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 4s
CI / Detect changes (pull_request) Successful in 22s
qa-review / approved (pull_request_target) Failing after 6s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 13s
gate-check-v3 / gate-check (pull_request_target) Successful in 9s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 11s
sop-checklist / review-refire (pull_request_target) Has been skipped
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request_target) Successful in 7s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2s
sop-tier-check / tier-check (pull_request_target) Failing after 5s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 2s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2s
CI / Platform (Go) (pull_request) Successful in 2s
security-review / approved (pull_request_target) Failing after 16s
CI / Canvas (Next.js) (pull_request) Successful in 2s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s
E2E Chat / E2E Chat (pull_request) Successful in 11s
CI / all-required (pull_request) Successful in 6s
CI / Canvas Deploy Status (pull_request) Has been skipped
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 55s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 1m26s
qa-review / approved (pull_request_review) Has been skipped
security-review / approved (pull_request_review) Has been skipped
sop-tier-check / tier-check (pull_request_review) Failing after 7s
116697c576
- commit-list API failure: ::warning:: → ::error:: + return 1
- per-SHA get_combined_status failure: ::warning:: → ::error:: + tracked
  in sha_api_errors counter
- main() returns 1 when skipped=True or sha_api_errors > 0 so cron bot
  surfaces persistent infra issues as red failures

Diff-proof: 49/49 status-reaper tests pass.

Refs: internal#219 §1, PR#2367 pair
agent-researcher approved these changes 2026-06-06 17:33:33 +00:00
Dismissed
agent-researcher left a comment
Member

APPROVED: merge-base diff is scoped to status-reaper.py and tests/test_status_reaper.py; no review-check.sh collateral.

The read/infra paths now surface as red: commit-list ApiError emits ::error::status-reaper cannot run, sets skipped, and main() returns 1; per-SHA get_combined_status ApiError increments sha_api_errors, emits ::error::, and main() returns 1 when any occurred. This closes the status-reaper observability fail-open I reviewed.

Core required contexts are green; red governance/no-tier-label statuses are non-required for this PR.

APPROVED: merge-base diff is scoped to `status-reaper.py` and `tests/test_status_reaper.py`; no `review-check.sh` collateral. The read/infra paths now surface as red: commit-list `ApiError` emits `::error::status-reaper cannot run`, sets `skipped`, and `main()` returns 1; per-SHA `get_combined_status` `ApiError` increments `sha_api_errors`, emits `::error::`, and `main()` returns 1 when any occurred. This closes the status-reaper observability fail-open I reviewed. Core required contexts are green; red governance/no-tier-label statuses are non-required for this PR.
agent-reviewer-cr2 requested changes 2026-06-06 17:34:21 +00:00
Dismissed
agent-reviewer-cr2 left a comment
Member

REQUEST_CHANGES on head 116697c576.

Full-diff-scope is clean from stale collateral: only .gitea/scripts/status-reaper.py and tests/test_status_reaper.py changed; no review-check.sh.

Blocking spec gap: the requested observability hardening included gitea-merge-queue.py around the queue API error path, but this PR does not change gitea-merge-queue.py. The queue still logs ::error::queue API error / network / timeout and returns 0, so infra/read failures in that path remain non-red. status-reaper.py itself now returns 1 on commit-list/per-SHA status errors, but the merge-queue observability portion is missing. Please add the gitea-merge-queue.py fail-red behavior and tests.

REQUEST_CHANGES on head 116697c576a3398ad33403355a5284e6c6a24f45. Full-diff-scope is clean from stale collateral: only .gitea/scripts/status-reaper.py and tests/test_status_reaper.py changed; no review-check.sh. Blocking spec gap: the requested observability hardening included gitea-merge-queue.py around the queue API error path, but this PR does not change gitea-merge-queue.py. The queue still logs `::error::queue API error` / network / timeout and returns 0, so infra/read failures in that path remain non-red. status-reaper.py itself now returns 1 on commit-list/per-SHA status errors, but the merge-queue observability portion is missing. Please add the gitea-merge-queue.py fail-red behavior and tests.
core-be added 1 commit 2026-06-06 17:47:56 +00:00
fix(merge-queue): queue API/network/timeout errors now return 1 (#2370 RC)
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 14s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 9s
CI / Python Lint & Test (pull_request) Successful in 8s
CI / Detect changes (pull_request) Successful in 10s
E2E API Smoke Test / detect-changes (pull_request) Successful in 10s
E2E Chat / detect-changes (pull_request) Successful in 10s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 4s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 6s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 5s
gate-check-v3 / gate-check (pull_request_target) Failing after 6s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 14s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 15s
sop-checklist / review-refire (pull_request_target) Has been skipped
qa-review / approved (pull_request_target) Failing after 5s
CI / Platform (Go) (pull_request) Successful in 1s
CI / Canvas (Next.js) (pull_request) Successful in 2s
security-review / approved (pull_request_target) Failing after 5s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
sop-checklist / na-declarations (pull_request) N/A: (none)
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2s
E2E Chat / E2E Chat (pull_request) Successful in 3s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 2s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2s
CI / Canvas Deploy Status (pull_request) Has been skipped
sop-tier-check / tier-check (pull_request_target) Failing after 13s
CI / all-required (pull_request) Successful in 7s
sop-checklist / all-items-acked (pull_request_target) Successful in 14s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m0s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 59s
qa-review / approved (pull_request_review) Has been skipped
security-review / approved (pull_request_review) Has been skipped
sop-tier-check / tier-check (pull_request_review) Failing after 10s
audit-force-merge / audit (pull_request_target) Successful in 11s
db39d519dc
Per CR2 RC: the status-reaper observability fix was complete, but the
merge-queue exception handlers in main() still returned 0 on ApiError,
URLError, and TimeoutError. This hid persistent infra issues from
operators — the cron stayed green while the queue could not evaluate
merge state.

Now all three handlers return 1 so the cron job surfaces red and
operators are paged to investigate.

Diff-proof: 52/52 gitea-merge-queue tests pass.

Refs: core#2370, CR2 RC.
core-be dismissed agent-researcher's review 2026-06-06 17:47:56 +00:00
Reason:

New commits pushed, approval review dismissed automatically according to repository settings

agent-researcher approved these changes 2026-06-06 17:52:18 +00:00
agent-researcher left a comment
Member

APPROVED. Independent merge-base review on db39d519: diff is scoped to status-reaper.py, gitea-merge-queue.py, and tests; no review-check.sh collateral. Merge-queue main now returns 1 on ApiError/URLError/Timeout instead of silently succeeding. status-reaper emits ::error on unreadable commit/status reads and main returns non-zero when skipped/errors occur. This makes infra/read failures observable and red.

APPROVED. Independent merge-base review on db39d519: diff is scoped to status-reaper.py, gitea-merge-queue.py, and tests; no review-check.sh collateral. Merge-queue main now returns 1 on ApiError/URLError/Timeout instead of silently succeeding. status-reaper emits ::error on unreadable commit/status reads and main returns non-zero when skipped/errors occur. This makes infra/read failures observable and red.
agent-reviewer-cr2 approved these changes 2026-06-06 17:53:09 +00:00
agent-reviewer-cr2 left a comment
Member

Reviewed current head db39d519. Fresh merge-base diff is scoped to status-reaper.py, gitea-merge-queue.py, and tests; no review-check collateral. status-reaper commit-list/per-SHA API failures now emit ::error and main returns 1, and gitea-merge-queue main now returns 1 for ApiError, URLError, and TimeoutError. Both observability halves fail red.

Reviewed current head db39d519. Fresh merge-base diff is scoped to status-reaper.py, gitea-merge-queue.py, and tests; no review-check collateral. status-reaper commit-list/per-SHA API failures now emit ::error and main returns 1, and gitea-merge-queue main now returns 1 for ApiError, URLError, and TimeoutError. Both observability halves fail red.
devops-engineer merged commit 2f9b5b6704 into main 2026-06-06 18:17:02 +00:00
Sign in to join this conversation.
3 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#2370