Closes the silent-block failure mode that left 25 commits — including
the Memory v2 redesign and the reno-stars data-loss fix — wedged on
staging for 12+ hours behind a single missing review. The auto-promote
workflow opened the PR + armed auto-merge, but main's branch protection
required a human review and nobody noticed until a user reported
"still seeing old memory tab".
## Detection logic — `scripts/check-stale-promote-pr.sh`
Reads open PRs `base=main head=staging` and alarms on:
- `mergeStateStatus == BLOCKED`
- `reviewDecision == REVIEW_REQUIRED`
- createdAt older than `STALE_HOURS` (default 4h)
Other BLOCKED reasons (DIRTY, BEHIND, failed checks) are NOT alarmed —
those are the author's signal-to-fix. This script targets the specific
"no human reviewed yet" wedge.
Output:
- `::warning` per stale PR (visible in workflow summary + Actions UI)
- PR comment (idempotent via marker-string detection; one alarm
per PR, never re-spammed)
- Exit code = count of stale PRs (capped at 125)
Logic in a script (not inline workflow YAML) so it's:
- **Unit-testable** — tests/test-check-stale-promote-pr.sh exercises
every branch with stubbed fixture JSON + frozen clock. 23 tests
covering: empty list, single stale, just-under-threshold, wrong
reviewDecision, wrong mergeStateStatus, mixed list (only matching
PRs alarm), custom threshold via --stale-hours, exit-code-counts-
matching-PRs, --help, unknown arg → 64, missing repo → 2.
- **Operator-runnable ad-hoc** — `scripts/check-stale-promote-pr.sh`
works from any shell with `gh` + `jq`.
- **SSOT** — one detector, the workflow YAML is just schedule +
invocation surface. Future sibling workflows that need the same
check call the same script.
## Workflow — `.github/workflows/auto-promote-stale-alarm.yml`
Triggers:
- cron `27 * * * *` (hourly, off-the-hour to dodge cron herd)
- workflow_dispatch with `stale_hours` + `post_comment` overrides
Concurrency: `auto-promote-stale-alarm` group, cancel-in-progress=false
(idempotent script; no benefit to cancelling a running scan).
Permissions: `contents: read` + `pull-requests: write` (post comments).
Sparse checkout — only fetches `scripts/check-stale-promote-pr.sh`.
No node_modules, no go modules, no slow setup steps. Workflow runs
in <30s on a clean repo.
## Why "alarm + comment" not "auto-approve"
Considered options in issue #2975:
1. Slack/email alert — picked.
2. Bot-account auto-approve via molecule-ops — circumvents the
human-review gate that branch protection encodes.
3. Trusted-promote bypass via CODEOWNERS — needs Org Admin config
change; out of scope for a workflow PR.
The comment-on-PR pattern picks (1) without external dependencies
(no Slack token, no email config). Subscribers get notified via
GitHub's existing PR notification delivery; the warning shows up in
the Actions feed.
## Why this won't false-positive on legitimate slow reviews
Threshold is 4h. Most legitimate gates clear in <1h, so 4× headroom
is plenty for slow CI. The comment is idempotent (one alarm per PR,
never re-posted) — adding noise stops at 1 comment regardless of
how long the PR sits.
## Test plan
- [x] `bash scripts/test-check-stale-promote-pr.sh` — 23/23 pass
- [x] `python3 -c 'yaml.safe_load(...)'` clean
- [x] `bash -n` clean on both scripts
- [ ] Live verification: dispatch the workflow once main has caught up,
confirm it correctly reports zero stale PRs