molecule-core/.github/workflows
Hongming Wang caf19e8980 feat(ops): hourly alarm for auto-promote PR stuck on REVIEW_REQUIRED (#2975)
Closes the silent-block failure mode that left 25 commits — including
the Memory v2 redesign and the reno-stars data-loss fix — wedged on
staging for 12+ hours behind a single missing review. The auto-promote
workflow opened the PR + armed auto-merge, but main's branch protection
required a human review and nobody noticed until a user reported
"still seeing old memory tab".

## Detection logic — `scripts/check-stale-promote-pr.sh`

Reads open PRs `base=main head=staging` and alarms on:
  - `mergeStateStatus == BLOCKED`
  - `reviewDecision == REVIEW_REQUIRED`
  - createdAt older than `STALE_HOURS` (default 4h)

Other BLOCKED reasons (DIRTY, BEHIND, failed checks) are NOT alarmed —
those are the author's signal-to-fix. This script targets the specific
"no human reviewed yet" wedge.

Output:
  - `::warning` per stale PR (visible in workflow summary + Actions UI)
  - PR comment (idempotent via marker-string detection; one alarm
    per PR, never re-spammed)
  - Exit code = count of stale PRs (capped at 125)

Logic in a script (not inline workflow YAML) so it's:
  - **Unit-testable** — tests/test-check-stale-promote-pr.sh exercises
    every branch with stubbed fixture JSON + frozen clock. 23 tests
    covering: empty list, single stale, just-under-threshold, wrong
    reviewDecision, wrong mergeStateStatus, mixed list (only matching
    PRs alarm), custom threshold via --stale-hours, exit-code-counts-
    matching-PRs, --help, unknown arg → 64, missing repo → 2.
  - **Operator-runnable ad-hoc** — `scripts/check-stale-promote-pr.sh`
    works from any shell with `gh` + `jq`.
  - **SSOT** — one detector, the workflow YAML is just schedule +
    invocation surface. Future sibling workflows that need the same
    check call the same script.

## Workflow — `.github/workflows/auto-promote-stale-alarm.yml`

Triggers:
  - cron `27 * * * *` (hourly, off-the-hour to dodge cron herd)
  - workflow_dispatch with `stale_hours` + `post_comment` overrides

Concurrency: `auto-promote-stale-alarm` group, cancel-in-progress=false
(idempotent script; no benefit to cancelling a running scan).

Permissions: `contents: read` + `pull-requests: write` (post comments).

Sparse checkout — only fetches `scripts/check-stale-promote-pr.sh`.
No node_modules, no go modules, no slow setup steps. Workflow runs
in <30s on a clean repo.

## Why "alarm + comment" not "auto-approve"

Considered options in issue #2975:
  1. Slack/email alert — picked.
  2. Bot-account auto-approve via molecule-ops — circumvents the
     human-review gate that branch protection encodes.
  3. Trusted-promote bypass via CODEOWNERS — needs Org Admin config
     change; out of scope for a workflow PR.

The comment-on-PR pattern picks (1) without external dependencies
(no Slack token, no email config). Subscribers get notified via
GitHub's existing PR notification delivery; the warning shows up in
the Actions feed.

## Why this won't false-positive on legitimate slow reviews

Threshold is 4h. Most legitimate gates clear in <1h, so 4× headroom
is plenty for slow CI. The comment is idempotent (one alarm per PR,
never re-posted) — adding noise stops at 1 comment regardless of
how long the PR sits.

## Test plan

- [x] `bash scripts/test-check-stale-promote-pr.sh` — 23/23 pass
- [x] `python3 -c 'yaml.safe_load(...)'` clean
- [x] `bash -n` clean on both scripts
- [ ] Live verification: dispatch the workflow once main has caught up,
      confirm it correctly reports zero stale PRs
2026-05-05 17:55:27 -07:00
..
auto-promote-on-e2e.yml fix(auto-promote): treat E2E completed/cancelled as defer, not failure 2026-05-04 19:26:29 -07:00
auto-promote-staging.yml fix(auto-promote): skip empty-tree promotes to break perpetual cycle 2026-05-03 08:56:44 -07:00
auto-promote-stale-alarm.yml feat(ops): hourly alarm for auto-promote PR stuck on REVIEW_REQUIRED (#2975) 2026-05-05 17:55:27 -07:00
auto-sync-main-to-staging.yml chore(deps)(deps): bump actions/checkout from 4 to 6 2026-05-02 19:23:01 +00:00
auto-tag-runtime.yml chore(deps)(deps): bump actions/checkout from 4 to 6 2026-05-02 19:23:01 +00:00
block-internal-paths.yml chore(deps)(deps): bump actions/checkout from 4 to 6 2026-05-02 19:23:01 +00:00
branch-protection-drift.yml fix(branch-protection-drift): hard-fail on schedule only, soft-skip + warn on PR 2026-05-04 21:20:30 -07:00
canary-staging.yml fix(workflows): preserve curl stderr in 8 status-capture sites 2026-05-04 18:54:50 -07:00
canary-verify.yml Merge pull request #2521 from Molecule-AI/dependabot/github_actions/actions/checkout-6 2026-05-03 01:36:57 +00:00
cascade-list-drift-gate.yml feat(ci): structural drift gate for cascade list vs manifest (RFC #388 PR-3) 2026-05-03 03:52:39 -07:00
check-merge-group-trigger.yml chore(deps)(deps): bump actions/checkout from 4 to 6 2026-05-02 19:23:01 +00:00
check-migration-collisions.yml chore(deps)(deps): bump actions/checkout from 4 to 6 2026-05-02 19:23:01 +00:00
ci.yml refactor(workspace): extract inbox tools from a2a_tools.py (RFC #2873 iter 4e) 2026-05-05 14:28:58 -07:00
codeql.yml chore(deps)(deps): bump actions/checkout from 4 to 6 2026-05-02 19:23:01 +00:00
continuous-synth-e2e.yml ci(canary): bump timeout-minutes 12 → 20 to absorb apt tail latency 2026-05-04 07:02:12 -07:00
e2e-api.yml test(e2e): add poll-mode chat upload E2E and wire into e2e-api.yml 2026-05-05 13:08:55 -07:00
e2e-staging-canvas.yml fix(workflows): preserve curl stderr in 8 status-capture sites 2026-05-04 18:54:50 -07:00
e2e-staging-external.yml fix(workflows): preserve curl stderr in 8 status-capture sites 2026-05-04 18:54:50 -07:00
e2e-staging-saas.yml fix(workflows): preserve curl stderr in 8 status-capture sites 2026-05-04 18:54:50 -07:00
e2e-staging-sanity.yml fix(workflows): preserve curl stderr in 8 status-capture sites 2026-05-04 18:54:50 -07:00
handlers-postgres-integration.yml ci(handlers-pg): apply all migrations with skip-on-error + sanity check (#320) 2026-05-05 03:48:43 -07:00
harness-replays.yml chore(deps)(deps): bump actions/checkout from 4 to 6 2026-05-02 19:23:01 +00:00
lint-curl-status-capture.yml fix(workflows): rewrite curl status-capture to prevent exit-code pollution 2026-05-04 18:29:38 -07:00
pr-guards.yml ci: add pr-guards caller that disables auto-merge on push 2026-04-27 06:39:31 -07:00
promote-latest.yml chore(deps)(deps): bump imjasonh/setup-crane from 0.4 to 0.5 2026-05-02 19:23:13 +00:00
publish-canvas-image.yml Merge pull request #2521 from Molecule-AI/dependabot/github_actions/actions/checkout-6 2026-05-03 01:36:57 +00:00
publish-runtime.yml fix(publish-runtime): re-add 5 templates wrongly removed from cascade (#2566) 2026-05-03 05:41:53 -07:00
publish-workspace-server-image.yml Merge pull request #2521 from Molecule-AI/dependabot/github_actions/actions/checkout-6 2026-05-03 01:36:57 +00:00
railway-pin-audit.yml Merge pull request #2523 from Molecule-AI/dependabot/github_actions/actions/github-script-9.0.0 2026-05-03 01:37:00 +00:00
redeploy-tenants-on-main.yml fix(workflows): preserve curl stderr in 8 status-capture sites 2026-05-04 18:54:50 -07:00
redeploy-tenants-on-staging.yml fix(workflows): preserve curl stderr in 8 status-capture sites 2026-05-04 18:54:50 -07:00
retarget-main-to-staging.yml fix(retarget): skip PRs whose head is staging (auto-promote PRs) 2026-05-03 07:34:24 -07:00
runtime-pin-compat.yml chore(deps)(deps): bump actions/checkout from 4 to 6 2026-05-02 19:23:01 +00:00
runtime-prbuild-compat.yml fix(ci): include event_name in runtime-prbuild-compat concurrency group 2026-05-05 04:01:20 -07:00
secret-pattern-drift.yml chore(deps)(deps): bump actions/checkout from 4 to 6 2026-05-02 19:23:01 +00:00
secret-scan.yml chore(deps)(deps): bump actions/checkout from 4 to 6 2026-05-02 19:23:01 +00:00
sweep-aws-secrets.yml feat(ops): add sweep-aws-secrets janitor — orphan tenant bootstrap secrets 2026-05-03 02:38:08 -07:00
sweep-cf-orphans.yml chore(deps)(deps): bump actions/checkout from 4 to 6 2026-05-02 19:23:01 +00:00
sweep-cf-tunnels.yml chore(deps)(deps): bump actions/checkout from 4 to 6 2026-05-02 19:23:01 +00:00
sweep-stale-e2e-orgs.yml fix(workflows): preserve curl stderr in 8 status-capture sites 2026-05-04 18:54:50 -07:00
test-ops-scripts.yml chore(deps)(deps): bump actions/checkout from 4 to 6 2026-05-02 19:23:01 +00:00