molecule-core

History

Hongming Wang caf19e8980 feat(ops): hourly alarm for auto-promote PR stuck on REVIEW_REQUIRED (#2975 ) Closes the silent-block failure mode that left 25 commits — including the Memory v2 redesign and the reno-stars data-loss fix — wedged on staging for 12+ hours behind a single missing review. The auto-promote workflow opened the PR + armed auto-merge, but main's branch protection required a human review and nobody noticed until a user reported "still seeing old memory tab". ## Detection logic — `scripts/check-stale-promote-pr.sh` Reads open PRs `base=main head=staging` and alarms on: - `mergeStateStatus == BLOCKED` - `reviewDecision == REVIEW_REQUIRED` - createdAt older than `STALE_HOURS` (default 4h) Other BLOCKED reasons (DIRTY, BEHIND, failed checks) are NOT alarmed — those are the author's signal-to-fix. This script targets the specific "no human reviewed yet" wedge. Output: - `::warning` per stale PR (visible in workflow summary + Actions UI) - PR comment (idempotent via marker-string detection; one alarm per PR, never re-spammed) - Exit code = count of stale PRs (capped at 125) Logic in a script (not inline workflow YAML) so it's: - Unit-testable — tests/test-check-stale-promote-pr.sh exercises every branch with stubbed fixture JSON + frozen clock. 23 tests covering: empty list, single stale, just-under-threshold, wrong reviewDecision, wrong mergeStateStatus, mixed list (only matching PRs alarm), custom threshold via --stale-hours, exit-code-counts- matching-PRs, --help, unknown arg → 64, missing repo → 2. - Operator-runnable ad-hoc — `scripts/check-stale-promote-pr.sh` works from any shell with `gh` + `jq`. - SSOT — one detector, the workflow YAML is just schedule + invocation surface. Future sibling workflows that need the same check call the same script. ## Workflow — `.github/workflows/auto-promote-stale-alarm.yml` Triggers: - cron `27 * * * *` (hourly, off-the-hour to dodge cron herd) - workflow_dispatch with `stale_hours` + `post_comment` overrides Concurrency: `auto-promote-stale-alarm` group, cancel-in-progress=false (idempotent script; no benefit to cancelling a running scan). Permissions: `contents: read` + `pull-requests: write` (post comments). Sparse checkout — only fetches `scripts/check-stale-promote-pr.sh`. No node_modules, no go modules, no slow setup steps. Workflow runs in <30s on a clean repo. ## Why "alarm + comment" not "auto-approve" Considered options in issue #2975: 1. Slack/email alert — picked. 2. Bot-account auto-approve via molecule-ops — circumvents the human-review gate that branch protection encodes. 3. Trusted-promote bypass via CODEOWNERS — needs Org Admin config change; out of scope for a workflow PR. The comment-on-PR pattern picks (1) without external dependencies (no Slack token, no email config). Subscribers get notified via GitHub's existing PR notification delivery; the warning shows up in the Actions feed. ## Why this won't false-positive on legitimate slow reviews Threshold is 4h. Most legitimate gates clear in <1h, so 4× headroom is plenty for slow CI. The comment is idempotent (one alarm per PR, never re-posted) — adding noise stops at 1 comment regardless of how long the PR sits. ## Test plan - [x] `bash scripts/test-check-stale-promote-pr.sh` — 23/23 pass - [x] `python3 -c 'yaml.safe_load(...)'` clean - [x] `bash -n` clean on both scripts - [ ] Live verification: dispatch the workflow once main has caught up, confirm it correctly reports zero stale PRs		2026-05-05 17:55:27 -07:00
..
auto-promote-on-e2e.yml	fix(auto-promote): treat E2E completed/cancelled as defer, not failure	2026-05-04 19:26:29 -07:00
auto-promote-staging.yml	fix(auto-promote): skip empty-tree promotes to break perpetual cycle	2026-05-03 08:56:44 -07:00
auto-promote-stale-alarm.yml	feat(ops): hourly alarm for auto-promote PR stuck on REVIEW_REQUIRED (#2975 )	2026-05-05 17:55:27 -07:00
auto-sync-main-to-staging.yml	chore(deps)(deps): bump actions/checkout from 4 to 6	2026-05-02 19:23:01 +00:00
auto-tag-runtime.yml	chore(deps)(deps): bump actions/checkout from 4 to 6	2026-05-02 19:23:01 +00:00
block-internal-paths.yml	chore(deps)(deps): bump actions/checkout from 4 to 6	2026-05-02 19:23:01 +00:00
branch-protection-drift.yml	fix(branch-protection-drift): hard-fail on schedule only, soft-skip + warn on PR	2026-05-04 21:20:30 -07:00
canary-staging.yml	fix(workflows): preserve curl stderr in 8 status-capture sites	2026-05-04 18:54:50 -07:00
canary-verify.yml	Merge pull request #2521 from Molecule-AI/dependabot/github_actions/actions/checkout-6	2026-05-03 01:36:57 +00:00
cascade-list-drift-gate.yml	feat(ci): structural drift gate for cascade list vs manifest (RFC #388 PR-3)	2026-05-03 03:52:39 -07:00
check-merge-group-trigger.yml	chore(deps)(deps): bump actions/checkout from 4 to 6	2026-05-02 19:23:01 +00:00
check-migration-collisions.yml	chore(deps)(deps): bump actions/checkout from 4 to 6	2026-05-02 19:23:01 +00:00
ci.yml	refactor(workspace): extract inbox tools from a2a_tools.py (RFC #2873 iter 4e)	2026-05-05 14:28:58 -07:00
codeql.yml	chore(deps)(deps): bump actions/checkout from 4 to 6	2026-05-02 19:23:01 +00:00
continuous-synth-e2e.yml	ci(canary): bump timeout-minutes 12 → 20 to absorb apt tail latency	2026-05-04 07:02:12 -07:00
e2e-api.yml	test(e2e): add poll-mode chat upload E2E and wire into e2e-api.yml	2026-05-05 13:08:55 -07:00
e2e-staging-canvas.yml	fix(workflows): preserve curl stderr in 8 status-capture sites	2026-05-04 18:54:50 -07:00
e2e-staging-external.yml	fix(workflows): preserve curl stderr in 8 status-capture sites	2026-05-04 18:54:50 -07:00
e2e-staging-saas.yml	fix(workflows): preserve curl stderr in 8 status-capture sites	2026-05-04 18:54:50 -07:00
e2e-staging-sanity.yml	fix(workflows): preserve curl stderr in 8 status-capture sites	2026-05-04 18:54:50 -07:00
handlers-postgres-integration.yml	ci(handlers-pg): apply all migrations with skip-on-error + sanity check (#320 )	2026-05-05 03:48:43 -07:00
harness-replays.yml	chore(deps)(deps): bump actions/checkout from 4 to 6	2026-05-02 19:23:01 +00:00
lint-curl-status-capture.yml	fix(workflows): rewrite curl status-capture to prevent exit-code pollution	2026-05-04 18:29:38 -07:00
pr-guards.yml	ci: add pr-guards caller that disables auto-merge on push	2026-04-27 06:39:31 -07:00
promote-latest.yml	chore(deps)(deps): bump imjasonh/setup-crane from 0.4 to 0.5	2026-05-02 19:23:13 +00:00
publish-canvas-image.yml	Merge pull request #2521 from Molecule-AI/dependabot/github_actions/actions/checkout-6	2026-05-03 01:36:57 +00:00
publish-runtime.yml	fix(publish-runtime): re-add 5 templates wrongly removed from cascade (#2566 )	2026-05-03 05:41:53 -07:00
publish-workspace-server-image.yml	Merge pull request #2521 from Molecule-AI/dependabot/github_actions/actions/checkout-6	2026-05-03 01:36:57 +00:00
railway-pin-audit.yml	Merge pull request #2523 from Molecule-AI/dependabot/github_actions/actions/github-script-9.0.0	2026-05-03 01:37:00 +00:00
redeploy-tenants-on-main.yml	fix(workflows): preserve curl stderr in 8 status-capture sites	2026-05-04 18:54:50 -07:00
redeploy-tenants-on-staging.yml	fix(workflows): preserve curl stderr in 8 status-capture sites	2026-05-04 18:54:50 -07:00
retarget-main-to-staging.yml	fix(retarget): skip PRs whose head is staging (auto-promote PRs)	2026-05-03 07:34:24 -07:00
runtime-pin-compat.yml	chore(deps)(deps): bump actions/checkout from 4 to 6	2026-05-02 19:23:01 +00:00
runtime-prbuild-compat.yml	fix(ci): include event_name in runtime-prbuild-compat concurrency group	2026-05-05 04:01:20 -07:00
secret-pattern-drift.yml	chore(deps)(deps): bump actions/checkout from 4 to 6	2026-05-02 19:23:01 +00:00
secret-scan.yml	chore(deps)(deps): bump actions/checkout from 4 to 6	2026-05-02 19:23:01 +00:00
sweep-aws-secrets.yml	feat(ops): add sweep-aws-secrets janitor — orphan tenant bootstrap secrets	2026-05-03 02:38:08 -07:00
sweep-cf-orphans.yml	chore(deps)(deps): bump actions/checkout from 4 to 6	2026-05-02 19:23:01 +00:00
sweep-cf-tunnels.yml	chore(deps)(deps): bump actions/checkout from 4 to 6	2026-05-02 19:23:01 +00:00
sweep-stale-e2e-orgs.yml	fix(workflows): preserve curl stderr in 8 status-capture sites	2026-05-04 18:54:50 -07:00
test-ops-scripts.yml	chore(deps)(deps): bump actions/checkout from 4 to 6	2026-05-02 19:23:01 +00:00