ci: bump continuous-synth-e2e cadence 3→6 fires/hour, all clean slots

Change cron from '10,30,50' (3 fires/hour) to '2,12,22,32,42,52'
(6 fires/hour). All new slots are 1-3 min away from any other
cron, avoiding both the cf-sweep collisions (:15, :45) and the
:30 heavy slot (canary-staging /30, sweep-aws-secrets,
sweep-stale-e2e-orgs every :15).

Why: empirically 2026-05-04 the canary fired only once per hour
on the 10,30,50 schedule (see #2726). Bumping fires-per-hour
gives more chances to land a survived fire under GH's load-
related drop ratio, and keeping all slots in clean lanes
minimizes the per-fire drop probability.

At empirically-observed ~67% drop ratio, 6 attempts/hour yields
~2 effective fires = ~30 min cadence; closer to the 20-min
target than the current shape and provides a real degradation
alarm if drops get worse.

Cost: ~$0.50/day → ~$1/day. Negligible.

Closes #2726.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Hongming Wang 2026-05-04 05:10:48 -07:00
parent c0997a5703
commit 032c011b37

View File

@ -32,20 +32,30 @@ name: Continuous synthetic E2E (staging)
on:
schedule:
# Every 20 minutes, on :10 :30 :50. Two constraints:
# Every 10 minutes, on :02 :12 :22 :32 :42 :52. Three constraints:
# 1. Stay off the top-of-hour. GitHub Actions scheduler drops
# :00 firings under high load (own docs:
# https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#schedule).
# Empirical 2026-05-03: cron was '0,20,40 * * * *' but actual
# firings landed at :08, :03, :01, :03 with :20 + :40 silently
# dropped — only the :00-region run survived. Detection
# latency degraded from claimed 20 min to actual ~60 min.
# :10/:30/:50 sit far enough from :00 that GH-load skips
# stop dropping us.
# Prior history: cron was '0,20,40' (2026-05-02) — only :00
# ever survived. Bumped to '10,30,50' (2026-05-03) on the
# theory that further-from-:00 wins. Empirically 2026-05-04
# that ALSO dropped to ~60 min effective cadence (only ~1
# schedule fire per hour — see molecule-core#2726). Detection
# latency was claimed 20 min, actual 60 min.
# 2. Avoid colliding with the existing :15 sweep-cf-orphans
# and :45 sweep-cf-tunnels — both hit the CF API and we
# don't want to fight for rate-limit tokens.
- cron: '10,30,50 * * * *'
# 3. Avoid the :30 heavy slot (canary-staging /30, sweep-aws-
# secrets, sweep-stale-e2e-orgs every :15) — multiple
# overlapping cron registrations on the same minute is part
# of what GH drops under load.
# Solution: bump fires-per-hour 3 → 6 AND keep all slots in clean
# lanes (1-3 min away from any other cron). Even with empirically-
# observed ~67% GH drop ratio, 6 attempts/hour yields ~2 effective
# fires = ~30 min cadence; closer to the 20-min target than the
# current shape and provides a real degradation alarm if drops
# get worse.
- cron: '2,12,22,32,42,52 * * * *'
workflow_dispatch:
inputs:
runtime: