The sweep-cf-orphans workflow shipped in #2088 was noisier than
intended in two ways. This PR fixes both — was filed under the
Optional finding I left on the original review and now matters because
the noise is observably hitting the merge queue.
1) `merge_group: types: [checks_requested]` was firing the entire
sweep job on every PR through the merge queue. The original intent
("future required-check support without a workflow edit") never
materialized, and meanwhile every recent merge-queue eval (#2091,
#2092, #2093, #2094, #2095, #2097) generated a red `Sweep CF
orphans (merge_group)` run.
Drop the trigger. Comment in the workflow explains the re-add path
if/when the workflow IS wired as a required check (re-add the
trigger AND gate the actual sweep step with
`if: github.event_name != 'merge_group'` so merge-queue evals are
no-op success).
2) The `Verify required secrets present` step exits 2 when the 6
secrets aren't configured yet (the PR body's post-merge step,
still pending). That turns the hourly schedule into an hourly red
CI run for as long as the secrets stay unset.
Convert to a soft skip: emit a `:⚠️:` listing the missing
secrets and set a `skip=true` step output, then gate the sweep
step with `if: steps.verify.outputs.skip != 'true'`. Workflow
reports green and ops still sees the warning when they review
recent runs.
Net effect:
- merge-queue evals stop generating spurious red runs
- the schedule reports green-with-warning until secrets land
- once secrets land, behavior is identical to today's (real sweep
runs, hard-fails if a secret is later removed)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Drop redundant 'aws --version' step. Script's own 'aws ec2
describe-instances' fails just as loud with a more actionable
error; the pre-check added ~1s with no signal value.
- timeout-minutes 10 → 3. Realistic worst case is ~2min (4 curls +
1 aws + N×CF-DELETE each individually capped at 10s by the
script's curl -m flag). 3 surfaces hangs within one cron tick
instead of burning the full interval.
- Document the schedule-vs-dispatch dry-run asymmetry inline so
the next reader doesn't need to trace input defaults.
- Add merge_group: types: [checks_requested] for queue parity with
runtime-pin-compat.yml — cheap insurance if this ever becomes a
required check.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes Molecule-AI/molecule-controlplane#239.
CF zone hit the 200-record quota 2026-04-23+ — every E2E and canary
left a record on moleculesai.app, and no scheduled job pruned them.
Provisions started failing with code 81045 ('Record quota exceeded').
The sweep-cf-orphans.sh script (PR #1978, with decision-function
unit tests added in #2079) already exists but no workflow fires it.
Adding it here as a parallel janitor to sweep-stale-e2e-orgs.yml:
- hourly schedule at :15 (offset from the e2e-orgs sweep at :00 so
the two converge cleanly without racing the same CP admin endpoint)
- workflow_dispatch with dry_run input default true (ad-hoc verify
without committing to deletes)
- workflow_dispatch with max_delete_pct input for major cleanups
(the script's own MAX_DELETE_PCT defaults to 50% as a safety gate)
- concurrency group prevents schedule + manual-dispatch from racing
the same zone
Why a separate workflow vs sweep-stale-e2e-orgs.yml:
- That workflow drives DELETE /cp/admin/tenants/:slug, assumes CP
has the org row. Doesn't catch records left when CP itself never
knew about the tenant (canary scratch, manual ops experiments)
or when the CP-side cascade's CF-delete branch failed.
- sweep-cf-orphans.sh enumerates the CF zone directly + matches
against live CP slugs + AWS EC2 names. Catches what the CP-driven
sweep can't.
Required secrets (will need to be set on the repo): CF_API_TOKEN,
CF_ZONE_ID, CP_PROD_ADMIN_TOKEN, CP_STAGING_ADMIN_TOKEN,
AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY. Pre-flight verify-secrets
step fails loud if any are missing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>