molecule-core/.github/workflows
rabbitblood 3c18b76aa7 ops(cf): hourly sweep workflow for orphan Cloudflare DNS records (#239)
Closes Molecule-AI/molecule-controlplane#239.

CF zone hit the 200-record quota 2026-04-23+ — every E2E and canary
left a record on moleculesai.app, and no scheduled job pruned them.
Provisions started failing with code 81045 ('Record quota exceeded').

The sweep-cf-orphans.sh script (PR #1978, with decision-function
unit tests added in #2079) already exists but no workflow fires it.
Adding it here as a parallel janitor to sweep-stale-e2e-orgs.yml:

- hourly schedule at :15 (offset from the e2e-orgs sweep at :00 so
  the two converge cleanly without racing the same CP admin endpoint)
- workflow_dispatch with dry_run input default true (ad-hoc verify
  without committing to deletes)
- workflow_dispatch with max_delete_pct input for major cleanups
  (the script's own MAX_DELETE_PCT defaults to 50% as a safety gate)
- concurrency group prevents schedule + manual-dispatch from racing
  the same zone

Why a separate workflow vs sweep-stale-e2e-orgs.yml:
- That workflow drives DELETE /cp/admin/tenants/:slug, assumes CP
  has the org row. Doesn't catch records left when CP itself never
  knew about the tenant (canary scratch, manual ops experiments)
  or when the CP-side cascade's CF-delete branch failed.
- sweep-cf-orphans.sh enumerates the CF zone directly + matches
  against live CP slugs + AWS EC2 names. Catches what the CP-driven
  sweep can't.

Required secrets (will need to be set on the repo): CF_API_TOKEN,
CF_ZONE_ID, CP_PROD_ADMIN_TOKEN, CP_STAGING_ADMIN_TOKEN,
AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY. Pre-flight verify-secrets
step fails loud if any are missing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 04:16:43 -07:00
..
auto-promote-staging.yml ci: canary-verify graceful-skip + draft auto-promote staging→main 2026-04-22 22:39:23 +00:00
block-internal-paths.yml ci(block-paths): fetch PR base SHA to fix shallow-clone diff failure 2026-04-24 12:01:53 +00:00
canary-staging.yml ci(canary): inject E2E_OPENAI_API_KEY so A2A turn doesn't 500 2026-04-24 22:37:13 -07:00
canary-verify.yml ci: canary-verify graceful-skip + draft auto-promote staging→main 2026-04-22 22:39:23 +00:00
check-merge-group-trigger.yml ci: add linter that fails when required workflow lacks merge_group trigger 2026-04-24 00:33:05 -07:00
ci.yml ci: add merge_group trigger to ci + codeql 2026-04-23 21:24:53 -07:00
codeql.yml ci: add merge_group trigger to ci + codeql 2026-04-23 21:24:53 -07:00
e2e-api.yml feat(ci): run E2E API smoke test on staging branch 2026-04-23 17:47:47 -07:00
e2e-staging-canvas.yml feat(ci): run E2E Staging Canvas on staging branch pushes 2026-04-23 17:47:51 -07:00
e2e-staging-saas.yml fix(e2e): increase hermes workspace wait from 20 to 30 min 2026-04-24 17:11:37 +00:00
e2e-staging-sanity.yml fix(e2e): CP DELETE /cp/admin/tenants body uses 'confirm', not 'confirm_token' 2026-04-21 04:50:28 -07:00
promote-latest.yml perf(ci): move all public-repo workflows to ubuntu-latest 2026-04-22 12:56:49 -07:00
publish-canvas-image.yml perf(ci): move all public-repo workflows to ubuntu-latest 2026-04-22 12:56:49 -07:00
publish-workspace-server-image.yml ci(publish-image): also tag :staging-latest so CP auto-picks up new builds 2026-04-24 00:29:55 -07:00
redeploy-tenants-on-main.yml ci(redeploy): fire post-main tenant fleet redeploy via CP admin endpoint 2026-04-24 14:34:28 -07:00
retarget-main-to-staging.yml ci(retarget): handle 422 'duplicate PR' by closing redundant main-PR (closes #1884) 2026-04-26 00:53:55 -07:00
runtime-pin-compat.yml fix(ci): set WORKSPACE_ID for the runtime-pin smoke import 2026-04-26 01:59:56 -07:00
sweep-cf-orphans.yml ops(cf): hourly sweep workflow for orphan Cloudflare DNS records (#239) 2026-04-26 04:16:43 -07:00
sweep-stale-e2e-orgs.yml ci: hourly sweep of stale e2e-* orgs on staging 2026-04-24 23:07:57 -07:00
test-ops-scripts.yml refactor(ops): apply simplify findings on #2027 PR 2026-04-26 00:28:15 -07:00