molecule-core/scripts/ops/sweep_cf_decide.py
rabbitblood ba78a5c00d test(ops): unit tests for sweep-cf-orphans decide() (#2027)
Closes #2027.

The CF orphan sweep deletes DNS records — a misclassification could nuke
a live workspace's tunnel. The decision function had MAX_DELETE_PCT
percentage gating but no automated test of category → action mapping.

Approach: extract the decide() function to scripts/ops/sweep_cf_decide.py
as a verbatim copy bracketed by `# CANONICAL DECIDE BEGIN/END` markers.
The shell script keeps its inline heredoc (so the operational path is
untouched) but bracketed by the same markers. A parity test
(TestParityWithBashScript) reads both files and asserts the bracketed
blocks match line-for-line — drift fails CI loudly.

Coverage (25 tests, 1 file, stdlib unittest only):
- Rule 1 platform-core: apex, _vercel, _domainkey, www/api/app/doc/send/status/staging-api
- Rule 3 ws-*: live (matches EC2 prefix) on prod + staging; orphan on prod + staging
- Rule 4 e2e-*: live + orphan on staging; orphan on prod
- Rule 2 generic tenant: live prod + staging; unknown subdomain kept-for-safety
- Rule 5 fallthrough: external domain + unrelated apex
- Rule priority: api.moleculesai.app stays platform-core (not tenant); _vercel stays verification
- Safety gate: under/at/over default 50% threshold; zero-total no-divide; custom threshold
- Empty live-sets: documents that decide() alone classifies as orphan, gate is the defense

CI: new .github/workflows/test-ops-scripts.yml runs `python -m unittest
discover` against scripts/ops/ on every PR/push that touches the
directory. Lightweight — no requirements file, stdlib only.

Local: `cd scripts/ops && python -m unittest test_sweep_cf_decide -v` →
25 tests, all OK.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 00:22:30 -07:00

91 lines
3.8 KiB
Python

"""Decision logic extracted from sweep-cf-orphans.sh for unit testing (#2027).
The bash script embeds the same logic inline as a python heredoc — this
module is a verbatim copy used by test_sweep_cf_decide.py. The parity
test (TestParityWithBashScript) reads the bash script and asserts the
canonical block in this file is present byte-for-byte, so the two
cannot drift apart silently.
If you change the rules: edit BOTH this file AND the inline block in
``scripts/ops/sweep-cf-orphans.sh`` (the canonical block runs from
``# CANONICAL DECIDE BEGIN`` to ``# CANONICAL DECIDE END`` markers in
both files; the parity check compares those slices).
Inputs to ``decide(record, prod_slugs, staging_slugs, ec2_names)``:
record Cloudflare DNS record dict {name, id, type}
prod_slugs set of CP-prod org slugs (live tenants)
staging_slugs set of CP-staging org slugs
ec2_names set of live EC2 Name tags (e.g. ``ws-d3605ef2-f7d``)
Returns ``(action, reason, id, name, type)`` matching the bash heredoc.
"""
from __future__ import annotations
import re
from typing import Iterable
# CANONICAL DECIDE BEGIN
def decide(r, prod_slugs, staging_slugs, ec2_names):
n = r["name"]
rid = r["id"]
typ = r["type"]
all_slugs = prod_slugs | staging_slugs
# Rule 1: platform core — leave alone
if n == "moleculesai.app":
return ("keep", "apex", rid, n, typ)
if n.startswith("_") or n.endswith("._domainkey.moleculesai.app"):
return ("keep", "verification/key", rid, n, typ)
if n in {"api.moleculesai.app","app.moleculesai.app","doc.moleculesai.app",
"send.moleculesai.app","status.moleculesai.app","www.moleculesai.app",
"staging-api.moleculesai.app"}:
return ("keep", "platform-core", rid, n, typ)
# Rule 3: ws-<hex8>-<rest>.(staging.)moleculesai.app
m = re.match(r"^(ws-[a-f0-9]{8}-[a-f0-9]+)(?:\.staging)?\.moleculesai\.app$", n)
if m:
prefix = m.group(1)
# Live EC2 names are like "ws-d3605ef2-f7d" — same shape as DNS subdomain.
for ename in ec2_names:
if ename.startswith(prefix):
return ("keep", "live-ec2", rid, n, typ)
return ("delete", "orphan-ws", rid, n, typ)
# Rule 4: e2e-* tenants (includes canary, canvas variants)
m = re.match(r"^(e2e-[^.]+)(?:\.staging)?\.moleculesai\.app$", n)
if m:
slug = m.group(1)
if slug in all_slugs:
return ("keep", "live-e2e-tenant", rid, n, typ)
return ("delete", "orphan-e2e-tenant", rid, n, typ)
# Rule 2: any other tenant subdomain (slug.moleculesai.app or slug.staging.moleculesai.app)
m = re.match(r"^([a-z0-9][a-z0-9-]*)(?:\.staging)?\.moleculesai\.app$", n)
if m:
slug = m.group(1)
if slug in all_slugs:
return ("keep", "live-tenant", rid, n, typ)
# Only flag as orphan if name looks like a tenant (not a one-off like "hermes-final-*")
# To avoid false-positive nukes on ad-hoc records, we KEEP anything that
# does not match a known pattern. Orphan only for explicit tenant-shaped names.
return ("keep", "unknown-subdomain-kept-for-safety", rid, n, typ)
return ("keep", "not-a-pattern-we-sweep", rid, n, typ)
# CANONICAL DECIDE END
def safety_gate(total: int, delete_count: int, max_delete_pct: int = 50) -> bool:
"""Return True iff the sweep is safe to execute.
Mirrors the shell-side gate in sweep-cf-orphans.sh: if the deletion
fraction exceeds ``max_delete_pct`` the sweep refuses to run. The
bash script computes the integer percentage as ``DELETE_COUNT*100/TOTAL``
— keeping the same arithmetic here so a future "raise to 75%" tweak
needs to be made in only one place semantically.
"""
if total <= 0:
return True # nothing to delete; gate is trivially satisfied
pct = delete_count * 100 // total
return pct <= max_delete_pct