Two changes that close one of the leak classes from the
molecule-controlplane#420 vCPU audit:
1. sweep-stale-e2e-orgs.yml: cron */15 (was hourly), MAX_AGE_MINUTES
30 (was 120). E2E runs are 8-25 min wall clock; 30 min is safely
above the longest run while shrinking the worst-case leak window
from ~2h to ~45 min (15-min sweep cadence + 30-min threshold).
2. canary-staging.yml teardown: the per-slug DELETE used `>/dev/null
|| true`, which swallowed every failure. A 5xx or timeout from CP
looked identical to "successfully deleted" and the canary tenant
kept eating ~2 vCPU until the sweeper caught it. Now we capture
the response code and surface non-2xx as a workflow warning that
names the leaked slug.
The exit semantics stay unchanged — a single-canary cleanup miss
shouldn't fail-flag the canary itself when the actual smoke check
passed. The sweeper is the safety net for whatever slips past.
Caught during the molecule-controlplane#420 audit on 2026-05-03 —
3 e2e canary tenant orphans were running for 24-95 min, all under
the previous 120-min sweep threshold so they went unnoticed until
manual cleanup. Same `|| true` pattern exists in
e2e-staging-{canvas,external,saas,sanity}.yml; out of scope for
this PR (mechanical port; tracking separately) but the sweeper
tightening covers all of them by reducing the safety-net latency.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>