forked from molecule-ai/molecule-core
The hourly Sweep stale Cloudflare Tunnels job got cancelled mid-cleanup
on 2026-05-02 (run 25248788312, killed at 5min after deleting 424/672
stale tunnels). A second manual dispatch finished the remaining 254
fine, so the immediate backlog cleared, but two underlying bugs would
re-trip on the next big cleanup.
Bug 1: serial delete loop. The execute branch was a `while read; do
curl -X DELETE; done` pipeline at ~0.7s/tunnel — fine for the
steady-state cleanup of a handful, but a 600+ backlog needs ~7-8min.
This commit fans out to $SWEEP_CONCURRENCY (default 8) workers via
`xargs -P 8 -L 1 -I {} bash -c '...' _ {} < "$DELETE_PLAN"`. With 8x
parallelism the same 600+ list drains in ~60s. Notes:
- We use stdin (`<`) not GNU's `xargs -a FILE` so the script stays
portable to BSD xargs (matters for local-runner testing on macOS).
- We pass ONLY the tunnel id on argv. xargs tokenizes on whitespace
by default; tab-separating id+name on argv risks mangling. The
name is kept in a side-channel id->name map ($NAME_MAP) and looked
up by the worker only on failure, for FAIL_LOG readability.
- Workers print exactly `OK` or `FAIL` on stdout; tally with
`grep -c '^OK$' / '^FAIL$'`.
- On non-zero FAILED, log the first 20 lines of $FAIL_LOG as
"Failure detail (first 20):" — same diagnostic surface as before
but consolidated so we don't spam logs on a flaky CF API.
Bug 2: workflow's 5-min cap was set as a hangs-detector but turned out
to be a real-job-too-slow detector. Raised to 30 min — generous
headroom for the ~60s steady-state run while still surfacing genuine
hangs (and in line with the sweep-cf-orphans companion job).
Bug 3 (drive-by): the existing trap was `trap 'rm -rf "$PAGES_DIR"'
EXIT`, which would have been silently overwritten by any later trap
registration. Replaced with a single `cleanup()` function that wipes
PAGES_DIR + all four new tempfiles (DELETE_PLAN, NAME_MAP, FAIL_LOG,
RESULT_LOG), called once via `trap cleanup EXIT`.
Verification:
- bash -n scripts/ops/sweep-cf-tunnels.sh: clean
- shellcheck -S warning scripts/ops/sweep-cf-tunnels.sh: clean
- python3 yaml.safe_load on the workflow: clean
- Synthetic 30-line delete plan with every 7th id sentinel'd to
return {"success":false}: TEST PASS, DELETED=26 FAILED=4, FAIL_LOG
side-channel name lookup verified.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|---|---|---|
| .. | ||
| scripts | ||
| workflows | ||
| CODEOWNERS | ||
| dependabot.yml | ||