feat(tests): add run-all-replays.sh harness runner

Boots the harness, runs every script under replays/, tracks pass/fail,
and tears down on exit. Closes the README's TODO for the harness runner
that the per-replay-registration comment referenced.

Usage:
  ./run-all-replays.sh                # boot, run, teardown
  KEEP_UP=1 ./run-all-replays.sh      # leave harness running on exit
  REBUILD=1 ./run-all-replays.sh      # rebuild images before booting

Trap-on-EXIT teardown ensures partial-failure runs don't leak Docker
resources. Returns non-zero if any replay failed; CI can adopt this as
a single command without per-replay registration. Phase 2 picks this up
to wire harness-based E2E as a required check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Hongming Wang 2026-04-30 11:57:27 -07:00
parent 3689fd9d82
commit 0af4012f79
2 changed files with 103 additions and 4 deletions

View File

@ -44,6 +44,15 @@ cd tests/harness
./down.sh # tear down + remove volumes ./down.sh # tear down + remove volumes
``` ```
To run every replay in one shot (boot, seed, run-all, teardown):
```bash
cd tests/harness
./run-all-replays.sh # full lifecycle; non-zero exit if any replay fails
KEEP_UP=1 ./run-all-replays.sh # leave harness up for debugging
REBUILD=1 ./run-all-replays.sh # rebuild images before booting
```
First-time setup needs an `/etc/hosts` entry so `harness-tenant.localhost` First-time setup needs an `/etc/hosts` entry so `harness-tenant.localhost`
resolves to the local cf-proxy: resolves to the local cf-proxy:
@ -71,8 +80,8 @@ To add a new replay:
2. The script's purpose: reproduce the production failure mode against 2. The script's purpose: reproduce the production failure mode against
the harness, then assert the fix is present. PASS criterion is the the harness, then assert the fix is present. PASS criterion is the
post-fix behavior. post-fix behavior.
3. Wire it into the `tests/harness/run-all-replays.sh` runner (TODO, 3. The `run-all-replays.sh` runner picks up every `replays/*.sh` script
Phase 2). automatically — no per-replay registration needed.
## Extending the cp-stub ## Extending the cp-stub
@ -102,9 +111,9 @@ its mandate of "exercise the tenant binary in production-shape topology."
## Roadmap ## Roadmap
- **Phase 1 (this PR):** harness + cp-stub + cf-proxy + 2 replays. - **Phase 1 (shipped):** harness + cp-stub + cf-proxy + 2 replays + `run-all-replays.sh` runner.
- **Phase 2:** convert `tests/e2e/test_api.sh` to run against the - **Phase 2:** convert `tests/e2e/test_api.sh` to run against the
harness instead of localhost. Make harness-based E2E a required CI harness instead of localhost. Make harness-based E2E a required CI
check. check (a workflow that invokes `run-all-replays.sh` on every PR).
- **Phase 3:** config-coherence lint that diffs harness env list - **Phase 3:** config-coherence lint that diffs harness env list
against production CP's env list, fails CI on drift. against production CP's env list, fails CI on drift.

View File

@ -0,0 +1,90 @@
#!/usr/bin/env bash
# Run every replay under tests/harness/replays/ against a fresh harness.
#
# Boots the harness (up.sh + seed.sh), runs each `replays/*.sh` in
# alphabetical order, tracks pass/fail, and tears down on exit. Returns
# non-zero if any replay failed.
#
# Usage:
# ./run-all-replays.sh # boot, run, teardown
# KEEP_UP=1 ./run-all-replays.sh # leave harness running on exit (debug)
# REBUILD=1 ./run-all-replays.sh # rebuild images before booting
#
# CI usage: invoke without flags. The trap-on-EXIT teardown ensures we
# don't leak Docker resources when a replay fails partway through.
set -euo pipefail
HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
cd "$HERE"
REPLAYS_DIR="$HERE/replays"
if [ ! -d "$REPLAYS_DIR" ]; then
echo "[run-all] no replays/ directory at $REPLAYS_DIR — nothing to run"
exit 1
fi
shopt -s nullglob
REPLAYS=("$REPLAYS_DIR"/*.sh)
shopt -u nullglob
if [ ${#REPLAYS[@]} -eq 0 ]; then
echo "[run-all] replays/ is empty — nothing to run"
exit 1
fi
cleanup() {
local exit_code=$?
if [ "${KEEP_UP:-0}" = "1" ]; then
echo ""
echo "[run-all] KEEP_UP=1 — leaving harness up. Tear down manually with ./down.sh"
else
echo ""
echo "[run-all] tearing down harness..."
./down.sh >/dev/null 2>&1 || echo "[run-all] WARN: ./down.sh exited non-zero"
fi
exit "$exit_code"
}
trap cleanup EXIT INT TERM
echo "[run-all] booting harness..."
if [ "${REBUILD:-0}" = "1" ]; then
./up.sh --rebuild
else
./up.sh
fi
echo "[run-all] seeding workspaces..."
./seed.sh
PASS_COUNT=0
FAIL_COUNT=0
SKIP_COUNT=0
FAILED_NAMES=()
for replay in "${REPLAYS[@]}"; do
name=$(basename "$replay" .sh)
echo ""
echo "[run-all] ━━━ $name ━━━"
if bash "$replay"; then
# Replays signal "skip" by exiting 0 with a __SKIP__ marker in stdout —
# but we capture that as a pass here since the script exited 0. The
# skip is documented in the script's own output. CI uses pass/fail.
PASS_COUNT=$((PASS_COUNT + 1))
echo "[run-all] PASS: $name"
else
FAIL_COUNT=$((FAIL_COUNT + 1))
FAILED_NAMES+=("$name")
echo "[run-all] FAIL: $name"
fi
done
echo ""
echo "[run-all] ============================="
echo "[run-all] Replay summary: ${PASS_COUNT} passed, ${FAIL_COUNT} failed (of ${#REPLAYS[@]} total)"
if [ ${FAIL_COUNT} -gt 0 ]; then
echo "[run-all] Failed:"
for name in "${FAILED_NAMES[@]}"; do
echo "[run-all] - $name"
done
exit 1
fi
echo "[run-all] All replays passed."