ci(harness-replays): KEEP_UP=1 so dump-logs step has containers to read

First run on PR #2410 failed with 'container harness-tenant-1 is unhealthy'
but the dump-compose-logs step printed empty tenant logs because
run-all-replays.sh's trap-on-EXIT had already torn down the harness.

Setting KEEP_UP=1 leaves containers in place; the always-run Force
teardown step at the end owns cleanup explicitly. Now we'll actually
see why the tenant didn't become healthy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Hongming Wang 2026-04-30 13:15:46 -07:00
parent 3105e87cf7
commit 24cb2a286f

View File

@ -120,8 +120,18 @@ jobs:
# run-all-replays.sh: boot via up.sh → seed via seed.sh → run
# every replays/*.sh → tear down via down.sh on EXIT (trap).
# Non-zero exit on any replay failure.
#
# KEEP_UP=1: without this, the script's trap-on-EXIT tears
# down containers immediately on failure, leaving the dump
# step below with nothing to dump (verified on PR #2410's
# first run — tenant became unhealthy, trap fired, dump
# step saw empty containers). Keeping them up lets the
# failure path collect tenant/cp-stub/cf-proxy logs. The
# always-run "Force teardown" step does the actual cleanup.
if: needs.detect-changes.outputs.run == 'true'
working-directory: tests/harness
env:
KEEP_UP: "1"
run: ./run-all-replays.sh
- name: Dump compose logs on failure
@ -139,10 +149,10 @@ jobs:
echo "=== postgres logs (last 100) ==="
docker compose -f compose.yml logs --tail 100 postgres || true
- name: Force teardown (belt-and-suspenders)
# run-all-replays.sh's trap should already have torn down,
# but if something killed bash before the trap fired, this
# ensures the runner doesn't leak the network/volumes.
- name: Force teardown
# We pass KEEP_UP=1 to run-all-replays.sh so the dump step
# above sees real containers — that means we own teardown
# explicitly here. Always run.
if: always() && needs.detect-changes.outputs.run == 'true'
working-directory: tests/harness
run: ./down.sh || true