ci(harness-replays): KEEP_UP=1 so dump-logs step has containers to read

First run on PR #2410 failed with 'container harness-tenant-1 is unhealthy' but the dump-compose-logs step printed empty tenant logs because run-all-replays.sh's trap-on-EXIT had already torn down the harness. Setting KEEP_UP=1 leaves containers in place; the always-run Force teardown step at the end owns cleanup explicitly. Now we'll actually see why the tenant didn't become healthy. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:15:46 -07:00 · 2026-04-30 13:15:46 -07:00 · 24cb2a286f
commit 24cb2a286f
parent 3105e87cf7
1 changed files with 14 additions and 4 deletions
--- a/.github/workflows/harness-replays.yml
+++ b/.github/workflows/harness-replays.yml
@ -120,8 +120,18 @@ jobs:
        # run-all-replays.sh: boot via up.sh → seed via seed.sh → run
        # every replays/*.sh → tear down via down.sh on EXIT (trap).
        # Non-zero exit on any replay failure.
+        #
+        # KEEP_UP=1: without this, the script's trap-on-EXIT tears
+        # down containers immediately on failure, leaving the dump
+        # step below with nothing to dump (verified on PR #2410's
+        # first run — tenant became unhealthy, trap fired, dump
+        # step saw empty containers). Keeping them up lets the
+        # failure path collect tenant/cp-stub/cf-proxy logs. The
+        # always-run "Force teardown" step does the actual cleanup.
        if: needs.detect-changes.outputs.run == 'true'
        working-directory: tests/harness
+        env:
+          KEEP_UP: "1"
        run: ./run-all-replays.sh

      - name: Dump compose logs on failure
@ -139,10 +149,10 @@ jobs:
          echo "=== postgres logs (last 100) ==="
          docker compose -f compose.yml logs --tail 100 postgres || true

-      - name: Force teardown (belt-and-suspenders)
-        # run-all-replays.sh's trap should already have torn down,
-        # but if something killed bash before the trap fired, this
-        # ensures the runner doesn't leak the network/volumes.
+      - name: Force teardown
+        # We pass KEEP_UP=1 to run-all-replays.sh so the dump step
+        # above sees real containers — that means we own teardown
+        # explicitly here. Always run.
        if: always() && needs.detect-changes.outputs.run == 'true'
        working-directory: tests/harness
        run: ./down.sh || true