From 8efb2dae8d7f99257532a82d4033e76d7bcc9d24 Mon Sep 17 00:00:00 2001 From: Hongming Wang Date: Thu, 30 Apr 2026 10:07:52 -0700 Subject: [PATCH] fix(ci): handle empty E2E lookup in auto-promote-on-e2e gate MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When gh run list returns [] (no E2E run on the main SHA — the common case for canvas-only / cmd-only / sweep-only changes whose paths don't trigger E2E), jq's `.[0]` is null and the interpolation `"\(null)/\(null // "none")"` produces "null/none". The case statement has no `null/none)` branch, so it falls into `*)` → exit 1 → auto-promote-on-e2e fails → `:latest` doesn't get retagged to the new SHA → tenants on `redeploy-tenants-on-main` end up pulling the OLD `:latest` digest. Surfaced 2026-04-30 17:00Z as the first observable consequence of PR #2389 (App-token dispatch fix). Every prior auto-promote-on-e2e run was triggered by E2E completion (the "Upstream is E2E itself" short-circuit at line 151 fired before reaching the gate). #2389 made publish-image's completion event correctly fire workflow_run listeners — auto-promote-on-e2e is one of those listeners — and hit the latent jq bug on the first publish-upstream run. Fix: change `.[0]` to `(.[0] // {})` in the jq filter so the empty- array case becomes `none/none` (the documented "E2E paths-filtered out for this SHA — proceed" branch) instead of the unhandled `null/none`. Also default `.status` for the same defensive reason. Verified the three input shapes locally: [] → "none/none" ✓ [{status:completed,conclusion:success}] → "completed/success" ✓ [{status:in_progress,conclusion:null}] → "in_progress/none" ✓ Outer `|| echo "none/none"` fallback retained as defense-in-depth for non-zero gh exits (network / auth failures). Co-Authored-By: Claude Opus 4.7 (1M context) --- .github/workflows/auto-promote-on-e2e.yml | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/.github/workflows/auto-promote-on-e2e.yml b/.github/workflows/auto-promote-on-e2e.yml index 817cc660..d548889c 100644 --- a/.github/workflows/auto-promote-on-e2e.yml +++ b/.github/workflows/auto-promote-on-e2e.yml @@ -155,6 +155,20 @@ jobs: fi # Upstream is publish-workspace-server-image. Check E2E state. + # The jq filter must defend against TWO empty cases that gh + # CLI emits indistinguishably: + # 1. gh exits non-zero (network blip, auth issue) → handled + # by the `|| echo "none/none"` fallback below. + # 2. gh exits zero but returns `[]` (no E2E run on this + # main SHA — the common case for canvas-only / cmd-only + # / sweep-only changes whose paths don't trigger E2E). + # Without `(.[0] // {})`, jq sees `null` and emits + # "null/none" — which the case statement below has no + # branch for, so it falls into *) → exit 1. + # Surfaced 2026-04-30 the first time the App-token chain + # (#2389) actually fired auto-promote-on-e2e from a publish + # upstream — every prior run was E2E-upstream which + # short-circuits before this gate. RESULT=$(gh run list \ --repo "$REPO" \ --workflow e2e-staging-saas.yml \ @@ -162,7 +176,7 @@ jobs: --commit "$SHA" \ --limit 1 \ --json status,conclusion \ - --jq '.[0] | "\(.status)/\(.conclusion // "none")"' \ + --jq '(.[0] // {}) | "\(.status // "none")/\(.conclusion // "none")"' \ 2>/dev/null || echo "none/none") echo "E2E Staging SaaS for ${SHA:0:7}: $RESULT"