Some checks failed
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (pull_request) Successful in 6s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 13s
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Successful in 5s
branch-protection drift check / Branch protection drift (pull_request) Successful in 17s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Successful in 5s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Successful in 5s
CI / Detect changes (pull_request) Successful in 12s
pr-guards / disable-auto-merge-on-push (pull_request) Failing after 7s
E2E API Smoke Test / detect-changes (pull_request) Successful in 15s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 15s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 13s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 19s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 19s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 21s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 10s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 12s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 33s
CI / Python Lint & Test (pull_request) Successful in 7m33s
CI / Canvas (Next.js) (pull_request) Failing after 8m25s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / Platform (Go) (pull_request) Successful in 9m22s
518 lines
25 KiB
YAML
518 lines
25 KiB
YAML
name: Auto-promote :latest after main image build
|
|
|
|
# Retags `ghcr.io/molecule-ai/{platform,platform-tenant}:staging-<sha>`
|
|
# → `:latest` after either the image build or E2E completes on a `main`
|
|
# push, gated on E2E Staging SaaS not being red for that SHA.
|
|
#
|
|
# Why two triggers:
|
|
#
|
|
# `publish-workspace-server-image` and `e2e-staging-saas` are both
|
|
# paths-filtered, but with DIFFERENT path sets:
|
|
#
|
|
# publish-workspace-server-image:
|
|
# workspace-server/**, canvas/**, manifest.json
|
|
#
|
|
# e2e-staging-saas (full lifecycle):
|
|
# workspace-server/internal/handlers/{registry,workspace_provision,
|
|
# a2a_proxy}.go, workspace-server/internal/middleware/**,
|
|
# workspace-server/internal/provisioner/**, tests/e2e/test_staging_full_saas.sh
|
|
#
|
|
# The E2E set is a strict SUBSET of the publish set. So:
|
|
# - canvas/** changes → publish fires, E2E does not
|
|
# - workspace-server/cmd/** changes → publish fires, E2E does not
|
|
# - workspace-server/internal/sweep/** → publish fires, E2E does not
|
|
#
|
|
# The previous version triggered ONLY on E2E completion, which meant
|
|
# non-E2E-path changes (canvas, cmd, sweep, etc.) rebuilt the image
|
|
# but never advanced `:latest`. Result: as of 2026-04-28 this workflow
|
|
# had run zero times since merge despite eight main pushes — `:latest`
|
|
# was ~7 hours / 9 PRs behind main with no human realising. See
|
|
# `molecule-core` Slack discussion 2026-04-28.
|
|
#
|
|
# Adding `publish-workspace-server-image` as a second trigger closes
|
|
# the gap: any image rebuild on main eligibly advances `:latest`.
|
|
#
|
|
# Why E2E remains a kill-switch (not the trigger):
|
|
#
|
|
# When E2E DID run for this SHA and ended red, we abort — `:latest`
|
|
# stays on the prior known-good digest. When E2E didn't run (paths
|
|
# filtered out), we proceed: pre-merge gates already validated this
|
|
# SHA on staging via auto-promote-staging requiring CI + E2E Canvas +
|
|
# E2E API + CodeQL all green. Image content for non-E2E-paths
|
|
# (canvas, cmd, sweep) is exercised by those staging gates.
|
|
#
|
|
# Why `main` only:
|
|
#
|
|
# `:latest` is what prod tenants pull. We only want SHAs that have
|
|
# reached main (via auto-promote-staging) to advance `:latest`.
|
|
# Triggering on staging would let a staging-only revert advance
|
|
# `:latest` to a SHA that never reaches main, breaking the "production
|
|
# runs what's on main" invariant.
|
|
#
|
|
# Idempotency:
|
|
#
|
|
# When a SHA touches paths that match BOTH publish and E2E, both
|
|
# workflows fire and complete. Both trigger this workflow on
|
|
# completion → two runs race. Both retag `:staging-<sha>` →
|
|
# `:latest`. crane tag is idempotent (re-tagging the same digest is a
|
|
# no-op), so the second run is harmless. concurrency group serializes
|
|
# them anyway.
|
|
|
|
on:
|
|
workflow_run:
|
|
workflows:
|
|
- 'E2E Staging SaaS (full lifecycle)'
|
|
- 'publish-workspace-server-image'
|
|
types: [completed]
|
|
branches: [main]
|
|
workflow_dispatch:
|
|
inputs:
|
|
sha:
|
|
description: 'Short sha to promote (override; defaults to upstream workflow_run head_sha)'
|
|
required: false
|
|
type: string
|
|
|
|
permissions:
|
|
contents: read
|
|
packages: write
|
|
|
|
concurrency:
|
|
# Serialize promotes per-SHA so the publish+E2E both-fired race lands
|
|
# cleanly. Different SHAs can promote in parallel.
|
|
group: auto-promote-latest-${{ github.event.workflow_run.head_sha || github.event.inputs.sha || github.sha }}
|
|
cancel-in-progress: false
|
|
|
|
env:
|
|
IMAGE_NAME: ghcr.io/molecule-ai/platform
|
|
TENANT_IMAGE_NAME: ghcr.io/molecule-ai/platform-tenant
|
|
|
|
jobs:
|
|
promote:
|
|
# Proceed if upstream succeeded OR manual dispatch. Upstream-failure
|
|
# paths are filtered here; the E2E-was-red kill-switch lives in the
|
|
# gate-check step below (covers the case where upstream is publish
|
|
# success but E2E for the same SHA failed).
|
|
if: |
|
|
github.event_name == 'workflow_dispatch' ||
|
|
(github.event_name == 'workflow_run' && github.event.workflow_run.conclusion == 'success')
|
|
runs-on: ubuntu-latest
|
|
steps:
|
|
- name: Compute short sha
|
|
id: sha
|
|
run: |
|
|
set -euo pipefail
|
|
if [ -n "${{ github.event.inputs.sha }}" ]; then
|
|
FULL="${{ github.event.inputs.sha }}"
|
|
else
|
|
FULL="${{ github.event.workflow_run.head_sha }}"
|
|
fi
|
|
echo "short=${FULL:0:7}" >> "$GITHUB_OUTPUT"
|
|
echo "full=${FULL}" >> "$GITHUB_OUTPUT"
|
|
|
|
- name: Gate — E2E Staging SaaS state for this SHA
|
|
# When upstream IS E2E success, we know it's green (filtered by
|
|
# the job-level `if` already). When upstream is publish, look up
|
|
# E2E state for the same SHA. Four buckets:
|
|
#
|
|
# - completed/success: E2E confirmed safe → proceed
|
|
# - completed/failure|cancelled|timed_out: E2E found a
|
|
# regression → ABORT (exit 1), `:latest` stays put
|
|
# - in_progress|queued|requested: E2E is RACING with publish
|
|
# for a runtime-touching SHA. publish typically completes
|
|
# ~5-10min before E2E (~10-15min). If we promote on the
|
|
# publish signal here, a later E2E failure can't roll back
|
|
# `:latest` — it'd already be wrongly advanced. So we DEFER:
|
|
# skip subsequent steps (proceed=false) and let E2E's own
|
|
# completion event re-fire this workflow, which then takes
|
|
# the upstream-is-E2E path. exit 0 so the run shows as
|
|
# success rather than a noisy fake-failure.
|
|
# - none/none: E2E was paths-filtered out for this SHA (the
|
|
# change touched canvas/cmd/sweep/etc. — paths covered by
|
|
# publish but not by E2E). pre-merge gates on staging
|
|
# already validated this SHA → proceed.
|
|
#
|
|
# Manual dispatch skips this check — operator override.
|
|
id: gate
|
|
env:
|
|
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
|
REPO: ${{ github.repository }}
|
|
SHA: ${{ steps.sha.outputs.full }}
|
|
UPSTREAM_NAME: ${{ github.event.workflow_run.name }}
|
|
EVENT_NAME: ${{ github.event_name }}
|
|
run: |
|
|
set -euo pipefail
|
|
|
|
if [ "$EVENT_NAME" = "workflow_dispatch" ]; then
|
|
echo "proceed=true" >> "$GITHUB_OUTPUT"
|
|
echo "::notice::Manual dispatch — skipping E2E gate (operator override)"
|
|
exit 0
|
|
fi
|
|
|
|
if [ "$UPSTREAM_NAME" = "E2E Staging SaaS (full lifecycle)" ]; then
|
|
echo "proceed=true" >> "$GITHUB_OUTPUT"
|
|
echo "::notice::Upstream is E2E itself (success per job-level if) — gate trivially satisfied"
|
|
exit 0
|
|
fi
|
|
|
|
# Upstream is publish-workspace-server-image. Check E2E state
|
|
# for the same SHA via Gitea's commit-status API.
|
|
#
|
|
# GitHub-era this was `gh run list --workflow=X --commit=SHA
|
|
# --json status,conclusion` returning either `[]` (no run on
|
|
# this SHA) or `[{status, conclusion}]` (the run's state).
|
|
# Gitea has NO workflow-runs API at all — `/api/v1/repos/.../
|
|
# actions/runs` returns 404 (verified 2026-05-07, issue #75).
|
|
# However Gitea Actions DOES emit a commit status per workflow
|
|
# job, with `context = "<Workflow Name> / <Job Name> (<event>)"`,
|
|
# which is exactly what we need: each E2E run leg becomes one
|
|
# status row on the SHA, and the aggregate state encodes the
|
|
# run's outcome.
|
|
#
|
|
# Mapping:
|
|
# 0 matched contexts → "none/none" (E2E paths-
|
|
# filtered
|
|
# out — same
|
|
# semantic
|
|
# as before)
|
|
# any context = pending → "in_progress/none" (defer)
|
|
# any context = error|failure → "completed/failure" (abort)
|
|
# all contexts = success → "completed/success" (proceed)
|
|
#
|
|
# The "completed/cancelled" and "completed/timed_out" buckets
|
|
# don't have direct Gitea analogs (Gitea statuses are
|
|
# success / failure / error / pending / warning). Per-SHA
|
|
# concurrency cancellation surfaces as `error` on Gitea, which
|
|
# we map to "completed/failure" rather than "completed/cancelled"
|
|
# — losing the soft-defer semantic of the cancelled bucket on
|
|
# this fleet. Tradeoff: the staleness alarm (auto-promote-stale-
|
|
# alarm.yml) still catches a stuck :latest within 4h, and a
|
|
# legitimate cancel is rare enough that aborting + manual
|
|
# re-dispatch is acceptable. If we measure cancel frequency
|
|
# > 1/week, revisit by reading the run-step-summary text via
|
|
# a follow-up script.
|
|
#
|
|
# Network or auth blips collapse to "none/none" via the curl
|
|
# `|| true` fallback, matching the pre-Gitea behaviour where
|
|
# an empty list also degenerated to none/none.
|
|
GITEA_API_URL="${GITHUB_SERVER_URL:-https://git.moleculesai.app}/api/v1"
|
|
STATUSES_JSON=$(curl --fail-with-body -sS \
|
|
-H "Authorization: token ${GH_TOKEN}" \
|
|
-H "Accept: application/json" \
|
|
"${GITEA_API_URL}/repos/${REPO}/commits/${SHA}/statuses?limit=100" \
|
|
2>/dev/null || echo "[]")
|
|
RESULT=$(printf '%s' "$STATUSES_JSON" | jq -r '
|
|
# Filter to E2E Staging SaaS (full lifecycle) statuses.
|
|
# Match by leading workflow-name prefix so the "<job>
|
|
# (<event>)" tail is irrelevant. Gitea emits the workflow
|
|
# name verbatim from the YAML `name:` field.
|
|
[.[] | select(.context | startswith("E2E Staging SaaS (full lifecycle) /"))] as $rows
|
|
| if ($rows | length) == 0 then
|
|
"none/none"
|
|
elif any($rows[]; .status == "pending") then
|
|
"in_progress/none"
|
|
elif any($rows[]; .status == "failure" or .status == "error") then
|
|
"completed/failure"
|
|
elif all($rows[]; .status == "success") then
|
|
"completed/success"
|
|
else
|
|
# Mixed / unknown — fall through to *) bucket below.
|
|
"completed/" + ($rows[0].status // "unknown")
|
|
end
|
|
' 2>/dev/null || echo "none/none")
|
|
|
|
echo "E2E Staging SaaS for ${SHA:0:7}: $RESULT"
|
|
|
|
case "$RESULT" in
|
|
completed/success)
|
|
echo "proceed=true" >> "$GITHUB_OUTPUT"
|
|
echo "::notice::E2E green for this SHA — proceeding with promote"
|
|
;;
|
|
completed/failure|completed/timed_out)
|
|
echo "proceed=false" >> "$GITHUB_OUTPUT"
|
|
{
|
|
echo "## ❌ Auto-promote aborted — E2E Staging SaaS failed"
|
|
echo
|
|
echo "E2E Staging SaaS for \`${SHA:0:7}\`: \`$RESULT\`"
|
|
echo "\`:latest\` stays on the prior known-good digest."
|
|
echo
|
|
echo "If the failure was a flake, manually dispatch this workflow with the same sha to override."
|
|
} >> "$GITHUB_STEP_SUMMARY"
|
|
exit 1
|
|
;;
|
|
completed/cancelled)
|
|
# GitHub-era only: cancelled ≠ failure. Gitea statuses
|
|
# don't expose a "cancelled" state — a per-SHA concurrency
|
|
# cancellation surfaces as `failure` or `error` on Gitea
|
|
# and is now handled by the failure branch above. This
|
|
# arm is kept for backwards compatibility / dual-host
|
|
# operation (if we ever add a non-Gitea fallback) but
|
|
# under the post-#75 flow it's unreachable.
|
|
echo "proceed=false" >> "$GITHUB_OUTPUT"
|
|
{
|
|
echo "## ⏭ Auto-promote deferred — E2E Staging SaaS was cancelled"
|
|
echo
|
|
echo "E2E Staging SaaS for \`${SHA:0:7}\`: \`$RESULT\`"
|
|
echo "Likely per-SHA concurrency (newer push superseded this E2E run)."
|
|
echo "The newer SHA's E2E will fire its own promote when it lands."
|
|
echo "If you need this specific SHA promoted, manually dispatch."
|
|
} >> "$GITHUB_STEP_SUMMARY"
|
|
;;
|
|
in_progress/*|queued/*|requested/*|waiting/*|pending/*)
|
|
echo "proceed=false" >> "$GITHUB_OUTPUT"
|
|
{
|
|
echo "## ⏳ Auto-promote deferred — E2E Staging SaaS still running"
|
|
echo
|
|
echo "Publish completed before E2E for \`${SHA:0:7}\` (state: \`$RESULT\`)."
|
|
echo "Skipping retag here — E2E's own completion event will re-fire this workflow."
|
|
echo "If E2E ends green, that run promotes \`:latest\`. If red, it aborts."
|
|
} >> "$GITHUB_STEP_SUMMARY"
|
|
;;
|
|
none/none)
|
|
echo "proceed=true" >> "$GITHUB_OUTPUT"
|
|
echo "::notice::E2E paths-filtered out for this SHA — pre-merge staging gates carry"
|
|
;;
|
|
*)
|
|
echo "proceed=false" >> "$GITHUB_OUTPUT"
|
|
{
|
|
echo "## ❓ Auto-promote aborted — unexpected E2E state"
|
|
echo
|
|
echo "E2E Staging SaaS for \`${SHA:0:7}\`: \`$RESULT\` (unhandled)"
|
|
echo "Manual investigation needed; re-dispatch with the same sha once resolved."
|
|
} >> "$GITHUB_STEP_SUMMARY"
|
|
exit 1
|
|
;;
|
|
esac
|
|
|
|
- if: steps.gate.outputs.proceed == 'true'
|
|
uses: imjasonh/setup-crane@6da1ae018866400525525ce74ff892880c099987 # v0.5
|
|
|
|
- name: GHCR login
|
|
if: steps.gate.outputs.proceed == 'true'
|
|
run: |
|
|
echo "${{ secrets.GITHUB_TOKEN }}" | \
|
|
crane auth login ghcr.io -u "${{ github.actor }}" --password-stdin
|
|
|
|
- name: Verify :staging-<sha> exists for both images
|
|
# Better to fail fast with a clear message than to half-tag
|
|
# (platform retagged but platform-tenant missing → tenants pull
|
|
# a stale image).
|
|
if: steps.gate.outputs.proceed == 'true'
|
|
run: |
|
|
set -euo pipefail
|
|
for img in "${IMAGE_NAME}" "${TENANT_IMAGE_NAME}"; do
|
|
tag="${img}:staging-${{ steps.sha.outputs.short }}"
|
|
if ! crane manifest "$tag" >/dev/null 2>&1; then
|
|
echo "::error::Missing tag: $tag"
|
|
echo "::error::publish-workspace-server-image must complete on this SHA before auto-promote can retag :latest."
|
|
exit 1
|
|
fi
|
|
echo " ok: $tag exists"
|
|
done
|
|
|
|
- name: Checkout for local ancestry compute
|
|
# #2244: workflow_run completions arrive in arbitrary order. The
|
|
# ancestry check below uses `git merge-base --is-ancestor` to
|
|
# compare CURRENT_REVISION (read off the live :latest image) and
|
|
# TARGET_SHA (this run's SHA) — both full commit SHAs.
|
|
#
|
|
# Why a local checkout, not the Gitea compare API:
|
|
#
|
|
# Gitea v1.22's `/api/v1/repos/.../compare/A...B` does NOT accept
|
|
# full commit SHAs as either side — it returns
|
|
# {"total_commits":null, "message":"BaseNotExist"}
|
|
# for any non-branch / non-tag ref (verified 2026-05-07, issue
|
|
# #75). Branch + tag refs work, but ancestry between two
|
|
# arbitrary historical commits does not. The previous version
|
|
# called GitHub's `gh api repos/.../compare/A...B` which DOES
|
|
# accept SHAs and returns `.status: ahead|behind|identical|
|
|
# diverged` — that surface simply doesn't exist on Gitea.
|
|
#
|
|
# Local git is exact, fast (depth=200 covers any realistic
|
|
# divergence between :latest and a candidate retag — promote
|
|
# cycles are minutes, not hundreds of commits), and removes the
|
|
# cross-host API dependency entirely.
|
|
if: steps.gate.outputs.proceed == 'true' && github.event_name != 'workflow_dispatch'
|
|
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
|
|
with:
|
|
# Need enough history to resolve both CURRENT_REVISION and
|
|
# TARGET_SHA + their merge-base. 200 covers ~a week of main
|
|
# activity at the current commit cadence. Bump if a future
|
|
# cron pause lets :latest fall further behind.
|
|
fetch-depth: 200
|
|
|
|
- name: Ancestry check — refuse to promote :latest backwards
|
|
# Detection: read current :latest's `org.opencontainers.image.revision`
|
|
# label (set by publish-workspace-server-image.yml at build time)
|
|
# and ask local git whether the candidate SHA is ahead-of /
|
|
# identical-to / behind / diverged-from current. Hard-fail on
|
|
# `behind` and `diverged` per the approved design — silent-
|
|
# bypass is the class we're moving away from. Workflow goes red,
|
|
# oncall sees it, operator decides how to recover (manual
|
|
# dispatch with the right SHA, force-promote, etc.).
|
|
#
|
|
# Manual dispatch skips this check — operator override semantics
|
|
# match the gate-check step above.
|
|
#
|
|
# Backward-compat: when current :latest carries no revision
|
|
# label (legacy image pre-publish-with-label), skip-with-warning.
|
|
# All :latest images on main are post-label as of 2026-04-29, so
|
|
# this branch will be dead within 90 days; remove then.
|
|
if: steps.gate.outputs.proceed == 'true' && github.event_name != 'workflow_dispatch'
|
|
id: ancestry
|
|
env:
|
|
REPO: ${{ github.repository }}
|
|
TARGET_SHA: ${{ steps.sha.outputs.full }}
|
|
run: |
|
|
set -euo pipefail
|
|
|
|
# Read the current :latest config and pull the revision label.
|
|
# `crane config` returns the OCI image config blob (not the manifest);
|
|
# labels live under `.config.Labels`. `// empty` makes jq return ""
|
|
# rather than the literal "null" so the test below works.
|
|
CURRENT_REVISION=$(crane config "${IMAGE_NAME}:latest" 2>/dev/null \
|
|
| jq -r '.config.Labels["org.opencontainers.image.revision"] // empty' \
|
|
|| true)
|
|
|
|
if [ -z "$CURRENT_REVISION" ]; then
|
|
echo "decision=skip-no-label" >> "$GITHUB_OUTPUT"
|
|
{
|
|
echo "## ⚠ Ancestry check skipped — current :latest has no revision label"
|
|
echo
|
|
echo "Likely a legacy image built before \`org.opencontainers.image.revision\` was set."
|
|
echo "Falling through to retag. After all \`:latest\` images are post-label (TODO 90 days), this branch is dead and should be removed."
|
|
} >> "$GITHUB_STEP_SUMMARY"
|
|
echo "::warning::Current :latest carries no revision label — skipping ancestry check (legacy image)"
|
|
exit 0
|
|
fi
|
|
|
|
if [ "$CURRENT_REVISION" = "$TARGET_SHA" ]; then
|
|
echo "decision=identical" >> "$GITHUB_OUTPUT"
|
|
echo "::notice:::latest already at ${TARGET_SHA:0:7} — retag will be a no-op"
|
|
exit 0
|
|
fi
|
|
|
|
# Compute ancestry locally with git. The four buckets match
|
|
# GitHub's compare API status semantics exactly:
|
|
#
|
|
# ahead — target reaches current via parent pointers
|
|
# AND current does not reach target. I.e. target
|
|
# is a descendant of current → :latest moves
|
|
# forward, allow.
|
|
# identical — caught above by SHA-equality early-return.
|
|
# behind — current reaches target via parent pointers
|
|
# AND target does not reach current. I.e. target
|
|
# is an ancestor of current → moving :latest
|
|
# backwards (the #2244 race), block.
|
|
# diverged — neither reaches the other. Force-push or
|
|
# history rewrite, block.
|
|
#
|
|
# `git merge-base --is-ancestor X Y` exits 0 iff X is an
|
|
# ancestor of Y. Both calls are cheap (constant-ish in depth,
|
|
# which we bounded at 200 above).
|
|
#
|
|
# Both SHAs MUST be reachable in the runner's clone. If
|
|
# either rev-parse fails (e.g. the depth=200 we fetched isn't
|
|
# deep enough for an unusually old :latest revision), fall
|
|
# back to "error" — the previous version's `error` branch
|
|
# exits 1 and surfaces an explicit failure for operator
|
|
# action, same as a network blip in the old GitHub version.
|
|
if ! git rev-parse --verify --quiet "${CURRENT_REVISION}^{commit}" >/dev/null \
|
|
|| ! git rev-parse --verify --quiet "${TARGET_SHA}^{commit}" >/dev/null; then
|
|
STATUS="error"
|
|
elif git merge-base --is-ancestor "$CURRENT_REVISION" "$TARGET_SHA" 2>/dev/null; then
|
|
# CURRENT is ancestor of TARGET → TARGET is ahead.
|
|
STATUS="ahead"
|
|
elif git merge-base --is-ancestor "$TARGET_SHA" "$CURRENT_REVISION" 2>/dev/null; then
|
|
# TARGET is ancestor of CURRENT → TARGET is behind.
|
|
STATUS="behind"
|
|
else
|
|
# Neither reaches the other → divergent.
|
|
STATUS="diverged"
|
|
fi
|
|
|
|
echo "ancestry compare ${CURRENT_REVISION:0:7} → ${TARGET_SHA:0:7}: $STATUS"
|
|
|
|
case "$STATUS" in
|
|
ahead)
|
|
echo "decision=ahead" >> "$GITHUB_OUTPUT"
|
|
echo "::notice::Target ${TARGET_SHA:0:7} is ahead of current :latest (${CURRENT_REVISION:0:7}) — proceeding with retag"
|
|
;;
|
|
behind)
|
|
echo "decision=behind" >> "$GITHUB_OUTPUT"
|
|
{
|
|
echo "## ❌ Auto-promote refused — target is BEHIND current :latest"
|
|
echo
|
|
echo "| Field | Value |"
|
|
echo "|---|---|"
|
|
echo "| Target SHA | \`$TARGET_SHA\` |"
|
|
echo "| Current :latest revision | \`$CURRENT_REVISION\` |"
|
|
echo "| Ancestry compute | \`behind\` (target is an ancestor of :latest) |"
|
|
echo
|
|
echo "This guard catches the workflow_run-completion-order race (#2244):"
|
|
echo "two rapid main pushes whose E2Es complete out-of-order can otherwise"
|
|
echo "promote \`:latest\` backwards. \`:latest\` stays on \`${CURRENT_REVISION:0:7}\`."
|
|
echo
|
|
echo "**Recovery:** if this is a legitimate revert that should land on \`:latest\`,"
|
|
echo "manually dispatch this workflow with the target sha as input — the manual-dispatch"
|
|
echo "path skips the ancestry check (operator override)."
|
|
} >> "$GITHUB_STEP_SUMMARY"
|
|
exit 1
|
|
;;
|
|
diverged)
|
|
echo "decision=diverged" >> "$GITHUB_OUTPUT"
|
|
{
|
|
echo "## ❓ Auto-promote refused — history diverged"
|
|
echo
|
|
echo "| Field | Value |"
|
|
echo "|---|---|"
|
|
echo "| Target SHA | \`$TARGET_SHA\` |"
|
|
echo "| Current :latest revision | \`$CURRENT_REVISION\` |"
|
|
echo "| Ancestry compute | \`diverged\` (neither commit reaches the other) |"
|
|
echo
|
|
echo "Likely cause: force-push rewrote main's history, leaving the previous"
|
|
echo "\`:latest\` revision orphaned. Needs human review before \`:latest\` advances."
|
|
} >> "$GITHUB_STEP_SUMMARY"
|
|
exit 1
|
|
;;
|
|
error|*)
|
|
echo "decision=error" >> "$GITHUB_OUTPUT"
|
|
{
|
|
echo "## ❌ Auto-promote aborted — ancestry-check error"
|
|
echo
|
|
echo "Could not resolve both \`$CURRENT_REVISION\` and \`$TARGET_SHA\` in the runner clone (status=\`$STATUS\`)."
|
|
echo "Likely cause: \`fetch-depth: 200\` did not reach \`$CURRENT_REVISION\` — increase the fetch depth in this workflow."
|
|
echo
|
|
echo "Manual dispatch with the target sha bypasses this check."
|
|
} >> "$GITHUB_STEP_SUMMARY"
|
|
exit 1
|
|
;;
|
|
esac
|
|
|
|
- name: Retag platform :staging-<sha> → :latest
|
|
if: steps.gate.outputs.proceed == 'true'
|
|
run: |
|
|
crane tag "${IMAGE_NAME}:staging-${{ steps.sha.outputs.short }}" latest
|
|
|
|
- name: Retag tenant :staging-<sha> → :latest
|
|
if: steps.gate.outputs.proceed == 'true'
|
|
run: |
|
|
crane tag "${TENANT_IMAGE_NAME}:staging-${{ steps.sha.outputs.short }}" latest
|
|
|
|
- name: Summary
|
|
if: steps.gate.outputs.proceed == 'true'
|
|
run: |
|
|
{
|
|
echo "## :latest promoted to ${{ steps.sha.outputs.short }}"
|
|
echo
|
|
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
|
|
echo "- Trigger: manual dispatch"
|
|
else
|
|
echo "- Upstream: \`${{ github.event.workflow_run.name }}\` ([run](${{ github.event.workflow_run.html_url }}))"
|
|
fi
|
|
echo "- platform:staging-${{ steps.sha.outputs.short }} → :latest"
|
|
echo "- platform-tenant:staging-${{ steps.sha.outputs.short }} → :latest"
|
|
echo
|
|
echo "Tenant fleet auto-pulls within 5 min via IMAGE_AUTO_REFRESH=true."
|
|
echo "Force immediate fanout: dispatch redeploy-tenants-on-main.yml."
|
|
} >> "$GITHUB_STEP_SUMMARY"
|