ci: canary-verify graceful-skip + draft auto-promote staging→main
Two related workflow hygiene changes: ## (1) canary-verify: graceful-skip when canary secrets absent Before: canary-verify hit `scripts/canary-smoke.sh` which exited non-zero when CANARY_TENANT_URLS was empty. Every main publish ran → canary-verify failed → red check on main CI signal (7/7 in past 24h). Noise, no value. After: smoke step detects the missing-secrets case, writes a warning to the step summary, sets an output `smoke_ran=false`, and exits 0. The workflow completes green without pretending to have tested anything. Gated downstream: `promote-to-latest` now requires BOTH `needs.canary-smoke.result == success` AND `needs.canary-smoke.outputs.smoke_ran == true`. A skip does NOT auto-promote — manual `promote-latest.yml` remains the release gate while Phase 2 canary is absent (see molecule-controlplane/docs/canary-tenants.md for the fleet stand-up plan + decision framework). When the canary fleet is stood up and secrets populated: delete the early-exit branch + the smoke_ran gate. The workflow goes back to its original "smoke gates promotion" semantics. ## (2) auto-promote-staging.yml — draft New workflow that fires after CI / E2E Staging Canvas / E2E API / CodeQL complete on the staging branch, checks that ALL four are green on the same SHA, and fast-forwards `main` to that SHA. Shipped disabled: the promote step is gated behind repo variable `AUTO_PROMOTE_ENABLED=true`. Until that's set, the workflow dry-runs and logs what it would have done. Toggle via Settings → Variables when staging CI has been reliably green for a few days. Safety: - workflow_run events only fire on push to staging (PRs into staging don't promote). - Every required gate must be `completed/success` on the same head_sha. Pending / failed / skipped / cancelled → abort. - `--ff-only` push. Refuses to advance main if it has diverged from staging history (someone landed a direct-to-main commit that's not on staging). Human resolves the fork. - `workflow_dispatch` with `force=true` lets us test the flow end-to-end before flipping the variable on. Motivation: molecule-core#1496 has been open with 1172 commits divergence between staging and main. Today that trapped PR #1526 (dynamic canvas runtime dropdown) on staging while prod users hit the hardcoded-dropdown bug. Auto-promote retires the bulk staging→main PR pattern once the staging CI it depends on is reliable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
28bf11fb85
commit
0082568448
182
.github/workflows/auto-promote-staging.yml
vendored
Normal file
182
.github/workflows/auto-promote-staging.yml
vendored
Normal file
@ -0,0 +1,182 @@
|
||||
name: Auto-promote staging → main
|
||||
|
||||
# Fires after any of the staging-branch quality gates complete. When ALL
|
||||
# required gates are green on the same staging SHA, fast-forwards `main`
|
||||
# to that SHA automatically — closing the gap that historically let
|
||||
# features sit on staging for weeks waiting for a bulk promotion PR
|
||||
# (see molecule-core#1496 for the 1172-commit example).
|
||||
#
|
||||
# Safety model:
|
||||
# - Runs ONLY on workflow_run events for the staging branch.
|
||||
# - Requires EVERY named gate workflow to have the same head_sha and
|
||||
# all be `conclusion == success`. If any of them is red, skipped,
|
||||
# cancelled, or pending, we abort (stay on the current main).
|
||||
# - Uses --ff-only: refuses to advance main if main has diverged from
|
||||
# the staging history (e.g. a hotfix landed directly on main). In
|
||||
# that case a human resolves the fork.
|
||||
# - Writes a commit summary so the promote shows up in git log as a
|
||||
# deliberate act, not a stealth move.
|
||||
#
|
||||
# **Initial rollout:** ship this file but leave the `enabled` input set
|
||||
# such that nothing auto-promotes until staging CI has been reliably
|
||||
# green for a few days. Toggle via repo variable `AUTO_PROMOTE_ENABLED`.
|
||||
|
||||
on:
|
||||
workflow_run:
|
||||
workflows:
|
||||
- CI
|
||||
- E2E Staging Canvas (Playwright)
|
||||
- E2E API Smoke Test
|
||||
- CodeQL
|
||||
types: [completed]
|
||||
workflow_dispatch:
|
||||
inputs:
|
||||
force:
|
||||
description: "Force promote even when AUTO_PROMOTE_ENABLED is unset (manual override)"
|
||||
required: false
|
||||
default: "false"
|
||||
|
||||
permissions:
|
||||
contents: write
|
||||
|
||||
jobs:
|
||||
check-all-gates-green:
|
||||
# Only consider staging pushes. PRs into staging don't promote.
|
||||
if: >
|
||||
(github.event_name == 'workflow_run' &&
|
||||
github.event.workflow_run.head_branch == 'staging' &&
|
||||
github.event.workflow_run.event == 'push')
|
||||
|| github.event_name == 'workflow_dispatch'
|
||||
runs-on: ubuntu-latest
|
||||
outputs:
|
||||
all_green: ${{ steps.gates.outputs.all_green }}
|
||||
head_sha: ${{ steps.gates.outputs.head_sha }}
|
||||
steps:
|
||||
- name: Check all required gates on this SHA
|
||||
id: gates
|
||||
env:
|
||||
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
||||
HEAD_SHA: ${{ github.event.workflow_run.head_sha || github.sha }}
|
||||
REPO: ${{ github.repository }}
|
||||
run: |
|
||||
set -euo pipefail
|
||||
|
||||
# Required gate workflow names. Must match the `name:` field
|
||||
# in the respective .github/workflows/*.yml files.
|
||||
GATES=(
|
||||
"CI"
|
||||
"E2E Staging Canvas (Playwright)"
|
||||
"E2E API Smoke Test"
|
||||
"CodeQL"
|
||||
)
|
||||
|
||||
echo "head_sha=${HEAD_SHA}" >> "$GITHUB_OUTPUT"
|
||||
echo "Checking gates on SHA ${HEAD_SHA}"
|
||||
|
||||
ALL_GREEN=true
|
||||
for gate in "${GATES[@]}"; do
|
||||
# Query the most recent run of this workflow on this SHA.
|
||||
# event=push to avoid picking up PR runs. branch=staging to
|
||||
# guard against someone dispatching the gate on a non-staging
|
||||
# branch at the same SHA.
|
||||
RESULT=$(gh run list \
|
||||
--repo "$REPO" \
|
||||
--workflow "$gate" \
|
||||
--branch staging \
|
||||
--event push \
|
||||
--commit "$HEAD_SHA" \
|
||||
--limit 1 \
|
||||
--json status,conclusion \
|
||||
--jq '.[0] | "\(.status)/\(.conclusion // "none")"' \
|
||||
2>/dev/null || echo "missing/none")
|
||||
|
||||
echo " $gate → $RESULT"
|
||||
|
||||
# Only completed/success counts. completed/failure or
|
||||
# in_progress/anything or no record at all = abort.
|
||||
if [ "$RESULT" != "completed/success" ]; then
|
||||
ALL_GREEN=false
|
||||
fi
|
||||
done
|
||||
|
||||
echo "all_green=${ALL_GREEN}" >> "$GITHUB_OUTPUT"
|
||||
if [ "$ALL_GREEN" != "true" ]; then
|
||||
echo "::notice::auto-promote: not all gates are green on ${HEAD_SHA} — staying on current main"
|
||||
fi
|
||||
|
||||
promote:
|
||||
needs: check-all-gates-green
|
||||
if: needs.check-all-gates-green.outputs.all_green == 'true'
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: Check rollout gate
|
||||
env:
|
||||
AUTO_PROMOTE_ENABLED: ${{ vars.AUTO_PROMOTE_ENABLED }}
|
||||
FORCE_INPUT: ${{ github.event.inputs.force }}
|
||||
run: |
|
||||
set -eu
|
||||
# Repo variable AUTO_PROMOTE_ENABLED=true flips this on. While
|
||||
# it's unset, the workflow dry-runs (logs what it would have
|
||||
# done) but doesn't actually push to main. Set the variable in
|
||||
# Settings → Secrets and variables → Actions → Variables.
|
||||
if [ "${AUTO_PROMOTE_ENABLED:-}" != "true" ] && [ "${FORCE_INPUT:-false}" != "true" ]; then
|
||||
{
|
||||
echo "## ⏸ Auto-promote disabled"
|
||||
echo
|
||||
echo "Repo variable \`AUTO_PROMOTE_ENABLED\` is not set to \`true\`."
|
||||
echo "All gates are green on staging; would have promoted to \`main\`."
|
||||
echo
|
||||
echo "To enable: Settings → Secrets and variables → Actions → Variables → \`AUTO_PROMOTE_ENABLED=true\`."
|
||||
echo "To test once manually: workflow_dispatch with \`force=true\`."
|
||||
} >> "$GITHUB_STEP_SUMMARY"
|
||||
echo "::notice::auto-promote disabled — dry run only"
|
||||
exit 0
|
||||
fi
|
||||
|
||||
- name: Checkout main
|
||||
if: ${{ vars.AUTO_PROMOTE_ENABLED == 'true' || github.event.inputs.force == 'true' }}
|
||||
uses: actions/checkout@v4
|
||||
with:
|
||||
ref: main
|
||||
fetch-depth: 0
|
||||
token: ${{ secrets.GITHUB_TOKEN }}
|
||||
|
||||
- name: Fast-forward main → staging HEAD
|
||||
if: ${{ vars.AUTO_PROMOTE_ENABLED == 'true' || github.event.inputs.force == 'true' }}
|
||||
env:
|
||||
TARGET_SHA: ${{ needs.check-all-gates-green.outputs.head_sha }}
|
||||
run: |
|
||||
set -eu
|
||||
git config user.name "github-actions[bot]"
|
||||
git config user.email "41898282+github-actions[bot]@users.noreply.github.com"
|
||||
|
||||
git fetch origin staging
|
||||
git fetch origin main
|
||||
|
||||
# Refuse to advance main if it's diverged from staging history.
|
||||
# Someone landed a commit directly on main that's not on
|
||||
# staging → human needs to decide how to reconcile.
|
||||
if ! git merge-base --is-ancestor "$(git rev-parse origin/main)" "$TARGET_SHA"; then
|
||||
{
|
||||
echo "## ❌ Auto-promote refused — main has diverged"
|
||||
echo
|
||||
echo "\`main\` (\`$(git rev-parse --short origin/main)\`) is not an ancestor of staging (\`${TARGET_SHA:0:7}\`)."
|
||||
echo "Someone committed directly to main or the histories forked."
|
||||
echo
|
||||
echo "Resolve manually: merge main into staging, get CI green on the merged commit,"
|
||||
echo "then the auto-promote will succeed on the next run."
|
||||
} >> "$GITHUB_STEP_SUMMARY"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Fast-forward main to the target SHA.
|
||||
git checkout main
|
||||
git merge --ff-only "$TARGET_SHA"
|
||||
git push origin main
|
||||
|
||||
{
|
||||
echo "## ✅ Auto-promoted main → ${TARGET_SHA:0:7}"
|
||||
echo
|
||||
echo "All gate workflows green on staging at this SHA."
|
||||
echo "\`main\` fast-forwarded to match."
|
||||
} >> "$GITHUB_STEP_SUMMARY"
|
||||
34
.github/workflows/canary-verify.yml
vendored
34
.github/workflows/canary-verify.yml
vendored
@ -37,6 +37,7 @@ jobs:
|
||||
runs-on: ubuntu-latest
|
||||
outputs:
|
||||
sha: ${{ steps.compute.outputs.sha }}
|
||||
smoke_ran: ${{ steps.smoke.outputs.ran }}
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v4
|
||||
@ -85,12 +86,38 @@ jobs:
|
||||
echo "Timeout after ${MAX_WAIT}s — proceeding anyway (smoke suite will validate)"
|
||||
|
||||
- name: Run canary smoke suite
|
||||
id: smoke
|
||||
# Graceful-skip when no canary fleet is configured (Phase 2 not yet
|
||||
# stood up — see molecule-controlplane/docs/canary-tenants.md).
|
||||
# Sets `ran=false` on skip so promote-to-latest stays off (we don't
|
||||
# want every main merge auto-promoting without gating). Manual
|
||||
# promote-latest.yml is the release gate while canary is absent.
|
||||
# Once the fleet is real: delete the early-exit branch.
|
||||
env:
|
||||
CANARY_TENANT_URLS: ${{ secrets.CANARY_TENANT_URLS }}
|
||||
CANARY_ADMIN_TOKENS: ${{ secrets.CANARY_ADMIN_TOKENS }}
|
||||
CANARY_CP_BASE_URL: https://staging-api.moleculesai.app
|
||||
CANARY_CP_SHARED_SECRET: ${{ secrets.CANARY_CP_SHARED_SECRET }}
|
||||
run: bash scripts/canary-smoke.sh
|
||||
run: |
|
||||
set -euo pipefail
|
||||
if [ -z "${CANARY_TENANT_URLS:-}" ] \
|
||||
|| [ -z "${CANARY_ADMIN_TOKENS:-}" ] \
|
||||
|| [ -z "${CANARY_CP_SHARED_SECRET:-}" ]; then
|
||||
{
|
||||
echo "## ⚠️ canary-verify skipped"
|
||||
echo
|
||||
echo "One or more canary secrets are unset (\`CANARY_TENANT_URLS\`, \`CANARY_ADMIN_TOKENS\`, \`CANARY_CP_SHARED_SECRET\`)."
|
||||
echo "Phase 2 canary fleet has not been stood up yet —"
|
||||
echo "see [canary-tenants.md](https://github.com/Molecule-AI/molecule-controlplane/blob/main/docs/canary-tenants.md)."
|
||||
echo
|
||||
echo "**Skipped — promote-to-latest will NOT auto-fire.** Dispatch \`promote-latest.yml\` manually when ready."
|
||||
} >> "$GITHUB_STEP_SUMMARY"
|
||||
echo "ran=false" >> "$GITHUB_OUTPUT"
|
||||
echo "::notice::canary-verify: skipped — no canary fleet configured"
|
||||
exit 0
|
||||
fi
|
||||
bash scripts/canary-smoke.sh
|
||||
echo "ran=true" >> "$GITHUB_OUTPUT"
|
||||
|
||||
- name: Summary on failure
|
||||
if: ${{ failure() }}
|
||||
@ -109,8 +136,11 @@ jobs:
|
||||
# On green, retag :staging-<sha> → :latest for BOTH images.
|
||||
# crane is a lightweight registry client (no Docker daemon needed on
|
||||
# the runner) that can retag remotely with a single API call each.
|
||||
# Gated on smoke_ran=true — without a real canary fleet the smoke
|
||||
# step no-ops with success, and we don't want that to silently
|
||||
# auto-promote every main merge.
|
||||
needs: canary-smoke
|
||||
if: ${{ needs.canary-smoke.result == 'success' }}
|
||||
if: ${{ needs.canary-smoke.result == 'success' && needs.canary-smoke.outputs.smoke_ran == 'true' }}
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: imjasonh/setup-crane@v0.4
|
||||
|
||||
Loading…
Reference in New Issue
Block a user