ci(canary): rewrite Probe 3 to actually validate auth (NOP push --dry-run)
Some checks failed
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Successful in 12s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Successful in 15s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Successful in 14s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 31s
pr-guards / disable-auto-merge-on-push (pull_request) Failing after 12s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 26s
E2E API Smoke Test / detect-changes (pull_request) Successful in 33s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 26s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 25s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 25s
Harness Replays / detect-changes (pull_request) Successful in 30s
CI / Detect changes (pull_request) Successful in 50s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 20s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 13s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 13s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 16s
Harness Replays / Harness Replays (pull_request) Successful in 9s
CI / Platform (Go) (pull_request) Successful in 14s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 10s
CI / Canvas (Next.js) (pull_request) Successful in 14s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 11s
CI / Python Lint & Test (pull_request) Successful in 14s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Some checks failed
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Successful in 12s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Successful in 15s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Successful in 14s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 31s
pr-guards / disable-auto-merge-on-push (pull_request) Failing after 12s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 26s
E2E API Smoke Test / detect-changes (pull_request) Successful in 33s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 26s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 25s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 25s
Harness Replays / detect-changes (pull_request) Successful in 30s
CI / Detect changes (pull_request) Successful in 50s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 20s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 13s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 13s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 16s
Harness Replays / Harness Replays (pull_request) Successful in 9s
CI / Platform (Go) (pull_request) Successful in 14s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 10s
CI / Canvas (Next.js) (pull_request) Successful in 14s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 11s
CI / Python Lint & Test (pull_request) Successful in 14s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
While verifying Phase 4, found a real flaw in Probe 3 (`git ls-remote refs/heads/staging`). On a public repo (which molecule-core is), Gitea falls back to anonymous read on bad auth, so `ls-remote` succeeds even with a junk token. The probe was therefore green-lighting rotated tokens — false-green, the worst possible canary failure mode. Rewritten to use `git push --dry-run` of the current staging SHA back to `refs/heads/staging`: - Push always authenticates (auth-gated on smart-protocol handshake, before the dry-run can compute the empty-diff). - NOP by construction: pushing the current tip back to itself is "Everything up-to-date" with exit 0. - Bad token → "Authentication failed", exit 128. - Doesn't reach pre-receive (where branch-protection authz runs), so scope is "auth only" — matches the design intent (failure mode B); authz already covered daily by branch-protection-drift.yml. Implementation note: `git push` requires a local repo. Spinning up a fresh `git init` in a tempdir (~1KB, ~50ms) instead of pulling the full repo via actions/checkout — actions/checkout would clone ~hundreds of MB for what amounts to "a place to run git from." Local mutation tests pass: - Real token: "Everything up-to-date" exit 0 - Junk token: "Authentication failed" exit 128 with actionable ::error:: messages pointing at the runbook Header comment + runbook step-mapping updated to reflect new probe shape. Refs: #72
This commit is contained in:
parent
0cef033a6a
commit
62629eda4a
132
.github/workflows/auto-sync-canary.yml
vendored
132
.github/workflows/auto-sync-canary.yml
vendored
@ -38,11 +38,17 @@ name: Auto-sync canary — AUTO_SYNC_TOKEN rotation drift
|
|||||||
# validates the token has `read:repository` scope on this repo
|
# validates the token has `read:repository` scope on this repo
|
||||||
# (the v2 scope contract — see saved memory
|
# (the v2 scope contract — see saved memory
|
||||||
# `reference_persona_token_v2_scope`).
|
# `reference_persona_token_v2_scope`).
|
||||||
# 3. `git ls-remote https://oauth2:<token>@<gitea>/.../molecule-core
|
# 3. `git push --dry-run` of the current staging SHA back to
|
||||||
# refs/heads/staging` → validates the EXACT HTTPS basic-auth path
|
# `refs/heads/staging` via `https://oauth2:<token>@<gitea>/...`
|
||||||
# that `actions/checkout` uses inside auto-sync-main-to-staging.yml.
|
# → validates the EXACT HTTPS basic-auth path that
|
||||||
# Without this we'd be testing the API surface but not the git
|
# `actions/checkout` + `git push origin staging` use inside
|
||||||
# HTTPS surface; they don't share an auth code path on Gitea.
|
# auto-sync-main-to-staging.yml. NOP by construction (push the
|
||||||
|
# current tip to itself = "Everything up-to-date"); auth is
|
||||||
|
# checked at the smart-protocol handshake BEFORE the empty-diff
|
||||||
|
# computation, so bad token → exit 128 with "Authentication
|
||||||
|
# failed". `git ls-remote` is NOT used here because Gitea
|
||||||
|
# falls back to anonymous read on public repos and would
|
||||||
|
# silently green-light a rotated token.
|
||||||
#
|
#
|
||||||
# Each step exits non-zero with an actionable error message if it
|
# Each step exits non-zero with an actionable error message if it
|
||||||
# fails. The workflow status itself is the operator-facing surface.
|
# fails. The workflow status itself is the operator-facing surface.
|
||||||
@ -93,9 +99,10 @@ name: Auto-sync canary — AUTO_SYNC_TOKEN rotation drift
|
|||||||
# token is invalid OR resolves to wrong persona.
|
# token is invalid OR resolves to wrong persona.
|
||||||
# - Step "Verify token has repo read scope" red → token valid but
|
# - Step "Verify token has repo read scope" red → token valid but
|
||||||
# stripped of `read:repository` scope (or repo perms changed).
|
# stripped of `read:repository` scope (or repo perms changed).
|
||||||
# - Step "Verify git HTTPS auth path works" red → API works but
|
# - Step "Verify git HTTPS auth path via no-op dry-run push to
|
||||||
# git HTTPS auth path is broken (rare; usually means a Gitea
|
# staging" red → token rotated/revoked OR Gitea git-HTTPS
|
||||||
# config drift, not a token issue).
|
# surface is broken (rare). Auth check happens on the
|
||||||
|
# smart-protocol handshake, separate from the API path.
|
||||||
#
|
#
|
||||||
# 2. **Re-issue the token** on the operator host:
|
# 2. **Re-issue the token** on the operator host:
|
||||||
# ```
|
# ```
|
||||||
@ -279,48 +286,101 @@ jobs:
|
|||||||
fi
|
fi
|
||||||
echo "Token has read:repository on ${REPO_PATH} ✓"
|
echo "Token has read:repository on ${REPO_PATH} ✓"
|
||||||
|
|
||||||
- name: Verify git HTTPS auth path resolves staging tip
|
- name: Verify git HTTPS auth path via no-op dry-run push to staging
|
||||||
# Final probe: exercise the EXACT auth path that
|
# Final probe: exercise the EXACT auth path that
|
||||||
# `actions/checkout` uses in auto-sync-main-to-staging.yml.
|
# `actions/checkout` + `git push origin staging` use in
|
||||||
# Gitea's API and git-HTTPS surfaces share the token but
|
# auto-sync-main-to-staging.yml. Gitea's API and git-HTTPS
|
||||||
# take different code paths internally — historically (#173)
|
# surfaces share the token-lookup code path internally but
|
||||||
|
# the wire-level error shapes differ — historically (#173)
|
||||||
# the API path was healthy while git-HTTPS rejected, so
|
# the API path was healthy while git-HTTPS rejected, so
|
||||||
# checking only the API would have given false-green.
|
# checking only the API would have given false-green.
|
||||||
#
|
#
|
||||||
# `git ls-remote --refs` is read-only: lists remote refs
|
# IMPORTANT: `git ls-remote` on a public repo (which
|
||||||
# without fetching pack data. ~1KB on the wire.
|
# molecule-core is) succeeds even with a junk token because
|
||||||
|
# Gitea falls back to anonymous-read. `ls-remote` therefore
|
||||||
|
# CANNOT validate auth on this surface. We use
|
||||||
|
# `git push --dry-run` instead — push is auth-gated even on
|
||||||
|
# public repos.
|
||||||
|
#
|
||||||
|
# NOP shape: read the current staging SHA via authenticated
|
||||||
|
# ls-remote (the SHA itself is public; auth is incidental
|
||||||
|
# here, used only to colocate the discovery in one step), then
|
||||||
|
# `git push --dry-run <SHA>:refs/heads/staging`. Pushing the
|
||||||
|
# current tip back to itself is "Everything up-to-date" with
|
||||||
|
# exit 0 when auth succeeds. With a bad token Gitea returns
|
||||||
|
# HTTP 401 in the smart-protocol handshake and git exits 128
|
||||||
|
# with "Authentication failed".
|
||||||
|
#
|
||||||
|
# The dry-run never reaches Gitea's pre-receive hook (which
|
||||||
|
# is where branch-protection authz runs), so this probe does
|
||||||
|
# not validate failure mode C. That's intentional —
|
||||||
|
# branch-protection-drift.yml owns authz monitoring; this
|
||||||
|
# canary owns auth.
|
||||||
env:
|
env:
|
||||||
# Build the URL inline so the token never appears as a
|
# Don't hang waiting for password prompt if auth fails on a
|
||||||
# literal string anywhere — it's an env-var interpolation,
|
# terminal-attached run. (In Actions there's no terminal,
|
||||||
# subject to GitHub's automatic secret-masking on output.
|
# but the env-var hardens against an interactive runner
|
||||||
GIT_TERMINAL_PROMPT: "0" # don't hang waiting for password if auth fails
|
# config.)
|
||||||
|
GIT_TERMINAL_PROMPT: "0"
|
||||||
run: |
|
run: |
|
||||||
set -euo pipefail
|
set -euo pipefail
|
||||||
# Token is in $AUTO_SYNC_TOKEN (job-level env). Compose the
|
# Token is in $AUTO_SYNC_TOKEN (job-level env). Compose the
|
||||||
# URL as a local var that's never echoed.
|
# URL as a local var that's never echoed.
|
||||||
url="https://oauth2:${AUTO_SYNC_TOKEN}@${GITEA_HOST}/${REPO_PATH}"
|
url="https://oauth2:${AUTO_SYNC_TOKEN}@${GITEA_HOST}/${REPO_PATH}"
|
||||||
|
|
||||||
# `timeout 30s` covers the (rare) case where the network
|
# Step a: read current staging SHA. ~1KB; auth-gated only
|
||||||
# path stalls without curl-style timeout flags — git
|
# on private repos but always works on public — used here
|
||||||
# honours GIT_HTTP_LOW_SPEED_TIME/LIMIT but not a hard wall.
|
# only to discover the SHA, not to validate auth.
|
||||||
if ! out=$(timeout 30s git ls-remote --refs "$url" refs/heads/staging 2>&1); then
|
staging_ref=$(timeout 30s git ls-remote --refs "$url" refs/heads/staging 2>&1) || {
|
||||||
# Redact any accidental token leak in the error output.
|
redacted=$(echo "$staging_ref" | sed -E "s|oauth2:[^@]+@|oauth2:<redacted>@|g")
|
||||||
redacted=$(echo "$out" | sed -E "s|oauth2:[^@]+@|oauth2:<redacted>@|g")
|
echo "::error::ls-remote against staging failed (network/DNS issue):" >&2
|
||||||
echo "::error::git ls-remote against staging failed via the AUTO_SYNC_TOKEN HTTPS auth path." >&2
|
echo "$redacted" >&2
|
||||||
echo "::error::API probes passed but git HTTPS surface is broken — likely Gitea config drift, not a token rotation." >&2
|
exit 1
|
||||||
|
}
|
||||||
|
if ! echo "$staging_ref" | grep -qE '^[0-9a-f]{40}[[:space:]]+refs/heads/staging$'; then
|
||||||
|
echo "::error::ls-remote returned unexpected shape:" >&2
|
||||||
|
echo "$staging_ref" | sed -E "s|oauth2:[^@]+@|oauth2:<redacted>@|g" >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
staging_sha=$(echo "$staging_ref" | awk '{print $1}')
|
||||||
|
|
||||||
|
# Step b: spin up an ephemeral local repo. `git push` always
|
||||||
|
# requires a local repo even when pushing a remote SHA that
|
||||||
|
# isn't in the local object DB (the protocol negotiates and
|
||||||
|
# discovers we don't need to send any objects). We don't use
|
||||||
|
# `actions/checkout` for this — it would clone the whole
|
||||||
|
# repo (~hundreds of MB) for what's essentially `git init`.
|
||||||
|
tmp_repo="$(mktemp -d)"
|
||||||
|
trap 'rm -rf "$tmp_repo"' EXIT
|
||||||
|
git -C "$tmp_repo" init -q
|
||||||
|
# Author config required for any git operation; values are
|
||||||
|
# arbitrary because nothing gets committed here.
|
||||||
|
git -C "$tmp_repo" config user.email canary@auto-sync.local
|
||||||
|
git -C "$tmp_repo" config user.name auto-sync-canary
|
||||||
|
|
||||||
|
# Step c: dry-run push the current staging SHA back to
|
||||||
|
# staging. NOP by construction — the remote tip equals the
|
||||||
|
# SHA we're pushing, so "Everything up-to-date" is the
|
||||||
|
# success path.
|
||||||
|
#
|
||||||
|
# Authentication is checked at the smart-protocol handshake,
|
||||||
|
# BEFORE the dry-run can compute an empty diff. Bad token
|
||||||
|
# → "Authentication failed", exit 128. Good token → exit 0.
|
||||||
|
set +e
|
||||||
|
push_out=$(timeout 30s git -C "$tmp_repo" push --dry-run "$url" "${staging_sha}:refs/heads/staging" 2>&1)
|
||||||
|
push_rc=$?
|
||||||
|
set -e
|
||||||
|
|
||||||
|
if [ "$push_rc" -ne 0 ]; then
|
||||||
|
redacted=$(echo "$push_out" | sed -E "s|oauth2:[^@]+@|oauth2:<redacted>@|g")
|
||||||
|
echo "::error::Token rotation suspected: git push --dry-run against staging failed via the AUTO_SYNC_TOKEN HTTPS auth path (exit $push_rc)." >&2
|
||||||
|
echo "::error::This is the EXACT auth path that actions/checkout + git push use in auto-sync-main-to-staging.yml." >&2
|
||||||
|
echo "::error::Likely cause: AUTO_SYNC_TOKEN was rotated/revoked on Gitea but the repo Actions secret was not updated. Runbook: see header." >&2
|
||||||
echo "$redacted" >&2
|
echo "$redacted" >&2
|
||||||
exit 1
|
exit 1
|
||||||
fi
|
fi
|
||||||
|
|
||||||
# Sanity-check: response should be one line "<sha> refs/heads/staging".
|
echo "git HTTPS auth path: NOP push --dry-run to staging → ${staging_sha:0:8} ✓"
|
||||||
if ! echo "$out" | grep -qE '^[0-9a-f]{40}[[:space:]]+refs/heads/staging$'; then
|
|
||||||
echo "::error::ls-remote returned unexpected shape:" >&2
|
|
||||||
echo "$out" | sed -E "s|oauth2:[^@]+@|oauth2:<redacted>@|g" >&2
|
|
||||||
exit 1
|
|
||||||
fi
|
|
||||||
|
|
||||||
staging_sha=$(echo "$out" | awk '{print $1}')
|
|
||||||
echo "git HTTPS auth path resolves staging → ${staging_sha:0:8} ✓"
|
|
||||||
|
|
||||||
- name: Summarise canary result
|
- name: Summarise canary result
|
||||||
# Everything passed — surface a green summary. (Failures
|
# Everything passed — surface a green summary. (Failures
|
||||||
@ -333,7 +393,7 @@ jobs:
|
|||||||
echo "AUTO_SYNC_TOKEN is healthy:"
|
echo "AUTO_SYNC_TOKEN is healthy:"
|
||||||
echo "- Authenticates as \`${EXPECTED_PERSONA}\` ✓"
|
echo "- Authenticates as \`${EXPECTED_PERSONA}\` ✓"
|
||||||
echo "- Has \`read:repository\` scope on \`${REPO_PATH}\` ✓"
|
echo "- Has \`read:repository\` scope on \`${REPO_PATH}\` ✓"
|
||||||
echo "- Git HTTPS auth path resolves \`refs/heads/staging\` ✓"
|
echo "- Git HTTPS auth path: no-op dry-run push to \`refs/heads/staging\` succeeds ✓"
|
||||||
echo ""
|
echo ""
|
||||||
echo "Auto-sync main → staging will succeed on the next push to main."
|
echo "Auto-sync main → staging will succeed on the next push to main."
|
||||||
echo "If this canary ever goes RED, see the runbook in this workflow's header."
|
echo "If this canary ever goes RED, see the runbook in this workflow's header."
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user