ci(provisioner-parity): enforce the fast local prod-mimic parity test as a fail-closed merge gate
Some checks failed
CI / Canvas Deploy Reminder (pull_request) Blocked by required conditions
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Blocked by required conditions
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Waiting to run
Handlers Postgres Integration / detect-changes (pull_request) Waiting to run
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Blocked by required conditions
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 21s
CI / Detect changes (pull_request) Successful in 40s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 43s
E2E API Smoke Test / detect-changes (pull_request) Successful in 42s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Has been skipped
E2E Chat / detect-changes (pull_request) Successful in 35s
Harness Replays / detect-changes (pull_request) Successful in 37s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 20s
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Successful in 1m20s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 2m8s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 3m6s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Failing after 2m18s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 2m23s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 23s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 22s
qa-review / approved (pull_request) Failing after 29s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 2m5s
security-review / approved (pull_request) Failing after 38s
CI / Python Lint & Test (pull_request) Failing after 8m41s
CI / all-required (pull_request) Failing after 8m33s
CI / Canvas (Next.js) (pull_request) Successful in 22m56s
CI / Provisioner Parity (pull_request) Has been cancelled
CI / Platform (Go) (pull_request) Successful in 24m59s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 3m6s
Harness Replays / Harness Replays (pull_request) Successful in 15s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 36s
E2E Chat / E2E Chat (pull_request) Failing after 10m31s
gate-check-v3 / gate-check (pull_request) Successful in 4s
sop-tier-check / tier-check (pull_request) Successful in 3s
lint-mask-pr-atomicity / lint-mask-pr-atomicity (pull_request) Failing after 50s
sop-checklist / all-items-acked (pull_request) acked: 7/7
sop-checklist / na-declarations (pull_request) N/A: (none)

The token-injection/ownership bug class — platform delivers
/configs/.auth_token root:root AFTER the entrypoint chown, so the
uid-1000 agent's save_token O_WRONLY|O_TRUNC is denied -> list_peers /
heartbeat 401 forever — shipped to the fleet (Hermes #1877/#418) and
again on template-hermes #162 precisely because nothing ENFORCED the
local check. The dev-SOP only referenced feedback_mandatory_local_e2e_
before_ship as prose; prose does not stop a PR.

This wires the //go:build local provisioner-parity test (added in this
PR) into CI as a real gate:

- new provisioner-parity job runs `go test -tags local -run
  TestTokenOwnership` against the runner's Docker daemon. The test
  self-skips Docker-less (keeps `make test` / Platform (Go) green on
  dev machines); this job runs on a Docker-capable runner and treats a
  SKIP or empty run as a FAILURE (fail-closed).
- outcomes parsed from the test2json stream as real JSON (Package sits
  between Action and Test; a grep adjacency match counts zero — a
  vacuous-green trap caught and fixed in verification).
- requires BOTH the headline parity test AND its fail-direction proof
  control (TestTokenOwnership_FailPre_ProvesCatch) to pass.
- joins the `CI / all-required` aggregator (RFC internal#219 §2) so
  branch protection fail-closes on it with NO branch-protection edit.

Verified locally: PASS-case exit 0; Hermes-bug-present FAIL-case exit
1; no-daemon SKIP-case exit 1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Molecule AI · core-be 2026-05-16 11:56:22 -07:00
parent efd755604f
commit c9175c071c

View File

@ -294,6 +294,132 @@ jobs:
exit 1
fi
# Provisioner Parity — fast local prod-mimic gate. REQUIRED, always runs.
#
# WHY THIS IS A GATE, NOT A DOC LINE (feedback_checkpointed_workflow_over
# _good_practice_doc): the dev-SOP already *referenced*
# feedback_mandatory_local_e2e_before_ship as prose, but prose does not
# stop a PR. The token-injection/ownership bug class — platform writes
# /configs/.auth_token root:root AFTER the entrypoint chown, so the
# uid-1000 agent's save_token O_WRONLY|O_TRUNC is denied → list_peers /
# heartbeat 401 forever — shipped to the fleet (Hermes #1877/#418) and
# again on template-hermes #162 (the bearer-401 being landed right now)
# precisely because nothing *enforced* the local check. The parity test
# (workspace-server/internal/provisioner/provisioner_token_ownership_
# local_test.go, `//go:build local`) reproduces that exact class against
# a LOCAL Docker daemon in <1s — versus an ~1h EC2 fresh-provision. This
# job makes it fail-closed on every workspace-server PR.
#
# FAIL-CLOSED CONTRACT: the test self-skips when no Docker daemon is
# reachable (so `make test` / `go test ./...` stay green on Docker-less
# dev machines and the standard Platform (Go) job). A *gate* that
# silently skips is not a gate. This job runs on a Docker-capable runner
# and treats "0 parity tests ran" as a FAILURE — a skipped daemon here
# means the gate did not execute, which must block merge, not pass.
#
# Always-run + per-step gating shape mirrors platform-build so the
# `CI / Provisioner Parity (<event>)` required-check name is always
# emitted (SKIPPED != passed under branch protection — PR #2314).
provisioner-parity:
name: Provisioner Parity
runs-on: ubuntu-latest
needs: [changes]
continue-on-error: false
# Test is seconds-local; generous ceiling absorbs a cold alpine pull
# on a slow runner link plus Go module/build cold cache.
timeout-minutes: 15
defaults:
run:
working-directory: workspace-server
steps:
- if: ${{ needs.changes.outputs.platform != 'true' }}
working-directory: .
run: echo "No workspace-server/** changes — parity gate is a no-op for this PR; this job always runs to satisfy the required-check name on branch protection."
- if: ${{ needs.changes.outputs.platform == 'true' }}
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- if: ${{ needs.changes.outputs.platform == 'true' }}
uses: actions/setup-go@40f1582b2485089dde7abd97c1529aa768e1baff # v5
with:
go-version: 'stable'
- if: ${{ needs.changes.outputs.platform == 'true' }}
run: go mod download
- if: ${{ needs.changes.outputs.platform == 'true' }}
name: Fast local prod-mimic provisioner-parity (fail-closed)
# Run the `//go:build local` parity suite against the runner's
# Docker daemon. -json lets us assert tests actually RAN: a skip
# (no daemon) or zero-test run must fail this gate, never pass it.
run: |
set -euo pipefail
echo "Docker daemon check (gate requires a reachable daemon):"
docker version --format '{{.Server.Version}}' \
|| { echo "::error::Provisioner-parity gate could not reach a Docker daemon. This gate is fail-closed: a missing daemon means the token-ownership class was NOT checked. Failing the PR rather than passing un-tested."; exit 1; }
set +e
go test -tags local -json -run 'TestTokenOwnership' \
-timeout 12m ./internal/provisioner/ | tee /tmp/parity.json
gotest_exit=${PIPESTATUS[0]}
set -e
# Parse the test2json stream as real JSON. test2json emits
# objects as {"Action":..,"Package":..,"Test":..} — field ORDER
# is not guaranteed and `Package` sits between `Action` and
# `Test`, so a grep adjacency match silently counts ZERO (a
# vacuous-green trap that nearly shipped here — caught in
# verification). Per-test terminal action is the source of truth.
GOTEST_EXIT="$gotest_exit" python3 - <<'PY'
import json, os, sys
headline = "TestTokenOwnership_LocalProvisionerParity"
proof = "TestTokenOwnership_FailPre_ProvesCatch"
outcome = {} # test name -> last terminal action
with open("/tmp/parity.json") as fh:
for line in fh:
line = line.strip()
if not line or not line.startswith("{"):
continue
try:
ev = json.loads(line)
except json.JSONDecodeError:
continue
t = ev.get("Test")
a = ev.get("Action")
if t and a in ("pass", "fail", "skip"):
outcome[t] = a
passed = sum(1 for v in outcome.values() if v == "pass")
failed = sum(1 for v in outcome.values() if v == "fail")
skipped = sum(1 for v in outcome.values() if v == "skip")
go_exit = int(os.environ["GOTEST_EXIT"])
print(f"parity outcomes: passed={passed} failed={failed} "
f"skipped={skipped} go_exit={go_exit} "
f"per-test={outcome}")
if go_exit != 0 or failed > 0:
print("::error::Provisioner-parity FAILED — the "
"token-injection/ownership bug class (Hermes "
"#1877/#418, template-hermes #162: /configs token "
"files delivered root:root, uid-1000 agent save_token "
"denied -> list_peers/heartbeat 401) is present. Fix "
"the provisioner injection to deliver AgentUID-owned "
"files before merge.")
sys.exit(1)
# The headline parity test AND its fail-direction proof control
# MUST have run and passed. If either was skipped (no daemon) or
# never collected, the gate did not actually execute its job —
# fail-closed, never pass un-checked.
if outcome.get(headline) != "pass":
print(f"::error::Provisioner-parity gate did NOT execute "
f"the headline test ({headline}={outcome.get(headline)}"
f"). Fail-closed: a skipped/absent parity run means "
f"the token-ownership class was never checked on this "
f"PR — treated as a gate failure.")
sys.exit(1)
if outcome.get(proof) != "pass":
print(f"::error::Provisioner-parity fail-direction proof "
f"control did NOT pass ({proof}="
f"{outcome.get(proof)}). Without it the headline "
f"assertion is not proven load-bearing — fail-closed.")
sys.exit(1)
print(f"Provisioner-parity gate PASSED: token-ownership class "
f"checked locally and the fail-direction proof control "
f"confirms the assertion is load-bearing (passed={passed}).")
PY
# Canvas (Next.js) — required check, always runs. Same always-run +
# per-step gating shape as platform-build. The two-job-sharing-name
# pattern attempted in PR #2321 doesn't satisfy branch protection
@ -591,6 +717,14 @@ jobs:
required = [
f"CI / Detect changes ({event})",
f"CI / Platform (Go) ({event})",
# Fast local prod-mimic provisioner-parity gate (this PR).
# Wired here — not into branch-protection's
# status_check_contexts — by RFC internal#219 §2 design:
# the single stable `CI / all-required` context is what BP
# points at, and new fail-closed gates join by extending
# this list. Makes the token-ownership class (Hermes
# #1877/#418, template-hermes #162) a hard merge gate.
f"CI / Provisioner Parity ({event})",
f"CI / Canvas (Next.js) ({event})",
f"CI / Shellcheck (E2E scripts) ({event})",
f"CI / Python Lint & Test ({event})",