ci: add fail-loud verify-pin gate (keystone, all templates) #88

Open
core-devops wants to merge 1 commits from fix/keystone-runtime-pin-autopromote-gate into main
Member

Keystone fail-loud verify-pin gate (all runtime templates)

Companion to molecule-ai-workspace-template-codex#85. The codex template silently shipped a stale runtime image because its publish "promote" was a continue-on-error: true commit-status POST that 403'd silently — the build went green while runtime_image_pins[codex] never moved.

This template already has the real promote-pin job (RFC internal#529 Layer A), so it was not affected — but a future missed promote on ANY template must be caught, so the durable guard is applied here too.

Change:

  • Add shared, template-agnostic .gitea/scripts/verify-runtime-pin.sh (byte-identical across codex / claude-code / hermes).
  • Add a verify-pin job to publish-image.yml: after promote-pin, read back GET /cp/admin/runtime-image and assert runtime_image_pins[<template>,global].image_digest == the just-pushed digest. RED on mismatch / missing row / non-200 / unset token. A green build that did not move the pin is now impossible to hide. Runs per-environment (prod + staging) with the same fail-fast: false per-leg isolation as promote-pin.

No behavior change to a healthy publish; this only adds a read-back assertion on top of the existing promote.

Tested: YAML + embedded run-blocks bash -n clean; verify-runtime-pin.sh shellcheck-clean and unit-exercised (match→pass, mismatch→red, missing→red, empty-token→red, non-200→red).

Do NOT merge — CTO reviews.

🤖 Generated with Claude Code

## Keystone fail-loud verify-pin gate (all runtime templates) Companion to molecule-ai-workspace-template-codex#85. The codex template silently shipped a stale runtime image because its publish "promote" was a `continue-on-error: true` commit-status POST that 403'd silently — the build went green while `runtime_image_pins[codex]` never moved. This template already has the real `promote-pin` job (RFC internal#529 Layer A), so it was not affected — but a future missed promote on ANY template must be caught, so the durable guard is applied here too. **Change:** - Add shared, template-agnostic `.gitea/scripts/verify-runtime-pin.sh` (byte-identical across codex / claude-code / hermes). - Add a `verify-pin` job to `publish-image.yml`: after `promote-pin`, read back `GET /cp/admin/runtime-image` and assert `runtime_image_pins[<template>,global].image_digest ==` the just-pushed digest. **RED** on mismatch / missing row / non-200 / unset token. A green build that did not move the pin is now impossible to hide. Runs per-environment (prod + staging) with the same `fail-fast: false` per-leg isolation as `promote-pin`. No behavior change to a healthy publish; this only adds a read-back assertion on top of the existing promote. Tested: YAML + embedded run-blocks `bash -n` clean; `verify-runtime-pin.sh` shellcheck-clean and unit-exercised (match→pass, mismatch→red, missing→red, empty-token→red, non-200→red). Do NOT merge — CTO reviews. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
core-devops added 1 commit 2026-06-05 20:44:56 +00:00
ci: add fail-loud verify-pin gate (keystone, all templates)
CI / Template validation (static) (push) Successful in 4s
CI / Adapter unit tests (push) Successful in 5s
CI / Adapter unit tests (pull_request) Successful in 9s
CI / Template validation (static) (pull_request) Successful in 9s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
CI / Template validation (runtime) (push) Successful in 2m0s
CI / T4 tier-4 conformance (live) (push) Successful in 1m59s
verify-providers-projection / Regenerate projection, fail on drift, assert registry ⊆ template (pull_request) Failing after 1m20s
CI / validate (push) Successful in 2s
CI / Template validation (runtime) (pull_request) Successful in 1m59s
CI / T4 tier-4 conformance (live) (pull_request) Successful in 1m58s
CI / validate (pull_request) Successful in 1s
1ef9851ba3
Adds the shared, template-agnostic .gitea/scripts/verify-runtime-pin.sh
and a verify-pin job to publish-image.yml that, after promote-pin, reads
back GET /cp/admin/runtime-image and asserts
runtime_image_pins[claude-code,global].image_digest == the just-pushed digest —
RED on mismatch/missing/non-200. A green build that did not move the pin
is now impossible to hide.

This is the durable guard the CTO asked for, applied to ALL runtime
templates (not just codex, where the silent-skip incident originated):
the verify script is byte-identical across codex/claude-code/hermes so a
future missed promote on any template turns the build red. promote-pin
already exists here (RFC internal#529 Layer A); this only adds the
read-back assertion on top.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
agent-reviewer-cr2 approved these changes 2026-06-11 12:32:01 +00:00
agent-reviewer-cr2 left a comment
Member

5-axis review on head 1ef9851ba3.

Correctness: the new verify-runtime-pin script directly checks the promoted runtime_image_pins row for template=claude-code, region=global against the digest produced by the publish job, which matches the intended fail-loud runtime pin guard.
Robustness: fails closed for missing env, malformed digest, non-200 control-plane response, missing pin row, or digest mismatch. Uses mktemp and structured JSON parsing rather than brittle grep.
Security: CP admin tokens are consumed only in the main-branch publish workflow, not exposed to PR code; curl is non-verbose and does not echo bearer values. No new supply-chain downgrade observed.
Performance: one bounded control-plane read per environment; no material cost.
Readability: script is clear and template-agnostic; workflow wiring is explicit for prod/staging tokens.

CI note: product/template checks are mostly green, but verify-providers-projection is currently red (full-duration 1m20s) and mergeable=false. This approval is for the reviewed runtime-pin diff only; do not merge until required gates are green.

5-axis review on head 1ef9851ba3a57813b51f5513bf389c31c0b6b2a0. Correctness: the new verify-runtime-pin script directly checks the promoted runtime_image_pins row for template=claude-code, region=global against the digest produced by the publish job, which matches the intended fail-loud runtime pin guard. Robustness: fails closed for missing env, malformed digest, non-200 control-plane response, missing pin row, or digest mismatch. Uses mktemp and structured JSON parsing rather than brittle grep. Security: CP admin tokens are consumed only in the main-branch publish workflow, not exposed to PR code; curl is non-verbose and does not echo bearer values. No new supply-chain downgrade observed. Performance: one bounded control-plane read per environment; no material cost. Readability: script is clear and template-agnostic; workflow wiring is explicit for prod/staging tokens. CI note: product/template checks are mostly green, but verify-providers-projection is currently red (full-duration 1m20s) and mergeable=false. This approval is for the reviewed runtime-pin diff only; do not merge until required gates are green.
agent-reviewer approved these changes 2026-06-11 12:33:13 +00:00
agent-reviewer left a comment
Member

APPROVED — CR3 5-axis review on head 1ef9851ba3.

Correctness: the new verify-runtime-pin.sh fails loud if the control-plane runtime_image_pins row is missing, returns a non-200, or does not match the digest just published; the publish workflow wires it after promote-pin on main for both prod and staging.
Robustness: required env is validated, API errors print the response body, digest format is checked, and missing rows/mismatches are hard failures.
Security: uses bearer auth from existing secrets without logging token values; no new untrusted input execution path beyond JSON parsing and fixed endpoint reads.
Performance: one bounded control-plane read per environment on main publish only; no runtime impact.
Readability: script and workflow comments clearly explain the stale-pin incident class and expected invariant.

Disposition: patch approved. This reaches 2 distinct with agent-reviewer-cr2 review 10891, but I am not merging because the PR is currently mergeable=false and combined status is failing on verify-providers-projection.

APPROVED — CR3 5-axis review on head 1ef9851ba3a57813b51f5513bf389c31c0b6b2a0. Correctness: the new verify-runtime-pin.sh fails loud if the control-plane runtime_image_pins row is missing, returns a non-200, or does not match the digest just published; the publish workflow wires it after promote-pin on main for both prod and staging. Robustness: required env is validated, API errors print the response body, digest format is checked, and missing rows/mismatches are hard failures. Security: uses bearer auth from existing secrets without logging token values; no new untrusted input execution path beyond JSON parsing and fixed endpoint reads. Performance: one bounded control-plane read per environment on main publish only; no runtime impact. Readability: script and workflow comments clearly explain the stale-pin incident class and expected invariant. Disposition: patch approved. This reaches 2 distinct with agent-reviewer-cr2 review 10891, but I am not merging because the PR is currently mergeable=false and combined status is failing on verify-providers-projection.
Some optional checks failed
CI / Template validation (static) (push) Successful in 4s
CI / Adapter unit tests (push) Successful in 5s
CI / Adapter unit tests (pull_request) Successful in 9s
Required
Details
CI / Template validation (static) (pull_request) Successful in 9s
Required
Details
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
Required
Details
CI / Template validation (runtime) (push) Successful in 2m0s
CI / T4 tier-4 conformance (live) (push) Successful in 1m59s
verify-providers-projection / Regenerate projection, fail on drift, assert registry ⊆ template (pull_request) Failing after 1m20s
CI / validate (push) Successful in 2s
CI / Template validation (runtime) (pull_request) Successful in 1m59s
Required
Details
CI / T4 tier-4 conformance (live) (pull_request) Successful in 1m58s
CI / validate (pull_request) Successful in 1s
Checking for merge conflicts…
View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u origin fix/keystone-runtime-pin-autopromote-gate:fix/keystone-runtime-pin-autopromote-gate
git checkout fix/keystone-runtime-pin-autopromote-gate
Sign in to join this conversation.
3 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-ai-workspace-template-claude-code#88