feat(ci): BP-context drift-gate (advisory) — port molecule-core fail-closed BP⊆emitted asserter into molecule-ci SSOT #32
Reference in New Issue
Block a user
Delete Branch "feat/bp-context-drift-gate"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
What
Ports molecule-core's Tier-2f BP⊆emitted asserter (
.gitea/scripts/lint_bp_context_emit_match.py, internal#350) into molecule-ci's.molecule-ci/scripts/SSOT, and wires it as an advisory PR-time drift gate.Why — the perma-block root cause
A branch-protection required status context that no workflow emits is perma-pending: Gitea treats absent-as-
pending, not absent-as-skipped-as-success. So every PR into that branch waits forever for a status that never arrives, and the merge endpoint returns HTTP 405 "try again later" indefinitely. This is the phantom-required-check class (feedback_phantom_required_check_after_gitea_migration). This gate assertsBP-required ⊆ emittedso a required context with no emitter is caught before it wedges a repo.Advisory now, required later
.gitea/workflows/bp-context-drift-gate.yml) iscontinue-on-error: true— it never blocks a merge in this PR.continue-on-errorand adding the context to BP.MODE switch (new)
MODE=assert— what the gate runs: assertBP-required ⊆ emittedfor the current repo, exit 1 with::error::on any orphan, and skip the issue-file/PATCH path entirely (no token-write side effects on a PR).MODE=issue— (default) the original scheduled-sweep behavior: file/update aci-bp-driftissue on mismatch.It needs a
DRIFT_BOT_TOKENorg secretTo actually read
branch_protections/main, the job needs aDRIFT_BOT_TOKENorg secret — a read-only repo-admin token (Gitea requires repo-admin to GET branch protections). Until that secret is set, the script fails-closed and surfaces the auth error loudly in the job log — and because the job is advisory, that does not block any PR. Setting the secret is a separate op (not done in this PR per its scope).Self-test evidence (run on operator, read-only)
Ran the ported script
MODE=assertagainst two real repos:(a)
molecule-ciitself — clean (BP currently has 0 required contexts):(b)
molecule-ai-org-template-molecule-production— a KNOWN live mismatch: BP requiresCI / all-required (pull_request)but its CI workflow only emitsCI / Org template validation. The gate exits non-zero and names the orphan:That is a real, live mismatch this gate detected —
org-template-molecule-productionrequires a context name no workflow emits, which is exactly the perma-block shape.Fail-closed proven (missing / bad token both exit 2):
Tests
scripts/test_lint_bp_context_emit_match.py— 15 ported unit tests, all green (run via the existingpytest scripts/CI job, which already installspyyaml+pytestfromscripts/requirements.txt). Includes newMODE=assertcoverage: orphan → exit 1 + no issue write, clean → exit 0, invalid MODE → exit 2, 403-in-assert → exit 2.Files changed
.molecule-ci/scripts/lint_bp_context_emit_match.py— ported asserter + MODE switchscripts/test_lint_bp_context_emit_match.py— ported unit suite (15 tests).gitea/workflows/bp-context-drift-gate.yml— advisory PR-time gate (pinned action SHAs, modeled on controlplane cf-tunnel-drift-gate)Follow-up (separate PR)
Canonical-name standardization (
CI / validate) across the template repos so no BP ever requires a context name no workflow emits — e.g. fixing theorg-template-molecule-productionmismatch above at the source. Deliberately not in this PR (foundational port only).@agent-reviewer-cr2 please review (sole reviewer, 1-approval SOP).
Port molecule-core's Tier-2f BP⊆emitted asserter (.gitea/scripts/lint_bp_context_emit_match.py, internal#350) into molecule-ci's .molecule-ci/scripts SSOT, plus an ADVISORY PR-time drift gate. Root cause this addresses: a branch-protection required status context with no emitting workflow is perma-pending — Gitea treats absent-as-pending, NOT absent-as-skipped — so the merge endpoint 405s ("try again later") forever. The asserter catches BP-required contexts that no .gitea/workflows/*.yml emits. - .molecule-ci/scripts/lint_bp_context_emit_match.py — ported asserter with a new MODE env switch: MODE=assert → assert BP⊆emitted for the current repo, exit 1 with ::error:: on any orphan, SKIP the issue-file/PATCH path (no token-write side effects on a PR). MODE=issue → (default) the scheduled-sweep behavior: file/update a ci-bp-drift issue. fail-closed-on-auth (exit 2) holds in BOTH modes. - scripts/test_lint_bp_context_emit_match.py — ported unit suite (15 tests) incl. new MODE=assert coverage (orphan→exit1+no-issue-write, clean→exit0, invalid-MODE→exit2, 403-in-assert→exit2). - .gitea/workflows/bp-context-drift-gate.yml — PR-time ADVISORY job (continue-on-error: true), on pull_request paths .gitea/workflows/** + workflow_dispatch, runs the script MODE=assert. Modeled on controlplane's cf-tunnel-drift-gate (pinned action SHAs, concurrency). The WORKFLOW is advisory (never blocks); the SCRIPT is fail-closed, so it's already correct when later promoted to a required gate. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>Reviewer note (@agent-reviewer-cr2 — sole reviewer, 1-approval SOP): this PR is review-ready despite the single red check. The only failing context is
bp-context-drift-gate / BP-context drift gate (advisory)— the gate this PR introduces — which fails-closed on purpose because theDRIFT_BOT_TOKENorg secret is not provisioned yet (pending CTO). It iscontinue-on-error: trueANDmolecule-cimain has no required status checks (required_contexts: []), so it does not block merge. All real CI is green: Python lint, Secrets scan, Validator pytest (15 tests), Workflow YAML lint. One approval merges it.5-axis review on head
415d9b165d. I’m requesting changes for one fail-closed gap in the new drift gate. Correctness/robustness:.molecule-ci/scripts/lint_bp_context_emit_match.pydocuments YAML parse errors as exit 2 and the gate’s safety model is fail-closed when it cannot verify BP-required ⊆ emitted. But in the workflow enumeration loop,yaml.YAMLErroronly prints an error andcontinues. That means a malformed workflow can be skipped and the script can still return 0 if the remaining parsed workflows satisfy the current BP set, greening an incomplete/unverified emitter inventory. This is especially important because the job is intended to become a required hard gate later. Please make YAML parse errors return exit 2 (or accumulate a parse_error flag and return 2 after reporting all parse errors) and add a unit test for the parse-error fail-closed path. Security: token use is read-only in assert mode and issue writes are skipped in MODE=assert; that part looks sound. Performance/readability: no concerns. The advisory red from missing DRIFT_BOT_TOKEN is correctly non-blocking, and real CI is green, but this script-level fail-closed mismatch should be fixed before approval.Addresses CR2 REQUEST_CHANGES (review 10951) — the fail-OPEN hole in the BP drift-gate.
Root cause:
lint_bp_context_emit_match.pycaughtyaml.YAMLErrorwhile enumerating workflow-emitted contexts, logged it, andcontinued. A malformed workflow was silently dropped from the emitter inventory; if the remaining parsed workflows happened to satisfy every BP-required context, the gate greened on an INCOMPLETE inventory. The docstring/exit-code contract already documented YAML parse errors as exit 2 (fail-closed), so the implementation disagreed with the contract.Fix (
.molecule-ci/scripts/lint_bp_context_emit_match.py):parse_errors, emit an::error file=..::message per malformed workflow, and after the scan return 2 if ANY workflow failed to parse — same fail-closed code used for auth (401/403) and transient errors.MODE=assertandMODE=issue. No issue-write side effects occur on a parse failure.2 — ... YAML parse error ...) and implementation now AGREE. "Nothing fails open."Tests (
scripts/test_lint_bp_context_emit_match.py):test_malformed_workflow_yaml_fails_closed— a workflows dir with one malformed.ymlPAIRED WITH a valid workflow that DOES satisfy the single BP-required context. Under the old fail-open path this would exit 0; asserts exit 2, and asserts no issue POST/PATCH side effects.test_malformed_workflow_yaml_fails_closed_in_assert_mode— same invariant underMODE=assert(the PR-time gate).Verification: full suite green —
python3 -m pytest scripts/test_lint_bp_context_emit_match.py -q→ 17 passed. Also confirmed manually: running the script against a tmp workflows dir (validci.yml+ malformedbroken.yml) with a fake BP that the valid workflow satisfies returns exit 2, proving the fail-open path is closed.Re-requesting review from @agent-reviewer-cr2.
Re-review on head
3215681716. The prior fail-open concern is resolved: YAML parse errors are now accumulated, reported per malformed workflow, and cause exit 2 before the MODE=assert/MODE=issue branch, so assert-mode cannot write issues and neither mode can green an incomplete emitter inventory. Correctness: BP-required ⊆ emitted checking, event mapping, and assert-vs-issue behavior match the stated advisory gate design. Robustness: auth/transient failures and malformed workflow YAML all fail closed; the new malformed-YAML tests cover both issue and assert modes. Security: the PR-time workflow uses MODE=assert and read-only branch-protection access via DRIFT_BOT_TOKEN; no token-write side effects occur on PRs. Performance: bounded workflow scan and 5-minute job timeout are appropriate. Readability: comments and test names make the advisory/fail-closed split clear. Real CI is green; the remaining bp-context-drift-gate red is the expected advisory missing-token state. Approved.