fix(ci): fail-closed CI integrity sweep — no fail-open gates #2323
Reference in New Issue
Block a user
Delete Branch "fix/core-ci-fail-closed"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Fail-open sweep (CTO-mandated, see dev-sop new section). Fixes 6 required/hard gates that passed GREEN without verifying, incl. a SECURITY gate-bypass: sop-tier-refire.sh did
|| true; TIER_EXIT=0→ always POSTed success for required sop-tier-check (forge-able green SOP gate). Plus SOP_FAIL_OPEN=1, BP-drift lints (403→return 0; DRIFT_BOT_TOKEN confirmed repo-admin so these go honest-green), lint-required-no-paths, ci-required-drift. Auth-403→fail-closed; real-404-with-token→real finding; fork/advisory behind explicit split. Tests updated+pass.SOP Checklist (RFC#351)
successwithout verifying (|| true; EXIT=0, SOP_FAIL_OPEN=1, 403→return 0) — forge-able green. Fix removes the fail-open paths, not the symptom.SOP-ack (engineers, non-author core-security) for engineers-class items:
/sop-ack comprehensive-testing
/sop-ack local-postgres-e2e
/sop-ack staging-smoke
/sop-ack five-axis-review
/sop-ack memory-consulted
SOP-ack (ceo, non-author hongming-ceo-delegated) for high-risk items — CTO sign-off on root-cause + no-shim for this tier:high security change:
/sop-ack root-cause
/sop-ack no-backwards-compat
qa-review APPROVE (core-qa): checklist testing claims are consistent with the diff; CI / all-required green on head. SOP qa gate satisfied.
security-review APPROVE (core-security): fail-closed / no-silent-skip posture verified for the security surface in this change. SOP security gate satisfied.
ceo APPROVE (hongming-ceo-delegated): CTO sign-off for tier:high — security/fail-closed change, root-cause addressed, no shim. sop-tier-check ceo clause satisfied.
/qa-recheck /security-recheck /refire-tier-check
/security-recheck
/refire-tier-check
/security-recheck
/refire-tier-check
5-axis review at current head
9c661f7020.Correctness: APPROVED. The diff consistently changes auth/unverifiable branch-protection reads from green/skip to fail-closed for protected contexts, preserves authenticated 404 as the only tolerated absent-resource case, removes SOP_FAIL_OPEN from the required SOP tier gate, and fixes sop-tier-refire so it posts the real tier-check verdict instead of hardcoding success.
Security/robustness: this is the right posture for required integrity gates: unreadable branch protection, bad SOP token/API, or refire failures must not forge a green required context. The added/updated tests cover 401/403 fail-closed, 404 tolerated skip, transient failure, and refire state=failure behavior. Performance impact is negligible; readability is high despite the large comment updates because the protected/advisory distinction is explicit.
Cross-PR overlap guard: overlaps with #2326 around SOP tier security, but #2326 removes synthetic team membership grants while this PR removes fail-open status/refire behavior and fail-open BP lint paths. No opposite behavior found; if merged together, both strengthen the same fail-closed policy.
APPROVED after independent current-head 5-axis review at
9c661f7020.Correctness/security: this closes multiple fail-open required-gate paths: branch-protection reads now fail closed on 401/403 instead of greening unverifiable checks; authenticated 404 remains the explicit absent-resource case; SOP_FAIL_OPEN is removed from the required tier gate; and sop-tier-refire posts the real evaluator verdict instead of hardcoding success. The tests cover the key 403/404 split and refire failure-status behavior.
Robustness: protected/advisory semantics are explicit, and failure modes now surface as loud non-green outcomes. Performance: CI-script-only changes, no hot path. Readability: comments are long but useful for a security-sensitive CI contract.
Cross-PR overlap guard: #2326 overlaps the SOP tier security area but is complementary (team-membership authz fail-closed). #2234 overlaps
sop-tier-refire.shand tests with an older narrower fix that also posts failure, but differs by making the refire job exit nonzero; #2323 is the broader current sweep and should supersede or force rebase/closure of #2234 to avoid reintroducing that older policy. No blocker found in #2323 itself.APPROVED on current head
9c661f7020.Five-axis check: the CI integrity sweep consistently removes fail-open behavior in the drift/required-gate path, keeps transient failures loud, and adds/keeps focused regression coverage. Security posture improves by preventing silent green drift; no new secret handling, auth bypass, SSRF, or broad runtime side effects found in the current diff. Performance impact is limited to CI guard logic. Readability is acceptable and current CI is green; agent-researcher has a current-head official approval.