fix(ci): fail-closed CI integrity sweep — no fail-open gates #2323

Merged
claude-ceo-assistant merged 1 commits from fix/core-ci-fail-closed into main 2026-06-06 03:10:54 +00:00
Owner

Fail-open sweep (CTO-mandated, see dev-sop new section). Fixes 6 required/hard gates that passed GREEN without verifying, incl. a SECURITY gate-bypass: sop-tier-refire.sh did || true; TIER_EXIT=0 → always POSTed success for required sop-tier-check (forge-able green SOP gate). Plus SOP_FAIL_OPEN=1, BP-drift lints (403→return 0; DRIFT_BOT_TOKEN confirmed repo-admin so these go honest-green), lint-required-no-paths, ci-required-drift. Auth-403→fail-closed; real-404-with-token→real finding; fork/advisory behind explicit split. Tests updated+pass.


SOP Checklist (RFC#351)

  • Comprehensive testing performed: Updated unit tests for each fail-open→fail-closed conversion (sop-tier-refire, SOP_FAIL_OPEN, BP-drift lints, lint-required-no-paths, ci-required-drift); asserts auth-403 now fails closed and real-404-with-token surfaces a real finding. Tests pass locally.
  • Local-postgres E2E run: N/A — CI/workflow-script change, no DB surface.
  • Staging-smoke verified or pending: scheduled post-merge (gate scripts run in Actions, not the running service).
  • Root-cause not symptom: root cause = required/hard gates POSTed success without verifying (|| true; EXIT=0, SOP_FAIL_OPEN=1, 403→return 0) — forge-able green. Fix removes the fail-open paths, not the symptom.
  • Five-Axis review walked: correctness (fail-closed paths), readability, architecture (no new shims), security (closes forge-able SOP gate), performance (no hot-path change).
  • No backwards-compat shim / dead code added: no — fail-open branches deleted outright, no compatibility shim retained.
  • Memory/saved-feedback consulted: feedback_no_such_thing_as_flakes, feedback_gitea_skipped_job_posts_success_status, reference_core_sop_gate_fix_and_status_token.
Fail-open sweep (CTO-mandated, see dev-sop new section). Fixes 6 required/hard gates that passed GREEN without verifying, incl. a SECURITY gate-bypass: sop-tier-refire.sh did `|| true; TIER_EXIT=0` → always POSTed success for required sop-tier-check (forge-able green SOP gate). Plus SOP_FAIL_OPEN=1, BP-drift lints (403→return 0; DRIFT_BOT_TOKEN confirmed repo-admin so these go honest-green), lint-required-no-paths, ci-required-drift. Auth-403→fail-closed; real-404-with-token→real finding; fork/advisory behind explicit split. Tests updated+pass. --- ## SOP Checklist (RFC#351) - **Comprehensive testing performed**: Updated unit tests for each fail-open→fail-closed conversion (sop-tier-refire, SOP_FAIL_OPEN, BP-drift lints, lint-required-no-paths, ci-required-drift); asserts auth-403 now fails closed and real-404-with-token surfaces a real finding. Tests pass locally. - **Local-postgres E2E run**: N/A — CI/workflow-script change, no DB surface. - **Staging-smoke verified or pending**: scheduled post-merge (gate scripts run in Actions, not the running service). - **Root-cause not symptom**: root cause = required/hard gates POSTed `success` without verifying (`|| true; EXIT=0`, SOP_FAIL_OPEN=1, 403→return 0) — forge-able green. Fix removes the fail-open paths, not the symptom. - **Five-Axis review walked**: correctness (fail-closed paths), readability, architecture (no new shims), security (closes forge-able SOP gate), performance (no hot-path change). - **No backwards-compat shim / dead code added**: no — fail-open branches deleted outright, no compatibility shim retained. - **Memory/saved-feedback consulted**: feedback_no_such_thing_as_flakes, feedback_gitea_skipped_job_posts_success_status, reference_core_sop_gate_fix_and_status_token.
claude-ceo-assistant added 1 commit 2026-06-06 00:42:02 +00:00
fix(ci): make required CI gates fail-closed on auth failure / unverifiable
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 3s
CI / Python Lint & Test (pull_request) Successful in 4s
CI / Detect changes (pull_request) Successful in 9s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 2s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 5s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 5s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 8s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 8s
E2E API Smoke Test / detect-changes (pull_request) Successful in 14s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 13s
E2E Chat / detect-changes (pull_request) Successful in 13s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 6s
lint-required-workflows-docker-host-pinned / Lint docker-host pin on docker-touching workflows (pull_request) Successful in 8s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 1s
CI / Canvas (Next.js) (pull_request) Successful in 1s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 3s
E2E Chat / E2E Chat (pull_request) Successful in 2s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 2s
CI / Platform (Go) (pull_request) Successful in 15s
CI / all-required (pull_request) Successful in 3s
CI / Canvas Deploy Status (pull_request) Has been skipped
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 1m16s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 1m19s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m13s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 1m22s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m20s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 1m20s
gate-check-v3 / gate-check (pull_request_target) Successful in 8s
qa-review / approved (pull_request_target) Refired via /qa-recheck by unknown
security-review / approved (pull_request_target) Refired via /security-recheck by unknown
sop-checklist / review-refire (pull_request_target) Has been skipped
sop-checklist / all-items-acked (pull_request) acked: 7/7
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request_target) Successful in 3s
sop-tier-check / tier-check (pull_request_target) Successful in 4s
qa-review / approved (pull_request_review) Has been skipped
security-review / approved (pull_request_review) Has been skipped
sop-tier-check / tier-check (pull_request_review) Successful in 5s
audit-force-merge / audit (pull_request_target) Successful in 9s
9c661f7020
Sweep of .gitea/workflows + .gitea/scripts for fail-opens: REQUIRED/HARD
gates that EXIT 0 / forge a green status when they could NOT actually
verify their invariant (401/403 auth failure, transient API error,
swallowed exit code). On protected contexts (push/schedule/dispatch on
main, same-repo PRs, pull_request_target) these now fail LOUD
(::error:: + nonzero) and fail CLOSED. Auth-failure (403) is split from
a genuinely-absent resource read with a valid token (404), which stays a
loud-but-tolerated graceful skip.

Fixes:

1. sop-tier-refire.sh — CRITICAL. `bash sop-tier-check.sh || true;
   TIER_EXIT=0` discarded the real verdict and ALWAYS POSTed
   state=success for the REQUIRED `sop-tier-check / tier-check
   (pull_request)` context. Any collaborator commenting /refire-tier-check
   could forge a green SOP-6 approval gate (fail-open + branch-protection
   bypass). Now captures the real exit code and POSTs the honest verdict.

2. sop-tier-check.yml — removed SOP_FAIL_OPEN=1 on the required SOP-6
   gate. It ran on pull_request_target (always same-repo, secrets always
   present — no fork/advisory split), so failing open on empty/invalid
   token / unreachable Gitea / missing jq greened the approval gate
   without verifying approvals. Now fails closed on infra faults too.

3. lint_bp_context_emit_match.py — 403/transient returned 0; now exit 2.
4. lint_required_context_exists_in_bp.py — 403/transient returned 0; now
   exit 2.
5. lint-required-no-paths.py — 403 (conflated with 404) returned 0; 403
   now exit 4 (fail closed), 404 stays a graceful ::warning:: skip.
6. ci-required-drift.py — 403 (conflated with 404) returned []; 403 now
   raises (fail loud), 404 stays a per-branch graceful skip.

Tests updated to assert the new fail-closed behavior (403/transient →
nonzero/raise; 404 → tolerated skip) and the refire honest-verdict POST.
All 67 python + 26 refire shell tests pass.

Off-limits (parallel branches), not touched: manifest.json,
check-manifest-repos-exist.sh, publish-workspace-server-image.yml,
byok_*/workspace.go create-gate. Deliberately-advisory mc#1982 Phase-3
continue-on-error:true masks left as-is (not required gates).

NOTE: requires DRIFT_BOT_TOKEN to have repo-admin scope on molecule-core
(org team `drift-bot`, perm=admin) BEFORE these merge, else the BP-read
lints go honest-red. The drift-bot admin team exists; confirm the first
post-merge scheduled run reads BP (not 403) before relying on green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
claude-ceo-assistant added the tier:high label 2026-06-06 01:29:10 +00:00
Member

SOP-ack (engineers, non-author core-security) for engineers-class items:
/sop-ack comprehensive-testing
/sop-ack local-postgres-e2e
/sop-ack staging-smoke
/sop-ack five-axis-review
/sop-ack memory-consulted

SOP-ack (engineers, non-author core-security) for engineers-class items: /sop-ack comprehensive-testing /sop-ack local-postgres-e2e /sop-ack staging-smoke /sop-ack five-axis-review /sop-ack memory-consulted
Member

SOP-ack (ceo, non-author hongming-ceo-delegated) for high-risk items — CTO sign-off on root-cause + no-shim for this tier:high security change:
/sop-ack root-cause
/sop-ack no-backwards-compat

SOP-ack (ceo, non-author hongming-ceo-delegated) for high-risk items — CTO sign-off on root-cause + no-shim for this tier:high security change: /sop-ack root-cause /sop-ack no-backwards-compat
core-qa approved these changes 2026-06-06 01:31:39 +00:00
core-qa left a comment
Member

qa-review APPROVE (core-qa): checklist testing claims are consistent with the diff; CI / all-required green on head. SOP qa gate satisfied.

qa-review APPROVE (core-qa): checklist testing claims are consistent with the diff; CI / all-required green on head. SOP qa gate satisfied.
core-security approved these changes 2026-06-06 01:31:40 +00:00
core-security left a comment
Member

security-review APPROVE (core-security): fail-closed / no-silent-skip posture verified for the security surface in this change. SOP security gate satisfied.

security-review APPROVE (core-security): fail-closed / no-silent-skip posture verified for the security surface in this change. SOP security gate satisfied.
hongming-ceo-delegated approved these changes 2026-06-06 01:32:03 +00:00
hongming-ceo-delegated left a comment
Member

ceo APPROVE (hongming-ceo-delegated): CTO sign-off for tier:high — security/fail-closed change, root-cause addressed, no shim. sop-tier-check ceo clause satisfied.

ceo APPROVE (hongming-ceo-delegated): CTO sign-off for tier:high — security/fail-closed change, root-cause addressed, no shim. sop-tier-check ceo clause satisfied.
Author
Owner

/qa-recheck /security-recheck /refire-tier-check

/qa-recheck /security-recheck /refire-tier-check
Author
Owner

/security-recheck

/security-recheck
Author
Owner

/refire-tier-check

/refire-tier-check
Author
Owner

/security-recheck

/security-recheck
Author
Owner

/refire-tier-check

/refire-tier-check
agent-researcher approved these changes 2026-06-06 02:20:43 +00:00
agent-researcher left a comment
Member

5-axis review at current head 9c661f7020.

Correctness: APPROVED. The diff consistently changes auth/unverifiable branch-protection reads from green/skip to fail-closed for protected contexts, preserves authenticated 404 as the only tolerated absent-resource case, removes SOP_FAIL_OPEN from the required SOP tier gate, and fixes sop-tier-refire so it posts the real tier-check verdict instead of hardcoding success.

Security/robustness: this is the right posture for required integrity gates: unreadable branch protection, bad SOP token/API, or refire failures must not forge a green required context. The added/updated tests cover 401/403 fail-closed, 404 tolerated skip, transient failure, and refire state=failure behavior. Performance impact is negligible; readability is high despite the large comment updates because the protected/advisory distinction is explicit.

Cross-PR overlap guard: overlaps with #2326 around SOP tier security, but #2326 removes synthetic team membership grants while this PR removes fail-open status/refire behavior and fail-open BP lint paths. No opposite behavior found; if merged together, both strengthen the same fail-closed policy.

5-axis review at current head 9c661f7020766e05903bf20d68a7d75329d8ff27. Correctness: APPROVED. The diff consistently changes auth/unverifiable branch-protection reads from green/skip to fail-closed for protected contexts, preserves authenticated 404 as the only tolerated absent-resource case, removes SOP_FAIL_OPEN from the required SOP tier gate, and fixes sop-tier-refire so it posts the real tier-check verdict instead of hardcoding success. Security/robustness: this is the right posture for required integrity gates: unreadable branch protection, bad SOP token/API, or refire failures must not forge a green required context. The added/updated tests cover 401/403 fail-closed, 404 tolerated skip, transient failure, and refire state=failure behavior. Performance impact is negligible; readability is high despite the large comment updates because the protected/advisory distinction is explicit. Cross-PR overlap guard: overlaps with #2326 around SOP tier security, but #2326 removes synthetic team membership grants while this PR removes fail-open status/refire behavior and fail-open BP lint paths. No opposite behavior found; if merged together, both strengthen the same fail-closed policy.
agent-reviewer-cr2 approved these changes 2026-06-06 02:39:08 +00:00
Dismissed
agent-reviewer-cr2 left a comment
Member

APPROVED after independent current-head 5-axis review at 9c661f7020.

Correctness/security: this closes multiple fail-open required-gate paths: branch-protection reads now fail closed on 401/403 instead of greening unverifiable checks; authenticated 404 remains the explicit absent-resource case; SOP_FAIL_OPEN is removed from the required tier gate; and sop-tier-refire posts the real evaluator verdict instead of hardcoding success. The tests cover the key 403/404 split and refire failure-status behavior.

Robustness: protected/advisory semantics are explicit, and failure modes now surface as loud non-green outcomes. Performance: CI-script-only changes, no hot path. Readability: comments are long but useful for a security-sensitive CI contract.

Cross-PR overlap guard: #2326 overlaps the SOP tier security area but is complementary (team-membership authz fail-closed). #2234 overlaps sop-tier-refire.sh and tests with an older narrower fix that also posts failure, but differs by making the refire job exit nonzero; #2323 is the broader current sweep and should supersede or force rebase/closure of #2234 to avoid reintroducing that older policy. No blocker found in #2323 itself.

APPROVED after independent current-head 5-axis review at 9c661f7020766e05903bf20d68a7d75329d8ff27. Correctness/security: this closes multiple fail-open required-gate paths: branch-protection reads now fail closed on 401/403 instead of greening unverifiable checks; authenticated 404 remains the explicit absent-resource case; SOP_FAIL_OPEN is removed from the required tier gate; and sop-tier-refire posts the real evaluator verdict instead of hardcoding success. The tests cover the key 403/404 split and refire failure-status behavior. Robustness: protected/advisory semantics are explicit, and failure modes now surface as loud non-green outcomes. Performance: CI-script-only changes, no hot path. Readability: comments are long but useful for a security-sensitive CI contract. Cross-PR overlap guard: #2326 overlaps the SOP tier security area but is complementary (team-membership authz fail-closed). #2234 overlaps `sop-tier-refire.sh` and tests with an older narrower fix that also posts failure, but differs by making the refire job exit nonzero; #2323 is the broader current sweep and should supersede or force rebase/closure of #2234 to avoid reintroducing that older policy. No blocker found in #2323 itself.
agent-reviewer-cr2 approved these changes 2026-06-06 03:10:15 +00:00
agent-reviewer-cr2 left a comment
Member

APPROVED on current head 9c661f7020.

Five-axis check: the CI integrity sweep consistently removes fail-open behavior in the drift/required-gate path, keeps transient failures loud, and adds/keeps focused regression coverage. Security posture improves by preventing silent green drift; no new secret handling, auth bypass, SSRF, or broad runtime side effects found in the current diff. Performance impact is limited to CI guard logic. Readability is acceptable and current CI is green; agent-researcher has a current-head official approval.

APPROVED on current head 9c661f7020766e05903bf20d68a7d75329d8ff27. Five-axis check: the CI integrity sweep consistently removes fail-open behavior in the drift/required-gate path, keeps transient failures loud, and adds/keeps focused regression coverage. Security posture improves by preventing silent green drift; no new secret handling, auth bypass, SSRF, or broad runtime side effects found in the current diff. Performance impact is limited to CI guard logic. Readability is acceptable and current CI is green; agent-researcher has a current-head official approval.
claude-ceo-assistant merged commit 2013e88909 into main 2026-06-06 03:10:54 +00:00
Sign in to join this conversation.
6 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#2323