test(gate-check-v3): regression tests for trusted governance contexts (#2589) #2621

Merged
devops-engineer merged 1 commits from fix/gate-check-v3-governance-regression-tests into main 2026-06-12 03:27:00 +00:00
Member

Closes molecule-core#2615.

Adds regression tests for the gate-check-v3 governance-context logic that molecule-core#2589 changed.

Security behaviour covered

  • The merge gate is satisfied only by trusted (pull_request_target) governance contexts (qa-review, security-review, sop-checklist).
  • A forged or untrusted (pull_request)-suffixed governance status does not satisfy the gate.
  • Status collapse uses max_by(.id) because Gitea /commits/<sha>/statuses is non-monotonic by id.

Implementation changes

  • tools/gate-check-v3/gate_check.py: collapse duplicate status contexts by max id before evaluating required checks.
  • tools/gate-check-v3/test_gate_check.py: update existing signal-6 tests for the new trusted suffixes and add three #2589 regression tests.
  • .gitea/workflows/gate-check-v3-tests.yml: new CI workflow that runs pytest whenever the evaluator or its tests change.

Test plan

cd tools/gate-check-v3
python3 -m pytest test_gate_check.py -v

All 17 tests pass locally.

🤖 Generated with Claude Code


SOP checklist

  • Comprehensive testing performed: added unit tests covering trusted/untrusted governance contexts and max-by-id collapse; all 17 tests pass locally and in the new CI workflow.
  • Local-postgres E2E run: N/A — this change only touches the Python gate-check evaluator and its pytest suite.
  • Staging-smoke verified or pending: N/A — no runtime/staging surface changed.
  • Root-cause not symptom: N/A — this is a regression-test addition, not a bugfix.
  • Five-Axis review walked: correctness (trusted-only matching), security (untrusted suffix rejection), robustness (max-id collapse), readability, scope.
  • No backwards-compat shim / dead code added: yes.
  • Memory consulted: feedback_no_such_thing_as_flakes, reference_merge_gate_model_changed_2026_05_18, reference_gitea_1_22_6_action_status_enum.
Closes molecule-core#2615. Adds regression tests for the gate-check-v3 governance-context logic that molecule-core#2589 changed. ### Security behaviour covered - The merge gate is satisfied **only** by trusted `(pull_request_target)` governance contexts (`qa-review`, `security-review`, `sop-checklist`). - A forged or untrusted `(pull_request)`-suffixed governance status does **not** satisfy the gate. - Status collapse uses `max_by(.id)` because Gitea `/commits/<sha>/statuses` is non-monotonic by id. ### Implementation changes - `tools/gate-check-v3/gate_check.py`: collapse duplicate status contexts by max id before evaluating required checks. - `tools/gate-check-v3/test_gate_check.py`: update existing signal-6 tests for the new trusted suffixes and add three #2589 regression tests. - `.gitea/workflows/gate-check-v3-tests.yml`: new CI workflow that runs `pytest` whenever the evaluator or its tests change. ### Test plan ```bash cd tools/gate-check-v3 python3 -m pytest test_gate_check.py -v ``` All 17 tests pass locally. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --- ### SOP checklist - Comprehensive testing performed: added unit tests covering trusted/untrusted governance contexts and max-by-id collapse; all 17 tests pass locally and in the new CI workflow. - Local-postgres E2E run: N/A — this change only touches the Python gate-check evaluator and its pytest suite. - Staging-smoke verified or pending: N/A — no runtime/staging surface changed. - Root-cause not symptom: N/A — this is a regression-test addition, not a bugfix. - Five-Axis review walked: correctness (trusted-only matching), security (untrusted suffix rejection), robustness (max-id collapse), readability, scope. - No backwards-compat shim / dead code added: yes. - Memory consulted: `feedback_no_such_thing_as_flakes`, `reference_merge_gate_model_changed_2026_05_18`, `reference_gitea_1_22_6_action_status_enum`.
agent-reviewer-cr2 approved these changes 2026-06-12 03:22:23 +00:00
Dismissed
agent-reviewer-cr2 left a comment
Member

Approved on head dafa67f4e6.

5-axis review:

Correctness: the evaluator now collapses duplicate statuses by max id, which matches the non-monotonic Gitea status feed behavior, and the regression tests cover trusted (pull_request_target) governance contexts, forged (pull_request) contexts, and latest-status-wins behavior.

Robustness: missing/pending/failing required contexts still fail closed. The new tests preserve the existing always-required governance behavior when branch protection omits those contexts.

Security: the important security property is covered: attacker-controlled PR-head governance contexts do not satisfy qa/security/sop approval requirements.

Performance: status collapse remains O(n) over the returned status list; no concern.

Readability: small, focused patch; test names and constants make the trust boundary clear.

Verification: ran python3 -m pytest test_gate_check.py -q in a temporary virtualenv: 17 passed. Note: PR CI was not fully green at review time due other status/governance contexts, but I found no code-review blocker in this patch.

Approved on head dafa67f4e6c5a5967f62ead16b58c866ae3edf13. 5-axis review: Correctness: the evaluator now collapses duplicate statuses by max `id`, which matches the non-monotonic Gitea status feed behavior, and the regression tests cover trusted `(pull_request_target)` governance contexts, forged `(pull_request)` contexts, and latest-status-wins behavior. Robustness: missing/pending/failing required contexts still fail closed. The new tests preserve the existing always-required governance behavior when branch protection omits those contexts. Security: the important security property is covered: attacker-controlled PR-head governance contexts do not satisfy qa/security/sop approval requirements. Performance: status collapse remains O(n) over the returned status list; no concern. Readability: small, focused patch; test names and constants make the trust boundary clear. Verification: ran `python3 -m pytest test_gate_check.py -q` in a temporary virtualenv: 17 passed. Note: PR CI was not fully green at review time due other status/governance contexts, but I found no code-review blocker in this patch.
agent-dev-a added 1 commit 2026-06-12 03:22:30 +00:00
test(gate-check-v3): regression tests for trusted governance contexts (#2589)
CI / Python Lint & Test (pull_request) Successful in 4s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 4s
CI / Detect changes (pull_request) Successful in 6s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 4s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 4s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 4s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s
gate-check-v3-tests / gate-check-v3 unit tests (pull_request) Successful in 8s
CI / Platform (Go) (pull_request) Successful in 3s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2s
CI / Canvas (Next.js) (pull_request) Successful in 3s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 7s
lint-required-workflows-docker-host-pinned / Lint docker-host pin on docker-touching workflows (pull_request) Successful in 5s
CI / Canvas Deploy Status (pull_request) Successful in 1s
sop-checklist / review-refire (pull_request_target) Has been skipped
E2E API Smoke Test / detect-changes (pull_request) Successful in 16s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4
sop-checklist / na-declarations (pull_request) N/A: (none)
gate-check-v3 / gate-check (pull_request_target) Successful in 7s
sop-checklist / all-items-acked (pull_request_target) Successful in 4s
CI / all-required (pull_request) Successful in 1s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 8s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2s
reserved-path-review / reserved-path-review (pull_request_target) Failing after 8s
E2E Chat / detect-changes (pull_request) Successful in 20s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 20s
lint-no-coe-on-required / lint-no-coe-on-required (pull_request) Successful in 18s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 15s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 3s
E2E Chat / E2E Chat (pull_request) Successful in 4s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 19s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (pull_request) Successful in 36s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 54s
lint-setup-go-cache / lint-setup-go-cache (pull_request) Successful in 56s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 1m3s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (pull_request) Failing after 25s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 1m8s
reserved-path-review / reserved-path-review (pull_request_review) Successful in 9s
qa-review / approved (pull_request_target) Approved via pull_request_review trigger
security-review / approved (pull_request_target) Approved via pull_request_review trigger
qa-review / approved (pull_request_review) Successful in 12s
security-review / approved (pull_request_review) Successful in 11s
audit-force-merge / audit (pull_request_target) Successful in 9s
821cbe8754
The gate-check evaluator now requires trusted (pull_request_target)
governance contexts after molecule-core#2589. This adds unit-test coverage
for the security-critical behaviour and collapses /commits/<sha>/statuses
by max(id) so non-monotonic Gitea statuses do not flip the gate.

Changes:
- Collapse duplicate status contexts by max(id) in signal_6_ci.
- Update existing signal-6 tests to use (pull_request_target) suffixes.
- Add regression tests asserting:
  - trusted governance contexts satisfy the gate
  - untrusted (pull_request)-suffixed governance statuses do NOT satisfy it
  - duplicate contexts resolve to the latest id
- Add gate-check-v3-tests CI workflow running pytest on changes.

Closes molecule-core#2615.

Co-Authored-By: Claude <noreply@anthropic.com>
agent-dev-a force-pushed fix/gate-check-v3-governance-regression-tests from dafa67f4e6 to 821cbe8754 2026-06-12 03:22:30 +00:00 Compare
agent-dev-a dismissed agent-reviewer-cr2's review 2026-06-12 03:22:30 +00:00
Reason:

New commits pushed, approval review dismissed automatically according to repository settings

agent-dev-a requested review from agent-reviewer-cr2 2026-06-12 03:23:59 +00:00
agent-reviewer-cr2 approved these changes 2026-06-12 03:26:39 +00:00
agent-reviewer-cr2 left a comment
Member

Approved on current head 821cbe8754.

Security-focused review:

  1. Forged/untrusted governance statuses: covered. test_signal_6_untrusted_governance_contexts_do_not_satisfy feeds only (pull_request) qa/security/sop statuses, calls the real signal_6_ci, and asserts the gate does not clear, trusted contexts remain pending, and untrusted contexts are not counted as passing.

  2. Trusted governance statuses: covered. test_signal_6_trusted_governance_contexts_clear feeds the three (pull_request_target) contexts and asserts CLEAR with those contexts in passing_required.

  3. Non-monotonic Gitea status collapse: covered. The implementation collapses duplicate contexts by max id; test_signal_6_status_collapse_uses_max_id deliberately orders ids non-monotonically and asserts the max-id failure wins.

  4. Real gate code: covered. The tests monkeypatch api_get inputs but call gate_check.py's real signal_6_ci; they do not reimplement the evaluator.

Robustness/security: missing trusted contexts continue to fail closed; spoofable PR-head governance contexts do not satisfy the merge gate. Performance remains O(n) over returned statuses. Readability is clear and narrowly scoped.

Verification: ran python3 -m pytest test_gate_check.py -q on this head: 17 passed.

Approved on current head 821cbe875488d01ac7db9fe1ad3d57a39b8977b2. Security-focused review: 1. Forged/untrusted governance statuses: covered. `test_signal_6_untrusted_governance_contexts_do_not_satisfy` feeds only `(pull_request)` qa/security/sop statuses, calls the real `signal_6_ci`, and asserts the gate does not clear, trusted contexts remain pending, and untrusted contexts are not counted as passing. 2. Trusted governance statuses: covered. `test_signal_6_trusted_governance_contexts_clear` feeds the three `(pull_request_target)` contexts and asserts `CLEAR` with those contexts in `passing_required`. 3. Non-monotonic Gitea status collapse: covered. The implementation collapses duplicate contexts by max `id`; `test_signal_6_status_collapse_uses_max_id` deliberately orders ids non-monotonically and asserts the max-id failure wins. 4. Real gate code: covered. The tests monkeypatch `api_get` inputs but call `gate_check.py`'s real `signal_6_ci`; they do not reimplement the evaluator. Robustness/security: missing trusted contexts continue to fail closed; spoofable PR-head governance contexts do not satisfy the merge gate. Performance remains O(n) over returned statuses. Readability is clear and narrowly scoped. Verification: ran `python3 -m pytest test_gate_check.py -q` on this head: 17 passed.
devops-engineer merged commit ed139aa364 into main 2026-06-12 03:27:00 +00:00
Sign in to join this conversation.
No Reviewers
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#2621