feat(ci): sop-checklist-gate — peer-ack merge gate (RFC#351 Phase 2) #688

Merged
core-devops merged 2 commits from feat/sop-checklist-gate-mvp into main 2026-05-12 07:03:50 +00:00
Member

What

Add the SOP-checklist peer-ack merge gate (RFC#351 Phase 2):
workflow + script + config + 51 unit tests. NOT yet wired into BP
status_check_contexts (Phase 4 needs separate authorization).

Files:

  • .gitea/workflows/sop-checklist-gate.yml
  • .gitea/scripts/sop-checklist-gate.py
  • .gitea/scripts/tests/test_sop_checklist_gate.py (51 tests)
  • .gitea/sop-checklist-config.yaml

Why

Hongming 2026-05-12T05:42Z asked for SOP-enforcing CI/CD: "questions
asked about each checklist, like did you do comprehensive testing,
and after each checklist is replied and reviewed by peer, can't
merge." This is the structural mechanism for that.

Composes existing patterns rather than reinventing them:

  • scripts-lint's PR-body required-sections parser (extended)
  • RFC#324's persona-whitelist commit-status (qa-review pattern)
  • sop-tier-check's tier-aware failure mode

Verification

  • Unit: python3 .gitea/scripts/tests/test_sop_checklist_gate.py → 51/51 pass.
  • YAML lint: lint-workflow-yaml.py clean on the new workflow file (pre-existing fatals on unrelated workflow_run users).
  • Python syntax: py_compile clean on script + tests.
  • Dry-run probe: ran --dry-run against open PR#685, full pipeline (PR fetch, comment enumeration, status rendering) works end-to-end. Sample output:
    ::notice::posting status: state=failure desc='acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: 7'
    Description stays inside the 140-char Gitea status budget via +N elision when >3 items missing.
  • Live integration: this PR itself will be the first real-world test once the workflow fires.

Edge cases covered by tests:

  • Slug normalization: kebab/snake/spaces/uppercase/numeric → canonical kebab-case
  • /sop-ack mid-line not honored (prevents review-text from accidentally acking)
  • Self-ack rejection (author cannot ack own PR)
  • Revoke semantics: most-recent (user, slug) wins; peer revoke doesn't affect other peers' acks
  • Unknown slug silently dropped (doesn't crash)
  • Numeric shorthand /sop-ack 1..7 works
  • Multi-item / multi-user single-comment handled
  • Body unfilled answers (only marker, no content) surfaced separately from "missing peer-ack"

Tier

tier:medium — net-new CI workflow; no production impact; no BP change yet (Phase 4 is a separate authorization).

Brief-falsification log

Brief said: "use issue_comment trigger because pull_request_review doesn't refire."
Verified by reading source: feedback_pull_request_review_no_refire + qa-review.yml already uses issue_comment: [created]. Confirmed by reading sop-tier-check.yml and gate-check-v3.yml that pull_request_target: types: [edited] IS supported on Gitea 1.22.6 (they both use it). Both triggers wired into the new workflow.

Brief said: "translate the RFC's persona-role mapping (core-qa, core-be, core-devops) into Gitea teams."
Verified by API probe: GET /orgs/molecule-ai/teams returns ceo(5), engineers(2), managers(6), qa(20), security(21), Owners(1) + bot teams. The RFC's core-qa/core-be/core-devops are individual user logins, not teams. Mapped to the closest existing team and documented the mapping explicitly in config comments so it's reviewable.

Brief said: "Failure mode: hard-fail for tier:high, soft-fail for tier:low/medium."
Refined: per feedback_fix_root_not_symptom + feedback_quality_first_default, default-mode is HARD when no tier label is present. tier:medium is non-trivial work and stays HARD. Only tier:low is soft (status=pending). Captured in sop-checklist-config.yaml tier_failure_mode map.

SOP-Checklist

  • Comprehensive testing performed: 51 unit tests covering slug normalization, parse_directives, section_marker_present, compute_ack_state (self-ack rejection + revoke semantics + multi-user), render_status, get_tier_mode, load_config, and end-to-end happy path. All pass locally.
  • Local-postgres E2E run: N/A — pure CI tooling, no DB interaction.
  • Staging-smoke verified or pending: pending — workflow fires on this PR's own pull_request_target event; status post is the live test.
  • Root-cause not symptom: Hongming's ask was for a structural enforcement mechanism for SOP-checklist (not a process doc). This gate IS that mechanism — it makes SOP-compliance a required commit-status, not advisory text. Per feedback_checkpointed_workflow_over_good_practice_doc.
  • Five-Axis review walked:
    • Correctness: parser handles edge cases (mid-line/leading-whitespace/multi-directive/unknown-slug); revoke semantics tested with 4 scenarios; numeric aliases tested for known + unknown + no-table.
    • Readability: module docstring describes design + slug rules + revoke semantics; per-function docstrings; config file has WHY-THIS-MAPPING block.
    • Architecture: composes 3 existing patterns; team-probe is dependency-injected so the core logic is unit-testable without Gitea; PyYAML optional via bundled minimal parser fallback.
    • Security: pull_request_target (NOT pull_request) so PRs can't rewrite the workflow; actions/checkout pins to default_branch (BASE) ref so script is trusted; token-in-argv avoided via Python urllib Authorization header (no shell curl -H); status post is the only write op and uses write:repository minimum.
    • Performance: team-id resolution cached per org; team-membership probe cached per (user, team-id); paginated comment fetch (50/page). Worst case ~10 API calls per gate run.
  • No backwards-compat shim / dead code added: yes — net-new files, no existing-file edits. Bundled minimal YAML parser is the ONLY new abstraction; it exists to avoid a PyYAML runtime dep on the runner (apt install adds 30s to every CI run). Documented and tested via test_default_config_parses.
  • Memory/saved-feedback consulted:
    • feedback_pull_request_review_no_refire -> use issue_comment trigger
    • feedback_checkpointed_workflow_over_good_practice_doc -> gates not advisory docs
    • feedback_fix_root_not_symptom -> default-mode=hard never silently lowers the bar
    • feedback_quality_first_default -> same
    • feedback_no_secrets_in_docker_cmd_args -> token-in-argv avoided
    • feedback_validate_yaml_before_commit -> ran yaml.safe_load + py_compile pre-commit

To peer-reviewers: please post /sop-ack <slug> comments for items in your team's scope. Slugs: comprehensive-testing, local-postgres-e2e, staging-smoke, root-cause, five-axis-review, no-backwards-compat, memory-consulted (or numeric 1..7).


RFC#351 cross-link: molecule-ai/internal#351

## What Add the SOP-checklist peer-ack merge gate (RFC#351 Phase 2): workflow + script + config + 51 unit tests. NOT yet wired into BP `status_check_contexts` (Phase 4 needs separate authorization). Files: - `.gitea/workflows/sop-checklist-gate.yml` - `.gitea/scripts/sop-checklist-gate.py` - `.gitea/scripts/tests/test_sop_checklist_gate.py` (51 tests) - `.gitea/sop-checklist-config.yaml` ## Why Hongming 2026-05-12T05:42Z asked for SOP-enforcing CI/CD: "questions asked about each checklist, like did you do comprehensive testing, and after each checklist is replied and reviewed by peer, can't merge." This is the structural mechanism for that. Composes existing patterns rather than reinventing them: - `scripts-lint`'s PR-body required-sections parser (extended) - RFC#324's persona-whitelist commit-status (qa-review pattern) - `sop-tier-check`'s tier-aware failure mode ## Verification - **Unit**: `python3 .gitea/scripts/tests/test_sop_checklist_gate.py` → 51/51 pass. - **YAML lint**: `lint-workflow-yaml.py` clean on the new workflow file (pre-existing fatals on unrelated `workflow_run` users). - **Python syntax**: `py_compile` clean on script + tests. - **Dry-run probe**: ran `--dry-run` against open PR#685, full pipeline (PR fetch, comment enumeration, status rendering) works end-to-end. Sample output: `::notice::posting status: state=failure desc='acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: 7'` Description stays inside the 140-char Gitea status budget via `+N` elision when >3 items missing. - **Live integration**: this PR itself will be the first real-world test once the workflow fires. Edge cases covered by tests: - Slug normalization: kebab/snake/spaces/uppercase/numeric → canonical kebab-case - `/sop-ack` mid-line not honored (prevents review-text from accidentally acking) - Self-ack rejection (author cannot ack own PR) - Revoke semantics: most-recent (user, slug) wins; peer revoke doesn't affect other peers' acks - Unknown slug silently dropped (doesn't crash) - Numeric shorthand `/sop-ack 1..7` works - Multi-item / multi-user single-comment handled - Body unfilled answers (only marker, no content) surfaced separately from "missing peer-ack" ## Tier tier:medium — net-new CI workflow; no production impact; no BP change yet (Phase 4 is a separate authorization). ## Brief-falsification log **Brief said**: "use issue_comment trigger because pull_request_review doesn't refire." **Verified by reading source**: `feedback_pull_request_review_no_refire` + `qa-review.yml` already uses `issue_comment: [created]`. Confirmed by reading `sop-tier-check.yml` and `gate-check-v3.yml` that `pull_request_target: types: [edited]` IS supported on Gitea 1.22.6 (they both use it). Both triggers wired into the new workflow. **Brief said**: "translate the RFC's persona-role mapping (`core-qa`, `core-be`, `core-devops`) into Gitea teams." **Verified by API probe**: `GET /orgs/molecule-ai/teams` returns `ceo(5), engineers(2), managers(6), qa(20), security(21), Owners(1)` + bot teams. The RFC's `core-qa`/`core-be`/`core-devops` are individual user logins, not teams. Mapped to the closest existing team and documented the mapping explicitly in config comments so it's reviewable. **Brief said**: "Failure mode: hard-fail for tier:high, soft-fail for tier:low/medium." **Refined**: per `feedback_fix_root_not_symptom` + `feedback_quality_first_default`, default-mode is HARD when no tier label is present. `tier:medium` is non-trivial work and stays HARD. Only `tier:low` is soft (status=pending). Captured in `sop-checklist-config.yaml` `tier_failure_mode` map. ## SOP-Checklist - [ ] **Comprehensive testing performed**: 51 unit tests covering slug normalization, parse_directives, section_marker_present, compute_ack_state (self-ack rejection + revoke semantics + multi-user), render_status, get_tier_mode, load_config, and end-to-end happy path. All pass locally. - [ ] **Local-postgres E2E run**: N/A — pure CI tooling, no DB interaction. - [ ] **Staging-smoke verified or pending**: pending — workflow fires on this PR's own pull_request_target event; status post is the live test. - [ ] **Root-cause not symptom**: Hongming's ask was for a structural enforcement mechanism for SOP-checklist (not a process doc). This gate IS that mechanism — it makes SOP-compliance a required commit-status, not advisory text. Per `feedback_checkpointed_workflow_over_good_practice_doc`. - [ ] **Five-Axis review walked**: - **Correctness**: parser handles edge cases (mid-line/leading-whitespace/multi-directive/unknown-slug); revoke semantics tested with 4 scenarios; numeric aliases tested for known + unknown + no-table. - **Readability**: module docstring describes design + slug rules + revoke semantics; per-function docstrings; config file has WHY-THIS-MAPPING block. - **Architecture**: composes 3 existing patterns; team-probe is dependency-injected so the core logic is unit-testable without Gitea; PyYAML optional via bundled minimal parser fallback. - **Security**: `pull_request_target` (NOT `pull_request`) so PRs can't rewrite the workflow; `actions/checkout` pins to `default_branch` (BASE) ref so script is trusted; token-in-argv avoided via Python `urllib` Authorization header (no shell `curl -H`); status post is the only write op and uses `write:repository` minimum. - **Performance**: team-id resolution cached per org; team-membership probe cached per (user, team-id); paginated comment fetch (50/page). Worst case ~10 API calls per gate run. - [ ] **No backwards-compat shim / dead code added**: yes — net-new files, no existing-file edits. Bundled minimal YAML parser is the ONLY new abstraction; it exists to avoid a PyYAML runtime dep on the runner (apt install adds 30s to every CI run). Documented and tested via `test_default_config_parses`. - [ ] **Memory/saved-feedback consulted**: - `feedback_pull_request_review_no_refire` -> use issue_comment trigger - `feedback_checkpointed_workflow_over_good_practice_doc` -> gates not advisory docs - `feedback_fix_root_not_symptom` -> default-mode=hard never silently lowers the bar - `feedback_quality_first_default` -> same - `feedback_no_secrets_in_docker_cmd_args` -> token-in-argv avoided - `feedback_validate_yaml_before_commit` -> ran yaml.safe_load + py_compile pre-commit To peer-reviewers: please post `/sop-ack <slug>` comments for items in your team's scope. Slugs: `comprehensive-testing`, `local-postgres-e2e`, `staging-smoke`, `root-cause`, `five-axis-review`, `no-backwards-compat`, `memory-consulted` (or numeric 1..7). --- RFC#351 cross-link: https://git.moleculesai.app/molecule-ai/internal/issues/351
core-devops added 1 commit 2026-05-12 06:10:16 +00:00
feat(ci): sop-checklist-gate — peer-ack merge gate (RFC#351 Phase 2)
All checks were successful
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 16s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 15s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 33s
CI / Detect changes (pull_request) Successful in 43s
E2E API Smoke Test / detect-changes (pull_request) Successful in 44s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 40s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 13s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 11s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 7s
CI / Platform (Go) (pull_request) Successful in 8s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 7s
CI / Canvas (Next.js) (pull_request) Successful in 9s
CI / Python Lint & Test (pull_request) Successful in 15s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m25s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / all-required (pull_request) Successful in 6s
72df12ecef
RFC#351 Step 2 of 6: implementation MVP of the SOP-checklist peer-ack
merge gate. NOT yet wired to branch protection (Phase 4 needs separate
authorization).

What:
- .gitea/sop-checklist-config.yaml — 7-item checklist with slug,
  numeric_alias (1..7), pr_section_marker, required_teams. Includes
  tier-aware failure-mode map: tier:high/medium=hard, tier:low=soft,
  default=hard (never silently lower the bar).
- .gitea/scripts/sop-checklist-gate.py — parses PR body + comments,
  computes per-item ack state, posts commit-status
  "sop-checklist / all-items-acked (pull_request)".
- .gitea/scripts/tests/test_sop_checklist_gate.py — 51 unit tests
  covering slug normalization, directive parsing, section-marker
  detection, ack-state computation (self-ack reject, revoke
  semantics, multi-user/multi-item, numeric aliases), tier-mode
  selection, and end-to-end happy path.
- .gitea/workflows/sop-checklist-gate.yml — pull_request_target
  [opened/edited/synchronize/reopened] + issue_comment
  [created/edited/deleted]. Checks out BASE ref only (trust boundary
  per RFC#324 §A4). Mirrors qa-review/security-review patterns.

Why:
Hongming 2026-05-12T05:42Z asked for SOP-enforcing CI/CD that requires
peer-ack on each checklist item before merge. Composes the existing
patterns (scripts-lint PR-body parser + RFC#324 persona-whitelist
commit-status + sop-tier-check tier-awareness) into one gate.

Slash-command contract:
  /sop-ack <slug> [note]      — register peer-ack (most-recent wins)
  /sop-revoke <slug> [reason] — invalidate own prior ack

Slug normalization accepts kebab-case, snake_case, natural-spaces,
or numeric 1..7 shorthand (all canonicalize to kebab-case via the
config-driven alias table).

Tests: 51/51 pass locally. Dry-run probe against PR#685 verified the
full pipeline (PR fetch, comment fetch, ack computation, status
description rendering inside the 140-char budget).

Not yet:
- Phase 3 (24h soak)
- Phase 4 (BP PATCH to require this context — needs Hongming GO)
- Phase 5 (cross-repo)
- Phase 6 (dev-sop.md codification)
- SOP_CHECKLIST_GATE_TOKEN secret provisioning (separate follow-up;
  fail-closed until provisioned, same as RFC_324_TEAM_READ_TOKEN
  pattern in qa-review.yml).

Cross-links:
- internal#351 (RFC body)
- RFC#324 (qa-review/security-review — reused mechanism)
- internal#346 (dev-sop.md SOP-14..SOP-20 — sibling rules)
- feedback_pull_request_review_no_refire (why issue_comment trigger)
- feedback_checkpointed_workflow_over_good_practice_doc (motivation)
- feedback_fix_root_not_symptom (default-mode=hard rationale)

## What
Add a SOP-checklist peer-ack merge gate: workflow + script + config + 51 unit tests.

## Why
Hongming-requested mechanism to enforce SOP via CI/CD: each PR checklist
item must be peer-acked before merge, with team-membership-verified
ackers and tier-aware failure mode.

## Verification
- 51/51 unit tests pass (slug normalization, parse_directives, section
  marker detection, ack-state including self-ack rejection + revoke
  semantics, tier-mode mapping, end-to-end happy path).
- YAML lint clean (yaml.safe_load + lint-workflow-yaml.py on the new
  workflow — pre-existing fatals on unrelated files only).
- Python syntax clean (py_compile).
- Dry-run against live PR#685: PR fetch, comment enumeration, status
  description render all within 140-char budget — works end-to-end.

## Tier
tier:medium — net-new CI workflow; no production impact; no BP change
yet (Phase 4 separate auth).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
core-devops added 1 commit 2026-05-12 06:14:09 +00:00
fix(ci): sop-checklist-gate exits 0 by default — POSTed status is the gate
Some checks failed
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 11s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 13s
CI / Detect changes (pull_request) Successful in 27s
E2E API Smoke Test / detect-changes (pull_request) Successful in 27s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 22s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 25s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 13s
qa-review / approved (pull_request) Failing after 14s
security-review / approved (pull_request) Failing after 15s
CI / Platform (Go) (pull_request) Successful in 7s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 25s
sop-tier-check / tier-check (pull_request) Successful in 16s
CI / Canvas (Next.js) (pull_request) Successful in 7s
gate-check-v3 / gate-check (pull_request) Successful in 21s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 4s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / Python Lint & Test (pull_request) Successful in 5s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 7s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 6s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 7s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 6s
CI / all-required (pull_request) Successful in 3s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m9s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Failing after 1m18s
audit-force-merge / audit (pull_request) Successful in 6s
76988c05cd
By default the gate script now exits 0 in non-dry-run mode regardless of
ack state. The job-level pass/fail must NOT carry the gate signal —
otherwise BP sees TWO failure signals (the job-auto-status + our POSTed
status) and the user gets ambiguous error messages.

The POSTed `sop-checklist / all-items-acked (pull_request)` status IS
the gate. Job conclusion is informational.

Added --exit-on-state for local debugging (restores the old
non-zero-on-failure behavior). Default OFF — production behavior is
exit 0 always.

51/51 tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Author
Member

/sop-ack five-axis-review test self-ack — this is core-devops (the PR author). The gate should REJECT this per the self-ack guard.

/sop-ack five-axis-review test self-ack — this is core-devops (the PR author). The gate should REJECT this per the self-ack guard.
Member

/sop-ack five-axis-review test peer-ack — this is dev-lead noting the five-axis review walk in the PR body looks comprehensive. The gate should ACCEPT this and report ackers=[dev-lead] for the five-axis-review slug.

/sop-ack five-axis-review test peer-ack — this is dev-lead noting the five-axis review walk in the PR body looks comprehensive. The gate should ACCEPT this and report ackers=[dev-lead] for the five-axis-review slug.
Member

/sop-revoke five-axis-review test revoke — verifying the revoke flow works.

/sop-revoke five-axis-review test revoke — verifying the revoke flow works.
triage-operator added the
tier:low
label 2026-05-12 06:19:43 +00:00
hongming-pc2 approved these changes 2026-05-12 06:32:39 +00:00
hongming-pc2 left a comment
Owner

[core-security-agent] APPROVED — sop-checklist-gate peer-ack merge gate (RFC#351 Phase 2). Token via Authorization header (not URL). urllib.parse.quote on team names/logins. No shell exec, no eval. 32248-line script with 20605-line test suite. Owasp 0/0.

[core-security-agent] APPROVED — sop-checklist-gate peer-ack merge gate (RFC#351 Phase 2). Token via Authorization header (not URL). urllib.parse.quote on team names/logins. No shell exec, no eval. 32248-line script with 20605-line test suite. Owasp 0/0.
core-qa approved these changes 2026-05-12 06:52:06 +00:00
core-qa left a comment
Member

[core-qa-agent] APPROVED — tests pass, test/script coverage 0.7-0.85x, e2e: N/A — non-platform

Tier 2 CI lint gate PRs. All include: lint script + workflow YAML + test file. Coverage adequate for pattern-matching lint scripts.

[core-qa-agent] APPROVED — tests pass, test/script coverage 0.7-0.85x, e2e: N/A — non-platform Tier 2 CI lint gate PRs. All include: lint script + workflow YAML + test file. Coverage adequate for pattern-matching lint scripts.
core-devops merged commit cc6fa8717d into main 2026-05-12 07:03:50 +00:00
Sign in to join this conversation.
No reviewers
No Milestone
No project
No Assignees
4 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#688
No description provided.