Files
molecule-core/runbooks/sop-production-cicd.md
T
hongming-codex-laptop 782eaf2e80
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 20s
CI / Detect changes (pull_request) Successful in 35s
Harness Replays / detect-changes (pull_request) Successful in 13s
E2E API Smoke Test / detect-changes (pull_request) Successful in 35s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 37s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 34s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 11s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 19s
qa-review / approved (pull_request) Failing after 20s
sop-checklist / all-items-acked (pull_request) [info tier:low] acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4
gate-check-v3 / gate-check (pull_request) Failing after 27s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 46s
security-review / approved (pull_request) Failing after 21s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m26s
sop-checklist-gate / gate (pull_request) Successful in 24s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 1m54s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 2m1s
sop-tier-check / tier-check (pull_request) Successful in 17s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 2m3s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m50s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 1m33s
CI / Platform (Go) (pull_request) Successful in 5s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 4s
CI / Python Lint & Test (pull_request) Successful in 4s
Harness Replays / Harness Replays (pull_request) Successful in 5s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 7s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 7s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 9s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 13m8s
CI / Canvas (Next.js) (pull_request) Successful in 14m21s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / all-required (pull_request) Successful in 5s
ci: auto deploy production tenants after green main
2026-05-13 04:51:43 -07:00

3.1 KiB

SOP: Production CI/CD Changes

Production CI/CD changes are higher risk than ordinary CI edits. They can publish images, deploy tenants, promote tags, mutate branch protection, or change merge behavior. This SOP separates rules that must be enforced by code from rules that require human judgment.

Programmatic Gates

The workflow YAML linter is the first line of enforcement:

python3 .gitea/scripts/lint-workflow-yaml.py --workflow-dir .gitea/workflows

It must reject:

  • Gitea-hostile syntax such as workflow_dispatch.inputs, workflow_run, workflow name collisions, slash-containing workflow names, and unsupported cross-repo action references.
  • Production deploy workflows that rely on concurrency.cancel-in-progress: false for serialization.
  • Production deploy workflows that print raw control-plane responses or raw .error fields into CI logs.
  • Production redeploy workflows with no kill switch or rollback/pin control.

Production deploy helpers must also unit-test:

  • Disable-flag parsing.
  • Required status context selection.
  • Terminal status handling for failure, error, cancelled, canceled, and skipped.
  • Production control-plane URL guards.
  • Rollback target/pin handling when applicable.

Required PR Evidence

Every production CI/CD PR must include concrete answers for:

  • Root cause: what production failure mode or process gap is being closed.
  • Deploy gate: which exact contexts must be green before production side effects.
  • Kill switch: how to stop deployment without reverting the PR.
  • Verification: how production state is proven after deployment.
  • Logging: proof that CI logs do not contain raw production runtime, SSM, or secret-adjacent output.
  • Rollback: the exact command, variable, or workflow to return to a known-good tag/digest.

Human Review

Production CI/CD PRs need non-author review across these roles:

  • DevOps: Gitea Actions semantics, branch protection, merge queue, and runner behavior.
  • SRE: rollout order, tenant health checks, observability, and partial-deploy recovery.
  • Security: secrets, token scopes, log redaction, and production endpoint targeting.

Critical or Required review findings must be closed with one of:

  • A code change plus verification.
  • An evidence-backed rejection.
  • A follow-up issue only if the finding is explicitly not merge-blocking.

Acknowledgement alone is not closure.

Production Defaults

Production deploys should fail closed:

  • Missing tenant result: fail.
  • Tenant unhealthy: fail.
  • /buildinfo unreachable: fail.
  • SHA mismatch: fail.
  • Required status cancelled/skipped/missing past timeout: fail.

Staging may tolerate warnings during rollout development; production should not.

Gitea 1.22.6 Constraints

Do not design production CI/CD around unsupported or unreliable features:

  • No workflow_run.
  • No reliable workflow_dispatch.inputs.
  • Do not assume concurrency.cancel-in-progress: false serializes queued runs.
  • Do not rely on a masked aggregate status as the only production deploy gate.

If these constraints change after a Gitea upgrade, update this SOP and the workflow linter in the same PR.