ci: stop operational push jobs painting main red #962
No reviewers
Labels
No Label
merge-queue
merge-queue
merge-queue
merge-queue-hold
release-blocker
release-test
security
test-label-sre
tier:high
tier:low
tier:medium
triage-test
No Milestone
No project
No Assignees
7 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: molecule-ai/molecule-core#962
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "fix/main-push-operational-red"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
redeploy-tenants-on-mainmanual-only so it no longer races the integrated publish -> ECR push -> production deploy workflow.status-reapersoft-skip a tick when the Gitea commit-list API times out after retries instead of painting main red.Root Cause
Two operational jobs were able to turn a valid main SHA red without proving a product regression:
redeploy-tenants-on-mainstill had a push trigger and calledredeploy-fleetbefore the newstaging-<sha>ECR image existed. The integratedpublish-workspace-server-imageworkflow now owns the ordered build/push/deploy sequence, so the legacy workflow was a race.status-reaperretried the Gitea commit-list request, but if all attempts timed out it raised frommain()and failed the whole push run. The next scheduled tick can retry safely, so this failure should be observable but non-blocking.Comprehensive testing performed
python3 -m pytest tests/test_status_reaper.py tests/test_lint_workflow_yaml.py -q-> 74 passedpython3 .gitea/scripts/lint-workflow-yaml.py --workflow-dir .gitea/workflows-> 53 workflow files checked, no fatal Gitea-hostile shapesgit diff --check-> cleanLocal-postgres E2E run
N/A. This changes CI workflow wiring and a Gitea API reaper script only; no database behavior or SQL path changed.
Staging-smoke verified or pending
Pending CI on this PR. Production publish/deploy remains owned by
publish-workspace-server-image; this PR removes the redundant racing deploy trigger.Root-cause not symptom
Yes. The patch removes the obsolete push-triggered deployment race and changes the status-reaper outage behavior at the exact commit-list failure boundary that caused the red push.
Five-Axis review walked
::warning::and JSON summary withskipped=truefor observability.No backwards-compat shim / dead code added
No runtime shim. One constant alias preserves the older test/tooling symbol name after the existing compensation-description split; behavior remains unchanged.
Memory/saved-feedback consulted
Used current operator SOP/context from this workspace: Gitea only, no direct main pushes, persona attribution, PR-based changes, and verify other agents/statuses rather than taking them at face value.
/sop-ack
Claiming as
hongming-codex-laptop.Evidence already in PR body:
Review request: please validate that making
redeploy-tenants-on-mainmanual-only is the right ownership split now thatpublish-workspace-server-imageowns ordered production auto-deploy, and that the status-reaper soft-skip is limited to commit-list API outage only.[core-offsec-agent] SECURITY REVIEW — APPROVED ✅
/sop-ack root-cause
CI fix: stops operational push jobs from painting main red. Adds status-reaper to clean up stale workflow runs.
/sop-ack no-backwards-compat
N/A: CI infrastructure change. No user-facing behavior.
/sop-ack no-migration
No data migration.
/sop-ack no-new-deps
No new dependencies.
/sop-ack no-secrets
CI script changes. No secrets.
/sop-ack no-perf-risk
CI optimization. Reduces noise on main branch.
/sop-ack no-multi-region
N/A: CI configuration.
becc6a2434to3b47c974ee/sop-ack comprehensive-testing
/sop-ack local-postgres-e2e
/sop-ack staging-smoke
/sop-ack five-axis-review
/sop-ack memory-consulted
[core-qa-agent] APPROVED — SOP gates confirmed, tier:low
Addressed independent review finding.
Change made:
main()catching anyApiErrortoreap_branch()catching onlylist_recent_commit_shas()failures.ApiErrorand fail loudly.sop-checklist.ymlconcurrency update after the branch moved under me; current PR diff is back to the intended three files only.Updated local verification:
python3 -m pytest tests/test_status_reaper.py tests/test_lint_workflow_yaml.py -q-> 75 passedpython3 .gitea/scripts/lint-workflow-yaml.py --workflow-dir .gitea/workflows-> cleangit diff --check-> cleanIndependent subagent re-review after
dec1be23: no findings.Peer-review note for whoever owns the final Gitea ack:
sop-checklist.ymlrollback; current PR diff is the intended three files.