CI/CD hardening WS2: fix CI fan-out — consolidate workflows + needs:-aggregator gate (make #2094 the standard) #2095

Open
opened 2026-06-01 07:54:00 +00:00 by core-be · 1 comment
Member

Workstream 2 of the CI/CD + agent-fleet hardening RFC (molecule-ai/internal#749).

Problem (2026-06-01 incident)

~57 workflow files generating ~65 runs/PR overwhelm the Gitea Actions scheduler. The all-required poll-gate squats a runner slot up to 40 min/PR doing nothing but polling the statuses API — amplifying contention during the outage.

Proposed solution (make #2094 the org standard)

  • Consolidate workflows (collapse sub-second sibling lints into one run).
  • paths-gate / lazy-trigger heavy non-required advisory jobs.
  • needs:-aggregator required gate, NOT a poll-gate — plain needs: works on Gitea 1.22.6 / act_runner v0.6.1 (feedback_gitea_needs_works_only_ifalways_broken); if: always() aggregators land with the 1.26 upgrade (RFC ci-cd-elastic-runners-and-native-needs). Apply the #2094 pattern across every repo mirroring the sentinel.

Links

  • RFC PR: molecule-ai/internal#749
  • molecule-core #2094 — cut fan-out + stop the poll-gate slot-squat (this workstream starts here)
  • Sibling: internal WS1 #750 (runner-host split unblocks slot pressure), WS3 (alerting), WS4/WS5 (internal).
**Workstream 2 of the CI/CD + agent-fleet hardening RFC** (https://git.moleculesai.app/molecule-ai/internal/pulls/749). ## Problem (2026-06-01 incident) ~57 workflow files generating ~65 runs/PR overwhelm the Gitea Actions scheduler. The `all-required` **poll-gate squats a runner slot up to 40 min/PR** doing nothing but polling the statuses API — amplifying contention during the outage. ## Proposed solution (make #2094 the org standard) - **Consolidate workflows** (collapse sub-second sibling lints into one run). - **paths-gate / lazy-trigger** heavy non-required advisory jobs. - **`needs:`-aggregator required gate, NOT a poll-gate** — plain `needs:` works on Gitea 1.22.6 / act_runner v0.6.1 (`feedback_gitea_needs_works_only_ifalways_broken`); `if: always()` aggregators land with the 1.26 upgrade (RFC `ci-cd-elastic-runners-and-native-needs`). Apply the #2094 pattern across every repo mirroring the sentinel. ## Links - RFC PR: https://git.moleculesai.app/molecule-ai/internal/pulls/749 - molecule-core #2094 — cut fan-out + stop the poll-gate slot-squat (this workstream starts here) - Sibling: internal WS1 #750 (runner-host split unblocks slot pressure), WS3 (alerting), WS4/WS5 (internal).
Author
Member

CI/CD + agent-fleet hardening — workstream map

RFC PR: internal#749molecule-ai/internal#749

WS Title Issue Existing work
1 Separate CI execution plane from SCM/DB control plane internal#750 internal#91, operator-config#143, operator-config#144
2 CI fan-out architecture molecule-core#2095 molecule-core#2094
3 CI observability / alerting internal#751
4 Agent capability-tiering + tooling internal#752
5 Trustworthy merge-gate / governance internal#753

All five trace to the 2026-06-01 incident: one un-throttled backup on the single Gitea+Postgres+runners box took all CI red ~40 min with no alert (WS1–3), and the agent fleet failed review/RCA — cheap models 0/6 false, capable reviewers sandboxed without tooling (WS4–5).

### CI/CD + agent-fleet hardening — workstream map RFC PR: **internal#749** — https://git.moleculesai.app/molecule-ai/internal/pulls/749 | WS | Title | Issue | Existing work | |----|-------|-------|---------------| | 1 | Separate CI execution plane from SCM/DB control plane | internal#750 | internal#91, operator-config#143, operator-config#144 | | 2 | CI fan-out architecture | molecule-core#2095 | molecule-core#2094 | | 3 | CI observability / alerting | internal#751 | — | | 4 | Agent capability-tiering + tooling | internal#752 | — | | 5 | Trustworthy merge-gate / governance | internal#753 | — | All five trace to the **2026-06-01 incident**: one un-throttled backup on the single Gitea+Postgres+runners box took all CI red ~40 min with no alert (WS1–3), and the agent fleet failed review/RCA — cheap models 0/6 false, capable reviewers sandboxed without tooling (WS4–5).
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#2095