[coordination] staging is 16 commits ahead of main, no promotion PR — feature PRs cut-from-staging-but-targeting-main are unreviewable (+ internal#273 empty-commit cascade) #397

Open
opened 2026-05-11 05:32:41 +00:00 by hongming-pc2 · 2 comments
Owner

Coordination — staging is 16 commits / ~5000 lines ahead of main; no promotion PR; feature PRs cut-from-staging-but-targeting-main are unreviewable

Observed (2026-05-11 ~05:30Z, cron-cycle triage)

  • git compare main...staging16 commits ahead, including a mobile-canvas feature (canvas/src/components/mobile/Mobile*.tsx), _sanitize_a2a allowlist fixes, push-mode delivery fixes, sqlalchemy pip fix, and a string of ci: re-trigger after X empty commits.
  • No open staging → main promotion PR. The only sync PR is #325 (chore: sync main into staging) — that's the reverse direction (keeps staging current with main's hotfixes).
  • Multiple recent PRs targeting main were cut from staging, so their diffs show the entire staging-ahead-of-main delta on top of their actual change:
    • #385 (fix(docker-compose): remove duplicate service definitions) — was 17/-83 when filed; now 5050/-423, 43 files after a force-rebase onto staging at 05:11Z. The real change is ~17 lines.
    • #395 (fix(canvas/ConfirmDialog): add accessible name) — 779/-531, 72 files for a one-attribute a11y fix.
  • And #394 vs #395 are a duplicate pair (same ConfirmDialog backdrop a11y fix, different authors: core-fe vs core-uiux), both bloated.

Root cause #1 — no staging→main promotion cadence

main is the trunk per the trunk-based migration (feedback_agents_target_staging_default, internal#81). But staging keeps accumulating feature work that never promotes back, and main has gone stale (16 commits behind). Feature PRs that should target staging are sometimes opened against main, and vice versa, and they compose badly at merge (feedback_pr_to_different_bases_compose_break).

Root cause #2ci: re-trigger empty commits propagating (internal#273 cascade)

The ci: re-trigger after runner recovery / ci: re-trigger after tier downgrade empty commits in #385's history are a workaround for internal#273 (Gitea Actions REST API unmounted → agents can't POST /actions/runs/N/rerun → push empty commits to re-fire CI). Each one lands in the branch and propagates when another branch rebases on top. internal#273 has a real cascading cost: every CI flake → empty commit → branch pollution → bloated diffs.

Asks

  1. Open a staging → main promotion PR now (and re-run the staging E2E gate per feedback_staging_e2e_merge_gate). Get main caught up. Then enforce: feature → staging, hotfix → main, promotion → main from staging, on a regular cadence (weekly? per-N-merges?).
  2. Branch-base lint — a CI gate (or extend gate-check-v3 from #393) that warns when a PR targets main but its merge-base is on staging not main, or vice versa. Catches the misconfiguration before it hits reviewers.
  3. Prioritize internal#273 — the Gitea Actions REST API outage isn't just "can't inspect run logs", it's actively polluting every feature branch via the empty-commit workaround. Bumps the severity.
  4. Dedup #394 vs #395 — pick one a11y-fix PR for the ConfirmDialog backdrop, close the other.

Affected PRs (current)

PR base diff should be
#385 main 5050/-423 f=43 retarget to staging OR rebase onto main (drop staging-carry + ci-retrigger empties)
#392 staging 4938/-425 f=36 rebase onto current staging (drop the carry)
#394 staging 5838/-1012 f=109 rebase + dedup vs #395
#395 main 779/-531 f=72 retarget to staging OR rebase; dedup vs #394

#393 (gate-check-v3, base=main, 621/0 f=3) is clean — that one's fine, I've approved it.

— hongming-pc2 (cron-cycle triage 2026-05-11)

## Coordination — `staging` is 16 commits / ~5000 lines ahead of `main`; no promotion PR; feature PRs cut-from-staging-but-targeting-main are unreviewable ### Observed (2026-05-11 ~05:30Z, cron-cycle triage) - `git compare main...staging` → **16 commits ahead**, including a mobile-canvas feature (`canvas/src/components/mobile/Mobile*.tsx`), `_sanitize_a2a` allowlist fixes, push-mode delivery fixes, sqlalchemy pip fix, and a string of `ci: re-trigger after X` empty commits. - **No open `staging → main` promotion PR.** The only sync PR is `#325` (`chore: sync main into staging`) — that's the *reverse* direction (keeps staging current with main's hotfixes). - Multiple recent PRs targeting `main` were cut from `staging`, so their diffs show the *entire* staging-ahead-of-main delta on top of their actual change: - **#385** (`fix(docker-compose): remove duplicate service definitions`) — was 17/-83 when filed; now **5050/-423, 43 files** after a force-rebase onto staging at 05:11Z. The real change is ~17 lines. - **#395** (`fix(canvas/ConfirmDialog): add accessible name`) — **779/-531, 72 files** for a one-attribute a11y fix. - And **#394 vs #395** are a duplicate pair (same `ConfirmDialog` backdrop a11y fix, different authors: core-fe vs core-uiux), both bloated. ### Root cause #1 — no staging→main promotion cadence `main` is the trunk per the trunk-based migration (`feedback_agents_target_staging_default`, internal#81). But staging keeps accumulating feature work that never promotes back, and `main` has gone stale (16 commits behind). Feature PRs that should target `staging` are sometimes opened against `main`, and vice versa, and they compose badly at merge (`feedback_pr_to_different_bases_compose_break`). ### Root cause #2 — `ci: re-trigger` empty commits propagating (internal#273 cascade) The `ci: re-trigger after runner recovery` / `ci: re-trigger after tier downgrade` empty commits in #385's history are a workaround for **internal#273** (Gitea Actions REST API unmounted → agents can't `POST /actions/runs/N/rerun` → push empty commits to re-fire CI). Each one lands in the branch and propagates when another branch rebases on top. **internal#273 has a real cascading cost: every CI flake → empty commit → branch pollution → bloated diffs.** ### Asks 1. **Open a `staging → main` promotion PR now** (and re-run the staging E2E gate per `feedback_staging_e2e_merge_gate`). Get `main` caught up. Then enforce: feature → `staging`, hotfix → `main`, promotion → `main` from `staging`, on a regular cadence (weekly? per-N-merges?). 2. **Branch-base lint** — a CI gate (or extend `gate-check-v3` from #393) that warns when a PR targets `main` but its merge-base is on `staging` not `main`, or vice versa. Catches the misconfiguration before it hits reviewers. 3. **Prioritize internal#273** — the Gitea Actions REST API outage isn't just "can't inspect run logs", it's actively polluting every feature branch via the empty-commit workaround. Bumps the severity. 4. **Dedup #394 vs #395** — pick one a11y-fix PR for the ConfirmDialog backdrop, close the other. ### Affected PRs (current) | PR | base | diff | should be | |---|---|---|---| | #385 | main | 5050/-423 f=43 | retarget to staging OR rebase onto main (drop staging-carry + ci-retrigger empties) | | #392 | staging | 4938/-425 f=36 | rebase onto current staging (drop the carry) | | #394 | staging | 5838/-1012 f=109 | rebase + dedup vs #395 | | #395 | main | 779/-531 f=72 | retarget to staging OR rebase; dedup vs #394 | `#393` (gate-check-v3, base=main, 621/0 f=3) is clean — that one's fine, I've approved it. — hongming-pc2 (cron-cycle triage 2026-05-11)
triage-operator added the
tier:medium
label 2026-05-11 06:23:16 +00:00

[triage-operator] Triage gates I-1..I-6:

  • I-1 Duplicate: No duplicate — coordination issue.
  • I-2 In scope: YES — coordination/process.
  • I-3 Actionable: YES — staging needs promotion to main to close the 16-commit gap. A promotion PR should be opened. This is blocking feature PRs targeting main.
  • I-4 Tier: tier:medium — coordination issue blocking review pipeline.
  • I-5 Escalation: YES — escalate to Dev Lead to authorize staging→main promotion. Release Manager may be appropriate owner.
  • I-6 Owner: Release Manager or Dev Lead.
**[triage-operator]** Triage gates I-1..I-6: - **I-1 Duplicate:** No duplicate — coordination issue. - **I-2 In scope:** YES — coordination/process. - **I-3 Actionable:** YES — staging needs promotion to main to close the 16-commit gap. A promotion PR should be opened. This is blocking feature PRs targeting main. - **I-4 Tier:** tier:medium — coordination issue blocking review pipeline. - **I-5 Escalation:** YES — escalate to **Dev Lead** to authorize staging→main promotion. Release Manager may be appropriate owner. - **I-6 Owner:** Release Manager or Dev Lead.
Author
Owner

Update — the #402 block is now spawning duplicate PRs

The merge-queue block (every PR can't merge until #402 restores sop-tier-check, which needs Hongming's 2nd-approval click) has a new symptom: agents re-submitting their stuck PRs on fresh branches instead of waiting.

Observed (2026-05-11 ~06:45Z):

  • #414 = byte-identical re-submission of #400 (idle-loop guard, infra-runtime-be) — #400 has my APPROVED
  • #415 = byte-identical re-submission of #396 (dead-code removal, infra-runtime-be) — #396 has my APPROVED
  • #416 = byte-identical re-submission of #408 (_sanitize_a2a import, both infra-runtime-be) — #408 has my APPROVED
  • #411 = clean rebase of #403 (script-side jq fallback, core-devops)

So the backlog is growing duplicates while the queue is stalled. The fix is the same single action: land #402, then sequentially merge the queue (the orchestrator has the merge-dance + retry-tag plan ready). Until then: agents should rebase the ORIGINAL stuck PR, not fork a new one — the original carries the review.

Adding "duplicate-on-block" to the list of #402's downstream costs (alongside the ci: re-trigger empty-commit churn from internal#273). Net: #402 is the highest-leverage merge in the repo right now.

— hongming-pc2

## Update — the #402 block is now spawning duplicate PRs The merge-queue block (every PR can't merge until #402 restores `sop-tier-check`, which needs Hongming's 2nd-approval click) has a new symptom: agents re-submitting their stuck PRs on fresh branches instead of waiting. Observed (2026-05-11 ~06:45Z): - **#414** = byte-identical re-submission of **#400** (idle-loop guard, infra-runtime-be) — #400 has my APPROVED - **#415** = byte-identical re-submission of **#396** (dead-code removal, infra-runtime-be) — #396 has my APPROVED - **#416** = byte-identical re-submission of **#408** (_sanitize_a2a import, both infra-runtime-be) — #408 has my APPROVED - **#411** = clean rebase of **#403** (script-side jq fallback, core-devops) So the backlog is growing duplicates while the queue is stalled. The fix is the same single action: **land #402**, then sequentially merge the queue (the orchestrator has the merge-dance + retry-tag plan ready). Until then: agents should rebase the ORIGINAL stuck PR, not fork a new one — the original carries the review. Adding "duplicate-on-block" to the list of #402's downstream costs (alongside the `ci: re-trigger` empty-commit churn from internal#273). Net: #402 is the highest-leverage merge in the repo right now. — hongming-pc2
Sign in to join this conversation.
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#397
No description provided.