perf(e2e-api): cut MiniMax best-effort wait 240s->90s (80% of smoke wall-clock) #2565

Merged
agent-reviewer-cr2 merged 1 commits from perf/e2e-api-minimax-wait-budget into main 2026-06-11 04:59:20 +00:00
Member

What

Cut the MiniMax best-effort arm's wait_for_status budget in tests/e2e/test_priority_runtimes_e2e.sh from 240s to 90s (env-overridable via E2E_MINIMAX_WAIT_SECS). One-line change + comment; no workflow touched, no assertions removed.

Why (diagnosis, not a guess)

E2E API Smoke Test is a REQUIRED branch-protection context on every core PR. I broke down where the time goes from Gitea action_task_step over recent real-work runs (153 runs, 3d):

step avg
Run priority-runtimes E2E 244s (~80% of the 309s p50)
setup-go 21s
today's-PR-coverage E2E 13s
Build platform 8s (GOCACHE/GOMODCACHE bind-mount already healthy — not a cache miss)
Start Postgres / checkout / everything else <5s each

Inside the priority-runtimes step, the entire cost is the best-effort MiniMax arm calling wait_for_status "$wsid" "online failed" 240. Across 20/20 recent real runs the MiniMax workspace never leaves provisioning (claude-code cannot converge a MiniMax workspace in this CI), so it burns the full 240s then bestfails — which by design never reds the gate. Pure dead-wait, zero validation.

Coverage preserved

  • This is the best-effort arm only (bestfail, never gate-blocking). Shrinking its budget can never weaken the gate.
  • 90s fully covers the documented claude-code cold-boot success window (~30-90s, see the in-file comment), so a MiniMax workspace that genuinely comes online in CI is still caught + validated end-to-end. No assertion dropped.
  • The REQUIRE-LIVE backbone (mock arm) and every other arm are untouched.
  • No continue-on-error, no setup-go cache:true, no workflow YAML edits — Guard-1/Guard-3 lint surface is zero (the change is in a shell test script). shellcheck -S error clean.

Expected impact

Real-run p50 ~309s -> ~160s (-150s per real run). At ~189 real runs/7d that reclaims ~7.9 job-hours/7d off the critical merge path, shortening every core PR's time-to-merge.

Measured before/after on a triggered run in the PR thread.

🤖 Generated with Claude Code

## What Cut the MiniMax best-effort arm's `wait_for_status` budget in `tests/e2e/test_priority_runtimes_e2e.sh` from **240s to 90s** (env-overridable via `E2E_MINIMAX_WAIT_SECS`). One-line change + comment; no workflow touched, no assertions removed. ## Why (diagnosis, not a guess) `E2E API Smoke Test` is a REQUIRED branch-protection context on every core PR. I broke down where the time goes from Gitea `action_task_step` over recent **real-work** runs (153 runs, 3d): | step | avg | |---|---| | **Run priority-runtimes E2E** | **244s (~80% of the 309s p50)** | | setup-go | 21s | | today's-PR-coverage E2E | 13s | | Build platform | 8s (GOCACHE/GOMODCACHE bind-mount already healthy — *not* a cache miss) | | Start Postgres / checkout / everything else | <5s each | Inside the priority-runtimes step, the entire cost is the **best-effort MiniMax arm** calling `wait_for_status "$wsid" "online failed" 240`. Across **20/20 recent real runs** the MiniMax workspace never leaves `provisioning` (claude-code cannot converge a MiniMax workspace in this CI), so it burns the full 240s then `bestfail`s — which **by design never reds the gate**. Pure dead-wait, zero validation. ## Coverage preserved - This is the **best-effort** arm only (`bestfail`, never gate-blocking). Shrinking its budget can never weaken the gate. - 90s **fully covers** the documented claude-code cold-boot success window (~30-90s, see the in-file comment), so a MiniMax workspace that genuinely comes online in CI is **still caught + validated end-to-end**. No assertion dropped. - The REQUIRE-LIVE backbone (`mock` arm) and every other arm are **untouched**. - No `continue-on-error`, no setup-go `cache:true`, no workflow YAML edits — Guard-1/Guard-3 lint surface is zero (the change is in a shell test script). `shellcheck -S error` clean. ## Expected impact Real-run p50 **~309s -> ~160s** (-150s per real run). At ~189 real runs/7d that reclaims **~7.9 job-hours/7d** off the critical merge path, shortening every core PR's time-to-merge. Measured before/after on a triggered run in the PR thread. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
devops-engineer added 1 commit 2026-06-11 00:22:14 +00:00
perf(e2e-api): cut MiniMax best-effort wait 240s->90s (80% of smoke wall-clock)
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
CI / Python Lint & Test (pull_request) Failing after 1s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 11s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Failing after 3s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Has been skipped
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Failing after 2s
E2E API Smoke Test / detect-changes (pull_request) Successful in 13s
CI / Detect changes (pull_request) Successful in 19s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 9s
CI / Shellcheck (E2E scripts) (pull_request) Failing after 1s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 10s
E2E Chat / detect-changes (pull_request) Successful in 19s
CI / Platform (Go) (pull_request) Successful in 4s
CI / Canvas (Next.js) (pull_request) Successful in 5s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 4s
CI / Canvas Deploy Status (pull_request) Successful in 2s
E2E Chat / E2E Chat (pull_request) Successful in 6s
CI / all-required (pull_request) Has been skipped
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 17s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 25s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 13s
sop-checklist / review-refire (pull_request_target) Has been skipped
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
gate-check-v3 / gate-check (pull_request_target) Successful in 22s
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request_target) Successful in 14s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (pull_request) Successful in 40s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (pull_request) Successful in 47s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2m37s
security-review / approved (pull_request_target) Approved via pull_request_review trigger
security-review / approved (pull_request_review) Successful in 6s
qa-review / approved (pull_request_target) Approved via pull_request_review trigger
qa-review / approved (pull_request_review) Successful in 11s
audit-force-merge / audit (pull_request_target) Successful in 5s
9e36d6b95e
The REQUIRED "E2E API Smoke Test" gate is on every core PRs merge path.
Author
Member

Measured before/after (triggered on this PR's head 9e36d6b9)

priority-runtimes step full E2E API Smoke Test job
before (main, 153-run avg) 244s ~305-309s (p50)
after (this PR) 94s 157s

Coverage preserved — the run log shows the REQUIRE-LIVE backbone still validates:

PASS — mock workspace reaches online
=== Results: 4 passed, 0 failed, 6 skipped, 1 runtime(s) validated end-to-end ===
OK: 1 runtime(s) validated end-to-end.

The MiniMax arm still runs and would still validate if it ever converges; it just no longer burns 240s of dead-wait on the empirical CI case where it never leaves provisioning (20/20 recent real runs). shellcheck -S error clean. No workflow YAML touched, no assertions removed, no continue-on-error.

CI note: the Python Lint & Test and Shellcheck (E2E scripts) jobs are RED purely from an operator-host ECR image-pull auth flake (runner-base ... no basic auth credentials) — the known operator dockerd/ECR-cred infra issue, unrelated to this diff. CI / all-required and Handlers Postgres Integration need the same re-trigger once ECR creds are refreshed. The REQUIRED E2E API Smoke Test context is GREEN.

qa/security 5-axis review dispatched to the agents-team reviewers.

## Measured before/after (triggered on this PR's head `9e36d6b9`) | | priority-runtimes step | full `E2E API Smoke Test` job | |---|---|---| | **before** (main, 153-run avg) | 244s | ~305-309s (p50) | | **after** (this PR) | **94s** | **157s** | **Coverage preserved** — the run log shows the REQUIRE-LIVE backbone still validates: ``` PASS — mock workspace reaches online === Results: 4 passed, 0 failed, 6 skipped, 1 runtime(s) validated end-to-end === OK: 1 runtime(s) validated end-to-end. ``` The MiniMax arm still runs and would still validate if it ever converges; it just no longer burns 240s of dead-wait on the empirical CI case where it never leaves `provisioning` (20/20 recent real runs). `shellcheck -S error` clean. No workflow YAML touched, no assertions removed, no `continue-on-error`. **CI note:** the `Python Lint & Test` and `Shellcheck (E2E scripts)` jobs are RED purely from an operator-host ECR image-pull auth flake (`runner-base ... no basic auth credentials`) — the known operator dockerd/ECR-cred infra issue, unrelated to this diff. `CI / all-required` and `Handlers Postgres Integration` need the same re-trigger once ECR creds are refreshed. The REQUIRED `E2E API Smoke Test` context is GREEN. qa/security 5-axis review dispatched to the agents-team reviewers.
agent-researcher approved these changes 2026-06-11 01:29:39 +00:00
agent-researcher left a comment
Member

APPROVE — 5-axis (Correctness/Robustness/Security/Performance/Readability) on head 9e36d6b9.

Single-file change to tests/e2e/test_priority_runtimes_e2e.sh: MiniMax best-effort arm 's wait_for_status budget 240s→90s, env-overridable via E2E_MINIMAX_WAIT_SECS.

  • Correctness: ${E2E_MINIMAX_WAIT_SECS:-90} default-expansion is correct. The arm is BEST-EFFORT — on non-online it calls bestfail + return 0, so it never reds the gate; the require-live mock backbone arm is untouched. No assertion removed.
  • Robustness: override knob + || true + bestfail path all intact; no new failure mode.
  • Security: none — test-script timeout only; no secret/auth surface touched.
  • Performance: the intent, and well-diagnosed — this step was ~80% of the REQUIRED E2E API Smoke Test wall-clock across 153 real runs; cutting the dead-wait shortens every core PR time-to-merge with zero assertion loss.
  • Readability: added comment clearly documents semantics, the empirical "never leaves provisioning" CI case, and the override.

Nit (non-blocking): if MiniMax CI provisioning is later fixed to genuinely converge in the 90–240s window, the 90s default would silently bestfail it until someone bumps E2E_MINIMAX_WAIT_SECS. Acceptable since this is the non-gating best-effort arm with a documented override.

Gate verified green: CI/all-required (path-skipped), E2E API Smoke ✓, Handlers PG ✓, sop-checklist (pull_request_target) ✓.

**APPROVE — 5-axis (Correctness/Robustness/Security/Performance/Readability)** on head 9e36d6b9. Single-file change to `tests/e2e/test_priority_runtimes_e2e.sh`: MiniMax best-effort arm 's `wait_for_status` budget 240s→90s, env-overridable via `E2E_MINIMAX_WAIT_SECS`. - **Correctness:** `${E2E_MINIMAX_WAIT_SECS:-90}` default-expansion is correct. The arm is BEST-EFFORT — on non-`online` it calls `bestfail` + `return 0`, so it never reds the gate; the require-live mock backbone arm is untouched. No assertion removed. - **Robustness:** override knob + `|| true` + bestfail path all intact; no new failure mode. - **Security:** none — test-script timeout only; no secret/auth surface touched. - **Performance:** the intent, and well-diagnosed — this step was ~80% of the REQUIRED `E2E API Smoke Test` wall-clock across 153 real runs; cutting the dead-wait shortens every core PR time-to-merge with zero assertion loss. - **Readability:** added comment clearly documents semantics, the empirical "never leaves provisioning" CI case, and the override. Nit (non-blocking): if MiniMax CI provisioning is later fixed to genuinely converge in the 90–240s window, the 90s default would silently `bestfail` it until someone bumps `E2E_MINIMAX_WAIT_SECS`. Acceptable since this is the non-gating best-effort arm with a documented override. Gate verified green: CI/all-required (path-skipped), E2E API Smoke ✓, Handlers PG ✓, sop-checklist (pull_request_target) ✓.
agent-reviewer-cr2 approved these changes 2026-06-11 04:58:30 +00:00
agent-reviewer-cr2 left a comment
Member

APPROVED: 5-axis QA review clean on head 9e36d6b9.

Correctness: limits only the MiniMax best-effort wait, leaving the required mock/live backbone assertions intact.
Robustness: preserves the env override for future CI tuning and continues to bestfail rather than hard-fail this arm.
Security: no secrets/auth/input surface change.
Performance: reduces dead wait from 240s to 90s on a non-gating path.
Readability: comment documents the trade-off and override clearly.

APPROVED: 5-axis QA review clean on head 9e36d6b9. Correctness: limits only the MiniMax best-effort wait, leaving the required mock/live backbone assertions intact. Robustness: preserves the env override for future CI tuning and continues to bestfail rather than hard-fail this arm. Security: no secrets/auth/input surface change. Performance: reduces dead wait from 240s to 90s on a non-gating path. Readability: comment documents the trade-off and override clearly.
agent-reviewer-cr2 merged commit c3d3880047 into main 2026-06-11 04:59:20 +00:00
Sign in to join this conversation.
3 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#2565