harden(e2e): staging-external + chat fail-closed (REQUIRE_LIVE, transient-retry, no zero-test green) #2279
Reference in New Issue
Block a user
Delete Branch "harden/e2e-staging-external-chat-failclosed"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Harden the staging-external + chat E2Es fail-closed toward HARD merge-gates.
continue-on-errorleft in place — promotion is the CTO's call.e2e-staging-external (test_staging_external_runtime.sh):
E2E_REQUIRE_LIVE=1; harness tracks the 4 contractedawaiting_agenttransitions, EXIT trap exits 5 on <4 proven (a real pre-existing failure is NOT masked into 5 — unit-verified).sleep 180sthen a lone GET → bounded readiness-poll toSTALE_POLL_DEADLINE_SECS(240=180+sweep headroom), hard-fail with elapsed./registry/register→register_with_retryretrying ONLY the transient transport class (5xx + body match), fail-closed on 4xx + exhausted budget; bearer-token redaction in transient logs.e2e-chat:
passWithNoTestsdefaulted true (renamed specs → exit 0) →passWithNoTests:false+forbidOnly:!!CI+ a run-step assert that ≥1 test executed.chat-desktop.spec.tsactivity-log test ended in.catch(()=>{})→ presence-gated (DOM-absent ⇒ recorded skip; present ⇒ realtoBeVisible).Pure-logic unit tests green (REQUIRE_LIVE matrix 6 + transient classification 9).
bash -n/shellcheck/YAML clean. PROMOTION-READINESS notes: still need an infra-vs-code signal split (a CP outage currently looks like a real failure) + the echo round-trip should assert the runtime actually received the A2A request.Both lanes stay continue-on-error (CTO's irreversible call) but are now fail-closed so they can become required gates. No "flaky" dispositions — each flake mechanism is named + fixed deterministically (internal#828). e2e-staging-external + test_staging_external_runtime.sh: - REQUIRE_LIVE guard (E2E_REQUIRE_LIVE=1 in CI): exit 5 if the harness reaches a clean exit without proving all four awaiting_agent transitions — a silent skip / early-return / dropped assertion can no longer show green. Mirrors CP serving-e2e SERVING_E2E_REQUIRE_LIVE. - Sweep-cadence flake (step 6): replaced fixed `sleep $STALE_WAIT_SECS` + one-shot assert with a bounded readiness-poll up to STALE_POLL_DEADLINE_SECS. A slow-but-working sweep tick was being misread as a stuck 'online'. - Cold-boot transient flake (register / re-register): single-shot POST /registry/register failed on Caddy 502/503/504 during cold TLS/agent boot. Added register_with_retry mirroring the full-saas bounded retry-on-transient loop — retries ONLY the transport class (5xx + body match), fails closed on 4xx (real contract bug) and on exhausted budget. - Token redaction (sanitize_http_body) on all transient-error logs. e2e-chat + Playwright: - passWithNoTests:false + forbidOnly(CI) in playwright.config.ts: a renamed/moved spec or stray test.only can no longer green the lane with zero executed tests. - REQUIRE-LIVE guard in the run step: chat==true must execute >=1 test. - chat-desktop "activity log" test no longer swallows its assertion with `.catch(() => {})` (always-passed before) — now presence-gated skip or a real visibility assertion. PROMOTION-READINESS comments added to each workflow listing what's now fail-closed and what still blocks promotion-to-required (infra-vs-code signal split for external; server-received A2A assertion for chat). Verified without live infra: bash -n + shellcheck clean on the harness (only a pre-existing SC2015 info on untouched teardown line); both workflow YAMLs parse; embedded run-step bash -n clean; pure-logic unit tests for REQUIRE_LIVE fail-closed, sweep-deadline guard, and transient retry classification all pass. Live staging suite NOT run (no infra). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>Reviewed: staging-external+chat fail-closed — REQUIRE_LIVE guard, bounded cold-boot register retry (transient-only, fail-closed on 4xx), Playwright passWithNoTests:false, swallowed .catch fixed. CI green. Approve.
REQUEST_CHANGES: direct Gitea verification does not support approval at head
10b7f8a99a.Source-of-truth combined CI is failure across 30 contexts at the current head. I cannot post a counting approval while the PR is red/pending, even with an existing CEO Assistant approval. Please re-request CR2 review after CI is success on the current head; I will re-run the normal 5-axis review then.
APPROVED after re-review using branch-protection required contexts rather than combined status.
Required-context check: present required context(s) are green at head 10b7f8a99af9; absent required contexts are path-filter absent for this PR. 5-axis review found no blocking issue.
Summary: Staging external/chat hardening prevents zero-test greens and tightens live validation paths.
Correctness/robustness: change adds targeted regression coverage or fail-closed behavior for the reported bug class. Security: no new secret exposure or auth broadening found. Performance: no concerning runtime cost. Readability: comments/tests are explicit about the incident class and gate semantics.