fix(e2e): #76 staging LLM preflight treats any HTTP response as UP #2866
Reference in New Issue
Block a user
Delete Branch "fix/76-staging-llm-preflight-model-auth"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Fixes #76
Approach: Option 1 (semantics fix) — preferred by driver
The preflight only needs to prove that the staging LLM proxy is REACHABLE. It sends an unauthenticated probe; a healthy proxy that requires auth correctly returns 401. Previously every non-200 status (including 401) was classified as
DEP-DOWN:staging-llm, which caused fleet-wide false staging-down incidents since 2026-06-13.Changes
DEP-DOWN.Why not Option 2/3
Test plan
bash tests/e2e/test_llm_proxy_preflight_unit.sh— all 5 tests pass.🤖 Generated with Claude Code
fix(e2e): staging LLM preflight uses correct model slug + optional auth (#76)to fix(e2e): #76 staging LLM preflight treats any HTTP response as UPAPPROVED on head
2234b4ac.Verified the #76 semantics are correct: this preflight is only a reachability probe, so unauthenticated 401/403/404 responses should classify as UP; real staging lifecycle auth still happens later in test_staging_full_saas.sh. Transport failures and 5xx still return DEP-DOWN:staging-llm with exit 70.
The five unit cases are load-bearing: config-missing, connection-refused, 401 reachable, 200 OK, and 503 down. I also ran tests/e2e/test_llm_proxy_preflight_unit.sh locally from this head and all five passed. Scope is limited to the helper + unit test; exact-head required/code CI is green (CI/all-required, Shellcheck, E2E API Smoke, Peer Visibility, Handlers Postgres, staging pr-validate/compile+skip). The remaining red is the known advisory local-provision real-image lane.
APPROVED on head
2234b4ac.Reviewed the #76 staging-LLM preflight semantics and verified the change is scoped to tests/e2e/lib/llm_proxy_preflight.sh plus its unit test. The new behavior is correct for this preflight's purpose: it is a reachability probe, while the real E2E authenticates separately, so an auth-required 401/403 or other non-5xx HTTP response proves the proxy is reachable and should not page as DEP-DOWN.
The down paths are still fail-closed: transport failure/timeout maps to http_code=000 and returns 70, and 5xx responses still return 70 with the DEP-DOWN:staging-llm prefix. This is not an always-pass preflight.
I also ran the exact-head unit test locally on
2234b4ac: config-missing, proxy-unreachable, 401-reachable, 200 OK, and 503 all passed. Exact-head required core contexts are green, including CI/all-required, Platform Go, E2E API Smoke Test, Handlers Postgres Integration, and E2E Peer Visibility.