The synth-E2E (#2342) provisions a langgraph tenant whose default
model `openai:gpt-4.1-mini` requires OPENAI_API_KEY for the first LLM
call. Sibling workflows already wire this:
- e2e-staging-saas.yml:89
- canary-staging.yml:63
continuous-synth-e2e.yml just forgot. Result: tenant boots, accepts
a2a messages, then returns:
Agent error: "Could not resolve authentication method. Expected
either api_key or auth_token to be set."
This was masked since 2026-04-29 (workflow creation) by a2a-sdk v0→v1
contract violations — PR #2558 (Task-enqueue) and #2563
(TaskUpdater.complete/failed terminal events) cleared those, exposing
the underlying auth gap on the synth-E2E firing at 11:39 UTC today.
The script tests/e2e/test_staging_full_saas.sh:325 already reads
E2E_OPENAI_API_KEY and persists it as a workspace_secret on tenant
create — only the workflow wiring was missing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Hard gate Tier 2 item 2 of 4. Cron-driven full-lifecycle E2E that
catches regressions visible only at runtime — schema drift,
deployment-pipeline gaps, vendor outages, env-var rotations,
DNS / CF / Railway side-effects.
Empirical motivation from today:
- #2345 (A2A v0.2 silent drop) — passed unit tests, broke at JSON-RPC
parse layer between sender + receiver. Visible only when a sender
exercises the full path. Now-fixed by PR #2349, but a continuous
E2E would have surfaced it within 20 min of the regression.
- RFC #2312 chat upload — landed staging-branch but never reached
staging tenants because publish-workspace-server-image was main-
only. Caught by manual dogfooding hours after deploy. Same pattern.
Both classes are invisible to PR-time CI. The continuous gate fires
every 20 min against a real staging tenant and surfaces regressions
within minutes.
Cadence: cron `0,20,40 * * * *` (3x/hour). Offsets the existing
sweep-cf-orphans (:15) and sweep-cf-tunnels (:45) so the three ops
don't burst CF/AWS APIs at the same minute. Concurrency group
prevents overlapping runs if one hangs.
Cost: ~$0.50-1/day GHA + pennies of staging tenant lifecycle.
Reuses existing tests/e2e/test_staging_full_saas.sh — no new harness
to maintain. Bounded at 10 min wall-clock (vs 15 min default) so
stuck runs fail fast rather than holding up the next firing.
Defaults to E2E_RUNTIME=langgraph (fastest cold start; the regression
classes this gate catches don't need hermes-specific paths). Operators
can dispatch with runtime=hermes when they want SDK-native coverage.
Schedule-vs-dispatch hardening: hard-fail on missing
CP_STAGING_ADMIN_API_TOKEN for cron firing (silent-skip would mask
real outages); soft-skip for operator dispatch.
Refs:
- #2342 hard-gates Tier 2 item 2
- #2345 (A2A v0.2 fix that this gate would have caught earlier)
- #2335 / #2337 (deployment-pipeline gaps that this gate also catches)