test(e2e): staging coverage for every runtime + resume/hibernate lifecycle #2296
Reference in New Issue
Block a user
Delete Branch "harden/staging-saas-all-runtimes"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
What
Closes the "e2e covers every runtime, no regressions" gap from the coverage audit. Adds the missing
provision → online → A2Aarms so the staging suite exercises every supported runtime, plus the resume/hibernate lifecycle transitions that previously had only handler unit tests.staging-saas (
tests/e2e/test_staging_full_saas.sh)E2E_RUNTIME=seo-agent) — provisioned viatemplate:"seo-agent", notruntime(seo-agent is a claude-code-adapter template variant, absent frommanifest.json/runtime_registry.goknownRuntimes; its config.yaml resolvesruntime: claude-code). Reuses the same MiniMax/claude-code key path (providers.yaml:21"the same block is copy-pasted into the seo-agent template"). Full provision→online→A2A→activity matrix, identical to the other runtime arms.E2E_RUNTIME=google-adk,E2E_GOOGLE_API_KEY) — BYOKGOOGLE_API_KEY/GEMINI_API_KEY→ baregemini-2.5-pro(providers.yamlruntimes.google-adk.googlearm). Exercises google-adk being provisioned at all; the keyless-Vertex PROD path (E2E_LLM_PATH=platform+platform:gemini-2.5-pro) needs WIF — flagged for CTO below.pause → paused → resume → provisioning → onlineandhibernate → hibernated → (auto-wake A2A) → online, each asserted against the live DB-backed status (workspace_restart.goPause/Resume/Hibernate). Gated to full MODE +E2E_LIFECYCLE!=off. Job timeout45→75min for the two reprovisions.template/runtime); create errors fail loud with a named message instead of a Python KeyError.staging-external (
tests/e2e/test_staging_external_runtime.sh)create(external:true, runtime=<rt>)→awaiting_agent+ runtime label preserved (not coerced to genericexternal,workspace.gonormalizeExternalRuntime) →register(poll)→online→ A2A → assert the poll-mode{status:"queued", delivery_mode:"poll"}envelope (a2a_proxy.go:462-477). Proves the a2a proxy routes a BYO meta-runtime to the poll queue rather than 404/500.REQUIRE_LIVE=1.Runtime/model evidence (registry + providers.yaml)
template:"seo-agent"→ claude-codeminimax:MiniMax-M2.7(or anthropic / oauth)runtimes.claude-code; template reuses claude-code blockruntime:"google-adk"googleAI-Studio, baregemini-2.5-proruntimes.google-adk.googleruntime:"kimi[-cli]",external:trueisExternalLikeRuntimeVerification (no live staging — needs the staging tenant + keys)
bash -n+shellcheck -xclean on all changed scripts (the oneSC2015info in the external harness is pre-existing, not in this diff).tests/e2e/test_model_slug.sh— 21/21 pass, with new pins for the seo-agent + google-adk branches.E2E_RUNTIME/selection + model verified againstruntime_registry.go+providers.yaml; payload shapes verified by isolated runs ofbuild_create_payload.⚠️ Flagged for CTO — needs extra provisioning
manifest.jsonworkspace_templateseven though it's inproviders.yaml+provisioner/registry.go+registry_gen.go. The Create-handler runtime allowlist is manifest-derived, soruntime:"google-adk"422sRUNTIME_UNSUPPORTEDuntilmanifest.jsongains it (+ template-cache ofmolecule-ai-workspace-template-google-adk, which already exists per RFC internal#730). I did not make this provisioning/architecture change unilaterally — the arm is wired and REDs clearly until the manifest is converged. This is an SSOT-drift fix that needs sign-off.Gate note
These staging arms remain
continue-on-error(non-gating). Promotinge2e-staging-saas.yml+e2e-staging-external.ymlto REQUIRED (after a de-flake window of consecutive green main runs for both jobs) is the CTO gate-flip that actually makes runtime provisioning regression-blocking.🤖 Generated with Claude Code
Closes the "e2e covers every runtime, no regressions" gap (coverage audit). Adds the missing provision→online→A2A arms so the staging suite exercises every supported runtime, plus the resume/hibernate lifecycle transitions. staging-saas (test_staging_full_saas.sh): - seo-agent arm (E2E_RUNTIME=seo-agent): provisioned via template="seo-agent" (NOT runtime — seo-agent is a claude-code-adapter template VARIANT absent from manifest.json/runtime_registry knownRuntimes; its config.yaml resolves runtime=claude-code). Reuses the same MiniMax/claude-code key path. Full provision→online→A2A→activity matrix, identical to the other runtime arms. - google-adk AI-Studio arm (E2E_RUNTIME=google-adk, E2E_GOOGLE_API_KEY): BYOK GOOGLE_API_KEY/GEMINI_API_KEY → bare gemini-2.5-pro (providers.yaml runtimes.google-adk `google` arm). Exercises google-adk being provisioned at all; the keyless-Vertex PROD path (E2E_LLM_PATH=platform + platform: model) needs WIF — FLAGGED for the CTO (see below). - Lifecycle step 10b: pause→paused→resume→provisioning→online and hibernate→hibernated→(auto-wake A2A)→online, each asserted against the live DB-backed status (workspace_restart.go Pause/Resume/Hibernate). Gated to full MODE + E2E_LIFECYCLE!=off. Job timeout 45→75 for the 2 reprovisions. - Create payload built in Python so template/runtime are emitted conditionally; create errors now fail loud (named) instead of a KeyError. staging-external (test_staging_external_runtime.sh): - kimi + kimi-cli BYO meta-runtime arms (step 7c): create(external:true, runtime=<rt>) → awaiting_agent + runtime-label-PRESERVED (not coerced to generic external, workspace.go normalizeExternalRuntime) → register(poll) → online → A2A → assert the poll-mode {status:"queued",delivery_mode:"poll"} envelope (a2a_proxy.go). Proves the a2a proxy routes a BYO meta-runtime to the poll queue rather than 404/500. Idioms preserved: skip-if-absent stays LOUD; REQUIRE_LIVE fail-closed intact; every new arm REDs on a real provision/A2A/transition break, never silently skips. model_slug dispatch pins added for seo-agent + google-adk (test passes 21/21). bash -n + shellcheck clean on all changed scripts. NOT changed (flagged for CTO, needs extra provisioning): - google-adk is in providers.yaml + provisioner/registry.go + registry_gen but MISSING from manifest.json workspace_templates → the Create-handler runtime allowlist (manifest-derived) rejects runtime="google-adk" with RUNTIME_UNSUPPORTED. Adding it (+ template-cache of molecule-ai-workspace-template-google-adk) is the provisioning change that makes the google-adk arm actually green. The arm is wired and REDs clearly until then. - Vertex WIF path for google-adk (server-side mint, no on-box cred) and a standing kimi BYO compute cell (for a REAL kimi completion vs the queued envelope) both need standing infra not present in staging. These staging arms remain continue-on-error (non-gating). Promoting e2e-staging-saas.yml + e2e-staging-external.yml to REQUIRED (after a de-flake window of consecutive green main runs) is the CTO gate-flip that makes runtime provisioning regression-blocking. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>APPROVED (CTO review). Verified diff: tests/e2e + 1 workflow ONLY, zero production code. Every new arm (seo-agent→claude-code/MiniMax, google-adk AI-Studio/gemini-2.5-pro, kimi/kimi-cli BYO-poll, pause/resume/hibernate lifecycle) hard-fails on a broken provision/transition/A2A assertion — no silent skip. Workflow change is timeout 45→75 + E2E_LIFECYCLE env only; continue-on-error correctly NOT flipped (gate-promotion is a separate CTO decision). 3 CTO-flagged provisioning gaps (google-adk manifest-allowlist, Vertex WIF, kimi compute cell) correctly NOT done unilaterally — arms red until then. Approving.
5-axis review: APPROVED.
Correctness: Expands staging E2E coverage for the requested runtime arms and lifecycle paths without production-code changes. The model slug helper now covers seo-agent as the claude-code template variant and google-adk AI-Studio BYOK, with unit coverage for those selections. The external-runtime harness adds kimi/kimi-cli BYO meta-runtime create/register/A2A poll-queue checks, and the full SaaS harness adds template-based seo-agent provisioning plus pause/resume/hibernate wake lifecycle checks.
Robustness: New create payload building avoids shell JSON escaping hazards and reports missing IDs with actionable response bodies. The lifecycle checks verify DB-backed status transitions and fail hard on broken transitions rather than silently skipping. Security: no secrets are committed; existing secret injection paths are used and Secret scan is green. Performance: the staging timeout increase is documented and sized for the added reprovision lifecycle checks; no unbounded polling was introduced. Readability: comments are lengthy but make the runtime/template distinctions and fail-closed expectations clear.
Required-context review: head
2e31f27304is mergeable; CI/all-required, E2E API Smoke, Handlers PG, Platform Go, and Secret scan are green. Combined red is ignored per corrected core gate.