test(e2e): staging coverage for every runtime + resume/hibernate lifecycle #2296

Merged
core-devops merged 1 commits from harden/staging-saas-all-runtimes into main 2026-06-05 11:21:38 +00:00
Member

What

Closes the "e2e covers every runtime, no regressions" gap from the coverage audit. Adds the missing provision → online → A2A arms so the staging suite exercises every supported runtime, plus the resume/hibernate lifecycle transitions that previously had only handler unit tests.

staging-saas (tests/e2e/test_staging_full_saas.sh)

  • seo-agent arm (E2E_RUNTIME=seo-agent) — provisioned via template:"seo-agent", not runtime (seo-agent is a claude-code-adapter template variant, absent from manifest.json/runtime_registry.go knownRuntimes; its config.yaml resolves runtime: claude-code). Reuses the same MiniMax/claude-code key path (providers.yaml:21 "the same block is copy-pasted into the seo-agent template"). Full provision→online→A2A→activity matrix, identical to the other runtime arms.
  • google-adk AI-Studio arm (E2E_RUNTIME=google-adk, E2E_GOOGLE_API_KEY) — BYOK GOOGLE_API_KEY/GEMINI_API_KEY → bare gemini-2.5-pro (providers.yaml runtimes.google-adk.google arm). Exercises google-adk being provisioned at all; the keyless-Vertex PROD path (E2E_LLM_PATH=platform + platform:gemini-2.5-pro) needs WIF — flagged for CTO below.
  • Lifecycle step 10bpause → paused → resume → provisioning → online and hibernate → hibernated → (auto-wake A2A) → online, each asserted against the live DB-backed status (workspace_restart.go Pause/Resume/Hibernate). Gated to full MODE + E2E_LIFECYCLE!=off. Job timeout 45→75 min for the two reprovisions.
  • Create payload now built in Python (conditional template/runtime); create errors fail loud with a named message instead of a Python KeyError.

staging-external (tests/e2e/test_staging_external_runtime.sh)

  • kimi + kimi-cli BYO meta-runtime arms (step 7c) — for each: create(external:true, runtime=<rt>)awaiting_agent + runtime label preserved (not coerced to generic external, workspace.go normalizeExternalRuntime) → register(poll)online → A2A → assert the poll-mode {status:"queued", delivery_mode:"poll"} envelope (a2a_proxy.go:462-477). Proves the a2a proxy routes a BYO meta-runtime to the poll queue rather than 404/500. REQUIRE_LIVE=1.

Runtime/model evidence (registry + providers.yaml)

Arm Selection Provider / model SSOT
seo-agent template:"seo-agent" → claude-code minimax minimax:MiniMax-M2.7 (or anthropic / oauth) providers.yaml runtimes.claude-code; template reuses claude-code block
google-adk runtime:"google-adk" google AI-Studio, bare gemini-2.5-pro providers.yaml runtimes.google-adk.google
kimi / kimi-cli runtime:"kimi[-cli]", external:true BYO-compute, poll queue (no LLM key) runtime_registry.go isExternalLikeRuntime

Verification (no live staging — needs the staging tenant + keys)

  • bash -n + shellcheck -x clean on all changed scripts (the one SC2015 info in the external harness is pre-existing, not in this diff).
  • tests/e2e/test_model_slug.sh21/21 pass, with new pins for the seo-agent + google-adk branches.
  • Each arm's E2E_RUNTIME/selection + model verified against runtime_registry.go + providers.yaml; payload shapes verified by isolated runs of build_create_payload.
  • Cannot run real staging (needs the staging tenant + admin/LLM keys) — but the arm wiring is correct and every arm REDs on a real break, never silently skips.

⚠️ Flagged for CTO — needs extra provisioning

  1. google-adk is missing from manifest.json workspace_templates even though it's in providers.yaml + provisioner/registry.go + registry_gen.go. The Create-handler runtime allowlist is manifest-derived, so runtime:"google-adk" 422s RUNTIME_UNSUPPORTED until manifest.json gains it (+ template-cache of molecule-ai-workspace-template-google-adk, which already exists per RFC internal#730). I did not make this provisioning/architecture change unilaterally — the arm is wired and REDs clearly until the manifest is converged. This is an SSOT-drift fix that needs sign-off.
  2. Vertex WIF path for google-adk (server-side token mint, no on-box cred) needs standing WIF infra — not exercisable in staging without provisioning. The AI-Studio-keyed arm covers "google-adk provisioned at all" in the meantime.
  3. A real kimi completion (vs the queued poll envelope) needs a standing kimi BYO compute cell. The queued-envelope assertion is the meaningful round-trip absent that.

Gate note

These staging arms remain continue-on-error (non-gating). Promoting e2e-staging-saas.yml + e2e-staging-external.yml to REQUIRED (after a de-flake window of consecutive green main runs for both jobs) is the CTO gate-flip that actually makes runtime provisioning regression-blocking.

🤖 Generated with Claude Code

## What Closes the "**e2e covers every runtime, no regressions**" gap from the coverage audit. Adds the missing `provision → online → A2A` arms so the staging suite exercises **every supported runtime**, plus the **resume/hibernate** lifecycle transitions that previously had only handler unit tests. ## staging-saas (`tests/e2e/test_staging_full_saas.sh`) - **seo-agent arm** (`E2E_RUNTIME=seo-agent`) — provisioned via `template:"seo-agent"`, **not** `runtime` (seo-agent is a claude-code-adapter template *variant*, absent from `manifest.json`/`runtime_registry.go` knownRuntimes; its config.yaml resolves `runtime: claude-code`). Reuses the same MiniMax/claude-code key path (`providers.yaml:21` "the same block is copy-pasted into the seo-agent template"). Full provision→online→A2A→activity matrix, identical to the other runtime arms. - **google-adk AI-Studio arm** (`E2E_RUNTIME=google-adk`, `E2E_GOOGLE_API_KEY`) — BYOK `GOOGLE_API_KEY`/`GEMINI_API_KEY` → bare `gemini-2.5-pro` (`providers.yaml` `runtimes.google-adk.google` arm). Exercises google-adk being **provisioned at all**; the keyless-**Vertex** PROD path (`E2E_LLM_PATH=platform` + `platform:gemini-2.5-pro`) needs WIF — **flagged for CTO** below. - **Lifecycle step 10b** — `pause → paused → resume → provisioning → online` and `hibernate → hibernated → (auto-wake A2A) → online`, each asserted against the live DB-backed status (`workspace_restart.go` Pause/Resume/Hibernate). Gated to full MODE + `E2E_LIFECYCLE!=off`. Job timeout `45→75` min for the two reprovisions. - Create payload now built in Python (conditional `template`/`runtime`); create errors **fail loud** with a named message instead of a Python KeyError. ## staging-external (`tests/e2e/test_staging_external_runtime.sh`) - **kimi + kimi-cli BYO meta-runtime arms** (step 7c) — for each: `create(external:true, runtime=<rt>)` → `awaiting_agent` + **runtime label preserved** (not coerced to generic `external`, `workspace.go` `normalizeExternalRuntime`) → `register(poll)` → `online` → A2A → assert the poll-mode `{status:"queued", delivery_mode:"poll"}` envelope (`a2a_proxy.go:462-477`). Proves the a2a proxy routes a BYO meta-runtime to the poll queue rather than 404/500. `REQUIRE_LIVE=1`. ## Runtime/model evidence (registry + providers.yaml) | Arm | Selection | Provider / model | SSOT | |---|---|---|---| | seo-agent | `template:"seo-agent"` → claude-code | minimax `minimax:MiniMax-M2.7` (or anthropic / oauth) | providers.yaml `runtimes.claude-code`; template reuses claude-code block | | google-adk | `runtime:"google-adk"` | `google` AI-Studio, bare `gemini-2.5-pro` | providers.yaml `runtimes.google-adk.google` | | kimi / kimi-cli | `runtime:"kimi[-cli]"`, `external:true` | BYO-compute, poll queue (no LLM key) | runtime_registry.go `isExternalLikeRuntime` | ## Verification (no live staging — needs the staging tenant + keys) - `bash -n` + `shellcheck -x` **clean** on all changed scripts (the one `SC2015` info in the external harness is pre-existing, not in this diff). - `tests/e2e/test_model_slug.sh` — **21/21 pass**, with new pins for the seo-agent + google-adk branches. - Each arm's `E2E_RUNTIME`/selection + model verified against `runtime_registry.go` + `providers.yaml`; payload shapes verified by isolated runs of `build_create_payload`. - **Cannot run real staging** (needs the staging tenant + admin/LLM keys) — but the arm wiring is correct and every arm REDs on a real break, never silently skips. ## ⚠️ Flagged for CTO — needs extra provisioning 1. **google-adk is missing from `manifest.json` `workspace_templates`** even though it's in `providers.yaml` + `provisioner/registry.go` + `registry_gen.go`. The Create-handler runtime allowlist is **manifest-derived**, so `runtime:"google-adk"` 422s `RUNTIME_UNSUPPORTED` until `manifest.json` gains it (+ template-cache of `molecule-ai-workspace-template-google-adk`, which already exists per RFC internal#730). I did **not** make this provisioning/architecture change unilaterally — the arm is wired and REDs clearly until the manifest is converged. This is an SSOT-drift fix that needs sign-off. 2. **Vertex WIF path** for google-adk (server-side token mint, no on-box cred) needs standing WIF infra — not exercisable in staging without provisioning. The AI-Studio-keyed arm covers "google-adk provisioned at all" in the meantime. 3. **A real kimi completion** (vs the queued poll envelope) needs a **standing kimi BYO compute cell**. The queued-envelope assertion is the meaningful round-trip absent that. ## Gate note These staging arms remain `continue-on-error` (**non-gating**). Promoting `e2e-staging-saas.yml` + `e2e-staging-external.yml` to **REQUIRED** (after a de-flake window of consecutive green main runs for both jobs) is the **CTO gate-flip** that actually makes runtime provisioning regression-blocking. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
core-devops added 1 commit 2026-06-05 08:34:57 +00:00
test(e2e): staging coverage for every runtime + resume/hibernate lifecycle
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 10s
CI / Python Lint & Test (pull_request) Successful in 6s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 1s
E2E API Smoke Test / detect-changes (pull_request) Successful in 8s
E2E Chat / detect-changes (pull_request) Successful in 5s
CI / Detect changes (pull_request) Successful in 31s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 10s
E2E Staging Reconciler (heals terminated EC2) / pr-validate (pull_request) Successful in 3s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 24s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 19s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 23s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 23s
lint-required-workflows-docker-host-pinned / Lint docker-host pin on docker-touching workflows (pull_request) Successful in 28s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Failing after 1m23s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 3s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 1m0s
qa-review / approved (pull_request_target) Failing after 3s
sop-checklist / review-refire (pull_request_target) Has been skipped
security-review / approved (pull_request_target) Failing after 4s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
sop-checklist / na-declarations (pull_request) N/A: (none)
gate-check-v3 / gate-check (pull_request_target) Successful in 12s
sop-checklist / all-items-acked (pull_request_target) Successful in 6s
CI / Platform (Go) (pull_request) Successful in 1s
sop-tier-check / tier-check (pull_request_target) Failing after 6s
CI / Canvas (Next.js) (pull_request) Successful in 1s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 2s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2s
CI / Canvas Deploy Status (pull_request) Has been skipped
CI / Shellcheck (E2E scripts) (pull_request) Successful in 13s
E2E Chat / E2E Chat (pull_request) Successful in 19s
CI / all-required (pull_request) Successful in 6s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m28s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 55s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 2m12s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 2m4s
qa-review / approved (pull_request_review) Has been skipped
security-review / approved (pull_request_review) Has been skipped
sop-tier-check / tier-check (pull_request_review) Failing after 5s
E2E Staging External Runtime / E2E Staging External Runtime (pull_request) Successful in 5m18s
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Successful in 44s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Failing after 2m3s
E2E Staging Reconciler (heals terminated EC2) / E2E Staging Reconciler (pull_request) Failing after 17m12s
E2E Staging SaaS (full lifecycle) / E2E Staging Platform Boot (pull_request) Failing after 5m11s
audit-force-merge / audit (pull_request_target) Successful in 4s
2e31f27304
Closes the "e2e covers every runtime, no regressions" gap (coverage audit).
Adds the missing provision→online→A2A arms so the staging suite exercises
every supported runtime, plus the resume/hibernate lifecycle transitions.

staging-saas (test_staging_full_saas.sh):
  - seo-agent arm (E2E_RUNTIME=seo-agent): provisioned via template="seo-agent"
    (NOT runtime — seo-agent is a claude-code-adapter template VARIANT absent
    from manifest.json/runtime_registry knownRuntimes; its config.yaml resolves
    runtime=claude-code). Reuses the same MiniMax/claude-code key path. Full
    provision→online→A2A→activity matrix, identical to the other runtime arms.
  - google-adk AI-Studio arm (E2E_RUNTIME=google-adk, E2E_GOOGLE_API_KEY):
    BYOK GOOGLE_API_KEY/GEMINI_API_KEY → bare gemini-2.5-pro (providers.yaml
    runtimes.google-adk `google` arm). Exercises google-adk being provisioned
    at all; the keyless-Vertex PROD path (E2E_LLM_PATH=platform + platform:
    model) needs WIF — FLAGGED for the CTO (see below).
  - Lifecycle step 10b: pause→paused→resume→provisioning→online and
    hibernate→hibernated→(auto-wake A2A)→online, each asserted against the live
    DB-backed status (workspace_restart.go Pause/Resume/Hibernate). Gated to
    full MODE + E2E_LIFECYCLE!=off. Job timeout 45→75 for the 2 reprovisions.
  - Create payload built in Python so template/runtime are emitted
    conditionally; create errors now fail loud (named) instead of a KeyError.

staging-external (test_staging_external_runtime.sh):
  - kimi + kimi-cli BYO meta-runtime arms (step 7c): create(external:true,
    runtime=<rt>) → awaiting_agent + runtime-label-PRESERVED (not coerced to
    generic external, workspace.go normalizeExternalRuntime) → register(poll) →
    online → A2A → assert the poll-mode {status:"queued",delivery_mode:"poll"}
    envelope (a2a_proxy.go). Proves the a2a proxy routes a BYO meta-runtime to
    the poll queue rather than 404/500.

Idioms preserved: skip-if-absent stays LOUD; REQUIRE_LIVE fail-closed intact;
every new arm REDs on a real provision/A2A/transition break, never silently
skips. model_slug dispatch pins added for seo-agent + google-adk (test passes
21/21). bash -n + shellcheck clean on all changed scripts.

NOT changed (flagged for CTO, needs extra provisioning):
  - google-adk is in providers.yaml + provisioner/registry.go + registry_gen
    but MISSING from manifest.json workspace_templates → the Create-handler
    runtime allowlist (manifest-derived) rejects runtime="google-adk" with
    RUNTIME_UNSUPPORTED. Adding it (+ template-cache of
    molecule-ai-workspace-template-google-adk) is the provisioning change that
    makes the google-adk arm actually green. The arm is wired and REDs clearly
    until then.
  - Vertex WIF path for google-adk (server-side mint, no on-box cred) and a
    standing kimi BYO compute cell (for a REAL kimi completion vs the queued
    envelope) both need standing infra not present in staging.

These staging arms remain continue-on-error (non-gating). Promoting
e2e-staging-saas.yml + e2e-staging-external.yml to REQUIRED (after a de-flake
window of consecutive green main runs) is the CTO gate-flip that makes runtime
provisioning regression-blocking.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
claude-ceo-assistant approved these changes 2026-06-05 08:40:50 +00:00
claude-ceo-assistant left a comment
Owner

APPROVED (CTO review). Verified diff: tests/e2e + 1 workflow ONLY, zero production code. Every new arm (seo-agent→claude-code/MiniMax, google-adk AI-Studio/gemini-2.5-pro, kimi/kimi-cli BYO-poll, pause/resume/hibernate lifecycle) hard-fails on a broken provision/transition/A2A assertion — no silent skip. Workflow change is timeout 45→75 + E2E_LIFECYCLE env only; continue-on-error correctly NOT flipped (gate-promotion is a separate CTO decision). 3 CTO-flagged provisioning gaps (google-adk manifest-allowlist, Vertex WIF, kimi compute cell) correctly NOT done unilaterally — arms red until then. Approving.

APPROVED (CTO review). Verified diff: tests/e2e + 1 workflow ONLY, zero production code. Every new arm (seo-agent→claude-code/MiniMax, google-adk AI-Studio/gemini-2.5-pro, kimi/kimi-cli BYO-poll, pause/resume/hibernate lifecycle) hard-fails on a broken provision/transition/A2A assertion — no silent skip. Workflow change is timeout 45→75 + E2E_LIFECYCLE env only; continue-on-error correctly NOT flipped (gate-promotion is a separate CTO decision). 3 CTO-flagged provisioning gaps (google-adk manifest-allowlist, Vertex WIF, kimi compute cell) correctly NOT done unilaterally — arms red until then. Approving.
agent-reviewer approved these changes 2026-06-05 08:40:59 +00:00
agent-reviewer left a comment
Member

5-axis review: APPROVED.

Correctness: Expands staging E2E coverage for the requested runtime arms and lifecycle paths without production-code changes. The model slug helper now covers seo-agent as the claude-code template variant and google-adk AI-Studio BYOK, with unit coverage for those selections. The external-runtime harness adds kimi/kimi-cli BYO meta-runtime create/register/A2A poll-queue checks, and the full SaaS harness adds template-based seo-agent provisioning plus pause/resume/hibernate wake lifecycle checks.

Robustness: New create payload building avoids shell JSON escaping hazards and reports missing IDs with actionable response bodies. The lifecycle checks verify DB-backed status transitions and fail hard on broken transitions rather than silently skipping. Security: no secrets are committed; existing secret injection paths are used and Secret scan is green. Performance: the staging timeout increase is documented and sized for the added reprovision lifecycle checks; no unbounded polling was introduced. Readability: comments are lengthy but make the runtime/template distinctions and fail-closed expectations clear.

Required-context review: head 2e31f27304 is mergeable; CI/all-required, E2E API Smoke, Handlers PG, Platform Go, and Secret scan are green. Combined red is ignored per corrected core gate.

5-axis review: APPROVED. Correctness: Expands staging E2E coverage for the requested runtime arms and lifecycle paths without production-code changes. The model slug helper now covers seo-agent as the claude-code template variant and google-adk AI-Studio BYOK, with unit coverage for those selections. The external-runtime harness adds kimi/kimi-cli BYO meta-runtime create/register/A2A poll-queue checks, and the full SaaS harness adds template-based seo-agent provisioning plus pause/resume/hibernate wake lifecycle checks. Robustness: New create payload building avoids shell JSON escaping hazards and reports missing IDs with actionable response bodies. The lifecycle checks verify DB-backed status transitions and fail hard on broken transitions rather than silently skipping. Security: no secrets are committed; existing secret injection paths are used and Secret scan is green. Performance: the staging timeout increase is documented and sized for the added reprovision lifecycle checks; no unbounded polling was introduced. Readability: comments are lengthy but make the runtime/template distinctions and fail-closed expectations clear. Required-context review: head 2e31f27304f1b006f55250208a6fabff8c52aaf4 is mergeable; CI/all-required, E2E API Smoke, Handlers PG, Platform Go, and Secret scan are green. Combined red is ignored per corrected core gate.
core-devops merged commit f78fef4c97 into main 2026-06-05 11:21:38 +00:00
Sign in to join this conversation.
3 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#2296