[no-regression] e2e coverage matrix + gaps: runtimes × providers × features #2332

Open
opened 2026-06-06 04:24:13 +00:00 by claude-ceo-assistant · 0 comments
Owner

e2e coverage matrix — runtimes × providers × features (no-regression audit, 2026-06-05)

Evidence-based audit (read from origin/main of core+CP+templates; local checkouts were stale). Goal: every supported runtime + provider + feature has a gating e2e, so no silent regressions.

Supported set (SSOT = controlplane providers.yaml)

  • Runtimes (5): claude-code, codex, hermes, openclaw, google-adk. (crewai/deepagents/gemini-cli/autogen/langgraph removed via migrations 031/034/035.)
  • Providers×auth gated live by ci / serving-e2e (REQUIRED, CP): anthropic-api, anthropic-oauth, openai-api, kimi-coding, minimax, google, vertex(WIF, conditional), platform, byok-{anthropic,openai,minimax,gemini}.

CONFIRMED COVERAGE GAPS (prioritized)

P0 — gated-cell illusions / shipped-but-uncovered:

  1. google-adk has NO live e2e arm. Core test_priority_runtimes_e2e.sh default set = mock claude-code codex hermes openclaw minimax — google-adk absent. Also not asserted in CP runtimes_test.go SSOT map. We just registered its models (cp#568/#2327) with zero serving/runtime e2e. → add google-adk arm + SSOT assertion.
  2. openai-subscription (Codex OAuth) serving is gated NOWHERE. CP serving-e2e arm #4 hard-skips it (serving_e2e_test.go:233); core priority-runtimes codex arm skips without E2E_OPENAI_API_KEY. Looks covered, isn't. → real Codex-backend serving e2e.
  3. groq: zero serving coverage (no arm; key now provisioned in Infisical /shared/serving-e2e-keys) → add arm.

P0 — feature gaps that map to ACTUAL recent incidents:
4. Context-overflow autoheal / session-reset: ZERO e2e. This is exactly what wedged Kimi tonight (262k overflow → manual restart). Must ship an e2e WITH fix/context-overflow-autoheal.
5. Workspace data-volume survives-recreate + snapshot-before-swap (/home/agent wipe): no e2e. Past incident feedback_workspace_container_swap_wipes_home_agent.
6. SM/secrets.d real round-trip (two-sided IAM): no live e2e (only userdata-string unit). Past incident cp#358.
7. Desktop reconnect + 300s lease renewal (core#2216): no e2e.

P1 — registry/SSOT drift (real regressions):
8. Core providers registry mirror STALE on vertex — fingerprint 9d129c96(CP) vs e457249e(core); core lacks the keyless-WIF vertex (auth_mode wif_adc + endpoint_vars). Re-byte-sync providers.yaml + regen. (This is the #561 vertex-SSOT work.)
9. catalog-vs-runtime over-offer (billing-only model ids) + haiku id mismatch (claude-haiku-4-5 vs -4-5-20251001) + no template_registry row for codex/google-adk (not List-API-discoverable).

P1 — operational surface (impl exists, unit-only):
10. Soft-restart/pause/resume/hibernate against a real container; live channel send (telegram/slack/discord/lark/email — email has no adapter test at all); channel discover; workspace data-prune (RFC#734).

Structural finding

Core's 3 REQUIRED gates (CI/all-required, E2E API Smoke, Handlers Postgres) include no live-SaaS/staging e2e — E2E Chat / Staging SaaS / Peer Visibility / Canvas / External / Reconciler are all advisory. The only live-runtime assurance in the required set is the mock backbone (real-LLM arms skip without secrets). Promote-to-required candidates exist (e2e-peer-visibility is already continue-on-error:false).

Owners: P0.1/P0.3/P1.8 → in-flight PRs (this issue). P0.2/P0.4-7/P1.10 → scoped follow-up PRs, each adding a GATING test. Tracking umbrella for the no-regression e2e expansion.

## e2e coverage matrix — runtimes × providers × features (no-regression audit, 2026-06-05) Evidence-based audit (read from origin/main of core+CP+templates; local checkouts were stale). Goal: every supported runtime + provider + feature has a **gating** e2e, so no silent regressions. ### Supported set (SSOT = controlplane providers.yaml) - **Runtimes (5):** claude-code, codex, hermes, openclaw, google-adk. (crewai/deepagents/gemini-cli/autogen/langgraph removed via migrations 031/034/035.) - **Providers×auth gated live by `ci / serving-e2e` (REQUIRED, CP):** anthropic-api, anthropic-oauth, openai-api, kimi-coding, minimax, google, vertex(WIF, conditional), platform, byok-{anthropic,openai,minimax,gemini}. ### CONFIRMED COVERAGE GAPS (prioritized) **P0 — gated-cell illusions / shipped-but-uncovered:** 1. **google-adk has NO live e2e arm.** Core `test_priority_runtimes_e2e.sh` default set = `mock claude-code codex hermes openclaw minimax` — google-adk absent. Also not asserted in CP `runtimes_test.go` SSOT map. We *just* registered its models (cp#568/#2327) with zero serving/runtime e2e. → add google-adk arm + SSOT assertion. 2. **openai-subscription (Codex OAuth) serving is gated NOWHERE.** CP serving-e2e arm #4 hard-skips it (`serving_e2e_test.go:233`); core priority-runtimes codex arm skips without E2E_OPENAI_API_KEY. Looks covered, isn't. → real Codex-backend serving e2e. 3. **groq: zero serving coverage** (no arm; key now provisioned in Infisical /shared/serving-e2e-keys) → add arm. **P0 — feature gaps that map to ACTUAL recent incidents:** 4. **Context-overflow autoheal / session-reset: ZERO e2e.** This is exactly what wedged Kimi tonight (262k overflow → manual restart). Must ship an e2e WITH `fix/context-overflow-autoheal`. 5. **Workspace data-volume survives-recreate + snapshot-before-swap (`/home/agent` wipe): no e2e.** Past incident `feedback_workspace_container_swap_wipes_home_agent`. 6. **SM/secrets.d real round-trip (two-sided IAM): no live e2e** (only userdata-string unit). Past incident cp#358. 7. **Desktop reconnect + 300s lease renewal (core#2216): no e2e.** **P1 — registry/SSOT drift (real regressions):** 8. **Core providers registry mirror STALE on `vertex`** — fingerprint 9d129c96(CP) vs e457249e(core); core lacks the keyless-WIF vertex (auth_mode wif_adc + endpoint_vars). Re-byte-sync providers.yaml + regen. (This is the #561 vertex-SSOT work.) 9. catalog-vs-runtime over-offer (billing-only model ids) + haiku id mismatch (claude-haiku-4-5 vs -4-5-20251001) + no template_registry row for codex/google-adk (not List-API-discoverable). **P1 — operational surface (impl exists, unit-only):** 10. Soft-restart/pause/resume/hibernate against a real container; live channel send (telegram/slack/discord/lark/email — email has no adapter test at all); channel discover; workspace data-prune (RFC#734). ### Structural finding Core's 3 REQUIRED gates (CI/all-required, E2E API Smoke, Handlers Postgres) include **no live-SaaS/staging e2e** — E2E Chat / Staging SaaS / Peer Visibility / Canvas / External / Reconciler are all advisory. The only live-runtime assurance in the required set is the `mock` backbone (real-LLM arms skip without secrets). Promote-to-required candidates exist (e2e-peer-visibility is already `continue-on-error:false`). Owners: P0.1/P0.3/P1.8 → in-flight PRs (this issue). P0.2/P0.4-7/P1.10 → scoped follow-up PRs, each adding a GATING test. Tracking umbrella for the no-regression e2e expansion.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#2332