test(provision): SSOT-parametrized + real-boot regression for moonshot/kimi NOT_CONFIGURED #2197
Reference in New Issue
Block a user
Delete Branch "test/provider-matrix-boot-regression-moonshot"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Why
The moonshot/kimi incident: a canvas-created claude-code workspace with provider=Platform + model=
moonshot/kimi-k2.6booted NOT_CONFIGURED in production.ensureDefaultConfiggenerated aconfig.yamlthat lacked the manifest-derivedprovider:key, so the cp#329 config-bundle the adapter actually reads left molecule-runtime to slash-splitmoonshot/...→ unregistered provider. Fixed by #2187 (Fix A:ensureDefaultConfigstampsDeriveProvider→provider: platform) + #2188 (Fix C: canvas). Unit tests passed; the real boot path was the gap. This PR adds comprehensive regression coverage so the class cannot reship — it does not change production code.1. Current coverage map (what existed, what was missing)
moonshot/kimi-k2.6?main?TestEnsureDefaultConfig_StampsDerivedProvider(#2187)CI / Platform (Go), blocking)model_registry_validation_test.go(DeriveProvider)tests/e2e/test_staging_full_saas.shviae2e-staging-saas.ymlwait_workspaces_online_routable+ completion)pick_model_slugonly picks BYOK ids (MiniMax-M2/claude-sonnet-4-6/sonnet); never the platform armcontinue-on-error: true, mc#1982 mask)e2e-staging-canvas.ymlcontinue-on-error: true)internal/servinge2e(controlplane repo)ci/serving-e2ein CP, not coreinternal/staginge2e(controlplane repo)The gap, precisely: No suite provisioned a real workspace and asserted ONLINE for the platform-managed path. The single deterministic pin covered exactly one hardcoded combo and was not SSOT-driven, so a newly-offered platform model that failed to derive
provider: platformwould ship uncaught — the same offered-but-not-stamped divergence the original bug rode in on.2. The regression test added
Deterministic (no live infra — runs in the normal unit suite, mutation-verified)
workspace-server/internal/handlers/workspace_provision_platform_boot_test.goTestEnsureDefaultConfig_StampsProviderForEverySSOTPlatformModel— enumerates the claude-codeplatformarm directly from the providers SSOT (providers.LoadManifest, the same manifest the config generator derives against) and assertsensureDefaultConfigstampsprovider: platformat both top-level andruntime_configfor every offered platform model (today: 7 —anthropic/claude-opus-4-7,anthropic/claude-sonnet-4-6,moonshot/kimi-k2.6,moonshot/kimi-k2.5,minimax/MiniMax-M2.7,minimax/MiniMax-M2.7-highspeed,minimax/MiniMax-M3). Add a platform model → it gets a case for free and only passes if actually stamped. A headline sentinel assertsmoonshot/kimi-k2.6stays in the set.TestPlatformModelDeriveProvider_SSOTConsistency— the upstream half:DeriveProvidermaps every SSOT platform model to providerName == "platform", so a derive-layer regression fails closer to root cause.workspace_provision.gomakes the suite FAIL (proven locally, then reverted) — not a vacuous green.Real-boot staging variant (I will run it against staging)
Extends the existing staging harness rather than adding a new one:
tests/e2e/lib/model_slug.sh— newE2E_LLM_PATH=platformpath selects the platform model (defaultmoonshot/kimi-k2.6), precedence over BYOK key branches, still overridable byE2E_MODEL_SLUG.tests/e2e/test_staging_full_saas.sh— platform branch sends empty secrets (platform-managed needs no tenant key); the workspace must boot purely on the CP-proxy env + Fix A's stamped provider. Reuses the harness's existingwait_workspaces_online_routable(status=online, NOT not_configured) + completion assertions — keys off the real artifact.tests/e2e/test_model_slug.sh— 4 new pinned cases (16/16 green locally)..gitea/workflows/e2e-staging-saas.yml— newE2E Staging Platform Bootjob:E2E_RUNTIME=claude-code E2E_LLM_PATH=platform E2E_MODE=smoke, no LLM key, own teardown safety-net; addedproviders.yaml+model_slug.shto the path triggers.Run it (operator host / CI with staging creds):
3. Gate-making plan (make the comprehensive suites merge-blocking)
Already blocking (no action): the deterministic suite rides
CI / Platform (Go)(continue-on-error: false) — the two new tests block on merge immediately.To make the real-boot staging gate blocking — de-flake FIRST, then flip:
e2e-staging-saas.ymlE2E Staging Platform Boot(the new job)continue-on-error: true,bp-required: pending #2187main: removecontinue-on-error: true, addCI/... E2E Staging Platform Boot (push)(and(pull_request)) to branch protection, flip directive tobp-required: yes.e2e-staging-saas.ymlE2E Staging SaaS(existing BYOK)continue-on-error: true(mc#1982 mask)De-flake prerequisites (do NOT gate on flake): this path shares the known cp#245 boot-timeout flake surface (stale-ECR-digest 30-min boot-death misread as flake —
project_runtime_image_pin_stale_digest_root_cause) and theE2E Staging SaaSflake noted inreference_flaky_e2e_staging_saas. Confirm a fresh runtime image pin + 3 clean runs before flipping. Thecontinue-on-error: truemasks here are tracked under mc#1982 ("root-fix and remove, do not renew silently") — this PR adds one new mask with a tracked flip plan rather than leaving the gate silent.No new flake introduced: the deterministic suite is pure/offline/deterministic and can gate today.
Self-test (no live infra)
go build ./...✓ ·go vet ./internal/handlers ./internal/providers✓gofmt -lclean ·shellcheckclean (all 3 bash files) · workflow YAML parsespick_model_slugbash unit tests: 16/16 PASSlint_required_context_exists_in_bp.find_directive_for_jobreturns('required-pending','2187')for the new job ✓Needs your staging run: the
E2E Staging Platform Bootjob (real EC2 + online + completion 200).🤖 Generated with Claude Code
Owner force-merged (honest bypass). RFC#340 coverage: SSOT-driven provider-matrix boot regression — deterministic test (all 7 claude-code platform models stamp provider:platform, mutation-verified, gates via CI/Platform Go) + real-boot staging job. All 3 REQUIRED contexts green. The E2E Staging SaaS (full lifecycle) red is NON-required + pre-existing (a staging-only A2A empty-content issue on reasoning models, identical on main scheduled run, NOT #2197-caused; prod LLM serving verified healthy — kimi+minimax return real content). Token revoked.