feat(workspace-server): kill DefaultModel + require model at Create (CTO 2026-05-22 SSOT) #1667
Reference in New Issue
Block a user
Delete Branch "platform-kill-defaultmodel-require-model-at-create"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Why
Empirical trigger: Code Reviewer
5ba15d7ewas created with{"name":"Code Reviewer","runtime":"codex","tier":4,...}— no model field. The legacymodels.DefaultModel("codex")returned"anthropic:claude-opus-4-7". Codex adapter only supportsopenai-*providers, so the workspace wedged forever innot_configuredwith:PATCH
/workspaces/:idexplicitly disallows updatingmodel(workspace_crud.go:138—model not patchable), so the only recovery path was SQL UPDATE or delete+recreate. Five-team rollout uncovered this when the agents-team production-team failed at 4/5.SSOT directive
CTO 2026-05-22T03:42Z: model is REQUIRED user input at create. Platform does not provide a default. Runtime does not fall back. The decision belongs to the user (or to the agent acting on the user's behalf) — never to the platform. Soft fallbacks hide contract bugs; this is the canonical example.
Filed as user-level feedback memory
feedback_workspace_model_required_no_platform_default_dynamic_credential_intake.What
models.DefaultModel(runtime_defaults.go). The function is the bug magnet.code=MODEL_REQUIRED, runs AFTER template-resolution and the runtime-default-to-langgraph path. Same fail-closed shape as the existing controlplane#188 runtime-unresolved gate.ensureDefaultConfig: drop the DefaultModel call. Log loudly if empty model reaches here (defence-in-depth — the Create gate should have rejected upstream).org_import.go createWorkspaceTree: drop the DefaultModel call. Org templates MUST declaremodel:per-workspace or underdefaults:; otherwise the import errors instead of silently provisioning a runtime-incompatible workspace.TestCreate_ModelRequired_Returns422covers three trigger shapes (bare name; explicit codex+no-model; explicit hermes+no-model). All three return 422code=MODEL_REQUIRED.TestWorkspaceCreatenow passes explicit model;runtime_defaults_test.goreduced to a doc-comment file (compile-time check is the contract).Out of scope (follow-up commit on this branch)
~20 existing tests pin the OLD
DefaultModelcontract and now fail. Holding the test surgery for a second commit so reviewers can sanity-check the contract change against the inventory below before we churn 20+ tests.(a) Mechanical (add
"model":"..."to request body)Exercise unrelated invariants (parent_id, SSRF, secrets, tier, runtime label, etc.) and would pass once an explicit model is supplied:
TestWorkspaceCreate_WithParentIDTestWorkspaceCreate_ExplicitClaudeCodeRuntimeTestWorkspaceCreate_MaxConcurrentTasksOverrideTestWorkspaceCreate_DBInsertErrorTestWorkspaceCreate_DefaultsAppliedTestWorkspaceCreate_SaaSHardForcesTier4TestWorkspaceCreate_WithSecrets_PersistsTestWorkspaceCreate_SecretPersistFails_RollsBackTestWorkspaceCreate_EmptySecrets_OKTestWorkspaceCreate_ExternalURL_SSRFSafe/_SSRFMetadataBlocked/_SSRFLoopbackBlockedTestWorkspaceCreate_KimiRuntime_PreservesLabelTestWorkspaceCreate_188_ExplicitRuntimeNoTemplate_OKTestWorkspaceBudget_Create_WithLimit(b) Semantic — pin the OLD contract, need rewrite or deletion
TestEnsureDefaultConfig_Hermes/_ClaudeCode/_EmptyRuntimeDefaultsToClaudeCode— assert YAML contains the DefaultModel-derived string. Either pass explicit model in the payload, or drop the model-string assertion (YAML now reflects whatever the caller supplied).TestWorkspaceCreate_FirstDeploy_NoModel_NoSecretWritten— literally asserts that empty model is a 201 with no secrets written. Under the new contract empty model is a 422 with no DB writes at all. Invert the test: now asserts the 422 + no-DB-write outcome.TestWorkspaceCreate_188_NoTemplateNoRuntime_StillDefaultsLanggraph— premise was "bare{name}defaults to langgraph 201". Now bare{name}is a 422 because model is missing. Premise changes to "bare{name}fails MODEL_REQUIRED before runtime resolution".Memory
feedback_workspace_model_required_no_platform_default_dynamic_credential_intake— the contractfeedback_three_layer_data_responsibility_platform_base_adaptor— platform-half of the same principlefeedback_never_probe_destructive_http_verbs_on_prod— neighbouring incident from the same sessionTest plan
go build ./...cleango test ./internal/models/...— passgo test -run 'TestCreate_ModelRequired_Returns422' ./internal/handlers/— pass./internal/handlers/...greenCoordinated follow-ups
UPDATE workspaces SET model='gpt-5.5' WHERE id='5ba15d7e-fd2b-44a0-989b-e7fbab1bc1a8';+ restart. The Create-side fix here would prevent the recurrence but doesn't unbreak the already-stuck workspace.model not patchable(workspace_crud.go:138): re-enable model PATCH at least for the broken-codex-default recovery case. Likely follow-up PR.🤖 Generated with Claude Code
Empirical trigger: Code Reviewer 5ba15d7e was created with `{"name":"Code Reviewer","runtime":"codex","tier":4,...}` — no model field. The legacy `models.DefaultModel("codex")` returned `"anthropic:claude-opus-4-7"`. Codex adapter only supports openai-* providers, so the workspace wedged forever in `not_configured` with: ValueError: codex adapter: workspace config picks provider='anthropic' but it is not in the providers registry PATCH /workspaces/:id explicitly disallows updating `model` (workspace_crud.go:138 — `model not patchable`), so the only recovery path was SQL UPDATE or delete+recreate. CTO 2026-05-22T03:42Z SSOT directive: model is REQUIRED user input at create time. The platform must not provide a default. The runtime must not fall back. The decision belongs to the user (or to the agent acting on the user's behalf), never to the platform. Soft fallbacks hide contract bugs — this is the canonical example. Changes: - DELETE `models.DefaultModel` (runtime_defaults.go). The function is the bug magnet — every callsite that previously fell back to it now fails closed. - `handlers.WorkspaceHandler.Create` (workspace.go): add MODEL_REQUIRED gate AFTER template-resolution and AFTER the runtime-default-to-langgraph path. Caller gets 422 with `code=MODEL_REQUIRED` and a hint message pointing at the contract. Same fail-closed shape as the existing controlplane#188 runtime-unresolved gate. - `ensureDefaultConfig` (workspace_provision.go): drop the DefaultModel call. If we ever reach this path with empty model, log loudly and let the workspace boot into not_configured — defence-in-depth assertion; the Create gate should have rejected the request upstream. - `org_import.go createWorkspaceTree`: drop the DefaultModel call. Org templates MUST declare `model:` per-workspace or under `defaults:` — return an error otherwise. Stops malformed templates from silently provisioning runtime-incompatible workspaces. - Tests: add `TestCreate_ModelRequired_Returns422` covering the three trigger shapes (bare name; explicit codex+no-model; explicit hermes+no-model). Update `TestWorkspaceCreate` to pass an explicit model (was a bare-defaults 201 test — now requires explicit model to express the new contract). Reduce `runtime_defaults_test.go` to a doc-comment file (function doesn't exist anymore; compile-time check is the contract). Known blast radius — DELIBERATELY NOT FIXED IN THIS COMMIT, surfaced for triage: The CTO directive inverts the prior contract; ~20 existing tests pin the OLD behaviour and now fail. They fall into two buckets: (a) Mechanical: add `"model":"..."` to request body. These tests exercise unrelated invariants (parent_id, SSRF, secrets, tier, runtime-label preservation, etc.) and would still pass once they supply an explicit model. ~15 tests in handlers_additional_test.go, registry_test.go, handlers_test.go, etc. (b) Semantic: these tests pinned the OLD contract directly and need rewriting or deletion: - `TestEnsureDefaultConfig_Hermes` / `_ClaudeCode` / `_EmptyRuntimeDefaultsToClaudeCode` — assert the YAML output contains the DefaultModel-derived string. Either pass an explicit model in the payload, or drop the YAML model assertion (it now reflects whatever the caller supplied). - `TestWorkspaceCreate_FirstDeploy_NoModel_NoSecretWritten` — literally asserts that empty model is a 201, just with no secrets written. Under the new contract empty model is a 422 with no DB writes at all. Test should be inverted to assert the 422 + no-DB-write outcome. - `TestWorkspaceCreate_188_NoTemplateNoRuntime_StillDefaultsLanggraph` — premise was "bare {name} should default to langgraph 201". Now bare {name} is a 422 because model is missing. Premise changes to "bare {name} now fails MODEL_REQUIRED before runtime resolution". Holding the test surgery for a follow-up so reviewers can sanity-check the contract change against the inventory above before we churn 20+ tests. Marking DRAFT until that follow-up lands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>Follow-up to the prior commit (kill DefaultModel + 422 at Create when model missing). Fixes the two test buckets called out in the DRAFT PR description. (a) Mechanical: add `"model":"..."` to request bodies so tests can still reach the 201 path. These tests exercise unrelated invariants (parent_id wiring, claude-code runtime label, max_concurrent_tasks override, DB insert error rollback, defaults, SaaS tier-4 force, with- and-without secrets persistence/rollback, external SSRF gates, Kimi runtime label preservation, controlplane#188 explicit-runtime path, budget limit). Model strings picked from the same set the prior DefaultModel(runtime) would have returned (`sonnet` for claude-code, `anthropic:claude-opus-4-7` elsewhere) plus `gpt-5.5` for codex, `kimi-coding/kimi-k2-coding-6` for kimi, `external:custom` for the external runtime — so the tests exercise realistic shapes, not just a single placeholder. (b) Semantic — the three tests that pinned the OLD contract directly: - `TestEnsureDefaultConfig_Hermes` / `_ClaudeCode` / `_EmptyRuntimeDefaultsToClaudeCode` — the function no longer sources the default; it just renders what Create supplied. Each test now passes the model that the prior implicit DefaultModel branch would have returned, so the YAML assertion remains meaningful. Comment updated to explain the shift. - `TestWorkspaceCreate_FirstDeploy_NoModel_NoSecretWritten` was renamed `TestWorkspaceCreate_FirstDeploy_NoModel_Returns422` and its premise INVERTED. Pre-fix it asserted that empty model goes through to 201 with no workspace_secrets writes; post-fix it asserts the request 422s at the Create boundary with NO DB writes at all. The sqlmock has zero ExpectExec calls so a late-firing gate surfaces as "unexpected call to ExecQuery 'INSERT INTO workspaces'". Comment block calls out the inversion explicitly. - `TestWorkspaceCreate_188_NoTemplateNoRuntime_StillDefaultsLanggraph` — was "bare {name} routes to langgraph 201". Now split into two halves: bare body MUST 422 with MODEL_REQUIRED, and (with model supplied) the original langgraph default invariant still holds. The 422 body must show `runtime:"langgraph"` to prove the gate fires AFTER the langgraph-default branch — that ordering is load-bearing for the diagnostic clarity of the error. Verified: go test -count=1 -short -timeout 300s ./internal/handlers/... \ ./internal/models/... → OK The Create gate test (`TestCreate_ModelRequired_Returns422`) from the prior commit remains green; it covers the three trigger shapes from the empirical incident (bare name, explicit codex+no-model, explicit hermes+no-model). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>feat(workspace-server): kill DefaultModel + require model at Create (CTO 2026-05-22 SSOT) [DRAFT]to feat(workspace-server): kill DefaultModel + require model at Create (CTO 2026-05-22 SSOT)Update: cascade test fixes pushed in
1ece444e— all tests green locally (./internal/handlers/...+./internal/models/...). Marking ready for review. The (a)-mechanical and (b)-semantic-inversion blast radius from the original commit message is now fully addressed. Reviewers: please sanity-check the semantic-inversion calls inTestWorkspaceCreate_FirstDeploy_NoModel_Returns422,TestWorkspaceCreate_188_NoTemplateNoRuntime_StillDefaultsLanggraph, and the threeTestEnsureDefaultConfig_*against the SSOT directive.Follow-up to the kill-DefaultModel + Create-handler-gate commits. PR#1667 CI surfaced two real failures: - E2E Peer Visibility (local) - (test_claude_code_e2e.sh, depending on which workflow) Root cause: the MODEL_REQUIRED gate as initially written 422s every external workspace too. External workspaces intentionally do NOT spawn a Docker container or run an adapter (workspace_provision.go:497-498: "external is a first-class runtime that intentionally does NOT spawn a Docker container"). They delegate to a registered URL — model has no meaning for them; the URL is the contract. Requiring it would 422 every legitimate "register my agent at https://..." flow. Changes: - `handlers.WorkspaceHandler.Create` (workspace.go): exempt external workspaces from the MODEL_REQUIRED gate. Two spellings count as external — `payload.External == true` and `isExternalLikeRuntime(payload.Runtime)`. The helper is already used throughout the codebase for the same "is this a URL-delegated workspace" decision (discovery.go, registry.go, a2a_proxy_helpers.go, plugins.go, workspace_restart.go). - New test `TestCreate_ExternalRuntime_NoModel_OK`: pins the exemption. Uses the test_api.sh Echo Agent body shape verbatim — empty model, `runtime:"external","external":true` → 201. - Rewrote `TestWorkspaceCreate_188_NoTemplateNoRuntime_StillDefaultsLanggraph` to `_NowMODEL_REQUIRED` — the bare-body langgraph 201 path no longer exists; the new test asserts the 422 shape includes `runtime:"langgraph"` so the diagnostic still surfaces what runtime WOULD have been used (the gate fires AFTER the langgraph-default assignment). Bare-body-with-explicit-model 201 path is covered by the existing `TestWorkspaceCreate` (which I patched in the prior commit) — no duplicate needed here. - `TestWorkspaceCreate_FirstDeploy_NoModel_Returns422` body now uses `runtime:"hermes"` without `external:true` to ensure the gate actually fires (hermes spawns a real adapter; the previous `runtime:"hermes","external":true` shape would now exempt and 201). - `tests/e2e/test_peer_visibility_mcp_local.sh`: added `_model_for_runtime` helper at both create sites (PARENT + sibling) so the parent + per-runtime sibling workspaces declare an explicit model. Mapping mirrors what the deleted DefaultModel returned plus the gpt-5.5 / kimi / minimax slugs used in production. - `tests/e2e/test_claude_code_e2e.sh`: added `"model":"sonnet"` to both POST bodies (ROOT + CHILD). Both use `runtime:"claude-code"` which spawns a real adapter, not exempt. Other E2E scripts inspected: - `test_api.sh` — already uses `runtime:"external","external":true`, exempt by the new gate. No change. - `test_a2a_e2e.sh`:133, `test_activity_e2e.sh`:218, `test_dev_mode.sh`:72, `test_workspace_abilities_e2e.sh`:93,99, `test_comprehensive_e2e.sh`:78,105,134,139,144, `test_notify_attachments_e2e.sh`:95, `test_mcp_stdio_staging.sh`:76 — these create workspaces without an explicit runtime field. They fall through to the langgraph default in the Create handler and are NOT external. They will still 422 under the new contract; held for a separate cleanup commit so the diff stays scoped to the PR#1667 failing CI surface (Peer Visibility + claude-code flows). Tracked for follow-up. Verified: go test -count=1 -short -timeout 300s ./internal/handlers/... \ ./internal/models/... → OK Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>cp-be referenced this pull request2026-05-22 04:58:44 +00:00
5-axis review for molecule-core #1667 @
680434a:Correctness: APPROVED. The PR removes the DefaultModel fallback and enforces MODEL_REQUIRED at the Create boundary for spawned runtimes after template/runtime resolution, matching the stated SSOT. Org import now fails closed when neither workspace nor defaults supplies model.
Robustness: The gate runs before DB writes/provisioning and tests cover bare create, explicit runtime without model, template-derived model, external runtime exemption, and prior success paths updated with explicit models. ensureDefaultConfig now only logs if an impossible empty-model path reaches provisioning.
Security: No new secret exposure or URL trust expansion found. Existing external URL/SSRF tests remain present; external workspaces are exempt because URL is their contract and they do not spawn adapters.
Performance: No material runtime cost; added checks are simple string/config parsing on create/import paths only.
Readability: Comments are verbose but clearly document the incident and contract. The implementation is localized and preserves existing handler flow.
Peer 2nd-review per CTO carve-out. 5-axis lens clean; deferring to Code Reviewer (2) review_id=5614 (DefaultModel removal + MODEL_REQUIRED gate). BP unblock for merge.
/sop-n/a qa-review
/sop-n/a security-review