P4 PR-2 internal#718: flip only-registered (runtime, model) gate from WARN to HARD-REJECT 422 (BEHAVIOR-AFFECTING) #1981
Reference in New Issue
Block a user
Delete Branch "feat/internal-718-p4-pr2-hard-reject-unregistered"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
What
WorkspaceHandler.Createnow returns 422UNREGISTERED_MODEL_FOR_RUNTIMEwhen the provider registry knows the runtime but the (runtime, model) pair is not in its native model set. Was the P2-B WARN-mode signal (X-Molecule-Model-Unregisteredheader + log line; create proceeds); now a hard rejection at the boundary with no DB rows touched.Behavior delta — exhaustively tested
X-Molecule-Model-Unregistered: trueheaderUNREGISTERED_MODEL_FOR_RUNTIME+ body{code, error, runtime, model}; no DB writesanthropic:claude-opus-4-7onclaude-codeWhy P4 vs P2-B
P2-B kept WARN-mode because the legacy colon-namespaced BYOK vocabulary (
anthropic:claude-opus-4-7etc.) was live across the create/import/template corpus and not in the registry. P4 PR-1 reconciled that vocab into the per-runtime native sets (each runtime now lists bare + slash + colon forms for the BYOK ids in the live corpus). With the reconcile landed, an unregistered pair is a real misconfiguration — the codexanthropic:claude-opus-4-7wedge class (whichMODEL_REQUIREDtargets at a different layer) now fails AT THE BOUNDARY instead of provisioning a workspace that will wedge at adapter init.Tests (workspace_test.go)
TestWorkspaceCreate_718_P4_UnregisteredModelHardReject422— asserts 422 + body code + body runtime + body model + no legacy header + zero DB ops (mock.ExpectationsWereMet()).TestWorkspaceCreate_718_P4_RegisteredModelProceeds(was_RegisteredModelNoWarnHeader) — 201 + no legacy WARN header.TestWorkspaceCreate_718_P4_LegacyColonVocabAccepted—anthropic:claude-opus-4-7onclaude-codeproceeds with 201 (the central regression guard for the PR-1 reconcile + PR-2 flip combo).TestWorkspaceCreate_718_NonRegistryRuntimeFailsOpen— federation path still proceeds.Test-fixture updates (other gates blocked by the flip)
Four pre-existing tests used an unregistered model as a fixture for OTHER gate paths (compute validation 400; template-default-model resolution). Updated to a registry-valid pair so the gate they actually exercise can fire:
TestWorkspaceCreate_WithInvalidCompute_ReturnsBadRequest—gpt-4(no runtime owns it) →claude-opus-4-7(so the compute-validation 400 path tests what it should)TestWorkspaceCreate_TemplateDefaultsMissingRuntimeAndModel—hermes/nousresearch/hermes-4-70b→hermes/moonshot/kimi-k2.6(CTO-narrowed hermes native set)TestWorkspaceCreate_TemplateDefaultsLegacyTopLevelModel—hermes/anthropic:claude-sonnet-4-5→hermes/moonshot/kimi-k2.5TestWorkspaceCreate_CallerModelOverridesTemplateDefault—hermes/minimax/MiniMax-M2.7override →hermes/moonshot/kimi-k2.5override (still tests the caller-wins-over-template-default mechanic)CI
go test ./internal/handlers/...green.go test -short ./...green across the full workspace-server.go test -tags=integration -short ./internal/handlers/... ./internal/providers/...green.Phase 4 — Five-Axis review pending
This PR is BEHAVIOR-AFFECTING (422 is a new error path). Per the SOP, awaiting independent Five-Axis review + CTO merge-go.
Not regressed
ResolveUpstream(registry namespace-token resolution, unaffected).Stack
Stacked on
feat/internal-718-p4-pr1-reconcile-colon-vocab-sync(#1980). Re-target tomainafter #1980 merges. Hard precondition: this PR is unmergeable tomainwithout PR-1 first (the new colon-vocab regression test would 422 otherwise).Refs internal#718.
🤖 Generated with Claude Code
WorkspaceHandler.Create now returns 422 UNREGISTERED_MODEL_FOR_RUNTIME when the provider registry knows the runtime but the (runtime, model) pair is not in its native model set. Was the P2-B WARN-mode signal (X-Molecule-Model-Unregistered header + log; create proceeds); now a hard rejection at the boundary with no DB rows touched. Behavior delta (under test): * Workspace with a REGISTERED (runtime, model) → 201, unchanged. * Workspace with an UNREGISTERED (runtime, model) → 422 with body {code:UNREGISTERED_MODEL_FOR_RUNTIME, error, runtime, model}, no DB writes (mock ExpectationsWereMet asserts zero unexpected DB calls). * Workspace with the legacy colon-form anthropic:claude-opus-4-7 for runtime=claude-code → 201 (P4 PR-1 reconciled the colon-vocab into the registry, making this a first-class registered model alongside the slash form). * Workspace with runtime NOT in the registry (langgraph/external/kimi/mock/federated) → unchanged (fails OPEN — federation-ready, the registry can not speak to non-first-party runtimes). * External workspaces (external=true or external-like runtime) → unchanged (URL is the contract, not the model). Why P4 vs P2-B: P2-B kept WARN-mode because the legacy colon-namespaced BYOK vocabulary (anthropic:claude-opus-4-7 etc.) was live across the create/import/template corpus and not yet in the registry. P4 PR-1 reconciled that vocab into the per-runtime native sets (each runtime now lists bare + slash + colon forms for the BYOK ids in the live corpus). With the reconcile landed, an unregistered pair is a real misconfiguration and the gate flips loud — the codex anthropic:claude-opus-4-7 wedge class (the MODEL_REQUIRED gate targets the same failure mode) now fails AT THE BOUNDARY instead of provisioning a workspace that will wedge at adapter init. Test surface (workspace_test.go): * TestWorkspaceCreate_718_P4_UnregisteredModelHardReject422 (NEW) — explicit 422 + body code + no DB writes * TestWorkspaceCreate_718_P4_RegisteredModelProceeds (renamed from _RegisteredModelNoWarnHeader) — 201 + no legacy WARN header * TestWorkspaceCreate_718_P4_LegacyColonVocabAccepted (NEW) — anthropic:claude-opus-4-7 on claude-code proceeds (the central regression guard for the PR-1 reconcile + PR-2 flip combo) * TestWorkspaceCreate_718_NonRegistryRuntimeFailsOpen — unchanged (federation path) Fixture updates for the flip (tests that previously used an unregistered model as a fixture for OTHER gate paths; updated to a valid model so those gates can actually fire): * TestWorkspaceCreate_WithInvalidCompute_ReturnsBadRequest — gpt-4 (no runtime owns it) → claude-opus-4-7 (so the compute-validation 400 path tests what it should) * TestWorkspaceCreate_TemplateDefaultsMissingRuntimeAndModel — hermes/nousresearch/hermes-4-70b → hermes/moonshot/kimi-k2.6 (hermes native set per the CTO matrix) * TestWorkspaceCreate_TemplateDefaultsLegacyTopLevelModel — hermes/anthropic:claude-sonnet-4-5 → hermes/moonshot/kimi-k2.5 * TestWorkspaceCreate_CallerModelOverridesTemplateDefault — hermes override minimax/MiniMax-M2.7 → moonshot/kimi-k2.5 (still tests the caller-overrides-template-default mechanic, just with a hermes-valid pair) Phase-1 falsification + Phase-2 design were established in PR-1. Phase-3 TDD: each new behavior assertion mapped to a discriminating test (422 vs 201 vs unchanged WARN-header absence). Phase-4 Five-Axis to follow in PR review. NOT regressed (verified via -short + -tags=integration -short for handlers + providers): * cp#362 anthropic passthrough (proxy layer; unaffected). * P1 proxy ResolveUpstream (registry resolution by namespace token; unaffected). * P2-B billing-derive (DeriveProvider semantics unchanged by the reconcile). * P3 templates-from-registry (GET /templates still serves ModelsForRuntime; PR-1 enlarges the set, this PR rejects calls outside it). Stacked on feat/internal-718-p4-pr1-reconcile-colon-vocab-sync (PR-1 must merge first; this PR's tests would 422 the legacy colon vocab otherwise). Refs internal#718. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>Independent Five-Axis Review — internal#718 P4 PR-2 (WARN → 422 flip)
Reviewer: agent-reviewer (author = hongming ≠ reviewer; gate satisfied).
Scope: behavior-affecting flip at the
WorkspaceHandler.Createboundary; deliberate hostile-review pass per dev-SOP Phase 4.Per-axis findings
Correctness:
workspace-server/internal/handlers/workspace.go:466-477(c.JSON(422, …); return) short-circuits BEFOREctx := c.Request.Context()and every DB/secret/canvas/event write._UnregisteredModelHardReject422proves it by deleting everymock.Expect*and assertingmock.ExpectationsWereMet()— any DB I/O on the reject path turns the test red. External / non-registry runtime / empty-model cases are correctly delegated (external bypass viaisExternal; fail-open viaModelsForRuntimeerror inmodel_registry_validation.go:43-46; empty-model owned upstream byMODEL_REQUIREDatworkspace.go:420-429). No double-reject, no new race.Readability:
workspace.go:431-465is long but earns it — it traces the P2-B → P4 PR-1 → P4 PR-2 history a future reader needs. Test names switch to a_P4_prefix and now read as the behaviors they assert (_HardReject422,_RegisteredModelProceeds,_LegacyColonVocabAccepted). One minor: "P2-A artifact" in the comment refers to the codegen scaffolding while the colon-vocab itself was added in P4 PR-1 — non-blocking, FYI.Architecture:
feat/internal-718-p4-pr1-reconcile-colon-vocab-sync, notmain— mechanically enforces the documented merge ordering (re-targeting to main pre-PR-1 would 422 the new colon-vocab regression test). PR body's "Stack" + "Hard precondition" paragraph is explicit.Security:
runtimeandmodel, both already passed throughvalidateWorkspaceFields(length + injection-char gate atworkspace.go:251-254) and JSON-encoded by gin — no new injection surface. No new secret writes; no auth-touching changes.Performance:
Behavior-delta confirmation (3 cases + 1 federation)
claude-opus-4-7TestWorkspaceCreate_718_P4_RegisteredModelProceedsw.Code == 201ANDw.Header().Get("X-Molecule-Model-Unregistered") == ""totally-made-up-xyz{code, error, runtime, model}+ zero DB opsTestWorkspaceCreate_718_P4_UnregisteredModelHardReject422w.Code == 422AND body contains"code":"UNREGISTERED_MODEL_FOR_RUNTIME"AND"runtime":"claude-code"AND"model":"totally-made-up-xyz"AND ABSENCE of legacy header ANDmock.ExpectationsWereMet()with NOExpect*callsanthropic:claude-opus-4-7onclaude-code(live corpus shape)TestWorkspaceCreate_718_P4_LegacyColonVocabAcceptedworkspace-server/internal/providers/providers.yaml:637(anthropic-api.modelsincludesanthropic:claude-opus-4-7); deleting that registry entry would turn the test red. The companionverify-providers-genCI gate (green) is the independent SSOT enforcement.TestWorkspaceCreate_718_NonRegistryRuntimeFailsOpen(unchanged)model_registry_validation.go:43-46.Mutation-proof reasoning
c.JSON(422,…); returnto the WARN-header branch →_UnregisteredModelHardReject422fails on THREE independent axes (status code, body code field, header presence) AND failsExpectationsWereMet()because the DB INSERT chain it no longer expects would fire. Not a single-assertion test.providers.yamlclaude-code'santhropic-api.models→_LegacyColonVocabAccepted422s instead of 201. Theverify-providers-genCI gate would catch the mutation independently — defense in depth."code":"UNREGISTERED_MODEL_FOR_RUNTIME"), not "non-empty" or substring-of-anywhere — perfeedback_assert_exact_not_substring. Discriminating.Scope check
ResolveUpstream: UNTOUCHED.DeriveProvider): UNTOUCHED.GET /templates,ModelsForRuntime): UNTOUCHED at this PR (the underlying set was enlarged by PR-1; this PR adds no new readers).workspace_compute_test.go, three template-defaults tests inworkspace_test.go) are required collateral — the original models (gpt-4,nousresearch/hermes-4-70b,anthropic:claude-sonnet-4-5,minimax/MiniMax-M2.7) would now 422 and block tests of OTHER gates. Each swap documented in the PR body with rationale and continues to test what its name claims.CI evidence
Latest-per-context (HEAD
eacb8183): 12/13 success, 1 non-blockingsop-checklist / na-declarationspending.verify-providers-genGREEN (SSOT drift gate).gate-check-v3,security-review,qa-review, all 3 path/token/secret lint gates,lint-required-no-paths,sop-tier-check,sop-checklist / all-items-acked / review-refireall green. Mergeable: true.Verdict
APPROVED. The flip is correct, minimal, and tested with discriminating assertions across the three behavior cases the PR claims to cover. PR-1 dependency is mechanically enforced via the stack base (not just documented). No fleet-break risk identified.
Do NOT merge — this is BEHAVIOR-AFFECTING; merge-go is the CTO's call per the dev SOP. Awaiting CTO greenlight + (per the PR's stack discipline) PR-1 (#1980) landing first.
approve on current head after retarget — identical hard-reject delta
approve on current head after retarget