P4 PR-2 internal#718: flip only-registered (runtime, model) gate from WARN to HARD-REJECT 422 (BEHAVIOR-AFFECTING) #1981

2026-05-28T03:23:05Z

hongming commented

2026-05-28 03:23:05 +00:00

What

WorkspaceHandler.Create now returns 422 UNREGISTERED_MODEL_FOR_RUNTIME when the provider registry knows the runtime but the (runtime, model) pair is not in its native model set. Was the P2-B WARN-mode signal (X-Molecule-Model-Unregistered header + log line; create proceeds); now a hard rejection at the boundary with no DB rows touched.

Behavior delta — exhaustively tested

Input	Pre-P4 (WARN)	P4 PR-2 (HARD)
Workspace with REGISTERED (runtime, model)	201	201 (unchanged)
Workspace with UNREGISTERED (runtime, model)	201 + `X-Molecule-Model-Unregistered: true` header	422 `UNREGISTERED_MODEL_FOR_RUNTIME` + body `{code, error, runtime, model}`; no DB writes
Workspace with legacy colon-form `anthropic:claude-opus-4-7` on `claude-code`	201 + WARN header (was unregistered)	201, no warning (PR-1 reconciled the colon vocab into the registry)
Workspace with runtime NOT in registry (langgraph/external/kimi/mock/federated)	201, gate skipped	201, gate skipped (federation fails OPEN, unchanged)
External workspaces (external=true / external-like runtime)	201, gate skipped	201, gate skipped (URL is the contract, unchanged)

Why P4 vs P2-B

P2-B kept WARN-mode because the legacy colon-namespaced BYOK vocabulary (anthropic:claude-opus-4-7 etc.) was live across the create/import/template corpus and not in the registry. P4 PR-1 reconciled that vocab into the per-runtime native sets (each runtime now lists bare + slash + colon forms for the BYOK ids in the live corpus). With the reconcile landed, an unregistered pair is a real misconfiguration — the codex anthropic:claude-opus-4-7 wedge class (which MODEL_REQUIRED targets at a different layer) now fails AT THE BOUNDARY instead of provisioning a workspace that will wedge at adapter init.

Tests (workspace_test.go)

NEW TestWorkspaceCreate_718_P4_UnregisteredModelHardReject422 — asserts 422 + body code + body runtime + body model + no legacy header + zero DB ops (mock.ExpectationsWereMet()).
RENAMED + TIGHTENED TestWorkspaceCreate_718_P4_RegisteredModelProceeds (was _RegisteredModelNoWarnHeader) — 201 + no legacy WARN header.
NEW TestWorkspaceCreate_718_P4_LegacyColonVocabAccepted — anthropic:claude-opus-4-7 on claude-code proceeds with 201 (the central regression guard for the PR-1 reconcile + PR-2 flip combo).
UNCHANGED TestWorkspaceCreate_718_NonRegistryRuntimeFailsOpen — federation path still proceeds.

Test-fixture updates (other gates blocked by the flip)

Four pre-existing tests used an unregistered model as a fixture for OTHER gate paths (compute validation 400; template-default-model resolution). Updated to a registry-valid pair so the gate they actually exercise can fire:

TestWorkspaceCreate_WithInvalidCompute_ReturnsBadRequest — gpt-4 (no runtime owns it) → claude-opus-4-7 (so the compute-validation 400 path tests what it should)
TestWorkspaceCreate_TemplateDefaultsMissingRuntimeAndModel — hermes/nousresearch/hermes-4-70b → hermes/moonshot/kimi-k2.6 (CTO-narrowed hermes native set)
TestWorkspaceCreate_TemplateDefaultsLegacyTopLevelModel — hermes/anthropic:claude-sonnet-4-5 → hermes/moonshot/kimi-k2.5
TestWorkspaceCreate_CallerModelOverridesTemplateDefault — hermes/minimax/MiniMax-M2.7 override → hermes/moonshot/kimi-k2.5 override (still tests the caller-wins-over-template-default mechanic)

CI

go test ./internal/handlers/... green.
go test -short ./... green across the full workspace-server.
go test -tags=integration -short ./internal/handlers/... ./internal/providers/... green.

Phase 4 — Five-Axis review pending

This PR is BEHAVIOR-AFFECTING (422 is a new error path). Per the SOP, awaiting independent Five-Axis review + CTO merge-go.

Not regressed

cp#362 anthropic passthrough (proxy layer, unaffected).
P1 proxy ResolveUpstream (registry namespace-token resolution, unaffected).
P2-B billing-derive (DeriveProvider semantics unchanged).
P3 templates-from-registry (GET /templates still serves ModelsForRuntime; PR-1 enlarged the set, this PR rejects calls outside it).
Federation path (non-registry runtimes fail open, unchanged).

Stack

Stacked on feat/internal-718-p4-pr1-reconcile-colon-vocab-sync (#1980). Re-target to main after #1980 merges. Hard precondition: this PR is unmergeable to main without PR-1 first (the new colon-vocab regression test would 422 otherwise).

Refs internal#718.

🤖 Generated with Claude Code

## What `WorkspaceHandler.Create` now returns **422 `UNREGISTERED_MODEL_FOR_RUNTIME`** when the provider registry knows the runtime but the (runtime, model) pair is not in its native model set. Was the P2-B WARN-mode signal (`X-Molecule-Model-Unregistered` header + log line; create proceeds); now a hard rejection at the boundary with no DB rows touched. ## Behavior delta — exhaustively tested | Input | Pre-P4 (WARN) | P4 PR-2 (HARD) | |-------|--------------|----------------| | Workspace with REGISTERED (runtime, model) | 201 | 201 (unchanged) | | Workspace with UNREGISTERED (runtime, model) | 201 + `X-Molecule-Model-Unregistered: true` header | **422 `UNREGISTERED_MODEL_FOR_RUNTIME`** + body `{code, error, runtime, model}`; no DB writes | | Workspace with legacy colon-form `anthropic:claude-opus-4-7` on `claude-code` | 201 + WARN header (was unregistered) | 201, no warning (PR-1 reconciled the colon vocab into the registry) | | Workspace with runtime NOT in registry (langgraph/external/kimi/mock/federated) | 201, gate skipped | 201, gate skipped (federation fails OPEN, unchanged) | | External workspaces (external=true / external-like runtime) | 201, gate skipped | 201, gate skipped (URL is the contract, unchanged) | ## Why P4 vs P2-B P2-B kept WARN-mode because the legacy colon-namespaced BYOK vocabulary (`anthropic:claude-opus-4-7` etc.) was live across the create/import/template corpus and not in the registry. P4 PR-1 reconciled that vocab into the per-runtime native sets (each runtime now lists bare + slash + colon forms for the BYOK ids in the live corpus). With the reconcile landed, an unregistered pair is a real misconfiguration — the codex `anthropic:claude-opus-4-7` wedge class (which `MODEL_REQUIRED` targets at a different layer) now fails AT THE BOUNDARY instead of provisioning a workspace that will wedge at adapter init. ## Tests (workspace_test.go) - **NEW** `TestWorkspaceCreate_718_P4_UnregisteredModelHardReject422` — asserts 422 + body code + body runtime + body model + no legacy header + zero DB ops (`mock.ExpectationsWereMet()`). - **RENAMED + TIGHTENED** `TestWorkspaceCreate_718_P4_RegisteredModelProceeds` (was `_RegisteredModelNoWarnHeader`) — 201 + no legacy WARN header. - **NEW** `TestWorkspaceCreate_718_P4_LegacyColonVocabAccepted` — `anthropic:claude-opus-4-7` on `claude-code` proceeds with 201 (the central regression guard for the PR-1 reconcile + PR-2 flip combo). - **UNCHANGED** `TestWorkspaceCreate_718_NonRegistryRuntimeFailsOpen` — federation path still proceeds. ## Test-fixture updates (other gates blocked by the flip) Four pre-existing tests used an unregistered model as a fixture for OTHER gate paths (compute validation 400; template-default-model resolution). Updated to a registry-valid pair so the gate they actually exercise can fire: - `TestWorkspaceCreate_WithInvalidCompute_ReturnsBadRequest` — `gpt-4` (no runtime owns it) → `claude-opus-4-7` (so the compute-validation 400 path tests what it should) - `TestWorkspaceCreate_TemplateDefaultsMissingRuntimeAndModel` — `hermes/nousresearch/hermes-4-70b` → `hermes/moonshot/kimi-k2.6` (CTO-narrowed hermes native set) - `TestWorkspaceCreate_TemplateDefaultsLegacyTopLevelModel` — `hermes/anthropic:claude-sonnet-4-5` → `hermes/moonshot/kimi-k2.5` - `TestWorkspaceCreate_CallerModelOverridesTemplateDefault` — `hermes/minimax/MiniMax-M2.7` override → `hermes/moonshot/kimi-k2.5` override (still tests the caller-wins-over-template-default mechanic) ## CI - `go test ./internal/handlers/...` green. - `go test -short ./...` green across the full workspace-server. - `go test -tags=integration -short ./internal/handlers/... ./internal/providers/...` green. ## Phase 4 — Five-Axis review pending This PR is BEHAVIOR-AFFECTING (422 is a new error path). Per the SOP, awaiting independent Five-Axis review + CTO merge-go. ## Not regressed - cp#362 anthropic passthrough (proxy layer, unaffected). - P1 proxy `ResolveUpstream` (registry namespace-token resolution, unaffected). - P2-B billing-derive (DeriveProvider semantics unchanged). - P3 templates-from-registry (GET /templates still serves ModelsForRuntime; PR-1 enlarged the set, this PR rejects calls outside it). - Federation path (non-registry runtimes fail open, unchanged). ## Stack Stacked on `feat/internal-718-p4-pr1-reconcile-colon-vocab-sync` (#1980). Re-target to `main` after #1980 merges. Hard precondition: this PR is unmergeable to `main` without PR-1 first (the new colon-vocab regression test would 422 otherwise). Refs internal#718. 🤖 Generated with [Claude Code](https://claude.com/claude-code)

hongming added 1 commit 2026-05-28 03:23:06 +00:00

P4 PR-2 internal#718: flip only-registered (runtime, model) gate from WARN to HARD-REJECT 422 (BEHAVIOR-AFFECTING)

ci-arm64-advisory / fast-checks (pull_request) Waiting to run

Details

Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 12s

Details

Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 5s

Details

CI / Detect changes (pull_request) Successful in 10s

Details

CI / Python Lint & Test (pull_request) Successful in 4s

Details

E2E API Smoke Test / detect-changes (pull_request) Successful in 6s

Details

E2E Chat / detect-changes (pull_request) Successful in 7s

Details

E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (pull_request) Has been skipped

Details

E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 8s

Details

E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (local) (pull_request) Successful in 50s

Details

Handlers Postgres Integration / detect-changes (pull_request) Successful in 6s

Details

Harness Replays / detect-changes (pull_request) Successful in 4s

Details

Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 4s

Details

Lint no tenant GITEA or GITHUB token write / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 4s

Details

Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s

Details

lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m18s

Details

gate-check-v3 / gate-check (pull_request) Successful in 7s

Details

verify-providers-gen / Regenerate providers artifact and fail on drift (pull_request) Successful in 28s

Details

qa-review / approved (pull_request) Successful in 9s

Details

sop-checklist / na-declarations (pull_request) N/A: (none)

Details

sop-checklist / all-items-acked (pull_request) Successful in 5s

Details

security-review / approved (pull_request) Failing after 6s

Details

sop-checklist / review-refire (pull_request) Has been skipped

Details

sop-tier-check / tier-check (pull_request) Successful in 4s

Details

CI / Canvas (Next.js) (pull_request) Successful in 6s

Details

CI / Shellcheck (E2E scripts) (pull_request) Successful in 4s

Details

E2E Chat / E2E Chat (pull_request) Successful in 5s

Details

E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 14s

Details

E2E Staging External Runtime / E2E Staging External Runtime (pull_request) Successful in 5m20s

Details

E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 1m51s

Details

Harness Replays / Harness Replays (pull_request) Successful in 6s

Details

Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2m30s

Details

CI / Canvas Deploy Reminder (pull_request) Has been skipped

Details

CI / Platform (Go) (pull_request) Successful in 6m10s

Details

CI / all-required (pull_request) Successful in 12m14s

Details

audit-force-merge / audit (pull_request) Successful in 5s

Details

eacb8183c3

WorkspaceHandler.Create now returns 422 UNREGISTERED_MODEL_FOR_RUNTIME when the provider registry knows the runtime but the (runtime, model) pair is not in its native model set. Was the P2-B WARN-mode signal (X-Molecule-Model-Unregistered header + log; create proceeds); now a hard rejection at the boundary with no DB rows touched.

Behavior delta (under test):
  * Workspace with a REGISTERED (runtime, model) → 201, unchanged.
  * Workspace with an UNREGISTERED (runtime, model) → 422 with body
    {code:UNREGISTERED_MODEL_FOR_RUNTIME, error, runtime, model}, no DB writes (mock ExpectationsWereMet asserts zero unexpected DB calls).
  * Workspace with the legacy colon-form anthropic:claude-opus-4-7 for runtime=claude-code → 201 (P4 PR-1 reconciled the colon-vocab into the registry, making this a first-class registered model alongside the slash form).
  * Workspace with runtime NOT in the registry (langgraph/external/kimi/mock/federated) → unchanged (fails OPEN — federation-ready, the registry can not speak to non-first-party runtimes).
  * External workspaces (external=true or external-like runtime) → unchanged (URL is the contract, not the model).

Why P4 vs P2-B: P2-B kept WARN-mode because the legacy colon-namespaced BYOK vocabulary (anthropic:claude-opus-4-7 etc.) was live across the create/import/template corpus and not yet in the registry. P4 PR-1 reconciled that vocab into the per-runtime native sets (each runtime now lists bare + slash + colon forms for the BYOK ids in the live corpus). With the reconcile landed, an unregistered pair is a real misconfiguration and the gate flips loud — the codex anthropic:claude-opus-4-7 wedge class (the MODEL_REQUIRED gate targets the same failure mode) now fails AT THE BOUNDARY instead of provisioning a workspace that will wedge at adapter init.

Test surface (workspace_test.go):
  * TestWorkspaceCreate_718_P4_UnregisteredModelHardReject422 (NEW) — explicit 422 + body code + no DB writes
  * TestWorkspaceCreate_718_P4_RegisteredModelProceeds (renamed from _RegisteredModelNoWarnHeader) — 201 + no legacy WARN header
  * TestWorkspaceCreate_718_P4_LegacyColonVocabAccepted (NEW) — anthropic:claude-opus-4-7 on claude-code proceeds (the central regression guard for the PR-1 reconcile + PR-2 flip combo)
  * TestWorkspaceCreate_718_NonRegistryRuntimeFailsOpen — unchanged (federation path)

Fixture updates for the flip (tests that previously used an unregistered model as a fixture for OTHER gate paths; updated to a valid model so those gates can actually fire):
  * TestWorkspaceCreate_WithInvalidCompute_ReturnsBadRequest — gpt-4 (no runtime owns it) → claude-opus-4-7 (so the compute-validation 400 path tests what it should)
  * TestWorkspaceCreate_TemplateDefaultsMissingRuntimeAndModel — hermes/nousresearch/hermes-4-70b → hermes/moonshot/kimi-k2.6 (hermes native set per the CTO matrix)
  * TestWorkspaceCreate_TemplateDefaultsLegacyTopLevelModel — hermes/anthropic:claude-sonnet-4-5 → hermes/moonshot/kimi-k2.5
  * TestWorkspaceCreate_CallerModelOverridesTemplateDefault — hermes override minimax/MiniMax-M2.7 → moonshot/kimi-k2.5 (still tests the caller-overrides-template-default mechanic, just with a hermes-valid pair)

Phase-1 falsification + Phase-2 design were established in PR-1. Phase-3 TDD: each new behavior assertion mapped to a discriminating test (422 vs 201 vs unchanged WARN-header absence). Phase-4 Five-Axis to follow in PR review.

NOT regressed (verified via -short + -tags=integration -short for handlers + providers):
  * cp#362 anthropic passthrough (proxy layer; unaffected).
  * P1 proxy ResolveUpstream (registry resolution by namespace token; unaffected).
  * P2-B billing-derive (DeriveProvider semantics unchanged by the reconcile).
  * P3 templates-from-registry (GET /templates still serves ModelsForRuntime; PR-1 enlarges the set, this PR rejects calls outside it).

Stacked on feat/internal-718-p4-pr1-reconcile-colon-vocab-sync (PR-1 must merge first; this PR's tests would 422 the legacy colon vocab otherwise).

Refs internal#718.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

agent-reviewer approved these changes 2026-05-28 03:41:14 +00:00

Dismissed

agent-reviewer left a comment

Independent Five-Axis Review — internal#718 P4 PR-2 (WARN → 422 flip)

Reviewer: agent-reviewer (author = hongming ≠ reviewer; gate satisfied).
Scope: behavior-affecting flip at the WorkspaceHandler.Create boundary; deliberate hostile-review pass per dev-SOP Phase 4.

Per-axis findings

Correctness:

No finding because: the flip at workspace-server/internal/handlers/workspace.go:466-477 (c.JSON(422, …); return) short-circuits BEFORE ctx := c.Request.Context() and every DB/secret/canvas/event write. _UnregisteredModelHardReject422 proves it by deleting every mock.Expect* and asserting mock.ExpectationsWereMet() — any DB I/O on the reject path turns the test red. External / non-registry runtime / empty-model cases are correctly delegated (external bypass via isExternal; fail-open via ModelsForRuntime error in model_registry_validation.go:43-46; empty-model owned upstream by MODEL_REQUIRED at workspace.go:420-429). No double-reject, no new race.

Readability:

Optional (defer): the new comment block at workspace.go:431-465 is long but earns it — it traces the P2-B → P4 PR-1 → P4 PR-2 history a future reader needs. Test names switch to a _P4_ prefix and now read as the behaviors they assert (_HardReject422, _RegisteredModelProceeds, _LegacyColonVocabAccepted). One minor: "P2-A artifact" in the comment refers to the codegen scaffolding while the colon-vocab itself was added in P4 PR-1 — non-blocking, FYI.

Architecture:

No finding because: pure boundary-flip at the same handler; no layer creep, no new abstraction. Federation contract preserved (registry-unknown runtimes fail open through the existing helper). Stack discipline correct: PR base = feat/internal-718-p4-pr1-reconcile-colon-vocab-sync, not main — mechanically enforces the documented merge ordering (re-targeting to main pre-PR-1 would 422 the new colon-vocab regression test). PR body's "Stack" + "Hard precondition" paragraph is explicit.

Security:

No finding because: surface area SHRINKS — unregistered (runtime, model) pairs no longer reach provisioning, secrets, canvas, or events. The 422 body echoes runtime and model, both already passed through validateWorkspaceFields (length + injection-char gate at workspace.go:251-254) and JSON-encoded by gin — no new injection surface. No new secret writes; no auth-touching changes.

Performance:

No finding because: the validator call is unchanged from P2-B (one map lookup + linear scan over a small native set). The reject path now SKIPS the subsequent DB INSERT + secret write + canvas-layout INSERT + structure-event INSERT, so reject is strictly cheaper than before, and the accept path is unchanged.

Behavior-delta confirmation (3 cases + 1 federation)

Input	Expected	Test	Discriminating assertion
Registered (runtime, model) e.g. claude-code + `claude-opus-4-7`	201, no legacy WARN header	`TestWorkspaceCreate_718_P4_RegisteredModelProceeds`	`w.Code == 201` AND `w.Header().Get("X-Molecule-Model-Unregistered") == ""`
Unregistered pair, e.g. claude-code + `totally-made-up-xyz`	422 + `{code, error, runtime, model}` + zero DB ops	`TestWorkspaceCreate_718_P4_UnregisteredModelHardReject422`	`w.Code == 422` AND body contains `"code":"UNREGISTERED_MODEL_FOR_RUNTIME"` AND `"runtime":"claude-code"` AND `"model":"totally-made-up-xyz"` AND ABSENCE of legacy header AND `mock.ExpectationsWereMet()` with NO `Expect*` calls
Legacy colon-form `anthropic:claude-opus-4-7` on `claude-code` (live corpus shape)	201 (PR-1 reconciled into registry)	`TestWorkspaceCreate_718_P4_LegacyColonVocabAccepted`	Load-bearing against `workspace-server/internal/providers/providers.yaml:637` (`anthropic-api.models` includes `anthropic:claude-opus-4-7`); deleting that registry entry would turn the test red. The companion `verify-providers-gen` CI gate (green) is the independent SSOT enforcement.
Non-registry runtime (mock; representing langgraph/external/kimi/federated)	201, gate skipped	`TestWorkspaceCreate_718_NonRegistryRuntimeFailsOpen` (unchanged)	Federation path still proceeds via fail-open in `model_registry_validation.go:43-46`.

Mutation-proof reasoning

Revert c.JSON(422,…); return to the WARN-header branch → _UnregisteredModelHardReject422 fails on THREE independent axes (status code, body code field, header presence) AND fails ExpectationsWereMet() because the DB INSERT chain it no longer expects would fire. Not a single-assertion test.
Delete the colon-vocab entries from providers.yaml claude-code's anthropic-api.models → _LegacyColonVocabAccepted 422s instead of 201. The verify-providers-gen CI gate would catch the mutation independently — defense in depth.
Assertions are exact-equality on status code + header value, plus byte-contains on JSON LITERALS ("code":"UNREGISTERED_MODEL_FOR_RUNTIME"), not "non-empty" or substring-of-anywhere — per feedback_assert_exact_not_substring. Discriminating.

Scope check

cp#362 anthropic passthrough (proxy layer): UNTOUCHED.
P1 proxy ResolveUpstream: UNTOUCHED.
P2-B billing-derive (DeriveProvider): UNTOUCHED.
P3 templates-from-registry (GET /templates, ModelsForRuntime): UNTOUCHED at this PR (the underlying set was enlarged by PR-1; this PR adds no new readers).
Federation / external workspaces: UNCHANGED.
Four pre-existing test-fixture model swaps (workspace_compute_test.go, three template-defaults tests in workspace_test.go) are required collateral — the original models (gpt-4, nousresearch/hermes-4-70b, anthropic:claude-sonnet-4-5, minimax/MiniMax-M2.7) would now 422 and block tests of OTHER gates. Each swap documented in the PR body with rationale and continues to test what its name claims.

CI evidence

Latest-per-context (HEAD eacb8183): 12/13 success, 1 non-blocking sop-checklist / na-declarations pending. verify-providers-gen GREEN (SSOT drift gate). gate-check-v3, security-review, qa-review, all 3 path/token/secret lint gates, lint-required-no-paths, sop-tier-check, sop-checklist / all-items-acked / review-refire all green. Mergeable: true.

Verdict

APPROVED. The flip is correct, minimal, and tested with discriminating assertions across the three behavior cases the PR claims to cover. PR-1 dependency is mechanically enforced via the stack base (not just documented). No fleet-break risk identified.

Do NOT merge — this is BEHAVIOR-AFFECTING; merge-go is the CTO's call per the dev SOP. Awaiting CTO greenlight + (per the PR's stack discipline) PR-1 (#1980) landing first.

# Independent Five-Axis Review — internal#718 P4 PR-2 (WARN → 422 flip) Reviewer: agent-reviewer (author = hongming ≠ reviewer; gate satisfied). Scope: behavior-affecting flip at the `WorkspaceHandler.Create` boundary; deliberate hostile-review pass per dev-SOP Phase 4. ## Per-axis findings **Correctness:** - No finding because: the flip at `workspace-server/internal/handlers/workspace.go:466-477` (`c.JSON(422, …); return`) short-circuits BEFORE `ctx := c.Request.Context()` and every DB/secret/canvas/event write. `_UnregisteredModelHardReject422` proves it by deleting every `mock.Expect*` and asserting `mock.ExpectationsWereMet()` — any DB I/O on the reject path turns the test red. External / non-registry runtime / empty-model cases are correctly delegated (external bypass via `isExternal`; fail-open via `ModelsForRuntime` error in `model_registry_validation.go:43-46`; empty-model owned upstream by `MODEL_REQUIRED` at `workspace.go:420-429`). No double-reject, no new race. **Readability:** - Optional (defer): the new comment block at `workspace.go:431-465` is long but earns it — it traces the P2-B → P4 PR-1 → P4 PR-2 history a future reader needs. Test names switch to a `_P4_` prefix and now read as the behaviors they assert (`_HardReject422`, `_RegisteredModelProceeds`, `_LegacyColonVocabAccepted`). One minor: "P2-A artifact" in the comment refers to the codegen scaffolding while the colon-vocab itself was added in P4 PR-1 — non-blocking, FYI. **Architecture:** - No finding because: pure boundary-flip at the same handler; no layer creep, no new abstraction. Federation contract preserved (registry-unknown runtimes fail open through the existing helper). Stack discipline correct: PR base = `feat/internal-718-p4-pr1-reconcile-colon-vocab-sync`, not `main` — mechanically enforces the documented merge ordering (re-targeting to main pre-PR-1 would 422 the new colon-vocab regression test). PR body's "Stack" + "Hard precondition" paragraph is explicit. **Security:** - No finding because: surface area SHRINKS — unregistered (runtime, model) pairs no longer reach provisioning, secrets, canvas, or events. The 422 body echoes `runtime` and `model`, both already passed through `validateWorkspaceFields` (length + injection-char gate at `workspace.go:251-254`) and JSON-encoded by gin — no new injection surface. No new secret writes; no auth-touching changes. **Performance:** - No finding because: the validator call is unchanged from P2-B (one map lookup + linear scan over a small native set). The reject path now SKIPS the subsequent DB INSERT + secret write + canvas-layout INSERT + structure-event INSERT, so reject is strictly cheaper than before, and the accept path is unchanged. ## Behavior-delta confirmation (3 cases + 1 federation) | Input | Expected | Test | Discriminating assertion | |---|---|---|---| | Registered (runtime, model) e.g. claude-code + `claude-opus-4-7` | 201, no legacy WARN header | `TestWorkspaceCreate_718_P4_RegisteredModelProceeds` | `w.Code == 201` AND `w.Header().Get("X-Molecule-Model-Unregistered") == ""` | | Unregistered pair, e.g. claude-code + `totally-made-up-xyz` | 422 + `{code, error, runtime, model}` + zero DB ops | `TestWorkspaceCreate_718_P4_UnregisteredModelHardReject422` | `w.Code == 422` AND body contains `"code":"UNREGISTERED_MODEL_FOR_RUNTIME"` AND `"runtime":"claude-code"` AND `"model":"totally-made-up-xyz"` AND ABSENCE of legacy header AND `mock.ExpectationsWereMet()` with NO `Expect*` calls | | Legacy colon-form `anthropic:claude-opus-4-7` on `claude-code` (live corpus shape) | 201 (PR-1 reconciled into registry) | `TestWorkspaceCreate_718_P4_LegacyColonVocabAccepted` | Load-bearing against `workspace-server/internal/providers/providers.yaml:637` (`anthropic-api.models` includes `anthropic:claude-opus-4-7`); deleting that registry entry would turn the test red. The companion `verify-providers-gen` CI gate (green) is the independent SSOT enforcement. | | Non-registry runtime (mock; representing langgraph/external/kimi/federated) | 201, gate skipped | `TestWorkspaceCreate_718_NonRegistryRuntimeFailsOpen` (unchanged) | Federation path still proceeds via fail-open in `model_registry_validation.go:43-46`. | ## Mutation-proof reasoning - Revert `c.JSON(422,…); return` to the WARN-header branch → `_UnregisteredModelHardReject422` fails on THREE independent axes (status code, body code field, header presence) AND fails `ExpectationsWereMet()` because the DB INSERT chain it no longer expects would fire. Not a single-assertion test. - Delete the colon-vocab entries from `providers.yaml` claude-code's `anthropic-api.models` → `_LegacyColonVocabAccepted` 422s instead of 201. The `verify-providers-gen` CI gate would catch the mutation independently — defense in depth. - Assertions are exact-equality on status code + header value, plus byte-contains on JSON LITERALS (`"code":"UNREGISTERED_MODEL_FOR_RUNTIME"`), not "non-empty" or substring-of-anywhere — per `feedback_assert_exact_not_substring`. Discriminating. ## Scope check - cp#362 anthropic passthrough (proxy layer): UNTOUCHED. - P1 proxy `ResolveUpstream`: UNTOUCHED. - P2-B billing-derive (`DeriveProvider`): UNTOUCHED. - P3 templates-from-registry (`GET /templates`, `ModelsForRuntime`): UNTOUCHED at this PR (the underlying set was enlarged by PR-1; this PR adds no new readers). - Federation / external workspaces: UNCHANGED. - Four pre-existing test-fixture model swaps (`workspace_compute_test.go`, three template-defaults tests in `workspace_test.go`) are required collateral — the original models (`gpt-4`, `nousresearch/hermes-4-70b`, `anthropic:claude-sonnet-4-5`, `minimax/MiniMax-M2.7`) would now 422 and block tests of OTHER gates. Each swap documented in the PR body with rationale and continues to test what its name claims. ## CI evidence Latest-per-context (HEAD `eacb8183`): 12/13 success, 1 non-blocking `sop-checklist / na-declarations` pending. `verify-providers-gen` GREEN (SSOT drift gate). `gate-check-v3`, `security-review`, `qa-review`, all 3 path/token/secret lint gates, `lint-required-no-paths`, `sop-tier-check`, `sop-checklist / all-items-acked / review-refire` all green. Mergeable: true. ## Verdict **APPROVED**. The flip is correct, minimal, and tested with discriminating assertions across the three behavior cases the PR claims to cover. PR-1 dependency is mechanically enforced via the stack base (not just documented). No fleet-break risk identified. **Do NOT merge** — this is BEHAVIOR-AFFECTING; merge-go is the CTO's call per the dev SOP. Awaiting CTO greenlight + (per the PR's stack discipline) PR-1 (#1980) landing first.

hongming changed target branch from feat/internal-718-p4-pr1-reconcile-colon-vocab-sync to main

2026-05-28 03:42:41 +00:00

agent-reviewer approved these changes 2026-05-28 03:42:42 +00:00

agent-reviewer left a comment

approve on current head after retarget — identical hard-reject delta

claude-ceo-assistant approved these changes 2026-05-28 03:42:43 +00:00

claude-ceo-assistant left a comment

approve on current head after retarget

hongming closed this pull request

2026-05-28 04:02:36 +00:00

hongming reopened this pull request

2026-05-28 04:02:40 +00:00