P2-B internal#718: billing/credential derives from provider + only-registered validation (BEHAVIOR-AFFECTING; supersedes #1966) #1971

Owner

internal#718 P2-B — billing/credential decision DERIVES the provider (BEHAVIOR-AFFECTING)

Stacked on #1970 (P2-A) — base is the P2-A branch (the registry/loader it consumes). Re-target to main after #1970 merges.

Re-points the platform-vs-BYOK billing/credential decision to DERIVE the provider from (runtime, model) via the registry SSOT, per the CTO directive (internal#718 comment 2026-05-27): "the billing read must DERIVE the provider, not read a stored LLM_PROVIDER", "remove LLM_PROVIDER entirely as a billing source", "retire organizations.llm_billing_mode as a billing source".

⚠ BEHAVIOR DELTA (tested explicitly)

workspace before P2 after P2
provider derives platform (or unset → platform default) platform_managed platform_managed — UNCHANGED
provider derives a non-platform vendor platform_managed (via the never-written org rung) → ran on platform creds byok#1963 strips platform scope:global LLM creds + fail-closes if no own cred — THE FIX (Reno billing-leak class: SEO 352e3c2b / Marketing 6b66de8d)

What changed

  • ResolveLLMBillingModeDerived(ctx, ws, runtime, model, authEnv) — new SSOT resolver: explicit workspaces.llm_billing_mode override (precedence 1, the only stored billing signal that survives — operator escape hatch) → else DeriveProvider + IsPlatform → else default-closed platform_managed.
  • ResolveLLMBillingMode(ctx, ws, orgMode) legacy signature kept for callers without (runtime, model) (admin route, secrets remote-pull): reads stored runtime + MODEL + auth-env from DB, delegates to the derived resolver. orgMode RETIRED/ignored — org rung gone.
  • applyPlatformManagedLLMEnv derives directly (has runtime+model+env). Feeds the merged #1963 strip + fail-closed the correct DERIVED signal.
  • Supersedes core#1966 (close it): #1966 read a stored LLM_PROVIDER first; this derives instead.
  • Only-registered validation at create (validateRegisteredModelForRuntime + WorkspaceHandler.Create): an unregistered (runtime, model) on a registry-known runtime is flagged (X-Molecule-Model-Unregistered header + warning log). P2 = WARN mode, not hard-422 — the legacy colon-namespaced model vocab ("anthropic:claude-opus-4-7") is still live across the create/import/template corpus and isn't yet reconciled into the registry; hard-reject is a one-line flip gated on P3/P4 vocabulary convergence. Fails OPEN for non-registry runtimes (langgraph/external/kimi/mock/federated).

Tests (TDD)

  • llm_billing_mode_derived_test.go — platform/non-platform/unset/override/unregistered/auth-env table + DB-error default-closed + empty-id.
  • workspace_provision_shared_test.go — DERIVED platform→unchanged, non-platform→byok+strip+fail-closed (the FIX), unset→platform default, through the real applyPlatformManagedLLMEnv path. Existing #1963 override-byok tests unchanged.
  • model_registry_validation_test.go + workspace_test.go — warn / no-warn / fail-open.
  • Reworked legacy resolver/admin/secrets tests off the retired org rung.

Build/CI

go build ./... (+ -tags=integration) green; full go test ./... (43 pkgs) green incl. -race on handlers; vet clean; changed files gofmt-clean.

BEHAVIOR-AFFECTING. Supersedes #1966. Cross-link internal#718 (P2-B). Do NOT merge.

## internal#718 P2-B — billing/credential decision DERIVES the provider (BEHAVIOR-AFFECTING) **Stacked on #1970 (P2-A)** — base is the P2-A branch (the registry/loader it consumes). Re-target to `main` after #1970 merges. Re-points the platform-vs-BYOK billing/credential decision to **DERIVE** the provider from `(runtime, model)` via the registry SSOT, per the CTO directive (internal#718 comment 2026-05-27): "the billing read must DERIVE the provider, not read a stored LLM_PROVIDER", "remove LLM_PROVIDER entirely as a billing source", "retire organizations.llm_billing_mode as a billing source". ### ⚠ BEHAVIOR DELTA (tested explicitly) | workspace | before P2 | after P2 | |---|---|---| | provider derives **platform** (or **unset** → platform default) | platform_managed | **platform_managed — UNCHANGED** | | provider derives a **non-platform** vendor | platform_managed (via the never-written org rung) → ran on platform creds | **byok** → #1963 strips platform scope:global LLM creds + **fail-closes** if no own cred — **THE FIX (Reno billing-leak class: SEO 352e3c2b / Marketing 6b66de8d)** | ### What changed - `ResolveLLMBillingModeDerived(ctx, ws, runtime, model, authEnv)` — new SSOT resolver: explicit `workspaces.llm_billing_mode` override (precedence 1, the only stored billing signal that survives — operator escape hatch) → else `DeriveProvider` + `IsPlatform` → else default-closed `platform_managed`. - `ResolveLLMBillingMode(ctx, ws, orgMode)` legacy signature kept for callers without (runtime, model) (admin route, secrets remote-pull): reads stored runtime + MODEL + auth-env from DB, delegates to the derived resolver. **`orgMode` RETIRED/ignored — org rung gone.** - `applyPlatformManagedLLMEnv` derives directly (has runtime+model+env). Feeds the merged **#1963** strip + fail-closed the correct DERIVED signal. - **Supersedes core#1966** (close it): #1966 read a stored LLM_PROVIDER first; this derives instead. - **Only-registered validation** at create (`validateRegisteredModelForRuntime` + `WorkspaceHandler.Create`): an unregistered `(runtime, model)` on a **registry-known** runtime is flagged (`X-Molecule-Model-Unregistered` header + warning log). **P2 = WARN mode, not hard-422** — the legacy colon-namespaced model vocab ("anthropic:claude-opus-4-7") is still live across the create/import/template corpus and isn't yet reconciled into the registry; hard-reject is a one-line flip gated on P3/P4 vocabulary convergence. **Fails OPEN** for non-registry runtimes (langgraph/external/kimi/mock/federated). ### Tests (TDD) - `llm_billing_mode_derived_test.go` — platform/non-platform/unset/override/unregistered/auth-env table + DB-error default-closed + empty-id. - `workspace_provision_shared_test.go` — DERIVED platform→unchanged, non-platform→byok+strip+fail-closed (the FIX), unset→platform default, through the real `applyPlatformManagedLLMEnv` path. Existing #1963 override-byok tests unchanged. - `model_registry_validation_test.go` + `workspace_test.go` — warn / no-warn / fail-open. - Reworked legacy resolver/admin/secrets tests off the retired org rung. ### Build/CI `go build ./...` (+ `-tags=integration`) green; full `go test ./...` (43 pkgs) green incl. `-race` on handlers; vet clean; changed files gofmt-clean. **BEHAVIOR-AFFECTING.** Supersedes #1966. Cross-link internal#718 (P2-B). **Do NOT merge.**
hongming added 1 commit 2026-05-28 00:43:06 +00:00
fix(workspace-server): P2-B internal#718 — billing/credential decision DERIVES the provider; supersede #1966 stored-read; retire org rung; only-registered validation (BEHAVIOR-AFFECTING)
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 11s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 5s
Lint no tenant GITEA or GITHUB token write / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 7s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 7s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m11s
gate-check-v3 / gate-check (pull_request) Successful in 11s
verify-providers-gen / Regenerate providers artifact and fail on drift (pull_request) Successful in 43s
qa-review / approved (pull_request) Successful in 10s
security-review / approved (pull_request) Successful in 8s
sop-checklist / review-refire (pull_request) Has been skipped
sop-tier-check / tier-check (pull_request) Successful in 5s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
sop-checklist / na-declarations (pull_request) N/A: (none)
audit-force-merge / audit (pull_request) Successful in 4s
3165b98cc8
Re-points the platform-vs-BYOK billing/credential decision to DERIVE the provider
from (runtime, model) via the registry SSOT, per the CTO directive (internal#718
comment, 2026-05-27): "the billing read must DERIVE the provider, not read a
stored LLM_PROVIDER", "remove LLM_PROVIDER entirely as a billing source", "retire
organizations.llm_billing_mode as a billing source".

## BEHAVIOR DELTA (this PR changes behavior — tested explicitly)
- platform-derived (or unset → platform default) → platform_managed → platform
  creds. UNCHANGED.
- non-platform-derived → byok → the already-merged #1963 strips platform
  scope:global LLM creds + FAIL-CLOSES if the workspace has no own cred. THIS IS
  THE INTENDED FIX (the Reno billing-leak class: Reno Stars SEO 352e3c2b /
  Marketing 6b66de8d ran on the platform's Anthropic credits because the never-
  written org rung always resolved platform_managed).
- unset model → platform default (CTO-confirmed).

## What changed
- `ResolveLLMBillingModeDerived(ctx, ws, runtime, model, authEnv)` — the new SSOT
  resolver: explicit `workspaces.llm_billing_mode` override (precedence 1, the
  only stored billing signal that survives — operator escape hatch) → else
  DeriveProvider + IsPlatform → else default-closed platform_managed.
- `ResolveLLMBillingMode(ctx, ws, orgMode)` legacy signature retained for callers
  without (runtime, model) (admin route, secrets remote-pull): reads the stored
  runtime + MODEL + auth-env names from DB and delegates to the derived resolver.
  `orgMode` is RETIRED/ignored; the org rung is gone.
- `applyPlatformManagedLLMEnv` calls the derived resolver directly (it has
  runtime + model + the workspace env) — no stored LLM_PROVIDER read. Feeds
  #1963's strip + fail-closed the correct DERIVED signal.
- SUPERSEDES core#1966: that PR made the billing read consult a stored
  LLM_PROVIDER first; this reworks the decision onto derive-from-provider. #1966
  should be closed in favor of this.
- Removed the now-dead org-default normalization (normalizeOrgDefault).
- ONLY-REGISTERED validation at create (model_registry_validation.go +
  WorkspaceHandler.Create): a (runtime, model) not in the registry's
  ModelsForRuntime for a REGISTRY-known runtime is flagged
  (X-Molecule-Model-Unregistered header + warning log). P2 = WARN mode (NOT hard
  422) because the legacy colon-namespaced model vocabulary ("anthropic:claude-
  opus-4-7") is still live across the create/import/template corpus and is not
  yet reconciled into the registry — hard-reject is a one-line flip gated on
  P3/P4 vocabulary convergence. Fails OPEN for non-registry runtimes
  (langgraph/external/kimi/mock/federated) so those flows are unchanged.

## Tests (TDD; behavior delta explicit)
- llm_billing_mode_derived_test.go — platform/non-platform/unset/override/
  unregistered/auth-env-disambiguation table + DB-error default-closed + empty-id.
- workspace_provision_shared_test.go — DERIVED platform→unchanged,
  non-platform→byok+strip+fail-closed (the FIX), unset→platform default, through
  the real applyPlatformManagedLLMEnv path. Existing #1963 override-byok strip +
  fail-closed tests unchanged (still pass).
- model_registry_validation_test.go + workspace_test.go — only-registered warn +
  registered-no-warn + non-registry-fail-open.
- Reworked the legacy resolver/admin/secrets tests off the retired org rung.

## Build/CI
go build ./... (+ -tags=integration) green; full `go test ./...` (43 pkgs) green
incl. -race on handlers; vet clean; changed files gofmt-clean. cp#362 anthropic
passthrough untouched (CP repo); merged #1963 strip+fail-closed reused unchanged.

internal#718 P2-B. BEHAVIOR-AFFECTING. Supersedes #1966. Not merged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hongming added the tier:medium label 2026-05-28 00:43:39 +00:00
agent-reviewer approved these changes 2026-05-28 01:09:08 +00:00
agent-reviewer left a comment
Member

Five-Axis review — agent-reviewer (independent; author hongming ≠ reviewer)

Reviewed against the #1971-branch diff vs its base (#1970 @71c68e4); head 3165b98. Built + ran the full internal/handlers + internal/providers suites with -race, plus integration-tag build, vet, gofmt — all green. Confirmed the behavior-delta tests are LOAD-BEARING via targeted source mutations (each mutant → RED).

VERDICT: APPROVED. Behavior delta is correct, default-platform-on-unset holds (no fleet fail-close), the #1966 stored-LLM_PROVIDER billing-read is gone, only-registered ships in WARN mode, and #1963/cp#362 are intact. One NON-BLOCKING coordination follow-up below.

Axis 1 — Behavior delta (load-bearing). PASS

All three cases verified through the REAL applyPlatformManagedLLMEnv path (not a mock), registry data confirmed via a throwaway DeriveProvider probe:

  • (a) platform-derived (anthropic/claude-opus-4-7platform) AND unset→default ⇒ platform_managed ⇒ CP proxy creds injected — UNCHANGED. (workspace_provision_shared_test.go DERIVED_PlatformModelKeepsPlatformCreds / DERIVED_UnsetModelPlatformDefault)
  • (b) non-platform-derived (kimi-for-codingkimi-coding, gpt-5.5openai) ⇒ byok#1963 strips the platform scope:global OAuth token + HasUsableLLMCred=false ⇒ caller fail-closes (workspace_provision_shared.go:217). This is the Reno-leak fix. (DERIVED_NonPlatformModelStripsAndFailsClosed)
  • (c) the path runs the real resolver. MUTATION PROOF: forcing non-platform→platform_managed turned the FIX tests RED; flipping derive-default→byok turned the unset/unregistered tests RED.

Axis 2 — No silent breakage of working workspaces (fleet-safety). PASS

Derive-failure (no model / unknown runtime / unregistered / ambiguous / registry-unavailable / DB error) ALWAYS defaults closed to platform_managed (llm_billing_mode.go ResolveLLMBillingModeDerived precedence-3 + the manifest-nil + override-DB-error branches). A wrong default here would fail-close the whole platform fleet; the mutation flipping this to byok went RED across unset/unregistered/unknown-runtime, so the guard is real. DeriveProvider errors (never silently returns a non-platform provider) on empty model — verified in the registry source.

Axis 3 — #1966 stored-read fully removed. PASS

Billing decision derives from (runtime, model, authEnv) only. No production path reads stored LLM_PROVIDER to decide billing. normalizeOrgDefault deleted (no callers); BillingModeSourceOrgDefault kept as a wire-compat const but NEVER returned as a Source in prod; organizations.llm_billing_mode never queried. workspaces.llm_billing_mode survives only as precedence-1 explicit operator override. The remaining LLM_PROVIDER reads (GetProvider/SetProvider/setProviderSecret/deriveProviderFromModelSlug) are the orthogonal CP user-data provider-HINT path (controlplane #364), not billing — internal#718 schedules their demote-to-write-through-cache for a later phase, correctly out of P2-B scope.

Axis 4 — only-registered is WARN, not hard-reject. PASS

workspace.go Create sets X-Molecule-Model-Unregistered: true + a warning log and PROCEEDS (201) for an unregistered (runtime, model) on a registry-known runtime; the hard-422 is commented/gated on P3/P4 vocab convergence. Fails OPEN for non-registry runtimes (langgraph/external/mock/kimi/federated). MUTATION PROOF: flipping WARN→422 turned the warns-but-proceeds test RED (expected 201). Does not 422 the legacy colon vocab.

Axis 5 — #1963 reuse + cp#362/P1 untouched. PASS

stripGlobalOriginLLMCreds / hasAnyPlatformManagedLLMKey bodies are byte-identical vs base (only the call-site reference appears in the diff). #1971 touches ONLY workspace-server/internal/handlers — no proxy ResolveUpstream, no controlplane, no cp#362. Existing #1963 byok override tests pass unchanged.

Axis 6 — Registry usage. PASS

Uses the #1970 core registry DeriveProvider/IsPlatform/ModelsForRuntime correctly. IsPlatform is the closed Name=="platform" predicate. The authEnv fed to DeriveProvider is the REAL workspace env: on the provision path via availableAuthEnvNames(envVars), and on the legacy-shim path via readWorkspaceDeriveInputs scanning recognized auth-env secret KEYS (never values) — verified opus+CLAUDE_CODE_OAUTH_TOKENanthropic-oauth (byok), so oauth-vs-api disambiguation fires identically to the canvas.


FOLLOW-UP (non-blocking, coordination — does not affect #1971 correctness)

Three open PRs concurrently retire the org rung from the SAME resolver (llm_billing_mode.go):

  • #1930 refactor/drop-org-tier-llm-billing-mode (base=main) makes resolution workspaces.llm_billing_mode ?? platform_managed — i.e. the stored column stays the PRIMARY signal.
  • #1931 is its canvas sibling.
  • #1966 (this PR supersedes; PR body says close it).

#1971 instead makes the DERIVED provider primary and demotes the column to an override. #1930 and #1971 are on a collision course: if #1930 lands AFTER #1971 it would revert derive-from-provider back to a stored-read. Recommend, before #1971 re-targets/merges to main: close #1966 and reconcile #1930/#1931 (close, or rebase the canvas drop on #1971's override-only model). Flagging per the SOP duplicate-fix-revert hazard — not a defect in #1971.

Static review only. Nothing merged/pushed/touched-live; build+test were on a throwaway worktree at the PR head.

## Five-Axis review — agent-reviewer (independent; author hongming ≠ reviewer) Reviewed against the #1971-branch diff vs its base (#1970 @71c68e4); head 3165b98. Built + ran the full `internal/handlers` + `internal/providers` suites with `-race`, plus integration-tag build, vet, gofmt — all green. Confirmed the behavior-delta tests are LOAD-BEARING via targeted source mutations (each mutant → RED). **VERDICT: APPROVED.** Behavior delta is correct, default-platform-on-unset holds (no fleet fail-close), the #1966 stored-LLM_PROVIDER billing-read is gone, only-registered ships in WARN mode, and #1963/cp#362 are intact. One NON-BLOCKING coordination follow-up below. ### Axis 1 — Behavior delta (load-bearing). PASS All three cases verified through the REAL `applyPlatformManagedLLMEnv` path (not a mock), registry data confirmed via a throwaway DeriveProvider probe: - (a) platform-derived (`anthropic/claude-opus-4-7`→`platform`) AND unset→default ⇒ `platform_managed` ⇒ CP proxy creds injected — UNCHANGED. (`workspace_provision_shared_test.go` DERIVED_PlatformModelKeepsPlatformCreds / DERIVED_UnsetModelPlatformDefault) - (b) non-platform-derived (`kimi-for-coding`→`kimi-coding`, `gpt-5.5`→`openai`) ⇒ `byok` ⇒ #1963 strips the platform scope:global OAuth token + `HasUsableLLMCred=false` ⇒ caller fail-closes (`workspace_provision_shared.go:217`). This is the Reno-leak fix. (DERIVED_NonPlatformModelStripsAndFailsClosed) - (c) the path runs the real resolver. MUTATION PROOF: forcing non-platform→platform_managed turned the FIX tests RED; flipping derive-default→byok turned the unset/unregistered tests RED. ### Axis 2 — No silent breakage of working workspaces (fleet-safety). PASS Derive-failure (no model / unknown runtime / unregistered / ambiguous / registry-unavailable / DB error) ALWAYS defaults closed to `platform_managed` (`llm_billing_mode.go` ResolveLLMBillingModeDerived precedence-3 + the manifest-nil + override-DB-error branches). A wrong default here would fail-close the whole platform fleet; the mutation flipping this to byok went RED across unset/unregistered/unknown-runtime, so the guard is real. `DeriveProvider` errors (never silently returns a non-platform provider) on empty model — verified in the registry source. ### Axis 3 — #1966 stored-read fully removed. PASS Billing decision derives from `(runtime, model, authEnv)` only. No production path reads stored `LLM_PROVIDER` to decide billing. `normalizeOrgDefault` deleted (no callers); `BillingModeSourceOrgDefault` kept as a wire-compat const but NEVER returned as a Source in prod; `organizations.llm_billing_mode` never queried. `workspaces.llm_billing_mode` survives only as precedence-1 explicit operator override. The remaining `LLM_PROVIDER` reads (GetProvider/SetProvider/setProviderSecret/deriveProviderFromModelSlug) are the orthogonal CP user-data provider-HINT path (controlplane #364), not billing — internal#718 schedules their demote-to-write-through-cache for a later phase, correctly out of P2-B scope. ### Axis 4 — only-registered is WARN, not hard-reject. PASS `workspace.go` Create sets `X-Molecule-Model-Unregistered: true` + a warning log and PROCEEDS (201) for an unregistered (runtime, model) on a registry-known runtime; the hard-422 is commented/gated on P3/P4 vocab convergence. Fails OPEN for non-registry runtimes (langgraph/external/mock/kimi/federated). MUTATION PROOF: flipping WARN→422 turned the warns-but-proceeds test RED (expected 201). Does not 422 the legacy colon vocab. ### Axis 5 — #1963 reuse + cp#362/P1 untouched. PASS `stripGlobalOriginLLMCreds` / `hasAnyPlatformManagedLLMKey` bodies are byte-identical vs base (only the call-site reference appears in the diff). #1971 touches ONLY `workspace-server/internal/handlers` — no proxy `ResolveUpstream`, no controlplane, no cp#362. Existing #1963 byok override tests pass unchanged. ### Axis 6 — Registry usage. PASS Uses the #1970 core registry `DeriveProvider`/`IsPlatform`/`ModelsForRuntime` correctly. `IsPlatform` is the closed `Name=="platform"` predicate. The authEnv fed to DeriveProvider is the REAL workspace env: on the provision path via `availableAuthEnvNames(envVars)`, and on the legacy-shim path via `readWorkspaceDeriveInputs` scanning recognized auth-env secret KEYS (never values) — verified `opus`+`CLAUDE_CODE_OAUTH_TOKEN`→`anthropic-oauth` (byok), so oauth-vs-api disambiguation fires identically to the canvas. --- ### FOLLOW-UP (non-blocking, coordination — does not affect #1971 correctness) Three open PRs concurrently retire the org rung from the SAME resolver (`llm_billing_mode.go`): - **#1930** `refactor/drop-org-tier-llm-billing-mode` (base=main) makes resolution `workspaces.llm_billing_mode ?? platform_managed` — i.e. the stored column stays the PRIMARY signal. - **#1931** is its canvas sibling. - **#1966** (this PR supersedes; PR body says close it). #1971 instead makes the DERIVED provider primary and demotes the column to an override. #1930 and #1971 are on a collision course: if #1930 lands AFTER #1971 it would revert derive-from-provider back to a stored-read. Recommend, before #1971 re-targets/merges to main: close #1966 and reconcile #1930/#1931 (close, or rebase the canvas drop on #1971's override-only model). Flagging per the SOP duplicate-fix-revert hazard — not a defect in #1971. _Static review only. Nothing merged/pushed/touched-live; build+test were on a throwaway worktree at the PR head._
claude-ceo-assistant approved these changes 2026-05-28 01:22:19 +00:00
claude-ceo-assistant left a comment
Owner

2nd approval (claude-ceo-assistant). Concur with agent-reviewer Five-Axis verdict (CTO-approved batch). Merge once required checks green.

2nd approval (claude-ceo-assistant). Concur with agent-reviewer Five-Axis verdict (CTO-approved batch). Merge once required checks green.
hongming merged commit 3ab690c273 into feat/internal-718-p2a-registry-codegen-distribution 2026-05-28 01:22:21 +00:00
Sign in to join this conversation.
3 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#1971