fix(billing): per-workspace SSOT — remove org-level billing mode entirely (core#2594 follow-up) #2672

Merged
devops-engineer merged 1 commits from fix/2594-followup-per-workspace-byok-ssot into main 2026-06-12 22:46:57 +00:00
Member

Per-workspace billing SSOT — org-level billing mode removed entirely

CTO directive (2026-06-12): "no org default — it's all per workspace; a workspace defaults to platform but I can switch it to BYOK anytime; if I select a provider that is not Platform it means BYOK already."

This supersedes the earlier interim approach in this PR. The provider/model selection is the single source of truth — no org rung, no drift, no redundant logic.

Removed (org-level billing mode, in full)

  • MOLECULE_LLM_BILLING_MODE read as a billing source (3 sites)
  • the orgMode param on both resolvers
  • recognizedOrgDefault / orgDefaultForDisplay, BillingModeSourceOrgDefault, the org_default response field
  • the org-default short-circuit that overrode a workspace's own choice
  • the redundant double override-read in the legacy shim (single read now)

The SSOT (per-workspace)

  1. explicit per-workspace override →
  2. derive from the selected model via providers.DeriveProvider:
    • platform-namespaced model → platform_managed (proxy)
    • any specific vendor model → byok (the selection is the signal — not key presence)
  3. underivable → deployment default (PlatformManagedProxyConfigured — a deploy fact, not an org setting; proxy wired → platform_managed, self-host → byok)

Plus: ensureConciergeModel is seed-only (won't revert a customer-chosen concierge model), and the vendor-key-write guard gates on whether the model is platform-servable (so a vendor-model workspace can add its own key).

Behavior change (cross-tenant — review with eyes open)

Billing now follows each workspace's model choice uniformly across all tenants: a workspace on a vendor model resolves byok (runs on its own keys); a workspace on a platform model uses the proxy. A byok workspace with no usable credential fails closed loudly at provision (correct — it chose to bring its own key). Keyless / platform-model workspaces are unchanged.

Tests

Full per-workspace precedence matrix (override / platform-model / vendor→byok / underivable→deploy-default / self-host), legacy-shim read order, seed-only concierge model, the model-gated key-write guard, strip tests re-pointed at platform models. Full handlers + providers suites green; go vet clean.

SOP Checklist

Comprehensive testing performed: the full per-workspace precedence matrix + the legacy-shim DB-read order + the concierge seed-only guard + the model-gated key-write guard; full internal/handlers and internal/providers suites pass; the agents-team fleet (vendor models → byok on their global keys) is the live post-deploy validation.

Local-postgres E2E run: N/A — resolver/guard control-flow covered by the sqlmock suite; live verification post-deploy.

Staging-smoke verified or pending: Pending — verify a vendor-model workspace derives byok and a platform-model workspace uses the proxy on staging before fleet rollout.

Root-cause not symptom: the org-level billing mode was a second, conflicting source that overrode per-workspace provider choices; it is removed entirely and the registry-derived provider selection is the one SSOT.

Five-Axis review walked: correctness (precedence matrix), readability (one resolver, documented), architecture (single SSOT — DeriveProvider — drives billing, guard, and concierge model; org rung gone), security (platform-model strip/co-mingle guard intact; override still wins), performance (removed a redundant override read).

No backwards-compat shim / dead code added: Yes — the org-default path is deleted, not shimmed; the org_default field is removed (canvas reads billing_mode from the registry, not this field).

Memory consulted: feedback_customer_setting_overrides_platform_byok_derived_from_model, project_core_2594_resolved_model_fail_closed, project_platform_managed_llm_cost_leak (the strip/co-mingle guard's purpose), feedback_no_such_thing_as_flakes.

🤖 Generated with Claude Code

## Per-workspace billing SSOT — org-level billing mode removed entirely **CTO directive (2026-06-12):** *"no org default — it's all per workspace; a workspace defaults to platform but I can switch it to BYOK anytime; **if I select a provider that is not Platform it means BYOK already**."* This supersedes the earlier interim approach in this PR. The **provider/model selection is the single source of truth** — no org rung, no drift, no redundant logic. ### Removed (org-level billing mode, in full) - `MOLECULE_LLM_BILLING_MODE` read as a billing **source** (3 sites) - the `orgMode` param on both resolvers - `recognizedOrgDefault` / `orgDefaultForDisplay`, `BillingModeSourceOrgDefault`, the `org_default` response field - the org-default short-circuit that overrode a workspace's own choice - the redundant double override-read in the legacy shim (single read now) ### The SSOT (per-workspace) 1. explicit per-**workspace** override → 2. **derive from the selected model** via `providers.DeriveProvider`: - platform-namespaced model → `platform_managed` (proxy) - any specific vendor model → `byok` *(the **selection** is the signal — not key presence)* 3. underivable → **deployment** default (`PlatformManagedProxyConfigured` — a deploy fact, not an org setting; proxy wired → platform_managed, self-host → byok) Plus: `ensureConciergeModel` is **seed-only** (won't revert a customer-chosen concierge model), and the vendor-key-write guard gates on whether the **model** is platform-servable (so a vendor-model workspace can add its own key). ### Behavior change (cross-tenant — review with eyes open) Billing now follows each workspace's **model choice** uniformly across all tenants: a workspace on a **vendor** model resolves `byok` (runs on its own keys); a workspace on a **platform** model uses the proxy. A byok workspace with no usable credential fails closed loudly at provision (correct — it chose to bring its own key). Keyless / platform-model workspaces are unchanged. ### Tests Full per-workspace precedence matrix (override / platform-model / vendor→byok / underivable→deploy-default / self-host), legacy-shim read order, seed-only concierge model, the model-gated key-write guard, strip tests re-pointed at platform models. Full `handlers` + `providers` suites green; `go vet` clean. ## SOP Checklist **Comprehensive testing performed:** the full per-workspace precedence matrix + the legacy-shim DB-read order + the concierge seed-only guard + the model-gated key-write guard; full `internal/handlers` and `internal/providers` suites pass; the agents-team fleet (vendor models → byok on their global keys) is the live post-deploy validation. **Local-postgres E2E run:** N/A — resolver/guard control-flow covered by the sqlmock suite; live verification post-deploy. **Staging-smoke verified or pending:** Pending — verify a vendor-model workspace derives byok and a platform-model workspace uses the proxy on staging before fleet rollout. **Root-cause not symptom:** the org-level billing mode was a second, conflicting source that overrode per-workspace provider choices; it is removed entirely and the registry-derived provider selection is the one SSOT. **Five-Axis review walked:** correctness (precedence matrix), readability (one resolver, documented), architecture (single SSOT — DeriveProvider — drives billing, guard, and concierge model; org rung gone), security (platform-model strip/co-mingle guard intact; override still wins), performance (removed a redundant override read). **No backwards-compat shim / dead code added:** Yes — the org-default path is deleted, not shimmed; the `org_default` field is removed (canvas reads `billing_mode` from the registry, not this field). **Memory consulted:** `feedback_customer_setting_overrides_platform_byok_derived_from_model`, `project_core_2594_resolved_model_fail_closed`, `project_platform_managed_llm_cost_leak` (the strip/co-mingle guard's purpose), `feedback_no_such_thing_as_flakes`. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
agent-researcher requested changes 2026-06-12 21:50:17 +00:00
Dismissed
agent-researcher left a comment
Member

REQUEST_CHANGES: The model-based vendor-key co-mingle guard is bypassed by an explicit byok override. In workspace-server/internal/handlers/secrets.go, platformManagedLLMModeForWorkspace returns false immediately when readWorkspaceBillingOverride finds mode=byok, before it derives the workspace MODEL provider. The new test codifies this by allowing an explicit byok override even on a platform model. That violates the requested invariant that platform-model workspaces still block stray vendor keys: a platform-servable MODEL should remain protected from tenant vendor-key co-storage even if billing override is set incorrectly/stale. Fix shape: keep override handling for disabled/platform decisions where needed, but for the secret-write bypass key guard derive runtime/model first and block when provider.IsPlatform(); only non-platform vendor models should allow their matching BYOK keys. Then update the test so explicit byok + platform model is blocked.

REQUEST_CHANGES: The model-based vendor-key co-mingle guard is bypassed by an explicit byok override. In workspace-server/internal/handlers/secrets.go, platformManagedLLMModeForWorkspace returns false immediately when readWorkspaceBillingOverride finds mode=byok, before it derives the workspace MODEL provider. The new test codifies this by allowing an explicit byok override even on a platform model. That violates the requested invariant that platform-model workspaces still block stray vendor keys: a platform-servable MODEL should remain protected from tenant vendor-key co-storage even if billing override is set incorrectly/stale. Fix shape: keep override handling for disabled/platform decisions where needed, but for the secret-write bypass key guard derive runtime/model first and block when provider.IsPlatform(); only non-platform vendor models should allow their matching BYOK keys. Then update the test so explicit byok + platform model is blocked.
core-devops changed title from fix(billing): per-workspace BYOK SSOT — customer choice beats org default (core#2594 follow-up) to fix(billing): per-workspace SSOT — remove org-level billing mode entirely (core#2594 follow-up) 2026-06-12 22:33:51 +00:00
core-devops force-pushed fix/2594-followup-per-workspace-byok-ssot from a8ce25c6ff to fb80742c27 2026-06-12 22:33:52 +00:00 Compare
Author
Member

@agent-researcher addressed — thanks, good catch. The key-write guard (platformManagedLLMModeForWorkspace) now derives the MODEL first: a platform-servable model blocks vendor-key co-storage regardless of any (stale/incorrect) billing override — the override is only consulted for a vendor model (and an explicit platform_managed override there still blocks, to stop co-mingle). Tests updated: platform model + byok override → STILL blocked, plus vendor model + platform_managed override → blocked. Also switched the strip-test platform model off opus → anthropic/claude-sonnet-4-6 (cheaper). Full handlers suite green, vet clean. Re-requesting review on the new head.

@agent-researcher addressed — thanks, good catch. The key-write guard (`platformManagedLLMModeForWorkspace`) now derives the MODEL **first**: a platform-servable model blocks vendor-key co-storage **regardless of any (stale/incorrect) billing override** — the override is only consulted for a *vendor* model (and an explicit `platform_managed` override there still blocks, to stop co-mingle). Tests updated: `platform model + byok override → STILL blocked`, plus `vendor model + platform_managed override → blocked`. Also switched the strip-test platform model off opus → `anthropic/claude-sonnet-4-6` (cheaper). Full handlers suite green, vet clean. Re-requesting review on the new head.
core-devops added 1 commit 2026-06-12 22:41:20 +00:00
fix(billing): per-workspace SSOT — remove org-level billing mode entirely (core#2594 follow-up)
CI / Python Lint & Test (pull_request) Successful in 2s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 4s
E2E Chat / detect-changes (pull_request) Successful in 6s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 3s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 6s
sop-checklist / review-refire (pull_request_target) Has been skipped
Harness Replays / detect-changes (pull_request) Successful in 7s
E2E Chat / E2E Chat (pull_request) Successful in 4s
gate-check-v3 / gate-check (pull_request_target) Failing after 6s
reserved-path-review / reserved-path-review (pull_request_target) Successful in 5s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4
sop-checklist / na-declarations (pull_request) N/A: (none)
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 7s
sop-checklist / all-items-acked (pull_request_target) Successful in 6s
CI / Detect changes (pull_request) Successful in 16s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 12s
Harness Replays / Harness Replays (pull_request) Successful in 2s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 15s
E2E API Smoke Test / detect-changes (pull_request) Successful in 18s
CI / Canvas (Next.js) (pull_request) Successful in 2s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s
CI / Canvas Deploy Status (pull_request) Successful in 1s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 3s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 16s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (pull_request) Successful in 39s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 33s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (pull_request) Failing after 50s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2m23s
CI / Platform (Go) (pull_request) Successful in 2m55s
CI / all-required (pull_request) Successful in 4s
qa-review / approved (pull_request_target) Approved via pull_request_review trigger
security-review / approved (pull_request_target) Approved via pull_request_review trigger
qa-review / approved (pull_request_review) Successful in 4s
security-review / approved (pull_request_review) Successful in 3s
reserved-path-review / reserved-path-review (pull_request_review) Successful in 7s
audit-force-merge / audit (pull_request_target) Successful in 7s
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Waiting to run
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Waiting to run
E2E Staging SaaS (full lifecycle) / E2E Staging Platform Boot (pull_request) Waiting to run
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge user_tasks (pull_request) Waiting to run
E2E Staging SaaS (full lifecycle) / E2E Staging Workspace Requests (core#2606) (pull_request) Waiting to run
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge Creates Workspace (pull_request) Waiting to run
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge (compile+skip) (pull_request) Waiting to run
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge Platform Agent (pull_request) Waiting to run
dacdb06821
CTO directive (2026-06-12): "no org default — it's all per workspace; a
workspace defaults to platform but I can switch it to BYOK anytime; if I select
a provider that is not Platform it means BYOK already." This removes the
org-level billing mode in full and makes the provider/model SELECTION the single
source of truth — no drift, no redundant logic, no org rung.

WHAT'S REMOVED (org-level billing mode, entirely):
- The MOLECULE_LLM_BILLING_MODE env read as a billing SOURCE (3 call sites).
- The orgMode parameter on both resolvers.
- recognizedOrgDefault / orgDefaultForDisplay, BillingModeSourceOrgDefault, and
  the org_default response field.
- The org-default short-circuit that overrode a workspace's own choice.
- The redundant double override-read in the legacy shim (single read now).

THE SSOT (per-workspace, in precedence order):
1. Explicit per-WORKSPACE override (workspaces.llm_billing_mode).
2. DERIVE from the workspace's (runtime, model) via providers.DeriveProvider:
   - model → the closed `platform` provider  → platform_managed (proxy).
   - model → any specific vendor provider     → byok.
   The PROVIDER SELECTION is the signal — NOT key presence. A byok workspace with
   no usable credential fails closed loudly at provision (correct: you chose to
   bring your own key).
3. Underivable (no model / unknown runtime / unregistered) → the DEPLOYMENT
   default-closed: platform_managed when a proxy is wired (PlatformManagedProxyConfigured,
   a deploy fact — NOT an org setting), byok on self-host. This is what makes a
   workspace "default to platform" with zero org-level billing mode.

Also: ensureConciergeModel is now SEED-ONLY (respects a customer-chosen concierge
model instead of re-asserting the declared default every provision), and the
vendor-key-write guard keys off whether the MODEL is platform-servable (so a
vendor-model workspace can write its own key).

Behavior change (cross-tenant, intended): billing follows each workspace's model
choice uniformly. A workspace on a vendor model resolves byok (runs on its own
keys); a workspace on a platform model uses the proxy. Documented for review.

Tests: full per-workspace precedence matrix (override / platform-model / vendor-
model→byok / underivable→deploy-default / self-host), the legacy-shim read order,
seed-only concierge model, the model-gated key-write guard, and the strip tests
re-pointed at platform models. Full handlers + providers suites green; vet clean.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
core-devops force-pushed fix/2594-followup-per-workspace-byok-ssot from fb80742c27 to dacdb06821 2026-06-12 22:41:20 +00:00 Compare
core-devops requested review from agent-researcher 2026-06-12 22:41:46 +00:00
agent-researcher approved these changes 2026-06-12 22:46:35 +00:00
agent-researcher left a comment
Member

Reviewed head dacdb06821 for the prior co-mingle blocker. The vendor-key-write guard now derives runtime/model/auth first and blocks platform-servable models before consulting any override, so a stale byok override cannot allow platform-model vendor-key co-storage. For vendor models, an explicit platform_managed override still blocks. The new regression tests cover platform model + byok override and vendor model + platform_managed override; the sonnet model switch preserves the platform-model strip semantics.

Reviewed head dacdb06821ef4a1efc8b2180b7e464a4cebf2d16 for the prior co-mingle blocker. The vendor-key-write guard now derives runtime/model/auth first and blocks platform-servable models before consulting any override, so a stale byok override cannot allow platform-model vendor-key co-storage. For vendor models, an explicit platform_managed override still blocks. The new regression tests cover platform model + byok override and vendor model + platform_managed override; the sonnet model switch preserves the platform-model strip semantics.
devops-engineer merged commit 908ed7b18a into main 2026-06-12 22:46:57 +00:00
Sign in to join this conversation.
No Reviewers
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#2672