fix(workspace-server): provider-matched byok credential injection (internal#728 Bug 1) [BEHAVIOR-AFFECTING — CTO merge-go] #2000
Reference in New Issue
Block a user
Delete Branch "fix/internal-728-provider-matched-cred-injection"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Fixes one of two SSOT-provider↔runtime-adapter reconciliation bugs in internal#728. BEHAVIOR-AFFECTING (provisioning credential hot path) — DO NOT MERGE without CTO merge-go.
Bug 1 — claude-code prefers a stray tenant-global oauth over the configured provider (DevB MiniMax)
Live-confirmed (internal#728 comment 52493, SSM container logs 2026-05-28): agents-team Dev Engineer B, model
MiniMax-M2.7.config.yamlcorrectly resolvesprovider=minimax, but the container env carriedCLAUDE_CODE_OAUTH_TOKEN(inherited from the tenant'sglobal_secrets). The claude-code runtime logsllm-auth: detected oauthand routesMiniMax-M2.7→api.anthropic.com→Claude Code returned an error result(Anthropic can't serve a MiniMax model).Phase 1 — evidence
applyPlatformManagedLLMEnvbyok branch (workspace_provision.go) returns WITHOUT stripping any cred. Comment chain confirms #1995 deliberately removed the blanket strip (correct for the platform-key co-mingling it targeted), which left EVERY claude-code workspace inheriting the tenant-global oauth. CONFIRMED.internal/providers/providers.yaml):minimaxauth_env: [MINIMAX_API_KEY, ANTHROPIC_AUTH_TOKEN, ANTHROPIC_API_KEY]— NOTCLAUDE_CODE_OAUTH_TOKEN.anthropic-oauthauth_env: [CLAUDE_CODE_OAUTH_TOKEN]. So a provider-matched strip naturally drops the oauth for minimax and keeps it for anthropic-oauth. CONFIRMED — this is the precise discriminator.loadWorkspaceSecretsalready returns aglobalKeysprovenance side-channel (workspace_secrets writes clear the flag) available inprepareProvisionContext. CONFIRMED — provenance is threadable.Phase 2 — design (provider-matched credential injection)
On the byok/disabled branch, keep ONLY the global-origin LLM bypass creds whose env-var name is in the RESOLVED provider's
auth_env; strip the rest. This is the precise, provider-AWARE version of the strip #1995 over-removed — NOT a return to the blanket strip (which would re-break the byok-anthropic-oauth case #1994/#1995 fixed).globalKeysintoapplyPlatformManagedLLMEnv(signature +map[string]struct{}).Phase 3 — TDD + mutation
llm_billing_mode_provision_parity_test.go:MinimaxStripsStrayGlobalOAuth— DevB repro: minimax-resolving ws strips the stray global oauth + keepsMINIMAX_API_KEYrouting.WorkspaceOriginCredExemptFromStrip— user-authored cred survives even when non-matching.ByokGlobalScopeOAuthSurvives(strengthened) — global-origin oauth on opus SURVIVES via provider match (PM/reno opus-byok regression guard).Mutation evidence (all verified RED): (1) remove the strip call → blanket-keep regresses DevB; (2) empty keep set (provider-UNAWARE) → minimax routing key + reno oauth stripped; (3) iterate all bypass keys (provenance-UNAWARE) → user-authored cred stripped.
Verification:
go build ./...✓ ·go build -tags=integration ./...✓ ·go test ./internal/handlers/✓ (all green, ~15s) ·golangci-lint run ./internal/handlers/→ 0 issues.Regression confirmation
anthropic-oauthauth_env → KEPT. NOT regressed.kimi-codingauth_env includesKIMI_API_KEY→ KEPT either way. NOT regressed.Phase 4 — five-axis self-review
Correctness/Readability/Security/Performance: no findings. Architecture: FYI — strip re-derives the provider (cached manifest, map walk; lower-blast-radius than widening the resolver return). Deferred.
Refs internal#728. Not merged — awaiting CTO merge-go.
🤖 Generated with Claude Code
CI status note for CTO merge-go review:
CI / all-required(the branch-protection merge-gate aggregator) = success. All code checks green: CI / Platform (Go), CI / Python Lint & Test, Handlers Postgres Integration, the forbidden-tenant-env-key + token-write lints, lint-required.The single combined-state
failureisE2E Staging SaaS (full lifecycle), which is NOT inall-requiredand does NOT gate merge. It is a pre-existing STAGING-ENVIRONMENT failure, not introduced by this PR:❌ byok-routing guard: GET /admin/workspaces/<id>/llm-billing-mode failed (rc=22). Body: <!DOCTYPE html>...— the admin API GET is returning the canvas Next.js HTML error page instead of JSON (wrong-service routing / staging-api 5xx).byok-routing guardfailure reproduces across THREE independent runs (smoke, API smoke, full-lifecycle) against THREE different fresh workspaces → a stack-wide staging condition (the job's own log lists the causes: CP_STAGING_ADMIN_API_TOKEN missing/rotated, staging-api 5xx, LLM key dead, AMI/CF/WorkOS drift).GET /admin/workspaces/:id/llm-billing-modehandler or its routing. The change is inapplyPlatformManagedLLMEnv(provision-time credential injection); the failing GET is a read endpoint returning frontend HTML, which no Go handler change can produce.Stage A0/A (provisioner-parity, handlers Postgres integration) green covers the provision path this PR changes. Stage B/C against staging is blocked by the same staging-stack breakage above (orthogonal infra issue), so the post-merge re-provision verification of DevB is owed once staging is healthy.
Not merged — awaiting CTO merge-go (BEHAVIOR-AFFECTING).
Five-Axis review PASS (independent, built+ran). Correctness: stripNonMatchingGlobalOriginLLMCreds keeps only global-origin bypass creds in the resolved provider auth_env, provenance-scoped via globalKeys (workspace_secrets override clears the flag in loadWorkspaceSecrets), fail-OPEN on underivable provider/unavailable registry. Threads globalKeys via prepareProvisionContext using effectiveModel (correct for re-provision). Load-bearing case verified: minimax auth_env=[MINIMAX_API_KEY,ANTHROPIC_AUTH_TOKEN,ANTHROPIC_API_KEY] -> stray global CLAUDE_CODE_OAUTH_TOKEN STRIPPED, MINIMAX_API_KEY KEPT; anthropic-oauth auth_env=[CLAUDE_CODE_OAUTH_TOKEN] -> oauth KEPT (PM/reno guard). Non-regression: go test -tags=integration ./internal/handlers/ GREEN (15.5s). Mutations: strip-NONE -> MinimaxStripsStrayGlobalOAuth RED (oauth survives), others GREEN; strip-ALL (provider-unaware) -> ByokGlobalScopeOAuthSurvives RED AND minimax MINIMAX_API_KEY RED (opposite directions) -> tests load-bearing. Merge-gate: required set = {CI/all-required, E2E API Smoke, Handlers Postgres Integration} all SUCCESS. E2E Staging SaaS is NOT required + confirmed orthogonal: it fails on the pre-existing GET /admin/.../llm-billing-mode-returns-HTML staging-ingress issue, reproduces on the unrelated every-30 cron smoke, and this diff touches zero HTTP routes. APPROVE. Companion: codex#63.
Five-Axis review PASS (independent, built+ran). Correctness: stripNonMatchingGlobalOriginLLMCreds keeps only global-origin bypass creds in the resolved provider auth_env, provenance-scoped via globalKeys (workspace_secrets override clears the flag in loadWorkspaceSecrets), fail-OPEN on underivable provider/unavailable registry. Threads globalKeys via prepareProvisionContext using effectiveModel (correct for re-provision). Load-bearing case verified: minimax auth_env=[MINIMAX_API_KEY,ANTHROPIC_AUTH_TOKEN,ANTHROPIC_API_KEY] -> stray global CLAUDE_CODE_OAUTH_TOKEN STRIPPED, MINIMAX_API_KEY KEPT; anthropic-oauth auth_env=[CLAUDE_CODE_OAUTH_TOKEN] -> oauth KEPT (PM/reno guard). Non-regression: go test -tags=integration ./internal/handlers/ GREEN (15.5s). Mutations: strip-NONE -> MinimaxStripsStrayGlobalOAuth RED (oauth survives), others GREEN; strip-ALL (provider-unaware) -> ByokGlobalScopeOAuthSurvives RED AND minimax MINIMAX_API_KEY RED (opposite directions) -> tests load-bearing. Merge-gate: required set = {CI/all-required, E2E API Smoke, Handlers Postgres Integration} all SUCCESS. E2E Staging SaaS is NOT required + confirmed orthogonal: it fails on the pre-existing GET /admin/.../llm-billing-mode-returns-HTML staging-ingress issue, reproduces on the unrelated every-30 cron smoke, and this diff touches zero HTTP routes. APPROVE. Companion: codex#63.