fix(llm-auth): honor configured provider -- drop inherited CLAUDE_CODE_OAUTH_TOKEN on non-Anthropic workspaces (drain fix) #81

Merged
hongming merged 1 commits from fix/runtime-honors-provider-drop-inherited-oauth into main 2026-05-29 01:53:53 +00:00
Owner

Root cause (2026-05-28 Anthropic drain)

The claude-code runtime auto-prefers CLAUDE_CODE_OAUTH_TOKEN (llm-auth: detected oauth) and routes to Anthropic even when the workspace config explicitly sets provider=minimax. The platform injects the tenant's shared global secrets into every workspace, so a non-Anthropic agent inherits the tenant's Claude OAuth token and silently bills Anthropic. Confirmed on DevB (Dev Engineer B / MiniMax, i-07400c3230cf45337): provider=minimax, model=MiniMax-M2.7, MINIMAX_API_KEY=unset, CLAUDE_CODE_OAUTH_TOKEN=SET -> silent Anthropic fallback = the drain.

core#2000's strip removed ANTHROPIC_* but left CLAUDE_CODE_OAUTH_TOKEN, so it did not close the drain.

Fix -- runtime honors provider (CTO decision 2026-05-28)

Provider is SSOT (internal#718). normalise_llm_env(env, provider=...) now drops an inherited CLAUDE_CODE_OAUTH_TOKEN when the resolved provider is not an Anthropic-OAuth provider (anthropic/anthropic-oauth/claude/claude-code), before the OAuth short-circuit can hijack auth. Downstream auth then follows the configured provider; if that provider's own key is missing, preflight fails loudly -- no silent fallback, no drain.

  • llm_auth.py: new provider param + _ANTHROPIC_OAUTH_PROVIDERS guard (case-insensitive). Empty/None provider preserves legacy behaviour (backward compatible with all existing call sites).
  • main.py: defers the normalise_llm_env call to just after load_config so config.provider (SSOT) is available -- still runs before preflight.
  • tests/test_llm_auth.py: +7 tests (minimax drops oauth; anthropic/claude-code keep it; minimax keeps proxy token after dropping oauth; empty/None legacy; case-insensitive; openai drops).

Test

pytest tests/test_llm_auth.py -> 24 passed. (The 7 collection errors in the full suite on the op-host are pre-existing missing-dep imports, unrelated to this change.)

Not in scope (CTO-held follow-ups, filed separately)

  • DevB still needs the tenant's MINIMAX_API_KEY injected (no minimax key exists in agents-team globals).
  • Codex (Researcher/CR2) blocked by a permanently-401 CODEX_AUTH_JSON (needs CTO ChatGPT re-login) + the codex adapter openai-alias gap.

Generated with Claude Code

## Root cause (2026-05-28 Anthropic drain) The claude-code runtime auto-prefers `CLAUDE_CODE_OAUTH_TOKEN` (`llm-auth: detected oauth`) and routes to Anthropic **even when the workspace config explicitly sets `provider=minimax`**. The platform injects the tenant's shared global secrets into every workspace, so a non-Anthropic agent inherits the tenant's Claude OAuth token and silently bills Anthropic. Confirmed on DevB (Dev Engineer B / MiniMax, `i-07400c3230cf45337`): `provider=minimax, model=MiniMax-M2.7`, `MINIMAX_API_KEY=unset`, `CLAUDE_CODE_OAUTH_TOKEN=SET` -> silent Anthropic fallback = the drain. `core#2000`'s strip removed `ANTHROPIC_*` but left `CLAUDE_CODE_OAUTH_TOKEN`, so it did not close the drain. ## Fix -- runtime honors provider (CTO decision 2026-05-28) Provider is SSOT (internal#718). `normalise_llm_env(env, provider=...)` now drops an inherited `CLAUDE_CODE_OAUTH_TOKEN` when the resolved provider is **not** an Anthropic-OAuth provider (`anthropic`/`anthropic-oauth`/`claude`/`claude-code`), **before** the OAuth short-circuit can hijack auth. Downstream auth then follows the configured provider; if that provider's own key is missing, preflight fails loudly -- **no silent fallback, no drain**. - `llm_auth.py`: new `provider` param + `_ANTHROPIC_OAUTH_PROVIDERS` guard (case-insensitive). Empty/`None` provider preserves legacy behaviour (backward compatible with all existing call sites). - `main.py`: defers the `normalise_llm_env` call to just after `load_config` so `config.provider` (SSOT) is available -- still runs before preflight. - `tests/test_llm_auth.py`: +7 tests (minimax drops oauth; anthropic/claude-code keep it; minimax keeps proxy token after dropping oauth; empty/None legacy; case-insensitive; openai drops). ## Test `pytest tests/test_llm_auth.py` -> **24 passed**. (The 7 collection errors in the full suite on the op-host are pre-existing missing-dep imports, unrelated to this change.) ## Not in scope (CTO-held follow-ups, filed separately) - DevB still needs the tenant's **MINIMAX_API_KEY** injected (no minimax key exists in agents-team globals). - Codex (Researcher/CR2) blocked by a permanently-401 **CODEX_AUTH_JSON** (needs CTO ChatGPT re-login) + the codex adapter `openai`-alias gap. Generated with Claude Code
hongming added 1 commit 2026-05-29 01:12:43 +00:00
fix(llm-auth): honor configured provider — drop inherited CLAUDE_CODE_OAUTH_TOKEN on non-Anthropic workspaces
ci / lint (pull_request) Successful in 53s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
ci / smoke-install (pull_request) Successful in 1m8s
ci / unit-tests (pull_request) Successful in 1m20s
ci / build (pull_request) Successful in 1m23s
ffb5540a20
Provider is SSOT (internal#718). The platform injects the tenant's shared
global secrets into every workspace, so a non-Anthropic workspace
(provider=minimax/openai/moonshot) inherits a stray CLAUDE_CODE_OAUTH_TOKEN
belonging to the tenant's Claude agents. claude-code auto-prefers that OAuth
token ("llm-auth: detected oauth") and silently bills Anthropic instead of the
configured provider — the 2026-05-28 drain (DevB MiniMax).

normalise_llm_env now takes the resolved provider and drops an inherited
CLAUDE_CODE_OAUTH_TOKEN when the provider is not an Anthropic-OAuth provider,
BEFORE the oauth short-circuit can hijack auth. If the provider's own key is
absent, preflight fails loudly — no silent fallback, no drain. main.py defers
the call to after load_config so config.provider (SSOT) is available; still
runs before preflight. Empty/None provider preserves legacy behaviour.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
agent-reviewer approved these changes 2026-05-29 01:53:08 +00:00
agent-reviewer left a comment
Member

Five-Axis review — APPROVED.

Correctness: guard fires only when provider is set AND not in the Anthropic-OAuth set; case-insensitive; drops the inherited oauth BEFORE the existing short-circuit so it cannot hijack a non-Anthropic workspace. Backward-compat: empty/None provider preserves legacy behaviour (all existing call sites unaffected). Tests: +7 targeted cases, 24/24 pass. Security: removes a foreign credential from the env — reduces blast radius, no secret logged. Failure mode: missing provider key now fails loud at preflight instead of silent Anthropic drain. Matches CTO 2026-05-28 decision (runtime honors provider).

Five-Axis review — APPROVED. **Correctness**: guard fires only when provider is set AND not in the Anthropic-OAuth set; case-insensitive; drops the inherited oauth BEFORE the existing short-circuit so it cannot hijack a non-Anthropic workspace. **Backward-compat**: empty/None provider preserves legacy behaviour (all existing call sites unaffected). **Tests**: +7 targeted cases, 24/24 pass. **Security**: removes a foreign credential from the env — reduces blast radius, no secret logged. **Failure mode**: missing provider key now fails loud at preflight instead of silent Anthropic drain. Matches CTO 2026-05-28 decision (runtime honors provider).
infra-runtime-be approved these changes 2026-05-29 01:53:51 +00:00
infra-runtime-be left a comment
Member

APPROVED — runtime/credential axis. Confirmed normalise_llm_env runs at main 0.1b after load_config so config.provider is populated, and still before run_preflight; the guard mutates env in place ahead of adapter/executor construction, so the claude-code SDK never sees a foreign CLAUDE_CODE_OAUTH_TOKEN on a minimax/openai/moonshot workspace. No new imports, pure function, fail-safe on empty provider. Matches the live drain evidence on DevB (oauth present, minimax wiring absent).

APPROVED — runtime/credential axis. Confirmed normalise_llm_env runs at main 0.1b after load_config so config.provider is populated, and still before run_preflight; the guard mutates env in place ahead of adapter/executor construction, so the claude-code SDK never sees a foreign CLAUDE_CODE_OAUTH_TOKEN on a minimax/openai/moonshot workspace. No new imports, pure function, fail-safe on empty provider. Matches the live drain evidence on DevB (oauth present, minimax wiring absent).
hongming merged commit 89f5a01209 into main 2026-05-29 01:53:53 +00:00
Sign in to join this conversation.
3 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-ai-workspace-runtime#81