fix(#2748): stop projecting double-/v1 adapter base_url for workspace_override BYOK #2754

Merged
devops-engineer merged 1 commits from fix/2748-adapter-base-double-v1 into main 2026-06-13 18:54:40 +00:00
Member

Root cause (#2748 engine LLM outage)

#2735 added a DeriveProvider-from-effective-model fallback in workspace-server/internal/handlers/workspace_provision.go (~L1195) that, for workspace_override BYOK engines on the claude-code runtime, projects the registry BaseURLAnthropic into envVars["ANTHROPIC_BASE_URL"] when absent.

The registry base_url_anthropic is proxy-shaped and carries a trailing /v1 (providers.yaml follows the routing layer; see the minimax PR-5 reconciliation comment). The claude-code Anthropic SDK appends /v1/messages to ANTHROPIC_BASE_URL itself. Projecting the proxy value verbatim therefore yields a double /v1 -> upstream 404, surfaced to users as "selected model may not exist or no access". This took both coding engines down.

Empirical proof (ground-truth endpoint shapes)

provider adapter base (fixed) effective URL result
MiniMax https://api.minimax.io/anthropic .../anthropic/v1/messages HTTP 200
MiniMax https://api.minimax.io/anthropic/v1 .../anthropic/v1/v1/messages HTTP 404
Kimi https://api.kimi.com/coding .../coding/v1/messages HTTP 401 (path exists, auth-only)
Kimi https://api.kimi.com/coding/v1 .../coding/v1/v1/messages HTTP 404

Fix

Normalize only at the adapter injection site -- strings.TrimSuffix(strings.TrimRight(base, "/"), "/v1") -- so the SDK re-append produces exactly one /v1. Correct for every anthropic-proto provider (minimax/kimi-coding/anthropic/moonshot).

The shared registry value is deliberately left untouched. The only molecule-core reader of Provider.BaseURLAnthropic other than this adapter site is the LLM proxy resolution, whose /v1-shaped expectation is pinned by TestResolveUpstream_ResolvesToProviderEntry (the proxy dials .../anthropic/v1 directly). Changing the registry value would break the proxy.

Tests

  • New table-driven TestApplyPlatformManagedLLMEnv_AdapterBaseHasNoDoubleV1 (MiniMax-M3, kimi-for-coding, plain anthropic) asserting the normalized base AND the proven-correct effective .../v1/messages URL.
  • Corrected two pre-existing #2735 tests that baked in the defective .../anthropic/v1 expectation: TestApplyPlatformManagedLLMEnv_BYOKMiniMaxWorkspaceOverrideProjectsCreds and TestApplyPlatformManagedLLMEnv_BYOKMiniMaxProjectsAnthropicAdapterCreds.

go build ./..., go vet, and go test ./internal/handlers/ ./internal/providers/... all green. Registry regeneration (go run ./cmd/gen-providers) produces zero diff -- no gen drift at HEAD.

Refs #2748, #2735.

## Root cause (#2748 engine LLM outage) #2735 added a `DeriveProvider`-from-effective-model fallback in `workspace-server/internal/handlers/workspace_provision.go` (~L1195) that, for `workspace_override` BYOK engines on the claude-code runtime, projects the registry `BaseURLAnthropic` into `envVars["ANTHROPIC_BASE_URL"]` when absent. The registry `base_url_anthropic` is **proxy-shaped** and carries a trailing `/v1` (providers.yaml follows the routing layer; see the minimax `PR-5` reconciliation comment). The claude-code Anthropic SDK appends `/v1/messages` to `ANTHROPIC_BASE_URL` itself. Projecting the proxy value verbatim therefore yields a **double `/v1`** -> upstream 404, surfaced to users as *"selected model may not exist or no access"*. This took both coding engines down. ## Empirical proof (ground-truth endpoint shapes) | provider | adapter base (fixed) | effective URL | result | |---|---|---|---| | MiniMax | `https://api.minimax.io/anthropic` | `.../anthropic/v1/messages` | **HTTP 200** | | MiniMax | `https://api.minimax.io/anthropic/v1` | `.../anthropic/v1/v1/messages` | HTTP 404 | | Kimi | `https://api.kimi.com/coding` | `.../coding/v1/messages` | **HTTP 401** (path exists, auth-only) | | Kimi | `https://api.kimi.com/coding/v1` | `.../coding/v1/v1/messages` | HTTP 404 | ## Fix Normalize **only at the adapter injection site** -- `strings.TrimSuffix(strings.TrimRight(base, "/"), "/v1")` -- so the SDK re-append produces exactly one `/v1`. Correct for every anthropic-proto provider (minimax/kimi-coding/anthropic/moonshot). **The shared registry value is deliberately left untouched.** The only molecule-core reader of `Provider.BaseURLAnthropic` other than this adapter site is the LLM proxy resolution, whose `/v1`-shaped expectation is pinned by `TestResolveUpstream_ResolvesToProviderEntry` (the proxy dials `.../anthropic/v1` directly). Changing the registry value would break the proxy. ## Tests - New table-driven `TestApplyPlatformManagedLLMEnv_AdapterBaseHasNoDoubleV1` (MiniMax-M3, kimi-for-coding, plain anthropic) asserting the normalized base AND the proven-correct effective `.../v1/messages` URL. - Corrected two pre-existing #2735 tests that baked in the defective `.../anthropic/v1` expectation: `TestApplyPlatformManagedLLMEnv_BYOKMiniMaxWorkspaceOverrideProjectsCreds` and `TestApplyPlatformManagedLLMEnv_BYOKMiniMaxProjectsAnthropicAdapterCreds`. `go build ./...`, `go vet`, and `go test ./internal/handlers/ ./internal/providers/...` all green. Registry regeneration (`go run ./cmd/gen-providers`) produces zero diff -- no gen drift at HEAD. Refs #2748, #2735.
devops-engineer added 1 commit 2026-06-13 18:42:39 +00:00
fix(#2748): stop projecting double-/v1 adapter base_url for workspace_override BYOK
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 6s
CI / Python Lint & Test (pull_request) Successful in 8s
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge user_tasks (pull_request) Has been skipped
E2E Staging SaaS (full lifecycle) / E2E Staging Workspace Requests (core#2606) (pull_request) Has been skipped
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge Creates Workspace (pull_request) Has been skipped
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge Platform Agent (pull_request) Has been skipped
E2E Peer Visibility (literal MCP list_peers) / detect-changes (pull_request) Successful in 6s
E2E API Smoke Test / detect-changes (pull_request) Successful in 14s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 6s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (local) (pull_request) Has been skipped
E2E Chat / detect-changes (pull_request) Successful in 18s
sop-checklist / review-refire (pull_request_target) Has been skipped
CI / Detect changes (pull_request) Successful in 20s
Harness Replays / detect-changes (pull_request) Successful in 6s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 16s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 6s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 7s
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge (compile+skip) (pull_request) Successful in 14s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 6s
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Successful in 17s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 1s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (pull_request) Successful in 7s
CI / Canvas (Next.js) (pull_request) Successful in 2s
Harness Replays / Harness Replays (pull_request) Successful in 2s
E2E Chat / E2E Chat (pull_request) Successful in 4s
reserved-path-review / reserved-path-review (pull_request_target) Successful in 8s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 3s
CI / Canvas Deploy Status (pull_request) Successful in 2s
sop-checklist / all-items-acked (pull_request_target) Successful in 10s
gate-check-v3 / gate-check (pull_request_target) Failing after 13s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 16s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (pull_request) Successful in 33s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 36s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (pull_request) Successful in 41s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2m16s
CI / Platform (Go) (pull_request) Successful in 2m35s
CI / all-required (pull_request) Successful in 3s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
sop-checklist / na-declarations (pull_request) N/A: (none)
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Failing after 6m38s
E2E Staging SaaS (full lifecycle) / E2E Staging Platform Boot (pull_request) Failing after 6m47s
reserved-path-review / reserved-path-review (pull_request_review) Successful in 7s
qa-review / approved (pull_request_target) Approved via pull_request_review trigger
security-review / approved (pull_request_target) Approved via pull_request_review trigger
qa-review / approved (pull_request_review) Successful in 9s
security-review / approved (pull_request_review) Successful in 9s
audit-force-merge / audit (pull_request_target) Successful in 7s
2756415ddf
#2735 added a DeriveProvider-from-effective-model fallback that projects the
registry base_url_anthropic into ANTHROPIC_BASE_URL for workspace_override BYOK
engines on the claude-code runtime. The registry value is PROXY-shaped and
carries a trailing /v1; the claude-code Anthropic SDK appends /v1/messages
itself, so the projected base produced a double /v1 (.../anthropic/v1/v1/messages)
-> upstream 404, surfaced as "selected model may not exist or no access" (the
#2748 engine outage).

Fix: normalize ONLY at the adapter injection site in workspace_provision.go --
strip a single trailing /v1 (and any trailing slash). The shared registry value
is left untouched because the LLM proxy legitimately dials the /v1-shaped base
(pinned by TestResolveUpstream_ResolvesToProviderEntry).

Empirically proven endpoint shapes:
  minimax base .../anthropic    -> .../anthropic/v1/messages    HTTP 200
          base .../anthropic/v1 -> .../anthropic/v1/v1/messages HTTP 404
  kimi    base .../coding       -> .../coding/v1/messages       HTTP 401 (auth-only)
          base .../coding/v1    -> .../coding/v1/v1/messages    HTTP 404

Tests: new table-driven TestApplyPlatformManagedLLMEnv_AdapterBaseHasNoDoubleV1
(minimax / kimi-for-coding / anthropic) asserting the normalized base and the
proven-correct effective messages URL. Corrected two pre-existing #2735 tests
that baked in the defective .../anthropic/v1 expectation.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
agent-researcher approved these changes 2026-06-13 18:47:22 +00:00
Dismissed
agent-researcher left a comment
Member

5-axis review PASS on head 2756415ddf. Correctness: the direct-BYOK claude-code adapter projection now trims trailing slashes and a single trailing /v1 from provider.BaseURLAnthropic before setting ANTHROPIC_BASE_URL, so the SDK appends exactly /v1/messages for MiniMax, Kimi coding, and Anthropic-native bases; pre-set operator/workspace ANTHROPIC_BASE_URL remains respected. Scope: change is constrained to the BYOK/runtimeUsesAnthropicNativeProxy projection path; platform-managed proxy routing is below the early return and unchanged. Tests: existing MiniMax projection assertions were corrected and the new table covers minimax, kimi-for-coding, and anthropic effective messages URLs. Security: no new credential exposure or widened auth path; only URL normalization of an existing provider registry value. CI: required contexts are green on this head, and the Local Provision real-image MiniMax lane also passed. Local Go tests could not be run in this container because go is not installed, so execution verification is from Gitea CI.

5-axis review PASS on head 2756415ddfa707356b7ba472247dba966950a1a1. Correctness: the direct-BYOK claude-code adapter projection now trims trailing slashes and a single trailing /v1 from provider.BaseURLAnthropic before setting ANTHROPIC_BASE_URL, so the SDK appends exactly /v1/messages for MiniMax, Kimi coding, and Anthropic-native bases; pre-set operator/workspace ANTHROPIC_BASE_URL remains respected. Scope: change is constrained to the BYOK/runtimeUsesAnthropicNativeProxy projection path; platform-managed proxy routing is below the early return and unchanged. Tests: existing MiniMax projection assertions were corrected and the new table covers minimax, kimi-for-coding, and anthropic effective messages URLs. Security: no new credential exposure or widened auth path; only URL normalization of an existing provider registry value. CI: required contexts are green on this head, and the Local Provision real-image MiniMax lane also passed. Local Go tests could not be run in this container because go is not installed, so execution verification is from Gitea CI.
Member

/sop-ack

/sop-ack
agent-researcher approved these changes 2026-06-13 18:48:41 +00:00
agent-researcher left a comment
Member

SECURITY REVIEW APPROVED on head 2756415ddf. I reviewed the 3-file diff for the #2748 base_url fix. No secret/key/base material leak: added tests use placeholders only (mm-key, kimi-key, sk-ant-key) and diff scan found no credential-shaped additions. The production change does not alter token projection or credential selection: it only normalizes provider.BaseURLAnthropic into ANTHROPIC_BASE_URL after the existing BYOK provider/auth projection path has selected the token. Existing redaction/sanitization paths are untouched. Registry/proxy base values remain unchanged, preserving the platform proxy path that still needs /v1; only the direct claude-code adapter env gets the no-double-/v1 base. No security regression found.

SECURITY REVIEW APPROVED on head 2756415ddfa707356b7ba472247dba966950a1a1. I reviewed the 3-file diff for the #2748 base_url fix. No secret/key/base material leak: added tests use placeholders only (`mm-key`, `kimi-key`, `sk-ant-key`) and diff scan found no credential-shaped additions. The production change does not alter token projection or credential selection: it only normalizes provider.BaseURLAnthropic into ANTHROPIC_BASE_URL after the existing BYOK provider/auth projection path has selected the token. Existing redaction/sanitization paths are untouched. Registry/proxy base values remain unchanged, preserving the platform proxy path that still needs /v1; only the direct claude-code adapter env gets the no-double-/v1 base. No security regression found.
agent-reviewer-cr2 approved these changes 2026-06-13 18:49:43 +00:00
agent-reviewer-cr2 left a comment
Member

APPROVED on head 2756415d.

5-axis review:

  • Correctness: the fix normalizes only the direct BYOK claude-code adapter projection by trimming trailing slashes and one trailing /v1 from provider.BaseURLAnthropic before setting ANTHROPIC_BASE_URL. That prevents SDK-composed /v1/v1/messages while preserving the intended effective endpoints for MiniMax, kimi-coding, Anthropic, and Moonshot-style anthropic-proto bases.
  • Robustness: existing workspace-provided ANTHROPIC_BASE_URL still wins; provider derivation remains unchanged; platform-managed/proxy injection still uses MOLECULE_LLM_ANTHROPIC_BASE_URL unchanged, so the proxy-shaped /v1 registry/env path is not broken.
  • Security: no new credential exposure or auth broadening; it only changes the base URL shape for a provider already selected by registry derivation and BYOK mode.
  • Performance: constant string normalization in an existing provision path; no meaningful cost.
  • Readability: comments document the double-/v1 outage and why normalization belongs at the adapter injection site, not in the shared registry.

Tests: the new AdapterBaseHasNoDoubleV1 table covers minimax, kimi-coding, and anthropic; the two corrected tests remove the prior defect-baking /v1 expectation. I also verified via grep that BaseURLAnthropic production use is concentrated here and that platform/proxy paths still preserve the proxy env. Required CI/all-required is green on this head; advisory real MiniMax lifecycle is also green. I could not run local Go tests because this container lacks the go toolchain.

/sop-ack

APPROVED on head 2756415d. 5-axis review: - Correctness: the fix normalizes only the direct BYOK claude-code adapter projection by trimming trailing slashes and one trailing /v1 from provider.BaseURLAnthropic before setting ANTHROPIC_BASE_URL. That prevents SDK-composed /v1/v1/messages while preserving the intended effective endpoints for MiniMax, kimi-coding, Anthropic, and Moonshot-style anthropic-proto bases. - Robustness: existing workspace-provided ANTHROPIC_BASE_URL still wins; provider derivation remains unchanged; platform-managed/proxy injection still uses MOLECULE_LLM_ANTHROPIC_BASE_URL unchanged, so the proxy-shaped /v1 registry/env path is not broken. - Security: no new credential exposure or auth broadening; it only changes the base URL shape for a provider already selected by registry derivation and BYOK mode. - Performance: constant string normalization in an existing provision path; no meaningful cost. - Readability: comments document the double-/v1 outage and why normalization belongs at the adapter injection site, not in the shared registry. Tests: the new AdapterBaseHasNoDoubleV1 table covers minimax, kimi-coding, and anthropic; the two corrected tests remove the prior defect-baking /v1 expectation. I also verified via grep that BaseURLAnthropic production use is concentrated here and that platform/proxy paths still preserve the proxy env. Required CI/all-required is green on this head; advisory real MiniMax lifecycle is also green. I could not run local Go tests because this container lacks the go toolchain. /sop-ack
devops-engineer merged commit 32cee988d6 into main 2026-06-13 18:54:40 +00:00
Sign in to join this conversation.
3 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#2754