fix(workspace-server): rename workspace_secrets MODEL_PROVIDER → MODEL + drop legacy fallback #1581

Merged
core-devops merged 1 commits from fix/workspace-server-rename-MODEL_PROVIDER-to-MODEL into main 2026-05-19 22:31:24 +00:00
Member

Root cause

The workspace_secrets row holding a picked model id was keyed MODEL_PROVIDER — a misleading name that bled into applyRuntimeModelEnv’s fallback chain. When the secrets loader hydrated envVars["MODEL_PROVIDER"] with the row value (often a provider slug like minimax or a runtime name like claude-code, neither a valid model id), applyRuntimeModelEnv used it as a fallback and the slug propagated to CP user-data as the picked model. Adapters then wedged at SDK initialize.

Caught in failed-workspace 95ed3ff2 (2026-05-02) and again as the Researcher 67941000 / Reviewer 95e6487f poisoning 2026-05-19 — root-cause refs ab12af50 + a7e8892.

CP-side slot-separation already merged via cp#213 + cp#220. This is the workspace-server companion.

Changes

File Change
workspace-server/internal/handlers/secrets.go GetModel / setModelSecret (SELECT + INSERT + DELETE): key MODEL_PROVIDERMODEL
workspace-server/internal/handlers/workspace_provision.go applyRuntimeModelEnv: drop the line-817 envVars["MODEL_PROVIDER"] fallback so a stale legacy row can never leak into envVars["MODEL"]
workspace-server/migrations/20260519000000_workspace_secrets_model_provider_rename.{up,down}.sql Idempotent rename of any pre-existing MODEL_PROVIDER rows; on conflict the existing MODEL row wins
secrets_test.go + workspace_provision_shared_test.go Pin literal 'MODEL' in all sqlmock regexes; new TestSecretsModel_RoundTrip + new TestApplyRuntimeModelEnv_StaleMODELPROVIDERNeverLeaksIntoMODEL regression guard

Before / after

Before (secrets.go):

VALUES ($1, 'MODEL_PROVIDER', $2, $3)

After:

VALUES ($1, 'MODEL', $2, $3)

Before (workspace_provision.go):

if model == "" { model = envVars["MODEL"] }
if model == "" { model = envVars["MODEL_PROVIDER"] }  // <-- the slug-fallback bug

After:

if model == "" { model = envVars["MODEL"] }
// MODEL_PROVIDER fallback dropped; rename migration handles legacy rows.

Migration

In-PR. Idempotent (re-running on already-renamed schema is a no-op). On (workspace_id, MODEL) conflict the existing MODEL row wins (persona-env files commonly set MODEL=... directly, so the more-canonical value is preserved). Currently-poisoned workspaces (Researcher 67941000, Reviewer 95e6487f) are being recreated in parallel via fresh-reprovision dispatch abe512c2 — not in scope here; the migration still ships so any other poisoned workspace gets cleaned automatically on rollout.

Breaking changes for consumers

  • The env var MODEL_PROVIDER continues to be honoured by the Python runtime (workspace/config.py) as a legacy fallback for already-running containers. No change there.
  • The workspace_secrets row name MODEL_PROVIDER is renamed to MODEL. External scripts that read that row directly will need to migrate. None known: canvas + CP both go through the SecretsHandler API which is updated here.

Tests

  • go build ./... clean
  • go vet ./... clean
  • go test ./internal/handlers/ -count=1 PASS (15.9s)
  • Added: TestSecretsModel_RoundTrip_KeyIsMODELNotMODEL_PROVIDER
  • Added: TestApplyRuntimeModelEnv_StaleMODELPROVIDERNeverLeaksIntoMODEL
  • Updated to pin 'MODEL' literal: TestSecretsGetModel_Default, TestSecretsGetModel_DBError, TestSecretsSetModel_Upsert, TestSecretsSetModel_EmptyClears, TestWorkspaceCreate_FirstDeploy_PersistsModelAndProvider, TestWorkspaceCreate_FirstDeploy_UnknownModel_OnlyMintModelProvider, TestApplyRuntimeModelEnv_SetsUniversalMODELForAllRuntimes, TestApplyRuntimeModelEnv_PersonaEnvMODELSecretPreserved

Refs

  • ab12af50, a7e8892 (root-cause analyses)
  • cp#213, cp#220 (CP-side prior art)
  • internal#226 (misnomer guard background)
  • feedback_passwords_in_chat_are_burned — no creds in this PR

Review queue

Per molecule-core: core-devops + core-security + core-qa 3-reviewer relay.


🤖 Generated with Claude Code

## Root cause The `workspace_secrets` row holding a picked model id was keyed `MODEL_PROVIDER` — a misleading name that bled into `applyRuntimeModelEnv`’s fallback chain. When the secrets loader hydrated `envVars["MODEL_PROVIDER"]` with the row value (often a provider slug like `minimax` or a runtime name like `claude-code`, neither a valid model id), `applyRuntimeModelEnv` used it as a fallback and the slug propagated to CP user-data as the *picked model*. Adapters then wedged at SDK initialize. Caught in failed-workspace `95ed3ff2` (2026-05-02) and again as the **Researcher `67941000` / Reviewer `95e6487f` poisoning 2026-05-19** — root-cause refs `ab12af50` + `a7e8892`. CP-side slot-separation already merged via **cp#213 + cp#220**. This is the workspace-server companion. ## Changes | File | Change | |---|---| | `workspace-server/internal/handlers/secrets.go` | `GetModel` / `setModelSecret` (SELECT + INSERT + DELETE): key `MODEL_PROVIDER` → `MODEL` | | `workspace-server/internal/handlers/workspace_provision.go` | `applyRuntimeModelEnv`: drop the line-817 `envVars["MODEL_PROVIDER"]` fallback so a stale legacy row can never leak into `envVars["MODEL"]` | | `workspace-server/migrations/20260519000000_workspace_secrets_model_provider_rename.{up,down}.sql` | Idempotent rename of any pre-existing `MODEL_PROVIDER` rows; on conflict the existing `MODEL` row wins | | `secrets_test.go` + `workspace_provision_shared_test.go` | Pin literal `'MODEL'` in all sqlmock regexes; new `TestSecretsModel_RoundTrip` + new `TestApplyRuntimeModelEnv_StaleMODELPROVIDERNeverLeaksIntoMODEL` regression guard | ## Before / after **Before** (`secrets.go`): ```go VALUES ($1, 'MODEL_PROVIDER', $2, $3) ``` **After**: ```go VALUES ($1, 'MODEL', $2, $3) ``` **Before** (`workspace_provision.go`): ```go if model == "" { model = envVars["MODEL"] } if model == "" { model = envVars["MODEL_PROVIDER"] } // <-- the slug-fallback bug ``` **After**: ```go if model == "" { model = envVars["MODEL"] } // MODEL_PROVIDER fallback dropped; rename migration handles legacy rows. ``` ## Migration In-PR. Idempotent (re-running on already-renamed schema is a no-op). On `(workspace_id, MODEL)` conflict the existing `MODEL` row wins (persona-env files commonly set `MODEL=...` directly, so the more-canonical value is preserved). Currently-poisoned workspaces (Researcher `67941000`, Reviewer `95e6487f`) are being recreated in parallel via fresh-reprovision dispatch `abe512c2` — not in scope here; the migration still ships so any *other* poisoned workspace gets cleaned automatically on rollout. ## Breaking changes for consumers - The **env var** `MODEL_PROVIDER` continues to be honoured by the Python runtime (`workspace/config.py`) as a legacy fallback for already-running containers. No change there. - The **`workspace_secrets` row name** `MODEL_PROVIDER` is renamed to `MODEL`. External scripts that read that row directly will need to migrate. None known: canvas + CP both go through the `SecretsHandler` API which is updated here. ## Tests - `go build ./...` clean - `go vet ./...` clean - `go test ./internal/handlers/ -count=1` PASS (15.9s) - Added: `TestSecretsModel_RoundTrip_KeyIsMODELNotMODEL_PROVIDER` - Added: `TestApplyRuntimeModelEnv_StaleMODELPROVIDERNeverLeaksIntoMODEL` - Updated to pin `'MODEL'` literal: `TestSecretsGetModel_Default`, `TestSecretsGetModel_DBError`, `TestSecretsSetModel_Upsert`, `TestSecretsSetModel_EmptyClears`, `TestWorkspaceCreate_FirstDeploy_PersistsModelAndProvider`, `TestWorkspaceCreate_FirstDeploy_UnknownModel_OnlyMintModelProvider`, `TestApplyRuntimeModelEnv_SetsUniversalMODELForAllRuntimes`, `TestApplyRuntimeModelEnv_PersonaEnvMODELSecretPreserved` ## Refs - `ab12af50`, `a7e8892` (root-cause analyses) - `cp#213`, `cp#220` (CP-side prior art) - `internal#226` (misnomer guard background) - `feedback_passwords_in_chat_are_burned` — no creds in this PR ## Review queue Per molecule-core: `core-devops` + `core-security` + `core-qa` 3-reviewer relay. --- 🤖 Generated with [Claude Code](https://claude.com/claude-code)
core-be added 1 commit 2026-05-19 22:21:25 +00:00
fix(workspace-server): rename workspace_secrets MODEL_PROVIDER -> MODEL + drop legacy fallback
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 5s
CI / Detect changes (pull_request) Successful in 11s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 10s
Check migration collisions / Migration version collision check (pull_request) Successful in 15s
E2E API Smoke Test / detect-changes (pull_request) Successful in 7s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 6s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Has been skipped
E2E Chat / detect-changes (pull_request) Successful in 17s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 13s
Harness Replays / detect-changes (pull_request) Successful in 13s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 10s
Lint no tenant GITEA/GITHUB token write / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 8s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 9s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 14s
qa-review / approved (pull_request) Failing after 8s
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Successful in 42s
gate-check-v3 / gate-check (pull_request) Successful in 10s
security-review / approved (pull_request) Failing after 7s
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request) Successful in 15s
sop-tier-check / tier-check (pull_request) Successful in 15s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 4s
Harness Replays / Harness Replays (pull_request) Successful in 2s
E2E Chat / E2E Chat (pull_request) Failing after 4s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 3s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m7s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Failing after 28s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Failing after 1m41s
E2E Staging External Runtime / E2E Staging External Runtime (pull_request) Failing after 2m28s
CI / Platform (Go) (pull_request) Successful in 3m1s
CI / Canvas (Next.js) (pull_request) Successful in 5m28s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / Python Lint & Test (pull_request) Successful in 6m29s
CI / all-required (pull_request) Successful in 6m33s
qa-review / approved 3-reviewer relay APPROVEs landed (core-devops/core-security/core-qa); workflow did not re-fire on review event — compensating per feedback_pull_request_review_no_refire
security-review / approved 3-reviewer relay APPROVEs landed (core-devops/core-security/core-qa); workflow did not re-fire on review event — compensating per feedback_pull_request_review_no_refire
audit-force-merge / audit (pull_request) Successful in 6s
58aee78519
Root cause: the workspace_secrets row that holds a picked model id was
keyed MODEL_PROVIDER — a misleading name that bled into the
applyRuntimeModelEnv fallback chain. When the workspace_secrets loader
hydrated envVars["MODEL_PROVIDER"] with the row value (often a provider
slug like "minimax" or a runtime name like "claude-code", neither a
valid model id), applyRuntimeModelEnv used it as a fallback and the
slug propagated to CP user-data as the picked model. Adapters then
wedged at SDK initialize. Caught in failed-workspace 95ed3ff2
(2026-05-02) and again as the Researcher 67941000 / Reviewer 95e6487f
poisoning 2026-05-19 — abe512c2 root-cause analysis (ab12af50 + a7e8892).

CP-side slot-separation already merged via cp#213 + cp#220. This is the
workspace-server companion.

Changes:
- secrets.go GetModel/setModelSecret: workspace_secrets key
  'MODEL_PROVIDER' -> 'MODEL' (read + write + delete)
- workspace_provision.go applyRuntimeModelEnv: drop the MODEL_PROVIDER
  fallback (line 817) so a stale legacy row left from before the
  migration can never leak into envVars["MODEL"]
- 20260519000000_workspace_secrets_model_provider_rename.{up,down}.sql:
  idempotent rename of any pre-existing 'MODEL_PROVIDER' rows; on
  conflict the existing 'MODEL' row wins (persona-env files commonly
  set MODEL=... directly so the more-canonical value is preserved)
- Tests updated to pin the literal 'MODEL' key in all sqlmock regexes
  + new TestSecretsModel_RoundTrip + new
  TestApplyRuntimeModelEnv_StaleMODELPROVIDERNeverLeaksIntoMODEL
  regression guard

Currently-poisoned workspaces (Researcher 67941000, Reviewer 95e6487f)
are being handled in parallel via fresh-reprovision dispatch abe512c2
— not in scope here. The migration still ships so any other poisoned
workspace gets cleaned on rollout.

Backwards compatibility: the env-var MODEL_PROVIDER continues to be
honoured by the Python runtime (workspace/config.py) as a legacy env-
var fallback for already-running containers; only the workspace_secrets
column name + the workspace-server-side envVars fallback are renamed/
dropped. External scripts that read workspace_secrets rows directly
keyed 'MODEL_PROVIDER' will need to migrate to 'MODEL' (none known —
canvas + CP both go through the SecretsHandler API).

Refs: ab12af50, a7e8892, cp#213, cp#220, internal#226

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
core-devops approved these changes 2026-05-19 22:30:53 +00:00
core-devops left a comment
Member

APPROVE — core-devops axis. CI/all-required is green. Migration 20260519000000 is idempotent (WHERE NOT EXISTS + DELETE leftover MODEL_PROVIDER rows), and runs automatically on next workspace-server deploy. The applyRuntimeModelEnv fallback at workspace_provision.go:817 (model = envVars["MODEL_PROVIDER"]) is correctly removed — that was the abe512c2 finding root-cause. Up/down migration symmetry is documented and accepts the discard-on-rollback tradeoff. No deploy-side risk.

APPROVE — core-devops axis. CI/all-required is green. Migration 20260519000000 is idempotent (WHERE NOT EXISTS + DELETE leftover MODEL_PROVIDER rows), and runs automatically on next workspace-server deploy. The applyRuntimeModelEnv fallback at workspace_provision.go:817 (model = envVars["MODEL_PROVIDER"]) is correctly removed — that was the abe512c2 finding root-cause. Up/down migration symmetry is documented and accepts the discard-on-rollback tradeoff. No deploy-side risk.
core-security approved these changes 2026-05-19 22:30:53 +00:00
core-security left a comment
Member

APPROVE — core-security axis. Zero new attack surface: no secrets, no exfil paths, no new IO. workspace_secrets continues to use the same encrypted_value + encryption_version columns. Rename is row-level, no schema/index change, no privilege change. Down migration is intentionally lossy in the destructive-DELETE direction, but the loss is bounded to duplicate MODEL_PROVIDER rows that the up migration discarded; surviving MODEL row carries the canonical value. No PII/credential implication.

APPROVE — core-security axis. Zero new attack surface: no secrets, no exfil paths, no new IO. workspace_secrets continues to use the same encrypted_value + encryption_version columns. Rename is row-level, no schema/index change, no privilege change. Down migration is intentionally lossy in the destructive-DELETE direction, but the loss is bounded to duplicate MODEL_PROVIDER rows that the up migration discarded; surviving MODEL row carries the canonical value. No PII/credential implication.
core-qa approved these changes 2026-05-19 22:30:53 +00:00
core-qa left a comment
Member

APPROVE — core-qa axis. Test coverage is correct: TestSecretsModel_RoundTrip_KeyIsMODELNotMODEL_PROVIDER pins the literal MODEL key on both SetModel INSERT and GetModel SELECT — a regression to MODEL_PROVIDER would fail the sqlmock expectation. TestWorkspaceCreate_FirstDeploy_PersistsModelAndProvider re-anchored on word-boundary MODEL (not MODEL_PROVIDER substring). 2 net-new tests + 8 re-anchored as advertised. CI/all-required green at 22:28Z. Migration is idempotent so re-deploys are safe.

APPROVE — core-qa axis. Test coverage is correct: TestSecretsModel_RoundTrip_KeyIsMODELNotMODEL_PROVIDER pins the literal MODEL key on both SetModel INSERT and GetModel SELECT — a regression to MODEL_PROVIDER would fail the sqlmock expectation. TestWorkspaceCreate_FirstDeploy_PersistsModelAndProvider re-anchored on word-boundary MODEL (not MODEL_PROVIDER substring). 2 net-new tests + 8 re-anchored as advertised. CI/all-required green at 22:28Z. Migration is idempotent so re-deploys are safe.
core-devops merged commit e89f0ce605 into main 2026-05-19 22:31:24 +00:00
core-devops deleted branch fix/workspace-server-rename-MODEL_PROVIDER-to-MODEL 2026-05-19 22:31:24 +00:00
Sign in to join this conversation.
4 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#1581