P4 closure follow-up internal#718: retire LLM_PROVIDER + PUT/GET /provider + deriveProviderFromModelSlug (core; BEHAVIOR-AFFECTING; NOT MERGED) #1984

Merged
hongming merged 1 commits from feat/internal-718-p4-followup-llm-provider-removal into main 2026-05-28 04:46:28 +00:00
Owner

internal#718 P4 closure (follow-up PR-1 — core side) — retire the LLM_PROVIDER workspace_secret, the PUT/GET /workspaces/:id/provider endpoints, the deriveProviderFromModelSlug slug-prefix switch, and the canvas-side provider override flow. BEHAVIOR-AFFECTING. tier:medium. NOT MERGED.

Companion PR opens against molecule-controlplane on the same branch name (CP-side resolveModelAndProvider repoint to registry).


SOP checklist

Comprehensive testing performed: TDD red→green for the 410 Gone shape (TestPutProvider_410Gone, TestGetProvider_410Gone, TestProviderEndpointGone_BodyShape); sqlmock-pinned "no LLM_PROVIDER write at Create" contract (TestWorkspaceCreate_FirstDeploy_OnlyPersistsMODEL); all 43 workspace-server packages green incl. -tags=integration; canvas vitest 23 passed | 2 skipped (the two retired suites). Five retired tests (TestDeriveProviderFromModelSlug, TestSecretsGetProvider_*, TestSecretsSetProvider_*, derive_provider_drift_test.go, TestWorkspaceCreate_FirstDeploy_PersistsModelAndProvider, TestWorkspaceCreate_FirstDeploy_UnknownModel_OnlyMintModelProvider) — each retirement annotated in-file with the replacement coverage.

Local-postgres E2E run: N/A — no schema-mutating migration beyond a single DELETE FROM workspace_secrets WHERE key='LLM_PROVIDER'; idempotent; up + down both validated locally against a fresh Postgres schema (a 0-row delete on a fresh tenant). Defence-in-depth filter in loadWorkspaceSecrets covers the rolling-deploy window before migration runs.

Staging-smoke verified or pending: scheduled post-merge — the behavior change is observable on the canary tenant immediately on deploy (PUT /provider → 410 Gone; existing LLM_PROVIDER row migrates away). Probe sequence in the description below.

Root-cause not symptom: The retire-list #3 + retire-list "stored LLM_PROVIDER" items in internal#718 had no remaining reader after P0-P3. This PR's root-cause finding is the four-reader audit (core GetProvider / core loadWorkspaceSecrets / CP resolveModelAndProvider / CP ValidateProviderEnv); each is retired or migrated to the registry SSOT. The endpoint returning 410 (not 404) is deliberate — the URL shipped to prod, the canvas knows it, the goal is loud retirement not silent.

Five-Axis review walked: Correctness (4-reader audit, no fifth; behavior delta tested explicitly); Readability (consolidated dead-code blocks into single-paragraph retirement notes with the replacement coverage cross-linked); Architecture (closes the SSOT-effort's last open behavior question — provider is now derived everywhere, never stored); Security (the canvas-side override removal closes a "user thinks they overrode" surface that no longer reaches CP after P2-B); Performance (smaller post-commit secret-mint window by one transaction).

No backwards-compat shim / dead code added: payload.LLMProvider field is preserved on CreateWorkspacePayload (older canvases still send it); the value is intentionally ignored. The 410 Gone handler is the only new "compat surface" — a structured failure mode for older callers — and is not dead code (the canvas-side test still exercises 410 via the gone handler). The derive_provider_drift_test.go (Go↔shell parity gate) is DELETED with the derive function; no zombie test left referring to a vanished symbol.

Memory/saved-feedback consulted:

  • feedback_long_term_robust_automated — chose root-cause LLM_PROVIDER removal over a band-aid (e.g. "skip writes for unknown prefixes")
  • feedback_real_subprocess_test_for_boot_path — the sqlmock-based "no LLM_PROVIDER write at Create" assertion is at the right layer (the SQL shape) for what's actually being verified
  • feedback_authoritative_source_per_concept_no_indirect_db_assertions — the registry SSOT IS the authoritative source for (runtime, model) → provider derivation; no DB column or workspace_secrets row remains as a competing source-of-truth

Phase 1 — Brief falsification + consumer audit

Brief claim: 4 readers of stored LLM_PROVIDER across core + CP. Confirmed via grep across both repos:

Reader Disposition
core SecretsHandler.GetProvider (SELECT ... LLM_PROVIDER) — powers GET /workspaces/:id/provider DELETED (route → 410 Gone)
core loadWorkspaceSecrets (hydrates env from workspace_secrets) DEFENCE-IN-DEPTH FILTER added; row dropped before envVars
core WorkspaceHandler.Create (setProviderSecret writes via payload.LLMProvider OR deriveProviderFromModelSlug) DELETED; payload.LLMProvider retained but ignored
CP resolveModelAndProvider (env["LLM_PROVIDER"]provider: YAML key) REPOINTED to providers.Manifest.DeriveProvider(runtime, model, authEnv) in the CP-side commit (cp#381)
CP ValidateProviderEnv (the "garbage in LLM_PROVIDER" wedge guard at userdata_containerized.go:803) RETAINED as defence-in-depth for the rolling-deploy window; documented as retired-in-spirit

No fifth reader exists. Falsified: brief H2 (codegen byte-identity for all 5 templates) — empirical finding is that only 3 templates (claude-code/hermes/codex) have a providers: block; openclaw + langgraph use a models: registry. Brief's #6 retire-list count needs adjustment. Codegen of template providers: blocks is scope-deferred to a follow-on PR; see "Out of scope" below.

Behavior delta

Input Pre-P4 (today on main) Post-P4
POST /workspaces with slug-prefixed model (minimax/MiniMax-M2.7) 201 + writes MODEL + LLM_PROVIDER (derived from prefix) 201 + writes MODEL only
POST /workspaces with {model, llm_provider: "anthropic-api"} 201 + writes MODEL + LLM_PROVIDER (verbatim from payload) 201 + writes MODEL only; payload.LLMProvider ignored
PUT /workspaces/:id/provider 200 + writes/clears LLM_PROVIDER row + triggers auto-restart 410 Gone {code: PROVIDER_ENDPOINT_RETIRED, error, issue: internal#718}
GET /workspaces/:id/provider 200 + {provider, source} 410 Gone with same shape
Existing workspace with LLM_PROVIDER row Read into env on every provision Filtered out in loadWorkspaceSecrets; migration drops at next deploy
Canvas Save with provider dropdown change PUT /provider + (if mode changed) PUT /admin/.../llm-billing-mode No PUT; provider dropdown is display-only (preview of derived)
Registry-derived billing decision (P2-B) Reads derived provider (never stored LLM_PROVIDER) UNCHANGED
Proxy ResolveUpstream (P1) Reads Manifest.DeriveUpstreamForModel UNCHANGED
cp#362 anthropic passthrough Untouched UNCHANGED
P3 PR-A templates from registry Untouched UNCHANGED
P4 PR-2 hard-reject 422 (#1981) Untouched UNCHANGED

Tests + CI status

  • TDD red→green for the 410 Gone shape — compile-fail RED before ProviderEndpointGone ships, GREEN after.
  • TestWorkspaceCreate_FirstDeploy_OnlyPersistsMODEL — sqlmock asserts exactly one workspace_secrets INSERT (MODEL), even with a slug-prefixed model + explicit llm_provider in payload. Regression fence for a future re-introduction.
  • go build ./... + go test ./... + go test -tags=integration ./... + go vet ./... + golangci-lint run ./... — all GREEN locally (43 packages).
  • Canvas npx tsc --noEmit — 0 errors on touched files.
  • Canvas npx vitest run src/components/tabs/__tests__/ — 23 passed | 2 skipped.

Migration

workspace-server/migrations/20260528000000_drop_llm_provider_workspace_secret.{up,down}.sql:

  • .up.sql: DELETE FROM workspace_secrets WHERE key = 'LLM_PROVIDER'; — idempotent. Defence-in-depth filter in loadWorkspaceSecrets covers the rolling-deploy window.
  • .down.sql: documented no-op (rows cannot be reconstituted from a soft-delete; the writers are gone, so a genuine revert needs an application-code revert as well).

Out of scope (flagged for CTO)

  • Per-template codegen of config.yaml providers: blocks (brief retire-list #6/#7/#8/#9) is NOT in this PR. Phase 1 empirical finding: registry's per-runtime view is a STRICT SUBSET of the templates' hand-authored providers: blocks (hermes template has 34 providers; registry's hermes runtime view has 2). Byte-identical codegen requires registry data growth. Filed as a comment on internal#718 with the recommendation.
  • CP ValidateProviderEnv's env["LLM_PROVIDER"] wedge-guard read retained as defence-in-depth; a separate cleanup PR removes it once the secret is empirically gone from prod.

Companion PR

molecule-controlplane#381 — resolveModelAndProvider(runtime, env) derives provider via providers.LoadManifest().DeriveProvider; env["LLM_PROVIDER"] no longer read; MODEL_PROVIDER retained only as back-compat for legacy fixtures.

cross-link: internal#718.

**internal#718 P4 closure (follow-up PR-1 — core side)** — retire the `LLM_PROVIDER` workspace_secret, the `PUT/GET /workspaces/:id/provider` endpoints, the `deriveProviderFromModelSlug` slug-prefix switch, and the canvas-side provider override flow. **BEHAVIOR-AFFECTING. tier:medium. NOT MERGED.** Companion PR opens against molecule-controlplane on the same branch name (CP-side `resolveModelAndProvider` repoint to registry). --- ## SOP checklist **Comprehensive testing performed**: TDD red→green for the 410 Gone shape (`TestPutProvider_410Gone`, `TestGetProvider_410Gone`, `TestProviderEndpointGone_BodyShape`); sqlmock-pinned "no LLM_PROVIDER write at Create" contract (`TestWorkspaceCreate_FirstDeploy_OnlyPersistsMODEL`); all 43 workspace-server packages green incl. `-tags=integration`; canvas vitest 23 passed | 2 skipped (the two retired suites). Five retired tests (`TestDeriveProviderFromModelSlug`, `TestSecretsGetProvider_*`, `TestSecretsSetProvider_*`, `derive_provider_drift_test.go`, `TestWorkspaceCreate_FirstDeploy_PersistsModelAndProvider`, `TestWorkspaceCreate_FirstDeploy_UnknownModel_OnlyMintModelProvider`) — each retirement annotated in-file with the replacement coverage. **Local-postgres E2E run**: N/A — no schema-mutating migration beyond a single `DELETE FROM workspace_secrets WHERE key='LLM_PROVIDER'`; idempotent; up + down both validated locally against a fresh Postgres schema (a 0-row delete on a fresh tenant). Defence-in-depth filter in `loadWorkspaceSecrets` covers the rolling-deploy window before migration runs. **Staging-smoke verified or pending**: scheduled post-merge — the behavior change is observable on the canary tenant immediately on deploy (PUT /provider → 410 Gone; existing LLM_PROVIDER row migrates away). Probe sequence in the description below. **Root-cause not symptom**: The retire-list #3 + retire-list "stored LLM_PROVIDER" items in internal#718 had no remaining reader after P0-P3. This PR's root-cause finding is the four-reader audit (core GetProvider / core loadWorkspaceSecrets / CP resolveModelAndProvider / CP ValidateProviderEnv); each is retired or migrated to the registry SSOT. The endpoint returning 410 (not 404) is deliberate — the URL shipped to prod, the canvas knows it, the goal is loud retirement not silent. **Five-Axis review walked**: Correctness (4-reader audit, no fifth; behavior delta tested explicitly); Readability (consolidated dead-code blocks into single-paragraph retirement notes with the replacement coverage cross-linked); Architecture (closes the SSOT-effort's last open behavior question — provider is now derived everywhere, never stored); Security (the canvas-side override removal closes a "user thinks they overrode" surface that no longer reaches CP after P2-B); Performance (smaller post-commit secret-mint window by one transaction). **No backwards-compat shim / dead code added**: `payload.LLMProvider` field is preserved on `CreateWorkspacePayload` (older canvases still send it); the value is intentionally ignored. The 410 Gone handler is the only new "compat surface" — a structured failure mode for older callers — and is not dead code (the canvas-side test still exercises 410 via the gone handler). The `derive_provider_drift_test.go` (Go↔shell parity gate) is DELETED with the derive function; no zombie test left referring to a vanished symbol. **Memory/saved-feedback consulted**: - `feedback_long_term_robust_automated` — chose root-cause LLM_PROVIDER removal over a band-aid (e.g. "skip writes for unknown prefixes") - `feedback_real_subprocess_test_for_boot_path` — the sqlmock-based "no LLM_PROVIDER write at Create" assertion is at the right layer (the SQL shape) for what's actually being verified - `feedback_authoritative_source_per_concept_no_indirect_db_assertions` — the registry SSOT IS the authoritative source for (runtime, model) → provider derivation; no DB column or workspace_secrets row remains as a competing source-of-truth --- ## Phase 1 — Brief falsification + consumer audit Brief claim: 4 readers of stored `LLM_PROVIDER` across core + CP. Confirmed via grep across both repos: | Reader | Disposition | |---|---| | core `SecretsHandler.GetProvider` (`SELECT ... LLM_PROVIDER`) — powers `GET /workspaces/:id/provider` | **DELETED** (route → 410 Gone) | | core `loadWorkspaceSecrets` (hydrates env from `workspace_secrets`) | **DEFENCE-IN-DEPTH FILTER** added; row dropped before envVars | | core `WorkspaceHandler.Create` (`setProviderSecret` writes via `payload.LLMProvider` OR `deriveProviderFromModelSlug`) | **DELETED**; `payload.LLMProvider` retained but ignored | | CP `resolveModelAndProvider` (`env["LLM_PROVIDER"]` → `provider:` YAML key) | **REPOINTED** to `providers.Manifest.DeriveProvider(runtime, model, authEnv)` in the CP-side commit (cp#381) | | CP `ValidateProviderEnv` (the "garbage in LLM_PROVIDER" wedge guard at userdata_containerized.go:803) | RETAINED as defence-in-depth for the rolling-deploy window; documented as retired-in-spirit | No fifth reader exists. **Falsified: brief H2** (codegen byte-identity for all 5 templates) — empirical finding is that only 3 templates (claude-code/hermes/codex) have a `providers:` block; openclaw + langgraph use a `models:` registry. Brief's #6 retire-list count needs adjustment. **Codegen of template `providers:` blocks is scope-deferred to a follow-on PR; see "Out of scope" below.** ## Behavior delta | Input | Pre-P4 (today on main) | Post-P4 | |---|---|---| | `POST /workspaces` with slug-prefixed model (`minimax/MiniMax-M2.7`) | 201 + writes MODEL + LLM_PROVIDER (derived from prefix) | 201 + writes MODEL only | | `POST /workspaces` with `{model, llm_provider: "anthropic-api"}` | 201 + writes MODEL + LLM_PROVIDER (verbatim from payload) | 201 + writes MODEL only; payload.LLMProvider ignored | | `PUT /workspaces/:id/provider` | 200 + writes/clears LLM_PROVIDER row + triggers auto-restart | **410 Gone** `{code: PROVIDER_ENDPOINT_RETIRED, error, issue: internal#718}` | | `GET /workspaces/:id/provider` | 200 + `{provider, source}` | **410 Gone** with same shape | | Existing workspace with `LLM_PROVIDER` row | Read into env on every provision | Filtered out in `loadWorkspaceSecrets`; migration drops at next deploy | | Canvas Save with provider dropdown change | PUT /provider + (if mode changed) PUT /admin/.../llm-billing-mode | No PUT; provider dropdown is display-only (preview of derived) | | Registry-derived billing decision (P2-B) | Reads derived provider (never stored LLM_PROVIDER) | **UNCHANGED** | | Proxy `ResolveUpstream` (P1) | Reads `Manifest.DeriveUpstreamForModel` | **UNCHANGED** | | cp#362 anthropic passthrough | Untouched | **UNCHANGED** | | P3 PR-A templates from registry | Untouched | **UNCHANGED** | | P4 PR-2 hard-reject 422 (#1981) | Untouched | **UNCHANGED** | ## Tests + CI status - TDD red→green for the 410 Gone shape — compile-fail RED before `ProviderEndpointGone` ships, GREEN after. - `TestWorkspaceCreate_FirstDeploy_OnlyPersistsMODEL` — sqlmock asserts exactly one `workspace_secrets` INSERT (MODEL), even with a slug-prefixed model + explicit `llm_provider` in payload. Regression fence for a future re-introduction. - `go build ./...` + `go test ./...` + `go test -tags=integration ./...` + `go vet ./...` + `golangci-lint run ./...` — all GREEN locally (43 packages). - Canvas `npx tsc --noEmit` — 0 errors on touched files. - Canvas `npx vitest run src/components/tabs/__tests__/` — 23 passed | 2 skipped. ## Migration `workspace-server/migrations/20260528000000_drop_llm_provider_workspace_secret.{up,down}.sql`: - `.up.sql`: `DELETE FROM workspace_secrets WHERE key = 'LLM_PROVIDER';` — idempotent. Defence-in-depth filter in `loadWorkspaceSecrets` covers the rolling-deploy window. - `.down.sql`: documented no-op (rows cannot be reconstituted from a soft-delete; the writers are gone, so a genuine revert needs an application-code revert as well). ## Out of scope (flagged for CTO) - **Per-template codegen of `config.yaml providers:` blocks** (brief retire-list #6/#7/#8/#9) is NOT in this PR. Phase 1 empirical finding: registry's per-runtime view is a STRICT SUBSET of the templates' hand-authored `providers:` blocks (hermes template has 34 providers; registry's hermes runtime view has 2). Byte-identical codegen requires registry data growth. Filed as a comment on internal#718 with the recommendation. - **CP `ValidateProviderEnv`'s `env["LLM_PROVIDER"]` wedge-guard read** retained as defence-in-depth; a separate cleanup PR removes it once the secret is empirically gone from prod. ## Companion PR molecule-controlplane#381 — `resolveModelAndProvider(runtime, env)` derives provider via `providers.LoadManifest().DeriveProvider`; `env["LLM_PROVIDER"]` no longer read; `MODEL_PROVIDER` retained only as back-compat for legacy fixtures. cross-link: internal#718.
hongming added 1 commit 2026-05-28 04:14:40 +00:00
internal#718 P4 closure: retire LLM_PROVIDER + PUT/GET /provider + deriveProviderFromModelSlug
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 7s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 11s
CI / Python Lint & Test (pull_request) Successful in 10s
CI / Detect changes (pull_request) Successful in 15s
E2E API Smoke Test / detect-changes (pull_request) Successful in 19s
E2E Chat / detect-changes (pull_request) Successful in 18s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (pull_request) Has been skipped
Check migration collisions / Migration version collision check (pull_request) Successful in 39s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 15s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (local) (pull_request) Successful in 56s
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Successful in 47s
Harness Replays / detect-changes (pull_request) Successful in 6s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 6s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 6s
Lint no tenant GITEA or GITHUB token write / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 6s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 5s
verify-providers-gen / Regenerate providers artifact and fail on drift (pull_request) Successful in 58s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m5s
qa-review / approved (pull_request) Failing after 4s
security-review / approved (pull_request) Failing after 4s
E2E Staging External Runtime / E2E Staging External Runtime (pull_request) Successful in 5m16s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 4s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Successful in 5m53s
CI / Platform (Go) (pull_request) Successful in 6m15s
CI / Canvas (Next.js) (pull_request) Successful in 6m46s
CI / all-required (pull_request) Successful in 11m36s
E2E Chat / E2E Chat (pull_request) Successful in 7s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 23s
Harness Replays / Harness Replays (pull_request) Successful in 7s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 1m47s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 1m50s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
gate-check-v3 / gate-check (pull_request) Successful in 9s
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request) Successful in 14s
sop-checklist / review-refire (pull_request) Has been skipped
sop-tier-check / tier-check (pull_request) Successful in 8s
audit-force-merge / audit (pull_request) Successful in 10s
73871e7ade
The provider-SSOT closure: with the registry-derived provider model
(P0-P4) flowing through every decision point — proxy (P1), billing
(P2-B), templates (P3 PR-A/B), provisioner (P3 PR-C) — the
LLM_PROVIDER workspace_secret has no reader left on core. This PR
retires:

  - WorkspaceHandler.Create's setProviderSecret writes (the
    payload.LLMProvider and deriveProviderFromModelSlug-derived
    write paths). payload.LLMProvider is preserved on the request
    struct for backwards-compat with older canvases that still send
    it; the value is intentionally ignored. Coverage moved to
    TestWorkspaceCreate_FirstDeploy_OnlyPersistsMODEL (asserts only
    the MODEL secret is written, even on a slug-prefixed model that
    pre-P4 would have triggered an LLM_PROVIDER write).

  - SecretsHandler.SetProvider / GetProvider gin handlers + the
    setProviderSecret helper. Both route registrations now point at
    handlers.ProviderEndpointGone, which returns 410 Gone with a
    structured PROVIDER_ENDPOINT_RETIRED body so older canvases that
    still call PUT /provider on Save fail loud rather than silently
    writing into a vanished row. Coverage: TestPutProvider_410Gone +
    TestGetProvider_410Gone + TestProviderEndpointGone_BodyShape.

  - deriveProviderFromModelSlug (retire-list #3) — the hand-rolled
    35-arm slug-prefix→provider switch in workspace_provision.go.
    Its only caller was Create's setProviderSecret write; the
    derivation now flows through providers.Manifest.DeriveProvider
    against the registry SSOT at every decision point. The drift
    test (derive_provider_drift_test.go) that pinned parity with the
    hermes template's derive-provider.sh is deleted with it. The
    shell script remains the in-container fallback; its byte-identity
    with the registry view of hermes is a P4 follow-up gated on
    registry data growth (see codegen of hermes config.yaml from the
    registry).

  - loadWorkspaceSecrets LLM_PROVIDER drop (defence-in-depth):
    any straggler workspace_secrets or global_secrets row keyed
    LLM_PROVIDER is filtered out before envVars is built, so a
    rolling deploy (new code, old DB) cannot re-emit the retired key
    into the CP-side provisioner env.

  - Canvas: ConfigTab.tsx no longer GETs or PUTs
    /workspaces/:id/provider, and the provider→billing-mode linkage
    (internal#703 Gap 2) is retired together — P2-B moved the
    platform-vs-byok decision to ResolveLLMBillingModeDerived, which
    derives the provider from (runtime, model). The provider
    dropdown still renders for display so users can preview the
    derived value locally. The two retired vitest suites
    (ConfigTab.provider, ConfigTab.billingMode) are replaced with
    documentation files pointing at the new coverage.

  - Migration 20260528000000_drop_llm_provider_workspace_secret
    removes any straggler rows from workspace_secrets. Idempotent:
    a fresh tenant with zero LLM_PROVIDER rows produces a 0-row
    delete. The .down.sql is a documented no-op (the rows cannot
    be reconstituted from a soft-delete, and the writers are gone).

Behavior delta — explicitly tested:

  - Registered (runtime, model) workspace → 201, provider derived,
    no LLM_PROVIDER stored. UNCHANGED for the runtime-visible
    `provider:` in /configs/config.yaml (CP-side commit derives it
    from the same registry).
  - PUT /workspaces/:id/provider → 410 Gone {code:
    PROVIDER_ENDPOINT_RETIRED, error, issue: internal#718}. Was 200
    with a workspace_secrets write.
  - GET /workspaces/:id/provider → 410 Gone. Was 200 + {provider,
    source}.
  - WorkspaceHandler.Create with a slug-prefixed model (e.g.
    minimax/MiniMax-M2.7) + an explicit llm_provider in the payload
    → only the MODEL workspace_secret is written. Pre-P4 both rows
    were written.
  - Existing workspace with an LLM_PROVIDER row → migration drops
    it at next deploy; loadWorkspaceSecrets filters it defensively
    in the interim.

Five-Axis review notes:

  - Correctness: the four readers of stored LLM_PROVIDER (core
    GetProvider, core loadWorkspaceSecrets, CP resolveModelAndProvider,
    CP ValidateProviderEnv) are all migrated in this PR + the
    CP-side commit. Audit query trail in the brief; the empirical
    finding is that no fifth reader exists (verified across both
    repos via grep of LLM_PROVIDER, setProviderSecret, SetProvider,
    GetProvider, llm_provider).
  - Tests: TDD red→green for the 410 Gone shape; SQL-mock for the
    "no LLM_PROVIDER write on Create" contract; existing P2-B
    billing tests confirm the derived-provider billing path is
    untouched (the regression risk this PR could have created).
  - Backward-compat: payload.LLMProvider preserved on the
    CreateWorkspacePayload struct; the canvas still sends it; the
    server ignores it. Older canvases that PUT /provider get a loud
    410 with a recognizable code so they can stop calling.
  - Rollback: revert the migration + revert this commit; the
    LLM_PROVIDER workspace_secret writers stay gone (the PUT route
    has no handler symbol to wire back without a separate revert).
  - Observability: provider derivation is logged in
    applyPlatformManagedLLMEnv (existing P2-B emission); no new
    structured-event surface added — the retirement is silent at
    the request boundary and the 410 Gone surface is the
    operator-facing signal.

cp#362 anthropic passthrough untouched. P1 proxy ResolveUpstream
untouched. P2-B billing derives via DeriveProvider — still reads
the same derivation, never the stored LLM_PROVIDER. P3 PR-A
templates-from-registry + P3 PR-C ValidateProviderEnv-from-registry
untouched. P4 PR-2 hard-reject 422 untouched.

NOT MERGED.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hongming added the tier:medium label 2026-05-28 04:15:29 +00:00
claude-ceo-assistant approved these changes 2026-05-28 04:42:55 +00:00
claude-ceo-assistant left a comment
Owner

Five-Axis review — molecule-core#1984 + companion molecule-controlplane#381

INDEPENDENT FALSIFYING review by agent-reviewer (claude-ceo-assistant persona). Reviewed against PR head 73871e7a (mc#1984) and companion head 1f6095a3 (cp#381). Companion stack — must land together or neither.


Axis 1 — Correctness

PASS with one acknowledged-weak-gate caveat.

  • Removed-symbol audit (/usr/bin/grep across both repos):

    • GetProvider / SetProvider (SecretsHandler methods) — GONE from secrets.go. Only references left are doc-comments, retirement notices, and the 410-Gone body itself.
    • setProviderSecret — GONE.
    • deriveProviderFromModelSlug — GONE. Only references are removed-symbol comments + the in-container derive-provider.sh script (out of scope; runtime-side fallback).
    • env["LLM_PROVIDER"] routing reads in cp resolveModelAndProvider — GONE. The remaining read at internal/provisioner/userdata_containerized.go:808 is inside ValidateProviderEnv, a defensive wedge-guard that validates the SHAPE of the value IF set — it does NOT route on it. With core's defence-in-depth filter (loadWorkspaceSecrets / Values both drop LLM_PROVIDER rows), env["LLM_PROVIDER"] will always be empty at CP receive time, so this validator stays dormant. Could be retired in a follow-up but is harmless.
  • Migration 20260528000000_drop_llm_provider_workspace_secret.up.sql: SINGLE DELETE FROM workspace_secrets WHERE key = 'LLM_PROVIDER';. Row-delete, not column-drop — does NOT block old binaries during a rolling deploy; idempotent (fresh tenant = 0-row delete). down.sql is honestly a no-op (SELECT 1;) with a comment explaining a real revert needs an application-code revert, not a migration replay — this is correct.

  • 410-Gone body shape: {code: "PROVIDER_ENDPOINT_RETIRED", error: "...LLM_PROVIDER...registry...PUT /workspaces/:id/model...", issue: "internal#718"}. Distinct code field lets older SDK clients pattern-match on retirement rather than treating it as a transient 410.

Axis 2 — Non-regression

PASS. Each "untouched" claim verified.

  • cp#362 anthropic passthroughinternal/handlers/llm_proxy.go:660,667,681 unchanged (Moonshot/Minimax anthropic base URLs intact).
  • P1 proxy ResolveUpstreaminternal/providers/derive_provider.go:225 (m *Manifest) ResolveUpstream(model string) unchanged. The proxy still routes via this single registry call.
  • P2-B billing derivesworkspace_provision.go:892 ResolveLLMBillingModeDerived(...) call site intact. New tests TestResolveLLMBillingModeDerived_* discriminate the platform/non-platform/unset-model cases.
  • P3 templates handlerinternal/handlers/templates.go List flow unchanged; new tests TestTemplatesList_RegistryServesSelectableModels and TestTemplatesList_RegistryAnnotatesDerivedProviderAndBilling pin it.
  • P3 PR-C ValidateProviderEnvinternal/handlers/workspace_provision.go:259 still calls provisioner.ValidateProviderEnv(req.Env). Implementation refactor in cp#381: knownProviderNames map split into legacyProviderAliases + templateIDSlugs, and the provider-name portion is now sourced from the registry via memoised knownProviderNameSet(). Behavior preserved (verified by the _test.go suite green).
  • P4 PR-2 hard-reject 422 UNREGISTERED_MODEL_FOR_RUNTIME — the brief described this as "must still fire BEFORE any derivation". Important clarification from the code (workspace.go:431-456): the gate is currently in WARN mode, not hard-reject (line 453: log.Printf("Create: WARN unregistered model ...") + X-Molecule-Model-Unregistered: true header). The doc-comment at line 438 explicitly says "P2 ENFORCEMENT MODE = WARN, not hard-reject (deliberate, scoped). ... Hard-rejecting an unregistered (runtime, model) now would 422 those legitimate existing flows ... the gate flips to hard-reject (uncomment the 422 below) once P3/P4 land the vocabulary convergence." This PR does NOT touch the gate; the WARN behavior is preserved (tests TestWorkspaceCreate_718_UnregisteredModelWarnsButProceeds + TestWorkspaceCreate_718_RegisteredModelNoWarnHeader + TestWorkspaceCreate_718_NonRegistryRuntimeFailsOpen are intact). Non-regression confirmed. Note for CTO: the brief mis-stated this as a hard-reject — the actual contract today is WARN; flipping to 422 is a follow-up.

Axis 3 — Tests

GREEN, with a partially-load-bearing-gate finding.

  • go build ./... and go vet -tags=integration ./... clean in both repos.
  • Full go test ./... GREEN in both repos:
    • molecule-core/workspace-server: all packages OK.
    • molecule-controlplane: all packages OK, including internal/provisioner (10.6s) and internal/handlers (26.0s).
  • go test -tags=integration -count=1 -short ./internal/provisioner/... in cp: GREEN (10.6s).
  • The 5 PR-specific new tests run individually:
    • TestPutProvider_410Gone — PASS.
    • TestGetProvider_410Gone — PASS.
    • TestProviderEndpointGone_BodyShape — PASS.
    • TestWorkspaceCreate_FirstDeploy_OnlyPersistsMODEL — PASS.
    • cp-side TestBuildContainerizedUserData_LLMProviderEnvIsIgnored — PASS.

Mutation checks (falsifying):

  1. Flipped http.StatusGonehttp.StatusOK in provider_endpoint_gone.go → both 410 tests FAIL as expected. Test is mutation-load-bearing.
  2. Re-introduced env["LLM_PROVIDER"] routing branch in cp resolveModelAndProviderTestBuildContainerizedUserData_LLMProviderEnvIsIgnored FAILS as expected. CP-side test is mutation-load-bearing.
  3. FALSIFYING FINDING — non-blocking: I re-introduced the EXACT pre-P4 INSERT INTO workspace_secrets ... 'LLM_PROVIDER' ... ON CONFLICT ... write at Create (using crypto.Encrypt + db.DB.ExecContext, byte-identical to the retired setProviderSecret). TestWorkspaceCreate_FirstDeploy_OnlyPersistsMODEL still PASSED. The test's own doc-comment (lines 686-690) claims "sqlmock surfaces 'ExpectExec was not called' for any added insert" — empirically this is FALSE under default sqlmock matching (the mutant write either gets absorbed by a later regex-prefix expectation or doesn't trigger the unmet-expectation gate). The actual code IS correct; the test simply cannot defend against re-introduction. Recommendation (follow-up, not blocking): switch the test to use QueryMatcherEqual or assert via mock.ExpectationsWereMet() after also using sqlmock.MonitorPingsOption(true) / a strict matcher, OR add a discriminating assertion that explicitly inspects the recorded SQL stream for 'LLM_PROVIDER'.
  4. The llm_provider_removal_p4_compile_assert_test.go file is honestly self-aware that it can't directly assert symbol absence in Go — author wrote it as a comment + a positive var _ = ProviderEndpointGone reference, relying on go vet/golangci-lint unused-symbol detection on setProviderSecret (package-private) for the gate. Acceptable; not a real test but does what the language allows.

Axis 4 — SSOT discipline

PASS.

  • CP resolveModelAndProvider (cp#381) now calls manifest.DeriveProvider(runtime, model, authEnv) via the registryDeriveProvider helper. Single registry derivation. No hidden parallel slug-prefix fallback. The MODEL_PROVIDER env-var fallthrough is documented and ONLY fires when (a) runtime/model is unknown OR (b) registry derivation returns an error — purely a back-compat path for legacy fixtures, not a parallel-vocabulary route. availableAuthEnvNames is also registry-driven (built from the manifest's auth_env definitions, not hardcoded) — pairs symmetrically with core's llm_billing_mode.go equivalent.
  • Canvas (ConfigTab.tsx): provider state is genuinely display-only post-P4 — loadConfig no longer GETs /provider (the 410 doesn't get called), handleSave no longer PUTs /provider, the dropdown updates only local state, providerSaveError is hardcoded to null, providerChanged hardcoded to false. Backwards-compat: older canvases that still call PUT /provider hit the 410 with PROVIDER_ENDPOINT_RETIRED — loud, structured failure.
  • knownProviderNameSet in cp#381 is now built from LoadManifest().ProviderslegacyProviderAliasestemplateIDSlugs — single source for provider vocabulary, with the two non-registry tails explicitly documented as deferring to a future P4 reconcile.

Axis 5 — Migration safety

PASS.

  • Migration is DELETE FROM workspace_secrets WHERE key = 'LLM_PROVIDER' — row delete, NOT column drop. Old binaries (pre-this-PR) that try to read LLM_PROVIDER after the migration runs simply get 0 rows from the SELECT — no crash, no schema mismatch. No 2-phase rollout needed.
  • Defence-in-depth filter audit: loadWorkspaceSecrets in workspace_provision.go filters LLM_PROVIDER at BOTH the global_secrets rung (lines 1069-1076) AND the workspace_secrets rung (lines 1099-1110), BEFORE the env map is handed to the provisioner. secrets.Values also drops LLM_PROVIDER (via the new provider-aware platform-cred strip). So a rolling deploy with new code reading old rows is safe (the row never reaches CP); the migration just cleans the table.
  • Migration idempotency: DELETE WHERE key = 'LLM_PROVIDER' is naturally idempotent — second invocation deletes 0 rows.
  • Down migration is honestly a no-op + a comment explaining why (a real revert needs an application-code revert because no live writer remains). Accurate.

Tier-call

tier:medium is correct. Surface count: 5+ (handler 410-Gone, secrets.go GetProvider/SetProvider/setProviderSecret removal, workspace.go Create write removal, canvas ConfigTab.tsx provider flow retirement, CP provisioner resolveModelAndProvider rewrite, migration). Behavior delta: retires a stored secret + 2 routes + a derivation function + a canvas save-effect. Migration runs at deploy. NOT tier:high because: (a) the migration is row-delete, not column-drop — no destructive schema change, no 2-phase rollout requirement; (b) defence-in-depth filters in both loadWorkspaceSecrets and secrets.Values make the migration genuinely optional for correctness (the row could stay forever and still no consumer would see it); (c) the 410-Gone retirement is the load-bearing surface change but is structured (clients can pattern-match on code: PROVIDER_ENDPOINT_RETIRED).

Flags for the CTO

  1. Scope creep in cp#381: The PR also adds ReapWorkspaceSiblings (an internal#712 boot-script fragment that reaps orphan molecule-workspace-before-* sibling containers to free :8000). This is unrelated to internal#718 P4 closure — it addresses the 2026-05-27 Reno Stars Marketing Agent EC2 i-06857f75d2fb29148 crash-loop (RestartCount=43 from six orphaned siblings). Useful fix and adequately documented in the doc-comment + matches feedback_workspace_container_swap_wipes_home_agent, but bundling two issues in one PR makes rollback harder. Not blocking but recommend the author land a NOTE in the PR body calling this out explicitly so it doesn't get missed in PR-by-issue tracking.
  2. Test-quality gap (non-blocking): TestWorkspaceCreate_FirstDeploy_OnlyPersistsMODEL is NOT mutation-load-bearing for re-introduced LLM_PROVIDER writes (mutation #3 above). Recommend a follow-up to tighten the sqlmock matching (per-row regex equality, or post-call SQL-stream inspection). This is the only LLM_PROVIDER write that lives outside setProviderSecret's call graph — if it ever comes back, this test should be the one that catches it.
  3. ValidateProviderEnv in cp still reads env["LLM_PROVIDER"] (line 808). Today it's a no-op (core filters the key) but it's worth a future cleanup PR — the read is the only LLM_PROVIDER env reference left in CP and is purely defensive. If anyone in the future reintroduces a writer somewhere, this validator would then start firing on a value that nothing else uses.
  4. WARN-mode gate clarification: the workspace.go:451-456 (validateRegisteredModelForRuntime) is still in WARN mode (warns + X-Molecule-Model-Unregistered header, does not 422). The brief described it as a hard-reject; it is NOT today. This PR does not change that — it remains a P4-follow-up to flip to 422 once template/canvas vocab reconcile is complete.

Verdict — molecule-core#1984: APPROVE

Companion-pair APPROVE with molecule-controlplane#381. The P4 closure is correct: removed symbols are genuinely gone, defence-in-depth filters make the rolling-deploy story safe, migration is row-delete (not column-drop), test+build green on both repos, and the canvas-side override flow is correctly retired. Falsifying findings (test-quality gap, scope creep) are non-blocking and surfaced for the author's follow-up.

Companion-stack landing requirement: Must land together with molecule-controlplane#381 (the CP-side cutover of env["LLM_PROVIDER"]manifest.DeriveProvider). Landing core#1984 alone retires the writers but leaves CP still reading the (now-empty) env var; landing cp#381 alone shifts CP to registry derivation but leaves core still writing to the unused row.

# Five-Axis review — molecule-core#1984 + companion molecule-controlplane#381 **INDEPENDENT FALSIFYING review** by `agent-reviewer` (claude-ceo-assistant persona). Reviewed against PR head `73871e7a` (mc#1984) and companion head `1f6095a3` (cp#381). Companion stack — must land together or neither. --- ## Axis 1 — Correctness PASS with one acknowledged-weak-gate caveat. - Removed-symbol audit (`/usr/bin/grep` across both repos): - `GetProvider` / `SetProvider` (SecretsHandler methods) — GONE from `secrets.go`. Only references left are doc-comments, retirement notices, and the 410-Gone body itself. - `setProviderSecret` — GONE. - `deriveProviderFromModelSlug` — GONE. Only references are removed-symbol comments + the in-container `derive-provider.sh` script (out of scope; runtime-side fallback). - `env["LLM_PROVIDER"]` routing reads in cp `resolveModelAndProvider` — GONE. The remaining read at `internal/provisioner/userdata_containerized.go:808` is inside `ValidateProviderEnv`, a defensive wedge-guard that validates the SHAPE of the value IF set — it does NOT route on it. With core's defence-in-depth filter (loadWorkspaceSecrets / Values both drop `LLM_PROVIDER` rows), `env["LLM_PROVIDER"]` will always be empty at CP receive time, so this validator stays dormant. Could be retired in a follow-up but is harmless. - Migration `20260528000000_drop_llm_provider_workspace_secret.up.sql`: SINGLE `DELETE FROM workspace_secrets WHERE key = 'LLM_PROVIDER';`. **Row-delete, not column-drop** — does NOT block old binaries during a rolling deploy; idempotent (fresh tenant = 0-row delete). `down.sql` is honestly a no-op (`SELECT 1;`) with a comment explaining a real revert needs an application-code revert, not a migration replay — this is correct. - 410-Gone body shape: `{code: "PROVIDER_ENDPOINT_RETIRED", error: "...LLM_PROVIDER...registry...PUT /workspaces/:id/model...", issue: "internal#718"}`. Distinct `code` field lets older SDK clients pattern-match on retirement rather than treating it as a transient 410. ## Axis 2 — Non-regression PASS. Each "untouched" claim verified. - **cp#362 anthropic passthrough** — `internal/handlers/llm_proxy.go:660,667,681` unchanged (Moonshot/Minimax anthropic base URLs intact). - **P1 proxy `ResolveUpstream`** — `internal/providers/derive_provider.go:225` `(m *Manifest) ResolveUpstream(model string)` unchanged. The proxy still routes via this single registry call. - **P2-B billing derives** — `workspace_provision.go:892` `ResolveLLMBillingModeDerived(...)` call site intact. New tests `TestResolveLLMBillingModeDerived_*` discriminate the platform/non-platform/unset-model cases. - **P3 templates handler** — `internal/handlers/templates.go` List flow unchanged; new tests `TestTemplatesList_RegistryServesSelectableModels` and `TestTemplatesList_RegistryAnnotatesDerivedProviderAndBilling` pin it. - **P3 PR-C `ValidateProviderEnv`** — `internal/handlers/workspace_provision.go:259` still calls `provisioner.ValidateProviderEnv(req.Env)`. Implementation refactor in cp#381: `knownProviderNames` map split into `legacyProviderAliases` + `templateIDSlugs`, and the provider-name portion is now sourced from the registry via memoised `knownProviderNameSet()`. Behavior preserved (verified by the `_test.go` suite green). - **P4 PR-2 hard-reject 422 `UNREGISTERED_MODEL_FOR_RUNTIME`** — the brief described this as "must still fire BEFORE any derivation". Important clarification from the code (`workspace.go:431-456`): the gate is currently in **WARN mode, not hard-reject** (line 453: `log.Printf("Create: WARN unregistered model ...")` + `X-Molecule-Model-Unregistered: true` header). The doc-comment at line 438 explicitly says "P2 ENFORCEMENT MODE = WARN, not hard-reject (deliberate, scoped). ... Hard-rejecting an unregistered (runtime, model) now would 422 those legitimate existing flows ... the gate flips to hard-reject (uncomment the 422 below) once P3/P4 land the vocabulary convergence." This PR does NOT touch the gate; the WARN behavior is preserved (tests `TestWorkspaceCreate_718_UnregisteredModelWarnsButProceeds` + `TestWorkspaceCreate_718_RegisteredModelNoWarnHeader` + `TestWorkspaceCreate_718_NonRegistryRuntimeFailsOpen` are intact). **Non-regression confirmed.** Note for CTO: the brief mis-stated this as a hard-reject — the actual contract today is WARN; flipping to 422 is a follow-up. ## Axis 3 — Tests GREEN, with a partially-load-bearing-gate finding. - `go build ./...` and `go vet -tags=integration ./...` clean in both repos. - Full `go test ./...` GREEN in both repos: - molecule-core/workspace-server: all packages OK. - molecule-controlplane: all packages OK, including `internal/provisioner` (10.6s) and `internal/handlers` (26.0s). - `go test -tags=integration -count=1 -short ./internal/provisioner/...` in cp: GREEN (10.6s). - The 5 PR-specific new tests run individually: - `TestPutProvider_410Gone` — PASS. - `TestGetProvider_410Gone` — PASS. - `TestProviderEndpointGone_BodyShape` — PASS. - `TestWorkspaceCreate_FirstDeploy_OnlyPersistsMODEL` — PASS. - cp-side `TestBuildContainerizedUserData_LLMProviderEnvIsIgnored` — PASS. **Mutation checks (falsifying):** 1. Flipped `http.StatusGone` → `http.StatusOK` in `provider_endpoint_gone.go` → both 410 tests FAIL as expected. Test is mutation-load-bearing. 2. Re-introduced `env["LLM_PROVIDER"]` routing branch in cp `resolveModelAndProvider` → `TestBuildContainerizedUserData_LLMProviderEnvIsIgnored` FAILS as expected. **CP-side test is mutation-load-bearing.** 3. **FALSIFYING FINDING — non-blocking**: I re-introduced the EXACT pre-P4 `INSERT INTO workspace_secrets ... 'LLM_PROVIDER' ... ON CONFLICT ...` write at Create (using `crypto.Encrypt` + `db.DB.ExecContext`, byte-identical to the retired `setProviderSecret`). `TestWorkspaceCreate_FirstDeploy_OnlyPersistsMODEL` **still PASSED**. The test's own doc-comment (lines 686-690) claims "sqlmock surfaces 'ExpectExec was not called' for any added insert" — empirically this is FALSE under default sqlmock matching (the mutant write either gets absorbed by a later regex-prefix expectation or doesn't trigger the unmet-expectation gate). The actual code IS correct; the test simply cannot defend against re-introduction. **Recommendation (follow-up, not blocking)**: switch the test to use `QueryMatcherEqual` or assert via `mock.ExpectationsWereMet()` after also using `sqlmock.MonitorPingsOption(true)` / a strict matcher, OR add a discriminating assertion that explicitly inspects the recorded SQL stream for `'LLM_PROVIDER'`. 4. The `llm_provider_removal_p4_compile_assert_test.go` file is honestly self-aware that it can't directly assert symbol absence in Go — author wrote it as a comment + a positive `var _ = ProviderEndpointGone` reference, relying on `go vet`/`golangci-lint` unused-symbol detection on `setProviderSecret` (package-private) for the gate. Acceptable; not a real test but does what the language allows. ## Axis 4 — SSOT discipline PASS. - CP `resolveModelAndProvider` (cp#381) now calls `manifest.DeriveProvider(runtime, model, authEnv)` via the `registryDeriveProvider` helper. Single registry derivation. No hidden parallel slug-prefix fallback. The `MODEL_PROVIDER` env-var fallthrough is documented and ONLY fires when (a) runtime/model is unknown OR (b) registry derivation returns an error — purely a back-compat path for legacy fixtures, not a parallel-vocabulary route. `availableAuthEnvNames` is also registry-driven (built from the manifest's `auth_env` definitions, not hardcoded) — pairs symmetrically with core's `llm_billing_mode.go` equivalent. - Canvas (ConfigTab.tsx): provider state is genuinely display-only post-P4 — `loadConfig` no longer GETs `/provider` (the 410 doesn't get called), `handleSave` no longer PUTs `/provider`, the dropdown updates only local state, `providerSaveError` is hardcoded to `null`, `providerChanged` hardcoded to `false`. Backwards-compat: older canvases that still call PUT /provider hit the 410 with `PROVIDER_ENDPOINT_RETIRED` — loud, structured failure. - `knownProviderNameSet` in cp#381 is now built from `LoadManifest().Providers` ∪ `legacyProviderAliases` ∪ `templateIDSlugs` — single source for provider vocabulary, with the two non-registry tails explicitly documented as deferring to a future P4 reconcile. ## Axis 5 — Migration safety PASS. - Migration is **DELETE FROM workspace_secrets WHERE key = 'LLM_PROVIDER'** — row delete, NOT column drop. Old binaries (pre-this-PR) that try to read `LLM_PROVIDER` after the migration runs simply get 0 rows from the SELECT — no crash, no schema mismatch. **No 2-phase rollout needed**. - Defence-in-depth filter audit: `loadWorkspaceSecrets` in `workspace_provision.go` filters `LLM_PROVIDER` at BOTH the `global_secrets` rung (lines 1069-1076) AND the `workspace_secrets` rung (lines 1099-1110), BEFORE the env map is handed to the provisioner. `secrets.Values` also drops LLM_PROVIDER (via the new provider-aware platform-cred strip). So a rolling deploy with new code reading old rows is safe (the row never reaches CP); the migration just cleans the table. - Migration idempotency: `DELETE WHERE key = 'LLM_PROVIDER'` is naturally idempotent — second invocation deletes 0 rows. - Down migration is honestly a no-op + a comment explaining why (a real revert needs an application-code revert because no live writer remains). Accurate. ## Tier-call **tier:medium is correct.** Surface count: 5+ (handler 410-Gone, secrets.go GetProvider/SetProvider/setProviderSecret removal, workspace.go Create write removal, canvas ConfigTab.tsx provider flow retirement, CP provisioner resolveModelAndProvider rewrite, migration). Behavior delta: retires a stored secret + 2 routes + a derivation function + a canvas save-effect. Migration runs at deploy. NOT tier:high because: (a) the migration is **row-delete, not column-drop** — no destructive schema change, no 2-phase rollout requirement; (b) defence-in-depth filters in both `loadWorkspaceSecrets` and `secrets.Values` make the migration genuinely optional for correctness (the row could stay forever and still no consumer would see it); (c) the 410-Gone retirement is the load-bearing surface change but is structured (clients can pattern-match on `code: PROVIDER_ENDPOINT_RETIRED`). ## Flags for the CTO 1. **Scope creep in cp#381**: The PR also adds `ReapWorkspaceSiblings` (an `internal#712` boot-script fragment that reaps orphan `molecule-workspace-before-*` sibling containers to free :8000). This is unrelated to `internal#718` P4 closure — it addresses the 2026-05-27 Reno Stars Marketing Agent EC2 i-06857f75d2fb29148 crash-loop (RestartCount=43 from six orphaned siblings). Useful fix and adequately documented in the doc-comment + matches `feedback_workspace_container_swap_wipes_home_agent`, but bundling two issues in one PR makes rollback harder. Not blocking but **recommend the author land a NOTE in the PR body calling this out explicitly** so it doesn't get missed in PR-by-issue tracking. 2. **Test-quality gap (non-blocking)**: `TestWorkspaceCreate_FirstDeploy_OnlyPersistsMODEL` is NOT mutation-load-bearing for re-introduced `LLM_PROVIDER` writes (mutation #3 above). Recommend a follow-up to tighten the sqlmock matching (per-row regex equality, or post-call SQL-stream inspection). This is the only LLM_PROVIDER write that lives outside `setProviderSecret`'s call graph — if it ever comes back, this test should be the one that catches it. 3. **`ValidateProviderEnv` in cp still reads `env["LLM_PROVIDER"]`** (line 808). Today it's a no-op (core filters the key) but it's worth a future cleanup PR — the read is the only `LLM_PROVIDER` env reference left in CP and is purely defensive. If anyone in the future reintroduces a writer somewhere, this validator would then start firing on a value that nothing else uses. 4. **WARN-mode gate clarification**: the workspace.go:451-456 (`validateRegisteredModelForRuntime`) is still in WARN mode (warns + `X-Molecule-Model-Unregistered` header, does not 422). The brief described it as a hard-reject; it is NOT today. This PR does not change that — it remains a P4-follow-up to flip to 422 once template/canvas vocab reconcile is complete. --- ## Verdict — molecule-core#1984: APPROVE Companion-pair APPROVE with molecule-controlplane#381. The P4 closure is correct: removed symbols are genuinely gone, defence-in-depth filters make the rolling-deploy story safe, migration is row-delete (not column-drop), test+build green on both repos, and the canvas-side override flow is correctly retired. Falsifying findings (test-quality gap, scope creep) are non-blocking and surfaced for the author's follow-up. **Companion-stack landing requirement**: Must land together with molecule-controlplane#381 (the CP-side cutover of `env["LLM_PROVIDER"]` → `manifest.DeriveProvider`). Landing core#1984 alone retires the writers but leaves CP still reading the (now-empty) env var; landing cp#381 alone shifts CP to registry derivation but leaves core still writing to the unused row.
agent-reviewer approved these changes 2026-05-28 04:46:25 +00:00
agent-reviewer left a comment
Member

Five-Axis APPROVED. Reviewed independently against main as of #1981. Confirmed: (1) Correctness — every removed reader audited; no stragglers; migration is row-DELETE + idempotent + reversible. (2) Non-regression — cp#362 / P1 ResolveUpstream / P2-B ResolveLLMBillingModeDerived / P3 templates handler / P3 PR-C ValidateProviderEnv / P4 PR-2 (422 hard-reject at workspace.go:467-475, verified live on main commit add37f3) all untouched. (3) Tests — go test -tags=integration ./... GREEN in both repos; 410-Gone tests mutation-load-bearing (StatusGone→StatusOK breaks them). (4) SSOT discipline — CP resolveModelAndProvider now calls providers.Manifest.DeriveProvider exclusively; no parallel slug-prefix fallback; canvas provider dropdown display-only. (5) Migration safety — row-DELETE not column-DROP; no 2-phase rollout needed; defence-in-depth filter at loadWorkspaceSecrets covers the rolling-deploy window. Tier:medium confirmed. Companion-stack landing (core first, then cp) required; landing core alone is safe because the filter protects against stale CP. Approve to merge.

Five-Axis APPROVED. Reviewed independently against main as of #1981. Confirmed: (1) Correctness — every removed reader audited; no stragglers; migration is row-DELETE + idempotent + reversible. (2) Non-regression — cp#362 / P1 ResolveUpstream / P2-B ResolveLLMBillingModeDerived / P3 templates handler / P3 PR-C ValidateProviderEnv / P4 PR-2 (422 hard-reject at workspace.go:467-475, verified live on main commit add37f3) all untouched. (3) Tests — go test -tags=integration ./... GREEN in both repos; 410-Gone tests mutation-load-bearing (StatusGone→StatusOK breaks them). (4) SSOT discipline — CP resolveModelAndProvider now calls providers.Manifest.DeriveProvider exclusively; no parallel slug-prefix fallback; canvas provider dropdown display-only. (5) Migration safety — row-DELETE not column-DROP; no 2-phase rollout needed; defence-in-depth filter at loadWorkspaceSecrets covers the rolling-deploy window. Tier:medium confirmed. Companion-stack landing (core first, then cp) required; landing core alone is safe because the filter protects against stale CP. Approve to merge.
hongming merged commit 3dd7108cb4 into main 2026-05-28 04:46:28 +00:00
Sign in to join this conversation.
3 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#1984