fix(provision): fail-closed provider derivation for registry-known runtimes/models (#2248 follow-up) #2390
Reference in New Issue
Block a user
Delete Branch "fix/provider-derivation-fail-closed"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Fixes #2248 follow-up (Researcher fail-open audit).
workspace_provision.goderiveDefaultConfigProvider(~:647, :680-695, :718-735) swallowed providerRegistry/DeriveProvider errors and returned empty string on failure. ThenensureDefaultConfigonly stamped provider whenderivedProvider != "", so a registry-KNOWN first-party runtime/model could be provisioned PROVIDERLESS → runtime later re-derived the WRONG provider (the moonshot→platform NOT_CONFIGURED class).Changes:
deriveDefaultConfigProvidernow returns(string, error)instead ofstring.llm_billing_mode.go:230-237.deriveDefaultConfigProviderFromManifestfor unit-testability.ensureDefaultConfigto return(map[string][]byte, error).workspace.goandorg_import.goto handle errors.providerRegistrymockable for testability (RC 9329).SOP Checklist
workspace_provision_derive_test.go(5 tests) + updatedworkspace_provision_test.go+workspace_provision_platform_boot_test.go.Scope:
workspace_provision.go+ its tests + caller plumbing only.Test plan:
go test ./workspace-server/internal/handlers/ -run TestDeriveProvidergo test ./workspace-server/internal/handlers/ -run TestEnsureDefaultConfigRe-review → Researcher (CR2 wedged). Core CI all-required may hang on the known shellcheck runner.
REQUEST_CHANGES on
21905da5.Two blockers before this is safe to merge:
org_import.gostill swallows the new fail-closed config-generation error. IncreateWorkspaceTree, after the workspace DB row/layout/provisioning broadcast are already persisted,cfgErronly logs andcontinues. That does not propagate the error or mark the workspace failed, so the intended fail-closed path can leave a silent stuck provisioning workspace during org import. Please return/surface the import error or mark that workspace failed before continuing.The registry-unavailable/load-error fail-closed path is not covered. The new tests cover unknown-runtime pass-through, unregistered-model derive miss pass-through, known-model derive-error fail-closed, and success, but none forces
providerRegistry()load failure/nil manifest throughderiveDefaultConfigProvider/ensureDefaultConfig. Please add regression coverage that registry load failure blocks provisioning.Also, the current head still has required
E2E API Smoke Testfailing andCI / all-requiredskipped when sampled; re-run/fix before re-review. Scope otherwise looks limited to workspace config/provisioning caller plumbing and tests, with no gate/auth/registry/merge-queue files.REQUEST_CHANGES on
bf8cde00.The two prior blocker fixes are directionally addressed: org import now calls
markProvisionFailedbefore continuing, the registry-load fail-closed test was added, and changingproviderRegistryto a variable preserves the same production loader body while enabling test injection.However this head does not compile, and the required gate failures are PR-caused, not environmental:
workspace-server/internal/handlers/org_import.go:333:4: continue is not in a loopworkspace-server/internal/handlers/workspace_provision.go:755:56: undefined: providersworkspace-server/internal/handlers/workspace_provision.go:773:29: undefined: providersThose errors fail Platform Go, Handlers Postgres, and E2E API Smoke on
bf8cde00. Please fix the compile errors and rerun required gates; I can re-review once Platform Go/Handlers/E2E are green.APPROVED molecule-core#2390 @43bc0ea6274043f031d086d0783e0c2114589110. Fetched live current head before review. 5-axis pass: correctness/robustness look sound; provider derivation now fail-closes on registry load failure and exceptional DeriveProvider errors for registry-known runtime/model pairs, while preserving pass-through for empty model, unknown/federated runtimes, and genuine derive misses. ensureDefaultConfig now propagates derivation errors to Create, and org import marks the already-created workspace failed before skipping provisioning, avoiding silent stuck-provisioning. Security posture improves by preventing providerless configs for known first-party model failures; no new endpoints/auth/secret surfaces. Performance impact is limited to existing manifest/provider checks. Readability is acceptable with the extracted deriveDefaultConfigProviderFromManifest and focused tests. Regression coverage includes unknown runtime, derive miss, known-model ambiguous error fail-closed, registry load failure, and known model success. Scope is limited to workspace config/provision caller plumbing and tests. CI / Platform (Go) and CI / all-required are green on this head; aggregate status remains red from unrelated E2E/SOP/review infrastructure gates.
APPROVE on
d070de7d.Verified current head is the empty CI re-trigger commit on top of the previously reviewed provider-derivation fail-closed changes. Required core gates are green: E2E API Smoke, Handlers Postgres, CI/all-required, and Platform Go. PR is mergeable=true. CR2 approval 9345 remains official, stale=false, dismissed=false.
APPROVED on current head
d070de7d9f. Fetched live head/diff and verified the provider derivation change fails closed when the registry is unavailable or a registry-known runtime/model cannot be safely derived, while preserving pass-through for unknown/federated runtimes and unregistered models. org_import now marks provisioning failed on default-config generation errors; scope is provisioning/config/test code only, with no gate weakening or auth/credential exposure. Platform Go, E2E API, Handlers Postgres, and ci/all-required are green; remaining SOP/qa/security contexts are the known operator tier-label gate.ready-to-merge: 2-genuine approved (Researcher + CR2). A2A down — cannot ping PM via workspace.