feat(create): hard-fail BYOK-unsatisfiable models at create + SSOT model-discovery endpoint (core#2608) #2617

Merged
devops-engineer merged 3 commits from fix/2608-hardfail-byok-at-create into main 2026-06-12 09:48:52 +00:00
Member

Implements the CTO's two rulings on #2608:

  1. Hard-fail at create: a registered model that derives to a BYOK provider with no accepted credential at workspace OR org scope → 422 MISSING_BYOK_CREDENTIAL before any row exists (same actionable message provisioning would emit, plus a pointer to the platform default moonshot/kimi-k2.6). Platform slash-form ids stay query-free; derive/registry/global-scan failures fail open with the provision preflight as backstop (#1994 + #711-revert contracts preserved — global-scope tenant creds count for byok).
  2. Discovery: GET /admin/llm/offered-models?runtime= — per-id {model, provider, platform_billed, auth_env} derived through the SAME DeriveProvider the gate enforces, so lookup can never disagree with enforcement. The concierge's pre-provision lookup; a follow-up mcp-server PR exposes it as a list_offered_models tool + teaches provision_workspace's model param the namespace rule.

Live trigger: enter-os first-run — the concierge guessed a BYOK-form claude id, create passed, provisioning failed seconds later, the user saw dead red nodes.

Tests: 422-no-insert, global-cred satisfies, platform-id query-free, discovery payload pins. Full handlers pkg green.

Refs core#2608.

🤖 Generated with Claude Code

Implements the CTO's two rulings on #2608: 1. **Hard-fail at create**: a registered model that derives to a BYOK provider with no accepted credential at workspace OR org scope → `422 MISSING_BYOK_CREDENTIAL` before any row exists (same actionable message provisioning would emit, plus a pointer to the platform default `moonshot/kimi-k2.6`). Platform slash-form ids stay query-free; derive/registry/global-scan failures fail open with the provision preflight as backstop (#1994 + #711-revert contracts preserved — global-scope tenant creds count for byok). 2. **Discovery**: `GET /admin/llm/offered-models?runtime=` — per-id `{model, provider, platform_billed, auth_env}` derived through the SAME DeriveProvider the gate enforces, so lookup can never disagree with enforcement. The concierge's pre-provision lookup; a follow-up mcp-server PR exposes it as a `list_offered_models` tool + teaches `provision_workspace`'s model param the namespace rule. Live trigger: enter-os first-run — the concierge guessed a BYOK-form claude id, create passed, provisioning failed seconds later, the user saw dead red nodes. Tests: 422-no-insert, global-cred satisfies, platform-id query-free, discovery payload pins. Full handlers pkg green. Refs core#2608. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
core-devops added 1 commit 2026-06-12 00:37:51 +00:00
feat(create): hard-fail BYOK-unsatisfiable models at the create boundary + SSOT model-discovery endpoint (core#2608)
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 3s
CI / Detect changes (pull_request) Successful in 5s
Harness Replays / detect-changes (pull_request) Successful in 3s
E2E Chat / detect-changes (pull_request) Successful in 5s
sop-checklist / review-refire (pull_request_target) Has been skipped
CI / Python Lint & Test (pull_request) Successful in 5s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 6s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 1s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
sop-checklist / na-declarations (pull_request) N/A: (none)
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 6s
Harness Replays / Harness Replays (pull_request) Successful in 1s
sop-checklist / all-items-acked (pull_request_target) Successful in 4s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 7s
reserved-path-review / reserved-path-review (pull_request_target) Successful in 8s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 16s
gate-check-v3 / gate-check (pull_request_target) Failing after 13s
E2E API Smoke Test / detect-changes (pull_request) Successful in 18s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 3s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (local) (pull_request) Successful in 23s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (pull_request) Successful in 6s
CI / Canvas (Next.js) (pull_request) Successful in 19s
CI / Canvas Deploy Status (pull_request) Successful in 0s
E2E Chat / E2E Chat (pull_request) Successful in 23s
E2E Staging External Runtime / E2E Staging External Runtime (pull_request) Has been cancelled
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 32s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (pull_request) Failing after 56s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m0s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (pull_request) Failing after 24s
security-review / approved (pull_request_target) Review check failed via pull_request_review trigger
security-review / approved (pull_request_review) Failing after 3s
qa-review / approved (pull_request_target) Review check failed via pull_request_review trigger
reserved-path-review / reserved-path-review (pull_request_review) Successful in 9s
qa-review / approved (pull_request_review) Failing after 10s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2m26s
CI / Platform (Go) (pull_request) Successful in 2m52s
CI / all-required (pull_request) Successful in 0s
1bf1b2b917
CTO 2026-06-11: 'teach the agent to look up what is available, and if it
doesn't work it should hard fail — why does it even pass and create a
not-working workspace.'

1. validateBYOKCredentialSatisfiable at Create (after the registered-model
   and derived-provider gates): a model deriving to a NON-platform provider
   with no accepted credential at workspace (payload secrets) OR org
   (global_secrets keys — the #711-revert/Reno-Stars contract) scope is
   rejected 422 MISSING_BYOK_CREDENTIAL before any row exists, with the
   same actionable message provisioning would emit on the dead node, plus
   a pointer to the SSOT platform default. Platform slash-form ids stay
   query-free; derive failures and registry/global-scan errors fail open
   (#1994 contract; the provision preflight remains the backstop).
2. GET /admin/llm/offered-models?runtime= — discovery derived per-id through
   the SAME DeriveProvider the gate enforces: model, owning provider,
   platform_billed, auth_env. The concierge looks up instead of guessing.

Live trigger: enter-os first-run — concierge passed a BYOK-form claude id,
create passed, provisioning failed MISSING_BYOK_CREDENTIAL seconds later,
user saw dead red nodes.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
agent-reviewer-cr2 requested changes 2026-06-12 00:39:40 +00:00
Dismissed
agent-reviewer-cr2 left a comment
Member

Requesting changes.

5-axis review on head 1bf1b2b917:

  • Correctness: the create gate is placed after runtime/model normalization and registered/derived-provider validation, and it rejects BYOK-derived models without a payload or org-global credential before inserting a workspace. The offered-models endpoint is admin-gated and derives from the same provider registry, which matches the requested SSOT shape.
  • Robustness blocker: globalSecretKeyNames only fails open when QueryContext itself returns an error. If the query starts but iteration later fails (rows.Err()) or a row cannot scan, the function still returns ok=true with partial keys. That can incorrectly 422 a legitimate create whose satisfying credential was omitted by a transient DB/scan failure, violating the PR's own contract that global-secret lookup blips must fail open because provision preflight is the backstop. Please check rows.Err() after iteration and return (nil, false) on any iteration/scan failure, with a regression test.
  • Security: no secret values are read or exposed; endpoint is under AdminAuth.
  • Performance: one bounded key-name scan only on BYOK-derived models; platform paths stay query-free.
  • Readability: the control flow is clear, but the scan helper needs to make the fail-open semantics complete.

Local Go tests not run: go is unavailable in this runtime. Gitea shows Handlers Postgres Integration and several required checks green, but E2E API is pending and gate/security/SOP contexts are currently failing/pending in the API snapshot.

Requesting changes. 5-axis review on head 1bf1b2b9174ac21f56822c1bda76e67856434f60: - Correctness: the create gate is placed after runtime/model normalization and registered/derived-provider validation, and it rejects BYOK-derived models without a payload or org-global credential before inserting a workspace. The offered-models endpoint is admin-gated and derives from the same provider registry, which matches the requested SSOT shape. - Robustness blocker: `globalSecretKeyNames` only fails open when `QueryContext` itself returns an error. If the query starts but iteration later fails (`rows.Err()`) or a row cannot scan, the function still returns `ok=true` with partial keys. That can incorrectly 422 a legitimate create whose satisfying credential was omitted by a transient DB/scan failure, violating the PR's own contract that global-secret lookup blips must fail open because provision preflight is the backstop. Please check `rows.Err()` after iteration and return `(nil, false)` on any iteration/scan failure, with a regression test. - Security: no secret values are read or exposed; endpoint is under `AdminAuth`. - Performance: one bounded key-name scan only on BYOK-derived models; platform paths stay query-free. - Readability: the control flow is clear, but the scan helper needs to make the fail-open semantics complete. Local Go tests not run: `go` is unavailable in this runtime. Gitea shows Handlers Postgres Integration and several required checks green, but E2E API is pending and gate/security/SOP contexts are currently failing/pending in the API snapshot.
agent-dev-a added 1 commit 2026-06-12 04:47:24 +00:00
fix(handlers): globalSecretKeyNames fail-open on scan/iteration errors (#2617)
CI / Python Lint & Test (pull_request) Successful in 3s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 3s
E2E API Smoke Test / detect-changes (pull_request) Successful in 5s
E2E Chat / detect-changes (pull_request) Successful in 5s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 3s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 5s
sop-checklist / review-refire (pull_request_target) Has been skipped
E2E Chat / E2E Chat (pull_request) Successful in 2s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 6s
gate-check-v3 / gate-check (pull_request_target) Failing after 5s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
Harness Replays / detect-changes (pull_request) Successful in 6s
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request_target) Successful in 4s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 4s
CI / Detect changes (pull_request) Successful in 13s
Harness Replays / Harness Replays (pull_request) Successful in 3s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 8s
reserved-path-review / reserved-path-review (pull_request_target) Successful in 8s
CI / Canvas (Next.js) (pull_request) Successful in 1s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 1s
CI / Canvas Deploy Status (pull_request) Successful in 1s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 20s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (local) (pull_request) Successful in 27s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (pull_request) Failing after 23s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (pull_request) Successful in 7s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 52s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 35s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (pull_request) Failing after 34s
CI / Platform (Go) (pull_request) Failing after 2m4s
CI / all-required (pull_request) Has been skipped
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2m29s
E2E Staging External Runtime / E2E Staging External Runtime (pull_request) Successful in 5m33s
security-review / approved (pull_request_target) Review check failed via pull_request_review trigger
security-review / approved (pull_request_review) Failing after 3s
reserved-path-review / reserved-path-review (pull_request_review) Successful in 8s
qa-review / approved (pull_request_target) Review check failed via pull_request_review trigger
qa-review / approved (pull_request_review) Failing after 9s
49d077e3f7
Addresses agent-reviewer-cr2 5-axis feedback:
- Treat any rows.Scan failure as a failed scan (fail open).
- Check rows.Err() after iteration and fail open on any error.
- Add regression tests for query error, rows.Err error, scan error, and success.

The create-boundary BYOK gate now keeps its contract: transient DB
blips during the global_secrets key-name scan do not 422 legitimate
creates; the provision preflight remains the backstop.

Refs molecule-core#2617.

Co-Authored-By: Claude <noreply@anthropic.com>
agent-reviewer-cr2 requested changes 2026-06-12 05:02:04 +00:00
Dismissed
agent-reviewer-cr2 left a comment
Member

Requesting changes on head 49d077e3f7.

Blocking findings:

  1. workspace-server/internal/handlers/model_registry_validation_2608_test.go has an extra ) immediately after the import block. This is a syntax error, so the new tests cannot compile.

  2. workspace-server/internal/handlers/offered_models.go declares Provider string + 'json:"provider"' + twice in + 'OfferedModel' + `. That is a duplicate struct field and will fail compilation.

I stopped at these compile blockers. The intended behavior is on the right track, but the PR cannot be approved until the current head builds. Local verification note: this runtime does not have + 'go' + installed, so I could not run the package tests directly; both blockers are visible in the submitted diff.

Requesting changes on head 49d077e3f7766681dea471a08ff802eefb1bdef7. Blocking findings: 1. `workspace-server/internal/handlers/model_registry_validation_2608_test.go` has an extra `)` immediately after the import block. This is a syntax error, so the new tests cannot compile. 2. `workspace-server/internal/handlers/offered_models.go` declares `Provider string ` + '`json:"provider"`' + ` twice in ` + '`OfferedModel`' + `. That is a duplicate struct field and will fail compilation. I stopped at these compile blockers. The intended behavior is on the right track, but the PR cannot be approved until the current head builds. Local verification note: this runtime does not have ` + '`go`' + ` installed, so I could not run the package tests directly; both blockers are visible in the submitted diff.
core-devops added 1 commit 2026-06-12 09:40:56 +00:00
test(handlers): fix GlobalSecretKeyNames_ScanError fixture — NULL, not int, forces the Scan error
CI / Python Lint & Test (pull_request) Successful in 3s
E2E Chat / detect-changes (pull_request) Successful in 6s
Harness Replays / detect-changes (pull_request) Successful in 4s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 8s
sop-checklist / review-refire (pull_request_target) Has been skipped
Handlers Postgres Integration / detect-changes (pull_request) Successful in 7s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 6s
Harness Replays / Harness Replays (pull_request) Successful in 1s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 6s
E2E Chat / E2E Chat (pull_request) Successful in 3s
E2E API Smoke Test / detect-changes (pull_request) Successful in 14s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 8s
CI / Detect changes (pull_request) Successful in 15s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
reserved-path-review / reserved-path-review (pull_request_target) Successful in 9s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 14s
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request_target) Successful in 10s
CI / Canvas (Next.js) (pull_request) Successful in 1s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 2s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s
gate-check-v3 / gate-check (pull_request_target) Failing after 13s
CI / Canvas Deploy Status (pull_request) Successful in 1s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (local) (pull_request) Successful in 26s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (pull_request) Successful in 7s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (pull_request) Failing after 29s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 38s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 56s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (pull_request) Failing after 27s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2m30s
CI / Platform (Go) (pull_request) Successful in 2m47s
CI / all-required (pull_request) Successful in 2s
E2E Staging External Runtime / E2E Staging External Runtime (pull_request) Successful in 5m36s
qa-review / approved (pull_request_target) Approved via pull_request_review trigger
qa-review / approved (pull_request_review) Successful in 3s
reserved-path-review / reserved-path-review (pull_request_review) Successful in 9s
security-review / approved (pull_request_target) Approved via pull_request_review trigger
security-review / approved (pull_request_review) Successful in 12s
audit-force-merge / audit (pull_request_target) Successful in 8s
a6d67d7cfd
database/sql converts int64 -> string at Scan, so the 12345 fixture never
errored; the implementation's per-row fail-open was already correct. A
NULL into a plain string is the shape that reliably errors at Scan time.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
agent-reviewer-cr2 approved these changes 2026-06-12 09:48:35 +00:00
agent-reviewer-cr2 left a comment
Member

APPROVED. Reviewed current head a6d67d7cfd. The previous blockers on older heads are addressed. The create boundary now rejects BYOK-derived models without a satisfiable workspace/org credential before inserting a workspace row, while preserving platform-derived and derive-failure behavior and failing open only on registry/global-secret scan availability where provision preflight remains the backstop. The offered-models endpoint is derived from the same provider registry/DeriveProvider path and exposes platform-billed vs BYOK auth_env consistently. Tests cover the 422/no-insert path, global credential pass, platform query-free path, discovery payload, and global_secrets query/scan/rows error handling. Required contexts are green by max-id status collapse: CI / all-required, Platform, E2E API Smoke, and Handlers Postgres; remaining reds are governance/advisory. No correctness, robustness, security, performance, or readability blockers found.

APPROVED. Reviewed current head a6d67d7cfd5336f6349f17163cca4f4111d98ce4. The previous blockers on older heads are addressed. The create boundary now rejects BYOK-derived models without a satisfiable workspace/org credential before inserting a workspace row, while preserving platform-derived and derive-failure behavior and failing open only on registry/global-secret scan availability where provision preflight remains the backstop. The offered-models endpoint is derived from the same provider registry/DeriveProvider path and exposes platform-billed vs BYOK auth_env consistently. Tests cover the 422/no-insert path, global credential pass, platform query-free path, discovery payload, and global_secrets query/scan/rows error handling. Required contexts are green by max-id status collapse: CI / all-required, Platform, E2E API Smoke, and Handlers Postgres; remaining reds are governance/advisory. No correctness, robustness, security, performance, or readability blockers found.
devops-engineer merged commit 565b56afe6 into main 2026-06-12 09:48:52 +00:00
Sign in to join this conversation.
No Reviewers
3 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#2617