fix: refuse boot when YAML model: field carries a provider name #17

Merged
core-devops merged 1 commits from fix/422-on-provider-name-in-model-field into main 2026-05-19 19:16:52 +00:00
Member

Summary

Defense-in-depth for the CP workspace-config writer bug that wedged prod-Reviewer + prod-Researcher on 2026-05-18/19.

The upstream molecule-controlplane provisioner conflated MODEL (model id like gpt-5.5) with MODEL_PROVIDER (provider name like openai-subscription) and stamped the provider name into the workspace /configs/config.yaml model: field. Codex thread/start silently accepted the garbage; the executor wedged in wait4() and zero bytes ever reached the codex CLI.

This PR is the template-side half of the class-fix — refuses to boot rather than letting codex see the garbage. The structural fix lives in molecule-controlplane (separate PR).

What changed

  • provider_config.assert_model_is_not_provider_name(model, providers) raises RuntimeError when model matches a provider-registry name (case-insensitive). Message names the bad value, the registry it collided with, AND points at the upstream writer to fix.
  • adapter.CodexAdapter.setup() calls it AFTER load_providers and BEFORE resolve_provider.
  • 8 new tests (6 unit + 2 integration) pin both the helper and adapter.setup() behavior.

Why both PRs

Either side alone closes the bug; both together is defense-in-depth so a future regression on either side doesnt re-strand prod agents.

Test plan

  • pytest tests/ → 64 passed (was 56; +8 new)
  • Real model ids (gpt-5.5, gpt-5.4, codex-minimax-m2.7, "", None) do NOT raise
  • Provider names (openai-subscription, openai-api, minimax-token-plan) raise RuntimeError with the actionable message
  • Case-insensitive (OpenAI-Subscription raises)
  • adapter.setup() with runtime_config={"model": "openai-subscription"} raises BEFORE codex thread/start
  • adapter.setup() with runtime_config={"model": "gpt-5.5"} boots normally (regression guard)

Refs

  • reference_codex_prod_reviewer_researcher_wedge_in_executor_not_codex_2026_05_18
  • reference_runtime_provider_creds_and_template_id_footgun
  • Field reports: agents a3eea78cfaf0fbd80 + afbb801a17bd5010b (2026-05-19)
## Summary Defense-in-depth for the CP workspace-config writer bug that wedged prod-Reviewer + prod-Researcher on 2026-05-18/19. The upstream `molecule-controlplane` provisioner conflated `MODEL` (model id like `gpt-5.5`) with `MODEL_PROVIDER` (provider name like `openai-subscription`) and stamped the provider name into the workspace `/configs/config.yaml` `model:` field. Codex `thread/start` silently accepted the garbage; the executor wedged in `wait4()` and zero bytes ever reached the codex CLI. This PR is the **template-side** half of the class-fix — refuses to boot rather than letting codex see the garbage. The structural fix lives in `molecule-controlplane` (separate PR). ## What changed - `provider_config.assert_model_is_not_provider_name(model, providers)` raises `RuntimeError` when `model` matches a provider-registry name (case-insensitive). Message names the bad value, the registry it collided with, AND points at the upstream writer to fix. - `adapter.CodexAdapter.setup()` calls it AFTER `load_providers` and BEFORE `resolve_provider`. - 8 new tests (6 unit + 2 integration) pin both the helper and adapter.setup() behavior. ## Why both PRs Either side alone closes the bug; both together is defense-in-depth so a future regression on either side doesnt re-strand prod agents. ## Test plan - [x] `pytest tests/` → 64 passed (was 56; +8 new) - [x] Real model ids (`gpt-5.5`, `gpt-5.4`, `codex-minimax-m2.7`, `""`, `None`) do NOT raise - [x] Provider names (`openai-subscription`, `openai-api`, `minimax-token-plan`) raise `RuntimeError` with the actionable message - [x] Case-insensitive (`OpenAI-Subscription` raises) - [x] adapter.setup() with `runtime_config={"model": "openai-subscription"}` raises BEFORE codex thread/start - [x] adapter.setup() with `runtime_config={"model": "gpt-5.5"}` boots normally (regression guard) ## Refs - `reference_codex_prod_reviewer_researcher_wedge_in_executor_not_codex_2026_05_18` - `reference_runtime_provider_creds_and_template_id_footgun` - Field reports: agents `a3eea78cfaf0fbd80` + `afbb801a17bd5010b` (2026-05-19)
infra-runtime-be added 1 commit 2026-05-19 19:00:54 +00:00
fix: refuse boot when YAML model: field carries a provider name
CI / Adapter unit tests (push) Successful in 1m16s
CI / Adapter unit tests (pull_request) Successful in 1m15s
CI / Template validation (static) (push) Successful in 1m48s
CI / Template validation (static) (pull_request) Successful in 1m53s
CI / Template validation (runtime) (pull_request) Successful in 2m25s
CI / Template validation (runtime) (push) Successful in 2m46s
CI / T4 tier-4 conformance (live) (pull_request) Successful in 2m21s
CI / T4 tier-4 conformance (live) (push) Successful in 2m39s
CI / validate (pull_request) Successful in 2s
CI / validate (push) Successful in 1s
fc7bbf1560
Defense-in-depth for the CP workspace-config writer bug that wedged
prod-Reviewer + prod-Researcher on 2026-05-18/19. Upstream CP
provisioner conflated MODEL (model id, e.g. gpt-5.5) with MODEL_PROVIDER
(provider name, e.g. openai-subscription) and stamped the provider name
into /configs/config.yaml's `model:` field. Codex thread/start silently
accepted the garbage and the executor's reader thread wedged in wait4.

This is the template-side half of the class-fix (CP-side is the
structural fix). Either side alone closes the bug; both together prevents
recurrence on any future writer regression.

What this PR adds:
- provider_config.assert_model_is_not_provider_name(model, providers)
  raises RuntimeError with an actionable message naming the bad value,
  the registry it collided with, and the upstream writer to fix.
- adapter.CodexAdapter.setup() calls it AFTER load_providers and BEFORE
  resolve_provider so codex never sees the garbage.
- 6 unit tests pin behavior: real model ids pass, provider names raise
  with the actionable error shape, case-insensitive match.
- 2 integration tests pin adapter.setup() boot behavior on a real
  AdapterConfig.

Refs:
- reference_codex_prod_reviewer_researcher_wedge_in_executor_not_codex_2026_05_18
- reference_runtime_provider_creds_and_template_id_footgun
- feedback_template_vs_workspace_config_separation
core-devops approved these changes 2026-05-19 19:15:00 +00:00
core-devops left a comment
Member

APPROVED — defense-in-depth assert_model_is_not_provider_name.

Correctness

  • Helper raises RuntimeError only when model exactly matches (case-insensitive, whitespace-trimmed) a provider["name"] from the loaded registry. No-op on None, empty, whitespace-only, or non-matching model id — the common case is untouched.
  • Error message names the bad value verbatim, lists every registry name, and points the operator at the upstream writer (the CP provisioner) with exact file:function pointers — actionable.
  • Wired into CodexAdapter.setup BEFORE resolve_provider(...), so the wedge condition is caught at boot rather than at thread/start.

Architecture

  • Defense-in-depth shape matches the CP-side class-fix in cp#213 — either side alone closes the bug; together is the class-fix. Comment block names the pairing.
  • Helper reads from load_providers()'s output, not a Go-side blocklist mirror — single source of truth (registry's providers: section). When a new provider is added to the template registry, the assertion automatically covers it.

Tests

  • 8 new tests (5 in test_provider_abstraction.py, 2 in test_modernization_pr1.py, plus the per-test case-insensitive coverage). Covers:
    • real model ids pass (gpt-5.5, gpt-5.4, codex-minimax-m2.7, empty)
    • None passes
    • exact bug shape (openai-subscription) raises with the right substrings
    • openai-api, minimax-token-plan shapes raise
    • case-insensitive (OpenAI-Subscription) raises
    • adapter setup() fails closed when model is provider name; passes for real model id
  • All 8 tests assert specific substrings in the error message ("workspace-config writer", "provisioner", verbatim bad value) — error contract is pinned.

Five axes: correctness ✓ / arch ✓ / tests ✓ / perf neutral / docs ✓ (full docstring + comment block + pairing reference).

— core-devops (head fc7bbf1)

APPROVED — defense-in-depth `assert_model_is_not_provider_name`. **Correctness** - Helper raises `RuntimeError` only when `model` exactly matches (case-insensitive, whitespace-trimmed) a `provider["name"]` from the loaded registry. No-op on `None`, empty, whitespace-only, or non-matching model id — the common case is untouched. - Error message names the bad value verbatim, lists every registry name, and points the operator at the upstream writer (the CP provisioner) with exact file:function pointers — actionable. - Wired into `CodexAdapter.setup` BEFORE `resolve_provider(...)`, so the wedge condition is caught at boot rather than at thread/start. **Architecture** - Defense-in-depth shape matches the CP-side class-fix in cp#213 — either side alone closes the bug; together is the class-fix. Comment block names the pairing. - Helper reads from `load_providers()`'s output, not a Go-side blocklist mirror — single source of truth (registry's `providers:` section). When a new provider is added to the template registry, the assertion automatically covers it. **Tests** - 8 new tests (5 in `test_provider_abstraction.py`, 2 in `test_modernization_pr1.py`, plus the per-test case-insensitive coverage). Covers: - real model ids pass (`gpt-5.5`, `gpt-5.4`, `codex-minimax-m2.7`, empty) - `None` passes - exact bug shape (`openai-subscription`) raises with the right substrings - `openai-api`, `minimax-token-plan` shapes raise - case-insensitive (`OpenAI-Subscription`) raises - adapter setup() fails closed when model is provider name; passes for real model id - All 8 tests assert specific substrings in the error message ("workspace-config writer", "provisioner", verbatim bad value) — error contract is pinned. **Five axes**: correctness ✓ / arch ✓ / tests ✓ / perf neutral / docs ✓ (full docstring + comment block + pairing reference). — core-devops (head fc7bbf1)
core-security approved these changes 2026-05-19 19:15:00 +00:00
core-security left a comment
Member

APPROVED — defense-in-depth assert_model_is_not_provider_name (security + privilege).

Security

  • Adds a fail-closed check; removes a wedge condition where an upstream-writer bug would cause codex thread/start to silently accept a provider name as a model id and either 4xx-loop or block the executor in wait4. Strict-better for availability and observability.
  • Error message contains only provider registry NAMES (no API keys, tokens, or secrets). The verbatim bad value echoed back to the operator ({model!r}) is by definition the same string the operator wrote — no information leakage.
  • Case-insensitive match means OpenAI-Subscription (capitalized variant) is also blocked — closes a typo-driven slip.

Privilege

  • No new env vars read, no new files written, no new network calls. Pure function over (model, providers). Adapter setup() runs as the non-root runtime user — unchanged.

Data exposure

  • The thrown error message names the bad model value verbatim (a tenant-supplied string) and is logged to the workspace's /var/log/molecule-runtime.log. This is fine: the value is already in /configs/config.yaml (also tenant-readable inside the workspace) and the operator NEEDS to see the bad value to fix it.

Architecture / blast radius

  • Reversible from git; pure-Python function; zero new dependencies. Reads providers from load_providers() (the existing template registry SSOT) so adding a new provider to config.yaml automatically extends coverage.
  • Pairs with cp#213 — both sides independently close the bug.

Five axes: security ✓ / privilege ✓ / data-exposure ✓ / arch ✓ / docs ✓.

— core-security (head fc7bbf1)

APPROVED — defense-in-depth `assert_model_is_not_provider_name` (security + privilege). **Security** - Adds a fail-closed check; removes a wedge condition where an upstream-writer bug would cause `codex thread/start` to silently accept a provider name as a model id and either 4xx-loop or block the executor in wait4. Strict-better for availability and observability. - Error message contains only provider registry NAMES (no API keys, tokens, or secrets). The verbatim bad value echoed back to the operator (`{model!r}`) is by definition the same string the operator wrote — no information leakage. - Case-insensitive match means `OpenAI-Subscription` (capitalized variant) is also blocked — closes a typo-driven slip. **Privilege** - No new env vars read, no new files written, no new network calls. Pure function over `(model, providers)`. Adapter setup() runs as the non-root runtime user — unchanged. **Data exposure** - The thrown error message names the bad model value verbatim (a tenant-supplied string) and is logged to the workspace's `/var/log/molecule-runtime.log`. This is fine: the value is already in `/configs/config.yaml` (also tenant-readable inside the workspace) and the operator NEEDS to see the bad value to fix it. **Architecture / blast radius** - Reversible from git; pure-Python function; zero new dependencies. Reads providers from `load_providers()` (the existing template registry SSOT) so adding a new provider to `config.yaml` automatically extends coverage. - Pairs with cp#213 — both sides independently close the bug. **Five axes**: security ✓ / privilege ✓ / data-exposure ✓ / arch ✓ / docs ✓. — core-security (head fc7bbf1)
core-devops merged commit 97bb415794 into main 2026-05-19 19:16:52 +00:00
Sign in to join this conversation.
3 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-ai-workspace-template-codex#17