Files
molecule-ai-workspace-templ…/adapter.py
infra-runtime-be fc7bbf1560
CI / Adapter unit tests (push) Successful in 1m16s
CI / Adapter unit tests (pull_request) Successful in 1m15s
CI / Template validation (static) (push) Successful in 1m48s
CI / Template validation (static) (pull_request) Successful in 1m53s
CI / Template validation (runtime) (pull_request) Successful in 2m25s
CI / Template validation (runtime) (push) Successful in 2m46s
CI / T4 tier-4 conformance (live) (pull_request) Successful in 2m21s
CI / T4 tier-4 conformance (live) (push) Successful in 2m39s
CI / validate (pull_request) Successful in 2s
CI / validate (push) Successful in 1s
fix: refuse boot when YAML model: field carries a provider name
Defense-in-depth for the CP workspace-config writer bug that wedged
prod-Reviewer + prod-Researcher on 2026-05-18/19. Upstream CP
provisioner conflated MODEL (model id, e.g. gpt-5.5) with MODEL_PROVIDER
(provider name, e.g. openai-subscription) and stamped the provider name
into /configs/config.yaml's `model:` field. Codex thread/start silently
accepted the garbage and the executor's reader thread wedged in wait4.

This is the template-side half of the class-fix (CP-side is the
structural fix). Either side alone closes the bug; both together prevents
recurrence on any future writer regression.

What this PR adds:
- provider_config.assert_model_is_not_provider_name(model, providers)
  raises RuntimeError with an actionable message naming the bad value,
  the registry it collided with, and the upstream writer to fix.
- adapter.CodexAdapter.setup() calls it AFTER load_providers and BEFORE
  resolve_provider so codex never sees the garbage.
- 6 unit tests pin behavior: real model ids pass, provider names raise
  with the actionable error shape, case-insensitive match.
- 2 integration tests pin adapter.setup() boot behavior on a real
  AdapterConfig.

Refs:
- reference_codex_prod_reviewer_researcher_wedge_in_executor_not_codex_2026_05_18
- reference_runtime_provider_creds_and_template_id_footgun
- feedback_template_vs_workspace_config_separation
2026-05-19 12:00:32 -07:00

240 lines
10 KiB
Python

"""Codex CLI adapter — runs OpenAI Codex (`@openai/codex`) inside the workspace.
This template wraps OpenAI's Codex CLI as a Molecule workspace runtime.
The actual A2A bridge lives in ``executor.py`` — this file is just the
``BaseAdapter`` shell: name, display metadata, config schema, executor
factory, and an ``OPENAI_API_KEY`` reachability check at setup.
Architecture in one paragraph: each workspace session holds one
long-lived ``codex app-server`` child (spawned by ``executor.py`` on
first turn) plus one Codex thread. A2A messages become ``turn/start``
RPCs against that thread, giving us session continuity + queued
mid-turn handling. See
``docs/integrations/codex-app-server-adapter-design.md`` in
molecule-core for the full design.
We deliberately do NOT run a separate daemon here (unlike hermes,
where a long-running gateway listens on :8642 from container boot).
``codex app-server`` is a stdio child of the executor, not a network
service — fewer moving parts, no port to configure, no health endpoint
to wait on at start time.
"""
from __future__ import annotations
import logging
import os
import shutil
from pathlib import Path
from molecule_runtime.adapters.base import BaseAdapter, AdapterConfig
logger = logging.getLogger(__name__)
class CodexAdapter(BaseAdapter):
"""Adapter that proxies A2A turns to a persistent codex app-server."""
@staticmethod
def name() -> str:
return "codex"
@staticmethod
def display_name() -> str:
return "OpenAI Codex CLI"
@staticmethod
def description() -> str:
return (
"Runs the OpenAI Codex CLI (@openai/codex) with native session "
"continuity. Each A2A message becomes a turn against a "
"long-lived codex thread — same UX shape as hermes/openclaw, "
"MCP-native push parity with claude-code."
)
@staticmethod
def get_config_schema() -> dict:
return {
"model": {
"type": "string",
"description": (
"Codex model. Pass through to `thread/start`. May-2026 "
"roster: 'gpt-5.5' (default), 'gpt-5.4', 'gpt-5.4-mini', "
"'gpt-5.3-codex', 'gpt-5.3-codex-spark', 'gpt-5.2'. "
"Empty = codex default (gpt-5.5)."
),
},
"provider": {
"type": "string",
"description": (
"Optional codex provider id from the `providers:` "
"registry in config.yaml (e.g. 'openai-subscription', "
"'openai-api', 'minimax-token-plan'). Empty = "
"auto-resolve from model + env credentials."
),
},
}
async def setup(self, config: AdapterConfig) -> None:
"""Verify the codex binary is on PATH and a credential is set, then
render ``~/.codex/config.toml`` from the providers registry.
We do NOT spawn the app-server here — that happens lazily on
the first turn inside the executor. Failing fast at setup
time with a clear message beats a confusing ``FileNotFoundError``
from the executor's first ``asyncio.create_subprocess_exec``.
Provider resolution (see ``provider_config.resolve_provider``):
1. Explicit ``provider`` field in ``runtime_config`` /
``MODEL_PROVIDER`` env wins.
2. Else, if any ``chatgpt_subscription`` provider's auth_env
is set (``CODEX_AUTH_JSON`` / ``CODEX_CHATGPT_AUTH_JSON``),
pick it — preserves the verified prod behavior where the
subscription beats a co-set vendor key.
3. Else, model-prefix / alias match against the registry.
4. Else, first credential-satisfied entry, with the registry's
first entry as the final fallback.
The resolved provider is then rendered to ``~/.codex/config.toml``:
built-in modes (subscription, openai_api) emit NO override (the
CLI's native OpenAI/Responses provider handles them); compat
providers emit ``[model_providers.<slug>]`` + ``model_provider``.
"""
if not shutil.which("codex"):
raise RuntimeError(
"codex binary not on PATH. The Dockerfile installs "
"@openai/codex globally via npm — if you're running "
"outside the container, install it with: "
"`npm install -g @openai/codex`"
)
# Auth: codex resolves credentials in three ways and any one
# is sufficient. Mirror that here so setup() does not
# false-fail a validly-authed workspace:
# A. OPENAI_API_KEY — direct OpenAI path (codex default).
# B. MINIMAX_API_KEY — MiniMax chat-wire route
# (codex_minimax_config.sh writes config.toml).
# C. $CODEX_HOME/auth.json — an injected ChatGPT/Codex
# -subscription credential (auth_mode:"chatgpt"),
# materialized by start.sh from the CODEX_AUTH_JSON env
# var (Infisical SSOT /shared/codex-oauth, key
# CODEX_AUTH_JSON, env=prod; CODEX_CHATGPT_AUTH_JSON is a
# backward-compat alias) for a SINGLE runner. This mirrors
# OpenClaw's openai-codex auth.order: prefer an injected
# subscription auth.json over the pay-as-you-go API key.
# Codex prefers auth.json over env keys. The
# OPENAI_API_KEY path (A) is retained as the documented
# fallback and is intentionally NOT removed.
# CODEX_HOME defaults to ~/.codex; honor an explicit override
# so a non-default home is still detected.
codex_home = os.environ.get("CODEX_HOME") or os.path.join(
os.path.expanduser("~"), ".codex"
)
auth_json = Path(codex_home) / "auth.json"
has_auth_json = auth_json.is_file() and auth_json.stat().st_size > 0
if not (
os.environ.get("OPENAI_API_KEY")
or os.environ.get("MINIMAX_API_KEY")
or has_auth_json
):
raise RuntimeError(
"No codex credential found. Codex needs exactly one "
"of: OPENAI_API_KEY (direct OpenAI), MINIMAX_API_KEY "
"(MiniMax token-plan codex route), or an injected "
"ChatGPT/Codex-subscription auth.json at "
f"{auth_json} (set CODEX_AUTH_JSON for a single-runner "
"workspace). Configure via the canvas Config tab."
)
# --- Provider resolution + config.toml rendering ---
# Pull the picked model + (optional) explicit provider from
# runtime_config (the canvas Config tab writes here on Save).
rc = getattr(config, "runtime_config", None)
if isinstance(rc, dict):
yaml_model = rc.get("model") or ""
yaml_provider = rc.get("provider") or ""
else:
yaml_model = getattr(rc, "model", None) or getattr(config, "model", "") or ""
yaml_provider = getattr(rc, "provider", None) or ""
# MODEL_PROVIDER env from the persona-env layer (if any) wins
# over YAML when set — mirrors the claude-code template's
# _resolve_model_and_provider_from_env shape.
env_provider = (os.environ.get("MODEL_PROVIDER") or "").strip()
explicit_provider = env_provider or yaml_provider or None
try:
from provider_config import (
assert_model_is_not_provider_name,
load_providers, resolve_provider, write_config_toml,
)
except ImportError as exc:
# Defensive: fall back to the legacy shell-script path
# below if the module can't be imported (e.g. a partial
# install). The credential preflight above has already
# gated; codex will boot off OPENAI_API_KEY or auth.json
# using the CLI defaults.
logger.warning(
"codex: provider_config import failed (%s); "
"skipping registry-driven config.toml render",
exc,
)
return
providers = load_providers(
workspace_config_path=getattr(config, "config_path", "") or "",
)
# Defense-in-depth for the CP workspace-config writer bug
# (2026-05-18 Reviewer + Researcher wedge): if the upstream
# writer stamped a PROVIDER name into the YAML `model:` field
# (e.g. model: 'openai-subscription'), refuse to boot rather
# than letting codex thread/start accept the garbage and wedge.
# Either side alone closes the bug — see
# `assert_model_is_not_provider_name` doc + the structural fix
# in molecule-controlplane's userdata_containerized.go /
# ec2.go writer.
assert_model_is_not_provider_name(yaml_model, providers)
try:
picked = resolve_provider(
yaml_model, providers,
explicit_provider=explicit_provider,
)
except ValueError:
# Re-raise with the actionable message intact — silent
# fallback to providers[0] when the operator picked an
# unknown name would route them through the wrong
# base_url + env key (the analog #180 in claude-code).
raise
# Render + write config.toml. For built-in OpenAI auth modes
# (subscription, openai_api) this writes NOTHING and clears
# any stale auto-generated override — exactly the verified
# device-logged codex-0.130 shape that the prod-Reviewer /
# prod-Researcher path requires.
codex_home = os.environ.get("CODEX_HOME") or os.path.join(
os.path.expanduser("~"), ".codex"
)
try:
written = write_config_toml(
picked, model=yaml_model or None, codex_home=codex_home,
)
except ValueError as exc:
# Misconfigured registry entry (missing base_url / vendor
# env). Fail closed so the operator sees the YAML defect.
raise RuntimeError(
f"codex provider registry: {exc}"
) from exc
logger.info(
"codex adapter: provider=%s auth_mode=%s wrote=%s",
picked["name"], picked["auth_mode"],
str(written) if written else "<no override>",
)
async def create_executor(self, config: AdapterConfig):
from executor import CodexAppServerExecutor
return CodexAppServerExecutor(config)
Adapter = CodexAdapter