From ad4241cebbfdc0e378eb5808f35761621ea188e9 Mon Sep 17 00:00:00 2001 From: dev-lead Date: Fri, 8 May 2026 11:15:39 -0700 Subject: [PATCH] fix(dockerfile): bundle config.yaml into /app so providers registry loads MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The adapter's _load_providers tries 4 paths in order: 1. /opt/adapter/config.yaml — provisioner-managed (currently missing) 2. os.path.dirname(__file__)/config.yaml — alongside adapter.py 3. ${WORKSPACE_CONFIG_PATH}/config.yaml — workspace overrides 4. _BUILTIN_PROVIDERS — oauth + anthropic-api only On this template's docker image /opt/adapter/ is never populated by the platform provisioner (verified 2026-05-08 by SSM-exec on a live canary's workspace EC2: ls /opt/adapter/ → no such file or directory). That makes path 2 — the dir adjacent to /app/adapter.py — the load-bearing one for production workloads. The Dockerfile copies adapter.py + claude_sdk_executor.py + scripts/ + entrypoint.sh + __init__.py into /app, but it does NOT copy config.yaml. So /app/config.yaml doesn't exist, path 2 fails, and the adapter falls all the way through to _BUILTIN_PROVIDERS. _BUILTIN_PROVIDERS contains only anthropic-oauth + anthropic-api. Every MiniMax / GLM / Kimi / DeepSeek model id has no matching prefix in those two, so _resolve_provider returns providers[0] = anthropic-oauth (per "unknown ids fall back to providers[0]" rule). That provider needs CLAUDE_CODE_OAUTH_TOKEN, which is unset for non-OAuth tenants. The claude CLI fails with: Not logged in · Please run /login …which surfaces in the A2A response as "Agent error (Exception)". This is the root cause of: • Canary chronic red since 2026-05-07 02:30 UTC (38h+ at time of investigation) • molecule-core#129 failure mode #1 • Memory feedback_template_vs_workspace_config_separation (template-claude-code PR #37 added the multi-path lookup but didn't bundle config.yaml into the image — the lookup paths point at files that don't exist) Fix: one-line `COPY config.yaml .` in the Dockerfile. Verification path (post-merge): publish-runtime workflow rebuilds the image, deploys to staging tenant fleet, next canary cron run sees /app/config.yaml → loads minimax provider → MINIMAX_API_KEY matches → claude CLI auths → A2A returns PONG → green. Co-Authored-By: Claude Opus 4.7 (1M context) --- Dockerfile | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/Dockerfile b/Dockerfile index 0e51588..49823ac 100644 --- a/Dockerfile +++ b/Dockerfile @@ -43,6 +43,19 @@ RUN pip install --no-cache-dir -r requirements.txt && \ # Copy adapter code COPY adapter.py . COPY __init__.py . +# Provider registry. The adapter's _load_providers walks 4 paths: +# 1. /opt/adapter/config.yaml — provisioner-managed canonical +# 2. os.path.dirname(__file__)/config.yaml — alongside adapter.py (this image) +# 3. ${WORKSPACE_CONFIG_PATH}/config.yaml — workspace per-instance overrides +# 4. _BUILTIN_PROVIDERS — oauth + anthropic-api only +# On this image /opt/adapter/ is never populated by the platform +# provisioner, so path 2 (/app/config.yaml) is the load-bearing one. +# Without this COPY the file isn't in the image, all 3 file paths fail, +# and _load_providers falls through to _BUILTIN_PROVIDERS — every +# MiniMax/GLM/Kimi/DeepSeek model silently routes to anthropic-oauth → +# "Not logged in. Please run /login" at first LLM call. Caused the +# canary's 38h chronic red on 2026-05-07/08 (molecule-core#129). +COPY config.yaml . # Adapter-specific executor — owned by THIS template (universal-runtime # refactor, molecule-core task #87). Lives alongside adapter.py so # Python's import system picks the local /app/claude_sdk_executor.py -- 2.45.2