fix(dockerfile): bundle config.yaml into /app so providers registry loads
All checks were successful
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
CI / Adapter unit tests (push) Successful in 55s
CI / Adapter unit tests (pull_request) Successful in 1m0s
CI / validate (pull_request) Successful in 3m10s
CI / validate (push) Successful in 3m10s

The adapter's _load_providers tries 4 paths in order:
  1. /opt/adapter/config.yaml  — provisioner-managed (currently missing)
  2. os.path.dirname(__file__)/config.yaml  — alongside adapter.py
  3. ${WORKSPACE_CONFIG_PATH}/config.yaml  — workspace overrides
  4. _BUILTIN_PROVIDERS  — oauth + anthropic-api only

On this template's docker image /opt/adapter/ is never populated by
the platform provisioner (verified 2026-05-08 by SSM-exec on a live
canary's workspace EC2: ls /opt/adapter/ → no such file or directory).
That makes path 2 — the dir adjacent to /app/adapter.py — the
load-bearing one for production workloads.

The Dockerfile copies adapter.py + claude_sdk_executor.py + scripts/
+ entrypoint.sh + __init__.py into /app, but it does NOT copy
config.yaml. So /app/config.yaml doesn't exist, path 2 fails, and
the adapter falls all the way through to _BUILTIN_PROVIDERS.

_BUILTIN_PROVIDERS contains only anthropic-oauth + anthropic-api.
Every MiniMax / GLM / Kimi / DeepSeek model id has no matching
prefix in those two, so _resolve_provider returns providers[0] =
anthropic-oauth (per "unknown ids fall back to providers[0]" rule).
That provider needs CLAUDE_CODE_OAUTH_TOKEN, which is unset for
non-OAuth tenants. The claude CLI fails with:
  Not logged in · Please run /login

…which surfaces in the A2A response as "Agent error (Exception)".

This is the root cause of:
  • Canary chronic red since 2026-05-07 02:30 UTC (38h+ at time of
    investigation)
  • molecule-core#129 failure mode #1
  • Memory feedback_template_vs_workspace_config_separation
    (template-claude-code PR #37 added the multi-path lookup but
    didn't bundle config.yaml into the image — the lookup paths
    point at files that don't exist)

Fix: one-line `COPY config.yaml .` in the Dockerfile.

Verification path (post-merge): publish-runtime workflow rebuilds
the image, deploys to staging tenant fleet, next canary cron run
sees /app/config.yaml → loads minimax provider → MINIMAX_API_KEY
matches → claude CLI auths → A2A returns PONG → green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
dev-lead 2026-05-08 11:15:39 -07:00
parent 3c849b3ba7
commit ad4241cebb

View File

@ -43,6 +43,19 @@ RUN pip install --no-cache-dir -r requirements.txt && \
# Copy adapter code # Copy adapter code
COPY adapter.py . COPY adapter.py .
COPY __init__.py . COPY __init__.py .
# Provider registry. The adapter's _load_providers walks 4 paths:
# 1. /opt/adapter/config.yaml — provisioner-managed canonical
# 2. os.path.dirname(__file__)/config.yaml — alongside adapter.py (this image)
# 3. ${WORKSPACE_CONFIG_PATH}/config.yaml — workspace per-instance overrides
# 4. _BUILTIN_PROVIDERS — oauth + anthropic-api only
# On this image /opt/adapter/ is never populated by the platform
# provisioner, so path 2 (/app/config.yaml) is the load-bearing one.
# Without this COPY the file isn't in the image, all 3 file paths fail,
# and _load_providers falls through to _BUILTIN_PROVIDERS — every
# MiniMax/GLM/Kimi/DeepSeek model silently routes to anthropic-oauth →
# "Not logged in. Please run /login" at first LLM call. Caused the
# canary's 38h chronic red on 2026-05-07/08 (molecule-core#129).
COPY config.yaml .
# Adapter-specific executor — owned by THIS template (universal-runtime # Adapter-specific executor — owned by THIS template (universal-runtime
# refactor, molecule-core task #87). Lives alongside adapter.py so # refactor, molecule-core task #87). Lives alongside adapter.py so
# Python's import system picks the local /app/claude_sdk_executor.py # Python's import system picks the local /app/claude_sdk_executor.py