fix(dockerfile): bundle config.yaml into /app so providers registry loads #6
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "fix/dockerfile-bundle-config-yaml"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Closes molecule-core#129 failure mode #1
38-hour canary chronic red root-caused. Live SSM capture of the workspace EC2 shows:
Root cause
The adapter's
_load_providerstries 4 paths in order:/opt/adapter/config.yaml— provisioner-managed canonical (currently missing)os.path.dirname(__file__)/config.yaml— alongside adapter.py (this image's/app/)${WORKSPACE_CONFIG_PATH}/config.yaml— workspace overrides_BUILTIN_PROVIDERS— oauth + anthropic-api onlyVerified by
ls /opt/adapter/on a live canary's workspace EC2: directory doesn't exist. So path 2 (/app/config.yaml) is the load-bearing one.Dockerfile copies
adapter.py,__init__.py,claude_sdk_executor.py,scripts/,entrypoint.sh— but does not copyconfig.yaml. So/app/config.yamldoesn't exist either. All 3 file paths fail._load_providersreturns_BUILTIN_PROVIDERS._BUILTIN_PROVIDERShas onlyanthropic-oauth+anthropic-api. Every MiniMax / GLM / Kimi / DeepSeek model id has no matching prefix →_resolve_providerreturnsproviders[0]=anthropic-oauth(per "unknown ids fall back to providers[0]" rule). That provider needsCLAUDE_CODE_OAUTH_TOKEN, unset for non-OAuth tenants. Claude CLI errorsNot logged in · Please run /login. The adapter wraps it as"Agent error (Exception)"and ships it to A2A.Fix
One-line
COPY config.yaml .afterCOPY __init__.py .in the Dockerfile. Now/app/config.yamlships with the image and path 2 of the 4-path lookup finds it.How this got missed
Memory
feedback_template_vs_workspace_config_separation(template-claude-code PR #37, 2026-05-04) added the multi-path lookup precisely to fix the original bug pattern (per-workspace config.yaml shouldn't carry providers — that's a template concern). PR #37 added the lookup logic but didn't bundleconfig.yamlinto the image, so the canonical path it expects doesn't exist anywhere — the same fallback-to-builtins bug persisted with a different code path producing it.Verification
Verified the failure path live:
e2e-canary-20260508-debug-177826via canary script withE2E_KEEP_ORG=1.i-01383cddf3b71e211,docker logsof the workspace container showed the boot-time audit + the SDK exception.docker exec ... cat /opt/adapter/config.yaml→ no such file.docker exec ... ls /app/config.yaml→ no such file.docker exec ... cat /configs/config.yaml→ hasmodel: MiniMax-M2.7-highspeedbut noproviders:section (canary's PUT replaced).Post-merge verification: publish-runtime workflow rebuilds image, deploys to staging tenant fleet, next canary cron run sees
/app/config.yaml→ loads minimax provider →MINIMAX_API_KEYmatches → claude CLI auths → A2A returns PONG → green.Out of scope (for follow-up)
/opt/adapter/despite that being the documented "canonical" path. Tracked separately. Fixing path 2 (this PR) makes path 1's absence non-blocking./configs/config.yamlwholesale. Not strictly a bug since path 2 (template's) is now load-bearing, but the canary should arguably preserve the workspace-level config or do a partial merge. Tracked in molecule-core#129 follow-ups.🤖 Generated with Claude Code
The adapter's _load_providers tries 4 paths in order: 1. /opt/adapter/config.yaml — provisioner-managed (currently missing) 2. os.path.dirname(__file__)/config.yaml — alongside adapter.py 3. ${WORKSPACE_CONFIG_PATH}/config.yaml — workspace overrides 4. _BUILTIN_PROVIDERS — oauth + anthropic-api only On this template's docker image /opt/adapter/ is never populated by the platform provisioner (verified 2026-05-08 by SSM-exec on a live canary's workspace EC2: ls /opt/adapter/ → no such file or directory). That makes path 2 — the dir adjacent to /app/adapter.py — the load-bearing one for production workloads. The Dockerfile copies adapter.py + claude_sdk_executor.py + scripts/ + entrypoint.sh + __init__.py into /app, but it does NOT copy config.yaml. So /app/config.yaml doesn't exist, path 2 fails, and the adapter falls all the way through to _BUILTIN_PROVIDERS. _BUILTIN_PROVIDERS contains only anthropic-oauth + anthropic-api. Every MiniMax / GLM / Kimi / DeepSeek model id has no matching prefix in those two, so _resolve_provider returns providers[0] = anthropic-oauth (per "unknown ids fall back to providers[0]" rule). That provider needs CLAUDE_CODE_OAUTH_TOKEN, which is unset for non-OAuth tenants. The claude CLI fails with: Not logged in · Please run /login …which surfaces in the A2A response as "Agent error (Exception)". This is the root cause of: • Canary chronic red since 2026-05-07 02:30 UTC (38h+ at time of investigation) • molecule-core#129 failure mode #1 • Memory feedback_template_vs_workspace_config_separation (template-claude-code PR #37 added the multi-path lookup but didn't bundle config.yaml into the image — the lookup paths point at files that don't exist) Fix: one-line `COPY config.yaml .` in the Dockerfile. Verification path (post-merge): publish-runtime workflow rebuilds the image, deploys to staging tenant fleet, next canary cron run sees /app/config.yaml → loads minimax provider → MINIMAX_API_KEY matches → claude CLI auths → A2A returns PONG → green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>LGTM. One-line fix that closes the canary's 38h chronic red. Live SSM verification: /app/config.yaml is missing → _load_providers falls through to _BUILTIN_PROVIDERS → MiniMax routes to anthropic-oauth → Not logged in. The COPY config.yaml puts the file at path 2 of the lookup.