The adapter's _load_providers tries 4 paths in order:
1. /opt/adapter/config.yaml — provisioner-managed (currently missing)
2. os.path.dirname(__file__)/config.yaml — alongside adapter.py
3. ${WORKSPACE_CONFIG_PATH}/config.yaml — workspace overrides
4. _BUILTIN_PROVIDERS — oauth + anthropic-api only
On this template's docker image /opt/adapter/ is never populated by
the platform provisioner (verified 2026-05-08 by SSM-exec on a live
canary's workspace EC2: ls /opt/adapter/ → no such file or directory).
That makes path 2 — the dir adjacent to /app/adapter.py — the
load-bearing one for production workloads.
The Dockerfile copies adapter.py + claude_sdk_executor.py + scripts/
+ entrypoint.sh + __init__.py into /app, but it does NOT copy
config.yaml. So /app/config.yaml doesn't exist, path 2 fails, and
the adapter falls all the way through to _BUILTIN_PROVIDERS.
_BUILTIN_PROVIDERS contains only anthropic-oauth + anthropic-api.
Every MiniMax / GLM / Kimi / DeepSeek model id has no matching
prefix in those two, so _resolve_provider returns providers[0] =
anthropic-oauth (per "unknown ids fall back to providers[0]" rule).
That provider needs CLAUDE_CODE_OAUTH_TOKEN, which is unset for
non-OAuth tenants. The claude CLI fails with:
Not logged in · Please run /login
…which surfaces in the A2A response as "Agent error (Exception)".
This is the root cause of:
• Canary chronic red since 2026-05-07 02:30 UTC (38h+ at time of
investigation)
• molecule-core#129 failure mode #1
• Memory feedback_template_vs_workspace_config_separation
(template-claude-code PR #37 added the multi-path lookup but
didn't bundle config.yaml into the image — the lookup paths
point at files that don't exist)
Fix: one-line `COPY config.yaml .` in the Dockerfile.
Verification path (post-merge): publish-runtime workflow rebuilds
the image, deploys to staging tenant fleet, next canary cron run
sees /app/config.yaml → loads minimax provider → MINIMAX_API_KEY
matches → claude CLI auths → A2A returns PONG → green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the cache trap structurally (instead of pin-bumping every
runtime release):
1. publish-image.yml caller now forwards
github.event.client_payload.runtime_version (set by cascade) to
the molecule-ci reusable workflow as runtime_version input.
2. Reusable workflow forwards it to docker build as a --build-arg.
3. Dockerfile declares ARG RUNTIME_VERSION near the pip install
layer so its value becomes part of the cache key.
4. The pip install RUN command does an extra targeted upgrade to
the exact version when ARG is set — guarantees the version is
what we expect even if requirements.txt resolves to something
else.
Pairs with molecule-ci PR #12 + molecule-core PR #2181. Together
the pipeline is now race- and cache-proof end-to-end.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First half of molecule-core task #87 — move adapter-specific code out
of the universal molecule-runtime package into the template that
actually consumes it.
Adds:
- claude_sdk_executor.py (757 LOC) — copied verbatim from
molecule-core/workspace/claude_sdk_executor.py @ commit 186f25c2.
The adapter at adapter.py:59 already does
`from claude_sdk_executor import ClaudeSDKExecutor` — once this
file lands at /app/, Python's import order picks the local copy
over the same-named module that older molecule-runtime versions
ship under site-packages.
- Dockerfile: COPY claude_sdk_executor.py . alongside adapter.py.
Pure additive at this stage — molecule-runtime still ships the
file too, so any image built from this template just has two copies
on disk (local /app shadows the site-packages one). No behavior
change.
Sequencing (the molecule-core PR follows AFTER this image rebuilds):
1. THIS PR — template gets local copy, image rebuilds with it
(current PR; safe because no removal yet)
2. molecule-core PR — drop workspace/claude_sdk_executor.py, bump
molecule-ai-workspace-runtime PyPI version. Templates that
haven't pulled the new runtime version still work because their
local copy is unchanged.
3. (later) Bump requirements.txt pin in this template once the
new runtime version is on PyPI, so future builds explicitly
install the slimmed runtime.
Why local-copy-first:
- Reverse order (drop from runtime first, then add to template)
creates a window where any template image build pulling the
latest runtime would fail to import claude_sdk_executor.
- This order has zero downtime: every intermediate state is valid.
Validates the capability primitives shipped in molecule-core PRs
#2137-#2144 — once this template image rebuilds and the molecule-
core deletion lands, the claude-code workspace is the FIRST adapter
to live entirely outside molecule-runtime, with native_session +
idle_timeout_override declared via capabilities() (PR #12 here).
Source: molecule-core/workspace/claude_sdk_executor.py @ 186f25c2
(commit hash pinned for traceability of any future divergence).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Symptoms before this PR:
- After ~60 min of workspace uptime, every git push/clone returns 401
- PMM, DevRel, Social Media Brand and other content agents infinite-loop
status reports back to PMs ("I tried, GH_TOKEN dead")
- PM A2A queues overflow with retry-status messages (depth 27 on Marketing
Lead, 18 on Dev Lead, 11 on Core Platform Lead at peak)
Root cause:
- GH_TOKEN/GITHUB_TOKEN injected at provision time has a ~60 min TTL
(GitHub App installation tokens cap at one hour)
- Workspace env is frozen at container start — no in-process mechanism
to refresh after expiry
- The credential-helper architecture exists in the codebase but was
never wired up at template boot. Specifically the claude-code template:
- did not COPY the helper scripts into the image
- did not configure git credential.helper at boot
- did not start the background refresh daemon
- did not run initial gh auth login
Fix:
1. Dockerfile COPYs scripts/molecule-git-token-helper.sh and
scripts/molecule-gh-token-refresh.sh into /app/scripts/
2. entrypoint.sh (root half) configures git credential helper for
github.com and creates the per-user token cache directory
3. entrypoint.sh (agent half) starts the refresh daemon under a
respawn loop and runs initial `gh auth login --with-token`
The helper hits the platform's /admin/github-installation-token endpoint
(fallback to env-var GH_TOKEN when platform unreachable). The refresh
daemon calls _refresh_gh every ~45 min ± 2 min jitter so cli auth and
helper cache stay warm even when no git operation triggers a refresh.
Acceptance:
- After this image deploys, `gh api /user` from inside a workspace
should keep returning 200 even after >60 min uptime
- Marketing Lead / Dev Lead a2a queues should drain to <5 within one
cycle of the new image rolling
Follow-up issues to file (not in this PR):
- Replicate this wiring in the other 7 template repos (autogen, crewai,
deepagents, gemini-cli, hermes, langgraph, openclaw)
- Lift the wiring into the molecule-runtime PyPI package so future
templates inherit it instead of re-implementing
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>