Adds two operator-visible boot diagnostics that close the diagnosis gap
exposed by the 2026-05-02 MiniMax E2E crash-loop. The universal
canvas-picked-model fix (Bug B) and per-model required_env (Bug D) live
in molecule-core PR #2538 — this PR adds the per-template visibility
that complements them so operators can answer "is the key missing or is
routing wrong?" from `docker logs` alone.
Changes
-------
adapter.py:
- _AUTH_ENV_AUDIT tuple of 8 vendor env names (CLAUDE_CODE_OAUTH_TOKEN,
ANTHROPIC_API_KEY/AUTH_TOKEN/BASE_URL, MINIMAX/GLM/KIMI/DEEPSEEK_API_KEY).
- _audit_auth_env_presence() helper — single INFO line of NAME=set/unset
pairs. NEVER logs values; the test pins this with a "fake-secret-MUST-
NOT-LEAK" sentinel that must never appear in the log message.
- One call site at the end of setup()'s boot banner so every workspace
start emits both "which provider got picked" and "which envs are present"
in adjacent log lines.
entrypoint.sh:
- log_boot_context() function fired once before the gosu drop (as root)
and once after (as agent) so an operator can spot env values lost
across the privilege drop. Emits uid/gid/user/hostname/workspace_id/
platform_url/configs_dir/workspace_dir + the same 8 env names as
NAME=set/unset. Mirror of _AUTH_ENV_AUDIT — list pinned in sync by a
new AST-style test (test_audit_env_list_matches_entrypoint_sh) that
parses entrypoint.sh and asserts set-equality with adapter.py's tuple.
tests/test_adapter_logging.py (new):
- 4 tests covering the audit contract: every name appears, all-unset
scenario, empty-string treated as unset (matches routing semantics),
and the cross-file sync gate against entrypoint.sh's for-loop.
- Stubs molecule_runtime + a2a so the helpers can be imported without
the real wheel installed in CI (mirrors test_adapter_prevalidate.py's
scaffolding pattern).
Why this complements molecule-core PR #2538
-------------------------------------------
- PR #2538 makes Bug B (canvas-picked model silently dropped) impossible
by resolving model centrally in workspace/config.py:load_config —
every adapter (claude-code, hermes, codex, future ones) gets the
passthrough for free.
- PR #2538 makes Bug D (preflight rejects valid auth for non-default
models) impossible by REPLACE-not-union per-entry required_env.
- This template PR is the per-template observability layer: when one
of those universal fixes regresses (or when an operator misconfigs a
vendor key), the boot logs say exactly which env was present at each
tier. Validated end-to-end on workspace
be27badd-00a7-4cef-91e8-af428175c76f (clean boot, MINIMAX_API_KEY=set
audited, no crash-loop).
Closes part of molecule-monorepo task #248. Sibling of #2538 for
molecule-core.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fresh-tenant signup hits "Upload failed: failed to prepare uploads
dir" on the first chat attachment (reported on hongming.moleculesai.app
2026-05-01T18:30Z). Root cause is that workspace/internal_chat_uploads.py
runs `mkdir -p /workspace/.molecule/chat-uploads` as the agent user,
but the volume's `.molecule` subdir surfaces root-owned in some race
windows (volume cache + new mount + RW remount during reboot/redeploy).
Pre-creating the directory tree as root in the entrypoint, BEFORE
gosu drops to agent, eliminates the class entirely — the upload
handler's `mkdir(parents=True, exist_ok=True)` is a no-op on the
common path and the failure mode it currently surfaces no longer
exists.
Idempotent: works on fresh volumes (creates) and reused volumes
(no-op + chown re-asserts ownership in case a prior process changed
it).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move the model→endpoint→auth-env mapping out of hardcoded constants
in adapter.py + entrypoint.sh into a single `providers:` list at the
top of config.yaml. The adapter loads it at boot via _load_providers;
canvas Config tab will read the same YAML for its Provider dropdown so
UI and adapter never disagree on what's available. Adding a new
provider becomes a one-line YAML edit — no Python or shell changes.
Includes 5 third-party providers ready out of the box (Anthropic-compat
endpoints, Bearer-style ANTHROPIC_AUTH_TOKEN OR ANTHROPIC_API_KEY auth):
xiaomi-mimo https://api.xiaomimimo.com/anthropic
minimax https://api.minimax.io/anthropic
zai https://api.z.ai/api/anthropic (NEW)
moonshot https://api.moonshot.ai/anthropic (NEW)
deepseek https://api.deepseek.com/anthropic (NEW)
Plus 7 new model entries in runtime_config.models (mimo-v2.5, MiniMax-M2,
MiniMax-M2.7, GLM-4.6, GLM-4.5, kimi-k2.5, kimi-k2, deepseek-v4-pro,
deepseek-v4-flash) so they show up in the Canvas Config dropdown.
Operator override unchanged: ANTHROPIC_BASE_URL set as a workspace
secret still wins over the registry default — the escape hatch for
regional endpoints (Xiaomi token-plan-sgp, MiniMax api.minimaxi.com).
entrypoint.sh: drops the `mimo-*` case mapping (adapter handles routing
now). _BUILTIN_PROVIDERS retained as malformed-YAML fallback so a
bare-bones workspace still boots with oauth + anthropic-api defaults.
Tests: 25 passing. New coverage:
- YAML parses + normalizes to expected shape
- Malformed YAML falls back to builtins (warning, not raise)
- Each new provider routes its model id to the right base_url
- ANTHROPIC_AUTH_TOKEN alone satisfies third-party auth check
- Operator-set ANTHROPIC_BASE_URL overrides registry default
- Case-insensitive prefix match (MiniMax-M2 / minimax-m2.7 / GLM-4.6)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- README: split Xiaomi MiMo into pay-as-you-go vs Token Plan rows,
explicitly document ANTHROPIC_BASE_URL as a required secret for
Token Plan users, and note that operator-set values always win over
the shell mapping fallback
- entrypoint.sh: add supported Xiaomi MiMo endpoints comment listing
pay-as-you-go + Token Plan SG/HK URLs for discoverability
Adds 4 model entries (mimo-v2-flash, mimo-v2-pro, mimo-v2-omni,
mimo-v2.5-pro) selectable from canvas. When MODEL matches mimo-*,
entrypoint.sh exports ANTHROPIC_BASE_URL=https://api.xiaomimimo.com/anthropic
so the claude CLI's native ANTHROPIC_BASE_URL handling routes there.
ANTHROPIC_API_KEY in this case is the Xiaomi key, not Anthropic Console.
Verified live against all 4 model IDs with x-api-key auth — all returned
200 with proper Anthropic-shape Messages responses (id, type=message,
role=assistant, content[].text, usage including cache_read_input_tokens).
Operator-set ANTHROPIC_BASE_URL is never overridden — the case-statement
only fills in the default when unset, so a user-supplied proxy still wins.
Marked as testing because the model→base-URL mapping currently lives in
entrypoint.sh shell. The robust shape is a data-driven `runtime_env`
field in config.yaml read by the platform provisioner; will follow up
with that as a separate cross-repo PR (workspace-server + canvas) so
this template no longer carries provider-specific knowledge in shell.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Symptoms before this PR:
- After ~60 min of workspace uptime, every git push/clone returns 401
- PMM, DevRel, Social Media Brand and other content agents infinite-loop
status reports back to PMs ("I tried, GH_TOKEN dead")
- PM A2A queues overflow with retry-status messages (depth 27 on Marketing
Lead, 18 on Dev Lead, 11 on Core Platform Lead at peak)
Root cause:
- GH_TOKEN/GITHUB_TOKEN injected at provision time has a ~60 min TTL
(GitHub App installation tokens cap at one hour)
- Workspace env is frozen at container start — no in-process mechanism
to refresh after expiry
- The credential-helper architecture exists in the codebase but was
never wired up at template boot. Specifically the claude-code template:
- did not COPY the helper scripts into the image
- did not configure git credential.helper at boot
- did not start the background refresh daemon
- did not run initial gh auth login
Fix:
1. Dockerfile COPYs scripts/molecule-git-token-helper.sh and
scripts/molecule-gh-token-refresh.sh into /app/scripts/
2. entrypoint.sh (root half) configures git credential helper for
github.com and creates the per-user token cache directory
3. entrypoint.sh (agent half) starts the refresh daemon under a
respawn loop and runs initial `gh auth login --with-token`
The helper hits the platform's /admin/github-installation-token endpoint
(fallback to env-var GH_TOKEN when platform unreachable). The refresh
daemon calls _refresh_gh every ~45 min ± 2 min jitter so cli auth and
helper cache stay warm even when no git operation triggers a refresh.
Acceptance:
- After this image deploys, `gh api /user` from inside a workspace
should keep returning 200 even after >60 min uptime
- Marketing Lead / Dev Lead a2a queues should drain to <5 within one
cycle of the new image rolling
Follow-up issues to file (not in this PR):
- Replicate this wiring in the other 7 template repos (autogen, crewai,
deepagents, gemini-cli, hermes, langgraph, openclaw)
- Lift the wiring into the molecule-runtime PyPI package so future
templates inherit it instead of re-implementing
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>