Commit Graph

7 Commits

Author SHA1 Message Date
Hongming Wang
78ae139609 feat(adapter,entrypoint): boot env audit + crash-loop diagnosis logging
Adds two operator-visible boot diagnostics that close the diagnosis gap
exposed by the 2026-05-02 MiniMax E2E crash-loop. The universal
canvas-picked-model fix (Bug B) and per-model required_env (Bug D) live
in molecule-core PR #2538 — this PR adds the per-template visibility
that complements them so operators can answer "is the key missing or is
routing wrong?" from `docker logs` alone.

Changes
-------
adapter.py:
- _AUTH_ENV_AUDIT tuple of 8 vendor env names (CLAUDE_CODE_OAUTH_TOKEN,
  ANTHROPIC_API_KEY/AUTH_TOKEN/BASE_URL, MINIMAX/GLM/KIMI/DEEPSEEK_API_KEY).
- _audit_auth_env_presence() helper — single INFO line of NAME=set/unset
  pairs. NEVER logs values; the test pins this with a "fake-secret-MUST-
  NOT-LEAK" sentinel that must never appear in the log message.
- One call site at the end of setup()'s boot banner so every workspace
  start emits both "which provider got picked" and "which envs are present"
  in adjacent log lines.

entrypoint.sh:
- log_boot_context() function fired once before the gosu drop (as root)
  and once after (as agent) so an operator can spot env values lost
  across the privilege drop. Emits uid/gid/user/hostname/workspace_id/
  platform_url/configs_dir/workspace_dir + the same 8 env names as
  NAME=set/unset. Mirror of _AUTH_ENV_AUDIT — list pinned in sync by a
  new AST-style test (test_audit_env_list_matches_entrypoint_sh) that
  parses entrypoint.sh and asserts set-equality with adapter.py's tuple.

tests/test_adapter_logging.py (new):
- 4 tests covering the audit contract: every name appears, all-unset
  scenario, empty-string treated as unset (matches routing semantics),
  and the cross-file sync gate against entrypoint.sh's for-loop.
- Stubs molecule_runtime + a2a so the helpers can be imported without
  the real wheel installed in CI (mirrors test_adapter_prevalidate.py's
  scaffolding pattern).

Why this complements molecule-core PR #2538
-------------------------------------------
- PR #2538 makes Bug B (canvas-picked model silently dropped) impossible
  by resolving model centrally in workspace/config.py:load_config —
  every adapter (claude-code, hermes, codex, future ones) gets the
  passthrough for free.
- PR #2538 makes Bug D (preflight rejects valid auth for non-default
  models) impossible by REPLACE-not-union per-entry required_env.
- This template PR is the per-template observability layer: when one
  of those universal fixes regresses (or when an operator misconfigs a
  vendor key), the boot logs say exactly which env was present at each
  tier. Validated end-to-end on workspace
  be27badd-00a7-4cef-91e8-af428175c76f (clean boot, MINIMAX_API_KEY=set
  audited, no crash-loop).

Closes part of molecule-monorepo task #248. Sibling of #2538 for
molecule-core.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 21:41:05 -07:00
Hongming Wang
59dff3d36d fix(entrypoint): pre-create /workspace/.molecule/chat-uploads
Fresh-tenant signup hits "Upload failed: failed to prepare uploads
dir" on the first chat attachment (reported on hongming.moleculesai.app
2026-05-01T18:30Z). Root cause is that workspace/internal_chat_uploads.py
runs `mkdir -p /workspace/.molecule/chat-uploads` as the agent user,
but the volume's `.molecule` subdir surfaces root-owned in some race
windows (volume cache + new mount + RW remount during reboot/redeploy).

Pre-creating the directory tree as root in the entrypoint, BEFORE
gosu drops to agent, eliminates the class entirely — the upload
handler's `mkdir(parents=True, exist_ok=True)` is a no-op on the
common path and the failure mode it currently surfaces no longer
exists.

Idempotent: works on fresh volumes (creates) and reused volumes
(no-op + chown re-asserts ownership in case a prior process changed
it).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 11:45:55 -07:00
Hongming Wang
c6f4912d09 feat(adapter): data-driven provider registry in config.yaml
Move the model→endpoint→auth-env mapping out of hardcoded constants
in adapter.py + entrypoint.sh into a single `providers:` list at the
top of config.yaml. The adapter loads it at boot via _load_providers;
canvas Config tab will read the same YAML for its Provider dropdown so
UI and adapter never disagree on what's available. Adding a new
provider becomes a one-line YAML edit — no Python or shell changes.

Includes 5 third-party providers ready out of the box (Anthropic-compat
endpoints, Bearer-style ANTHROPIC_AUTH_TOKEN OR ANTHROPIC_API_KEY auth):

  xiaomi-mimo  https://api.xiaomimimo.com/anthropic
  minimax      https://api.minimax.io/anthropic
  zai          https://api.z.ai/api/anthropic           (NEW)
  moonshot     https://api.moonshot.ai/anthropic        (NEW)
  deepseek     https://api.deepseek.com/anthropic       (NEW)

Plus 7 new model entries in runtime_config.models (mimo-v2.5, MiniMax-M2,
MiniMax-M2.7, GLM-4.6, GLM-4.5, kimi-k2.5, kimi-k2, deepseek-v4-pro,
deepseek-v4-flash) so they show up in the Canvas Config dropdown.

Operator override unchanged: ANTHROPIC_BASE_URL set as a workspace
secret still wins over the registry default — the escape hatch for
regional endpoints (Xiaomi token-plan-sgp, MiniMax api.minimaxi.com).

entrypoint.sh: drops the `mimo-*` case mapping (adapter handles routing
now). _BUILTIN_PROVIDERS retained as malformed-YAML fallback so a
bare-bones workspace still boots with oauth + anthropic-api defaults.

Tests: 25 passing. New coverage:
  - YAML parses + normalizes to expected shape
  - Malformed YAML falls back to builtins (warning, not raise)
  - Each new provider routes its model id to the right base_url
  - ANTHROPIC_AUTH_TOKEN alone satisfies third-party auth check
  - Operator-set ANTHROPIC_BASE_URL overrides registry default
  - Case-insensitive prefix match (MiniMax-M2 / minimax-m2.7 / GLM-4.6)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 23:29:40 -07:00
Hongming Wang
def15d3738 fix: document Token Plan URL support and multi-endpoint routing
- README: split Xiaomi MiMo into pay-as-you-go vs Token Plan rows,
  explicitly document ANTHROPIC_BASE_URL as a required secret for
  Token Plan users, and note that operator-set values always win over
  the shell mapping fallback
- entrypoint.sh: add supported Xiaomi MiMo endpoints comment listing
  pay-as-you-go + Token Plan SG/HK URLs for discoverability
2026-04-29 16:56:43 -07:00
Hongming Wang
a21d16d94f feat: add Xiaomi MiMo support via Anthropic-API-compatible routing (testing)
Adds 4 model entries (mimo-v2-flash, mimo-v2-pro, mimo-v2-omni,
mimo-v2.5-pro) selectable from canvas. When MODEL matches mimo-*,
entrypoint.sh exports ANTHROPIC_BASE_URL=https://api.xiaomimimo.com/anthropic
so the claude CLI's native ANTHROPIC_BASE_URL handling routes there.
ANTHROPIC_API_KEY in this case is the Xiaomi key, not Anthropic Console.

Verified live against all 4 model IDs with x-api-key auth — all returned
200 with proper Anthropic-shape Messages responses (id, type=message,
role=assistant, content[].text, usage including cache_read_input_tokens).

Operator-set ANTHROPIC_BASE_URL is never overridden — the case-statement
only fills in the default when unset, so a user-supplied proxy still wins.

Marked as testing because the model→base-URL mapping currently lives in
entrypoint.sh shell. The robust shape is a data-driven `runtime_env`
field in config.yaml read by the platform provisioner; will follow up
with that as a separate cross-repo PR (workspace-server + canvas) so
this template no longer carries provider-specific knowledge in shell.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 03:13:01 -07:00
rabbitblood
d4ab584deb fix: wire up GitHub App token refresh — fixes #1933
Symptoms before this PR:
- After ~60 min of workspace uptime, every git push/clone returns 401
- PMM, DevRel, Social Media Brand and other content agents infinite-loop
  status reports back to PMs ("I tried, GH_TOKEN dead")
- PM A2A queues overflow with retry-status messages (depth 27 on Marketing
  Lead, 18 on Dev Lead, 11 on Core Platform Lead at peak)

Root cause:
- GH_TOKEN/GITHUB_TOKEN injected at provision time has a ~60 min TTL
  (GitHub App installation tokens cap at one hour)
- Workspace env is frozen at container start — no in-process mechanism
  to refresh after expiry
- The credential-helper architecture exists in the codebase but was
  never wired up at template boot. Specifically the claude-code template:
  - did not COPY the helper scripts into the image
  - did not configure git credential.helper at boot
  - did not start the background refresh daemon
  - did not run initial gh auth login

Fix:
1. Dockerfile COPYs scripts/molecule-git-token-helper.sh and
   scripts/molecule-gh-token-refresh.sh into /app/scripts/
2. entrypoint.sh (root half) configures git credential helper for
   github.com and creates the per-user token cache directory
3. entrypoint.sh (agent half) starts the refresh daemon under a
   respawn loop and runs initial `gh auth login --with-token`

The helper hits the platform's /admin/github-installation-token endpoint
(fallback to env-var GH_TOKEN when platform unreachable). The refresh
daemon calls _refresh_gh every ~45 min ± 2 min jitter so cli auth and
helper cache stay warm even when no git operation triggers a refresh.

Acceptance:
- After this image deploys, `gh api /user` from inside a workspace
  should keep returning 200 even after >60 min uptime
- Marketing Lead / Dev Lead a2a queues should drain to <5 within one
  cycle of the new image rolling

Follow-up issues to file (not in this PR):
- Replicate this wiring in the other 7 template repos (autogen, crewai,
  deepagents, gemini-cli, hermes, langgraph, openclaw)
- Lift the wiring into the molecule-runtime PyPI package so future
  templates inherit it instead of re-implementing

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 17:57:30 -07:00
Hongming Wang
fef8fd5c57
fix: install git + gh CLI for agent autonomy loop (#2)
Install git + gh CLI in workspace image for agent autonomy loop.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 00:50:33 +00:00