Commit Graph

8 Commits

Author SHA1 Message Date
66e3b7edb3 feat(claude-code): route Kimi K2.6 to api.kimi.com/coding per official spec
All checks were successful
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 16s
CI / Adapter unit tests (push) Successful in 1m23s
CI / Template validation (static) (push) Successful in 1m27s
CI / Adapter unit tests (pull_request) Successful in 1m25s
CI / Template validation (static) (pull_request) Successful in 1m30s
CI / Template validation (runtime) (push) Successful in 10m27s
CI / Template validation (runtime) (pull_request) Successful in 9m52s
CI / validate (pull_request) Successful in 6s
CI / validate (push) Successful in 5s
Kimi (Kimi-For-Coding / K2.6) was structurally unreachable from the
claude-code runtime: the `kimi-` model prefix matched the `moonshot`
provider, which set ANTHROPIC_BASE_URL=https://api.moonshot.ai/anthropic
and projected KIMI_API_KEY -> ANTHROPIC_AUTH_TOKEN. Both are wrong per
kimi.com's official Claude Code integration doc
(kimi.com/code/docs/en/third-party-tools/other-coding-agents.html):
  - the sk-kimi-* key (KIMI_API_KEY in SSOT) authenticates ONLY against
    https://api.kimi.com/coding/ — the legacy api.moonshot.ai/anthropic
    surface 401s it (invalid_authentication_error);
  - that gateway authenticates with the x-api-key header, which the
    Anthropic SDK / claude CLI emits from ANTHROPIC_API_KEY, NOT the
    Bearer ANTHROPIC_AUTH_TOKEN.

So a Kimi pick on claude-code 401'd every LLM call.

Fix (config + minimal adapter, scoped to this template — adapter.py and
config.yaml are template-local, COPY'd in the Dockerfile; zero blast
radius on other runtimes):

- config.yaml: repoint the existing kimi- provider entry (renamed
  moonshot -> kimi-coding) to base_url https://api.kimi.com/coding/
  (trailing slash, per the doc) and add a new optional per-provider
  field `auth_token_env: ANTHROPIC_API_KEY` so the boot-time vendor-key
  projection writes KIMI_API_KEY into ANTHROPIC_API_KEY (x-api-key)
  instead of the default ANTHROPIC_AUTH_TOKEN (Bearer). Renaming the
  existing entry (vs adding a parallel one) keeps the kimi- model-prefix
  matcher working with the least change; still 7 providers total.
- config.yaml: add a selectable "Kimi K2.6" model catalog entry
  (id kimi-for-coding — the gateway's own served-model name, mirroring
  the proven OpenClaw kimi-for-coding route; the gateway routes to K2.6
  regardless of the wire model id). kimi-k2.5 / kimi-k2 retained as
  aliases hitting the same gateway for back-compat.
- adapter.py: _normalize_provider parses the optional `auth_token_env`
  (default ANTHROPIC_AUTH_TOKEN — preserves MiniMax/GLM/DeepSeek
  behavior bit-for-bit); _project_vendor_auth projects into that
  per-provider target and is idempotent on it (explicit operator value
  still wins).

Wire-verified before commit: POST https://api.kimi.com/coding/v1/messages
with x-api-key=<SSOT KIMI_API_KEY> + anthropic-version + claude-cli UA
-> HTTP 200, model=kimi-for-coding, real completion. The shipped routing
produces exactly this wire shape.

Tests: added 4 tests (Kimi -> ANTHROPIC_API_KEY projection, operator
override idempotency, _normalize_provider auth_token_env parse,
prevalidate routing matrix incl. kimi-for-coding); updated the
moonshot-named fixtures/assertions to the new kimi-coding contract.
Full suite 85 passed.
2026-05-16 04:56:49 -07:00
dev-lead
291f356dab fix(adapter,tests): isolate _load_providers tests from multi-path lookup
All checks were successful
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 5s
CI / Adapter unit tests (push) Successful in 1m1s
CI / Adapter unit tests (pull_request) Successful in 1m2s
CI / validate (push) Successful in 3m23s
CI / validate (pull_request) Successful in 3m22s
The 5 _load_providers tests were single-path-only: they wrote a
config.yaml to tmp_path and called _load_providers(str(tmp_path)),
expecting the lookup to read tmp_path/config.yaml.

After the multi-path fix in #7, _load_providers also checks:
  1. _CANONICAL_ADAPTER_DIR/config.yaml  (= /opt/adapter/config.yaml)
  2. _TEMPLATE_DIR/config.yaml           (= dirname(__file__)/config.yaml)
  3. ${config_path}/config.yaml          (the test's tmp_path)

Path 2 finds the repo's bundled config.yaml on the test runner's
disk before path 3 — the tests then see the bundled providers list
instead of the test's expected behavior.

Two surface changes:

  1. adapter.py — extract `os.path.dirname(os.path.abspath(__file__))`
     into a module-level `_TEMPLATE_DIR` constant, mirroring
     `_CANONICAL_ADAPTER_DIR`. Production behavior identical
     (resolved once at import). Tests can monkeypatch the module
     attribute to redirect the path-2 lookup.

  2. tests/test_adapter_prevalidate.py — 5 _load_providers tests
     monkeypatch `_CANONICAL_ADAPTER_DIR` and `_TEMPLATE_DIR` to a
     non-existent tmp subdir, isolating the test to the workspace
     config_path branch they always meant to test.

The 6th _load_providers test (`test_load_providers_parses_yaml_and_normalizes`)
already passed because path 2 returns 7 providers and that's what
that test expects — left unchanged.

Verification:
  pytest tests/                                 65/65 PASS
  pytest tests/test_adapter_prevalidate.py -k load_providers
                                                  6/6 PASS

Closes molecule-core#129 follow-up — the unit tests were the last
red on the template repo's CI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 13:27:56 -07:00
Hongming Wang
a78626ced4 fix(adapter): keep setup() routing — strip prefix only at CLI invocation
Live-test revealed a regression in PR #24's setup() strip: the wheel-
default `anthropic:claude-opus-4-7` paired with an OAuth workspace
(CLAUDE_CODE_OAUTH_TOKEN set, no ANTHROPIC_API_KEY) is the realistic
production shape. Stripping in setup() routes those users into the
`anthropic-api` provider entry, after which the CLI hangs at
`initialize` because no API key env is set. Caught on workspace
dd40faf8 on 2026-05-01 — banner went `provider=anthropic-api` and
A2A wedged on Control request timeout.

Pre-fix routing (let prefixed strings fall through to providers[0] =
anthropic-oauth) is actually correct for this combo. The strip is only
needed at the CLI invocation site (create_executor) where claude's
`--model` arg must be a bare id.

Tests: replace `test_setup_strip_routes_prefixed_anthropic_to_anthropic_api`
with `test_setup_keeps_prefix_routing_oauth_for_anthropic_prefix`,
which pins the inverse — prefixed model + OAuth env stays on oauth and
emits no API-key warning. The 5 unit cases on `_strip_provider_prefix`
plus the `create_executor` strip pins remain unchanged. 36/36 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:46:54 -07:00
Hongming Wang
6ba4cc6a01 fix(adapter): strip LangChain-style provider prefix before CLI invocation
The molecule-runtime wheel's config.py defaults model to
`anthropic:claude-opus-4-7` so langchain/crewai consumers get a uniform
provider:model string out of the box. The claude CLI's --model arg
expects the bare model id and silently exits 1 (no stderr) on prefixed
strings — root cause of the 2026-05-01 "Agent error (Exception)" mid-A2A
bug. Diagnosed via strace on a live workspace: the CLI received
`--model anthropic:claude-opus-4-7` and exit_group(1)'d before any
non-fatal output.

Add `_strip_provider_prefix` and call it in both setup() (so
_resolve_provider routes anthropic:claude-X correctly to anthropic-api
instead of falling back to oauth) and create_executor() (so the bare
id reaches the CLI). Only known-Claude prefixes are stripped; unknown
ones (openai:, bedrock:) pass through so the CLI fails loudly instead
of being silently mangled.

Coverage: 8 new tests — unit tests for the helper across all branches,
end-to-end `create_executor` strip on dict + dataclass shapes, and a
caplog-based setup() test that pins provider=anthropic-api routing
after the strip (the silent-fallback failure mode this fix eliminates).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:06:53 -07:00
Hongming Wang
2b9b4306eb fix(adapter): per-entry isolation in _load_providers + tighten _normalize_provider
Two correctness issues spotted in self-review of c6f4912:

1. String-as-prefix typo split into character tuple. ``model_prefixes:
   mimo-`` (operator forgot brackets) used to iterate over characters
   → ``('m','i','m','o','-')``, silently routing every model id starting
   with 'm', 'i', or '-' through the entry. Now: non-list values coerce
   to empty tuple (entry survives but matches nothing — operator notices
   in boot banner, not via misrouted requests).

2. Single bad provider entry nuked the whole registry. _load_providers
   built the registry via a generator inside tuple(...). One AttributeError
   mid-comprehension (e.g. ``[mimo-, 123]`` — int's missing .lower())
   propagated out, broad except caught it, registry silently fell back
   to _BUILTIN_PROVIDERS (oauth + anthropic-api only). Every third-party
   model would then route to anthropic-oauth — exactly the silent-fallback
   failure mode this PR was meant to eliminate. Now: per-entry try/except
   drops the bad entry with a warning, rest survives.

Also: entries without a string ``name`` field are now dropped with a
warning instead of silently using the placeholder ``<unnamed>`` —
operator typos surface in boot logs.

Tests: 28 passing (3 new regression tests covering both issues plus
the no-name path).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 23:58:24 -07:00
Hongming Wang
c6f4912d09 feat(adapter): data-driven provider registry in config.yaml
Move the model→endpoint→auth-env mapping out of hardcoded constants
in adapter.py + entrypoint.sh into a single `providers:` list at the
top of config.yaml. The adapter loads it at boot via _load_providers;
canvas Config tab will read the same YAML for its Provider dropdown so
UI and adapter never disagree on what's available. Adding a new
provider becomes a one-line YAML edit — no Python or shell changes.

Includes 5 third-party providers ready out of the box (Anthropic-compat
endpoints, Bearer-style ANTHROPIC_AUTH_TOKEN OR ANTHROPIC_API_KEY auth):

  xiaomi-mimo  https://api.xiaomimimo.com/anthropic
  minimax      https://api.minimax.io/anthropic
  zai          https://api.z.ai/api/anthropic           (NEW)
  moonshot     https://api.moonshot.ai/anthropic        (NEW)
  deepseek     https://api.deepseek.com/anthropic       (NEW)

Plus 7 new model entries in runtime_config.models (mimo-v2.5, MiniMax-M2,
MiniMax-M2.7, GLM-4.6, GLM-4.5, kimi-k2.5, kimi-k2, deepseek-v4-pro,
deepseek-v4-flash) so they show up in the Canvas Config dropdown.

Operator override unchanged: ANTHROPIC_BASE_URL set as a workspace
secret still wins over the registry default — the escape hatch for
regional endpoints (Xiaomi token-plan-sgp, MiniMax api.minimaxi.com).

entrypoint.sh: drops the `mimo-*` case mapping (adapter handles routing
now). _BUILTIN_PROVIDERS retained as malformed-YAML fallback so a
bare-bones workspace still boots with oauth + anthropic-api defaults.

Tests: 25 passing. New coverage:
  - YAML parses + normalizes to expected shape
  - Malformed YAML falls back to builtins (warning, not raise)
  - Each new provider routes its model id to the right base_url
  - ANTHROPIC_AUTH_TOKEN alone satisfies third-party auth check
  - Operator-set ANTHROPIC_BASE_URL overrides registry default
  - Case-insensitive prefix match (MiniMax-M2 / minimax-m2.7 / GLM-4.6)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 23:29:40 -07:00
Hongming Wang
c646b8cebe feat(adapter): raise on third-party model without ANTHROPIC_BASE_URL
Aligns setup()'s third-party-model-without-URL handling with
create_executor()'s pre-validate (#19) — both unrecoverable
misconfigurations now raise ValueError at boot instead of one warning
and one raising.

Why: a third-party (mimo-*) model selected without ANTHROPIC_BASE_URL
sends every LLM request to api.anthropic.com with a non-Anthropic key,
401-ing every prompt. Workspace boots, looks "online" via heartbeat,
but is structurally broken on the user-facing path. The previous
warning-only path produced the same end-user symptom as the
2026-04-30 incident (workspace looks alive, every interaction fails)
just via a different misconfig shape.

Symmetry: create_executor raises when ANTHROPIC_BASE_URL is set to a
non-Anthropic host but no model is picked. setup() now raises when a
third-party model is picked but no URL is set. Together they catch
both halves of the misconfig surface at boot, before the workspace
enters "online" status.

Adds 4 setup() tests:
- raises on third-party + no URL
- passes on third-party + URL
- passes on OAuth alias (sonnet) + no URL
- passes on Anthropic API id (claude-*) + no URL

Stubs molecule_runtime.plugins.load_plugins as a no-op so the pass-path
tests run cleanly without the runtime installed. Test count: 11 (7
create_executor + 4 setup).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 22:50:25 -07:00
Hongming Wang
0d95b5098a feat(adapter): pre-validate ANTHROPIC_BASE_URL + missing model combo
The 2026-04-30 staging incident traced back to workspaces booting with
ANTHROPIC_BASE_URL pointing at a non-Anthropic shim (MiniMax / OpenAI
gateway) but no explicit model configured. The adapter silently fell
back to "sonnet" — an Anthropic-native alias the upstream didn't
recognize — and the SDK --print probe hung 30s before timing out.
Platform's phantom-busy sweep then nuked the workspace at 10min,
producing "every workspace dead" with the root cause buried in a
30s subprocess hang.

Pre-validate the combo at adapter boot: when ANTHROPIC_BASE_URL host
is non-Anthropic AND no explicit model is set, raise ValueError with
an actionable message pointing to MODEL_PROVIDER / runtime_config.model.
Also log the resolved model + base_url_host every boot so future
failures explain themselves in the workspace logs without digging
into the SDK subprocess.

Tests live under tests/ with their own pytest.ini that anchors rootdir
there — keeps pytest from importing the package __init__.py (which
does the runtime-discovery relative import that requires
molecule_runtime installed). 7 tests cover: misconfig raises with the
right message, Anthropic-native passes, no-base-url passes, custom-url
+ explicit model passes, dataclass + dict shapes, unparseable URL
no-crash. CI runs them on every push/PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 22:35:49 -07:00