diff --git a/content/docs/hermes.mdx b/content/docs/hermes.mdx new file mode 100644 index 0000000..1471ce4 --- /dev/null +++ b/content/docs/hermes.mdx @@ -0,0 +1,245 @@ +--- +title: Hermes Runtime & Multi-Provider Dispatch +description: Hermes is Molecule AI's built-in inference router. Route tasks to Anthropic, Gemini, or any OpenAI-compatible model through native dispatch paths — with correct multi-turn history on all three. +--- + +import { Callout } from 'fumadocs-ui/components/callout'; + +# Hermes Runtime & Multi-Provider Dispatch + +Hermes is Molecule AI's built-in inference router powering `runtime: hermes` workspaces. It supports three dispatch paths — a native Anthropic Messages API path, a native Gemini `generateContent` path, and an OpenAI-compatible shim for 13+ other providers — keyed automatically by which API secret is present on the workspace. + +Phases 2a, 2b, and 2c are fully merged to `main`: + +- **Phase 2a** (PR #240) — native Anthropic dispatch +- **Phase 2b** (PR #255) — native Gemini dispatch with correct `role: "model"` + `parts` wire format +- **Phase 2c** (PR #267) — correct multi-turn history preserved as turns (not flattened) on all three paths + + + **Phase 2d (roadmap):** tool_use / tool_result blocks, vision content, system instructions, and streaming on the native paths are scoped for a future release. See the [capability table](#capability-table) below. + + +--- + +## Dispatch table + +Hermes selects an inference path based on which API key is set on the workspace. Keys are resolved in priority order: + +> `HERMES_API_KEY` → `OPENROUTER_API_KEY` → `ANTHROPIC_API_KEY` → `GEMINI_API_KEY` + +The first key found wins. Don't set `HERMES_API_KEY` if you want native Anthropic or Gemini dispatch — it takes priority and routes through the OpenAI-compat shim. + +| Key present | Dispatch path | Provider | Wire format | +|---|---|---|---| +| `ANTHROPIC_API_KEY` | Native Anthropic | Anthropic | Messages API — `{role, content}` | +| `GEMINI_API_KEY` | Native Gemini | Google | `generateContent` — `{role: "model", parts: [{text}]}` | +| `OPENROUTER_API_KEY` / `HERMES_API_KEY` / other | OpenAI-compat shim | 13+ providers | OpenAI Chat Completions | +| None | Error | — | — | + +**Fail-loud semantics:** if `ANTHROPIC_API_KEY` is set but the `anthropic` Python package is not installed in the workspace image, Hermes raises a `RuntimeError` immediately — before any inference attempt. Same for `google-genai`. Silent fallback to the compat shim would mask format errors; Hermes fails loudly instead. + +--- + +## Secrets + +Set provider keys as global or workspace-level secrets: + +```bash +# Native Anthropic dispatch +curl -X PUT http://localhost:8080/settings/secrets \ + -H "Content-Type: application/json" \ + -d '{"key":"ANTHROPIC_API_KEY","value":"sk-ant-..."}' + +# Native Gemini dispatch +curl -X PUT http://localhost:8080/settings/secrets \ + -H "Content-Type: application/json" \ + -d '{"key":"GEMINI_API_KEY","value":"YOUR-GEMINI-KEY"}' + +# OpenAI-compat shim (OpenRouter, Groq, Mistral, etc.) +curl -X PUT http://localhost:8080/settings/secrets \ + -H "Content-Type: application/json" \ + -d '{"key":"OPENROUTER_API_KEY","value":"sk-or-..."}' +``` + +To force a specific workspace to use Gemini dispatch when a global `ANTHROPIC_API_KEY` is set, clear the key at the workspace level: + +```bash +curl -X PUT http://localhost:8080/workspaces/$GEMINI_WS/secrets \ + -H "Content-Type: application/json" \ + -d '{"key":"ANTHROPIC_API_KEY","value":""}' +``` + +--- + +## Quickstart + +### Native Anthropic dispatch + +```bash +export MOLECULE_API=http://localhost:8080 + +# 1. Store your Anthropic key +curl -s -X PUT $MOLECULE_API/settings/secrets \ + -H "Content-Type: application/json" \ + -d '{"key":"ANTHROPIC_API_KEY","value":"sk-ant-YOUR-KEY"}' | jq . + +# 2. Create a Hermes workspace +ANTHROPIC_WS=$(curl -s -X POST $MOLECULE_API/workspaces \ + -H "Content-Type: application/json" \ + -d '{ + "name": "hermes-anthropic", + "role": "Inference worker — native Anthropic path", + "runtime": "hermes", + "model": "anthropic:claude-sonnet-4-5" + }' | jq -r '.id') + +# 3. Wait for ready +until curl -s $MOLECULE_API/workspaces/$ANTHROPIC_WS \ + | jq -r '.status' | grep -q ready; do sleep 5; done + +# 4. Confirm dispatch path +curl -s -X POST $MOLECULE_API/workspaces/$ANTHROPIC_WS/a2a \ + -H "Content-Type: application/json" \ + -d '{ + "jsonrpc":"2.0","id":"probe-1","method":"message/send", + "params":{"message":{"role":"user","parts":[{"kind":"text", + "text":"Which provider API are you calling to generate this response?"}]}} + }' | jq '.result.parts[0].text' +# Expected: confirms Anthropic Messages API — no OpenAI-compat translation layer +``` + +### Native Gemini dispatch + +```bash +# 1. Store your Gemini key +curl -s -X PUT $MOLECULE_API/settings/secrets \ + -H "Content-Type: application/json" \ + -d '{"key":"GEMINI_API_KEY","value":"YOUR-GEMINI-KEY"}' | jq . + +# 2. Create a Gemini workspace +GEMINI_WS=$(curl -s -X POST $MOLECULE_API/workspaces \ + -H "Content-Type: application/json" \ + -d '{ + "name": "hermes-gemini", + "role": "Inference worker — native Gemini path", + "runtime": "hermes", + "model": "gemini:gemini-2.0-flash" + }' | jq -r '.id') + +# 3. Wait for ready +until curl -s $MOLECULE_API/workspaces/$GEMINI_WS \ + | jq -r '.status' | grep -q ready; do sleep 5; done + +# 4. Confirm dispatch path +curl -s -X POST $MOLECULE_API/workspaces/$GEMINI_WS/a2a \ + -H "Content-Type: application/json" \ + -d '{ + "jsonrpc":"2.0","id":"probe-2","method":"message/send", + "params":{"message":{"role":"user","parts":[{"kind":"text", + "text":"Which provider API are you calling?"}]}} + }' | jq '.result.parts[0].text' +# Expected: confirms Google generateContent — role: "model" + parts[] wrapper used correctly +``` + +### Multi-turn history (Phase 2c) + +```bash +# Turn 1 +curl -s -X POST $MOLECULE_API/workspaces/$ANTHROPIC_WS/a2a \ + -H "Content-Type: application/json" \ + -d '{ + "jsonrpc":"2.0","id":"turn-1","method":"message/send", + "params":{"message":{"role":"user","parts":[{"kind":"text", + "text":"My name is Alice. Remember that."}]}} + }' | jq '.result.parts[0].text' + +# Turn 2 — history is threaded as turns, not flattened into a single blob +curl -s -X POST $MOLECULE_API/workspaces/$ANTHROPIC_WS/a2a \ + -H "Content-Type: application/json" \ + -d '{ + "jsonrpc":"2.0","id":"turn-2","method":"message/send", + "params":{"message":{"role":"user","parts":[{"kind":"text", + "text":"What is my name?"}]}} + }' | jq '.result.parts[0].text' +# Expected: "Alice" — role attribution is preserved across turns +``` + +Before Phase 2c, multi-turn history was flattened into a single user blob. The model could often recover context from the text but lost clean role attribution, which caused failures on structured prompts. Phase 2c passes turns as turns: OpenAI and Anthropic use `{role, content}`; Gemini uses `{role: "model", parts: [{text}]}`. + +--- + +## Multi-provider teams + +An orchestrator can fan tasks to Anthropic and Gemini workers simultaneously, each routed through its native path — no application-level provider switching required: + +```bash +# Fan out — both workers fire via delegate_task_async +curl -s -X POST $MOLECULE_API/workspaces/$ORCH_ID/a2a \ + -H "Content-Type: application/json" \ + -d "{ + \"jsonrpc\":\"2.0\",\"id\":\"fan-1\",\"method\":\"message/send\", + \"params\":{\"message\":{\"role\":\"user\",\"parts\":[{\"kind\":\"text\", + \"text\":\"delegate_task_async $ANTHROPIC_WS 'Draft release notes for v2.1' AND delegate_task_async $GEMINI_WS 'Summarise the last 30 days of support tickets'\"}]}} + }" | jq . +``` + +Both workers receive correctly formatted messages through their native paths. No LiteLLM proxy layer. No format translation overhead on every request. + +--- + +## Capability table + +### Shipped (Phases 2a + 2b + 2c — all merged to main) + +| Capability | OpenAI-compat shim | Anthropic native | Gemini native | +|---|---|---|---| +| Plain text, single-turn | ✅ | ✅ | ✅ | +| Multi-turn history | ⚠️ flattened into one user blob | ✅ role-attributed turns | ✅ `role: "model"` + `parts` wrapper | +| Correct Gemini wire format | ❌ wrong role, missing parts | — | ✅ | +| No compat-shim translation overhead | ❌ every request translated | ✅ | ✅ | + +### Roadmap — Phase 2d (not yet shipped) + +| Capability | Anthropic native | Gemini native | +|---|---|---| +| `tool_use` / `tool_result` blocks | 📋 Phase 2d | 📋 Phase 2d | +| Vision content blocks | 📋 Phase 2d | 📋 Phase 2d | +| System instructions | 📋 Phase 2d | 📋 Phase 2d | +| Extended thinking | 📋 Phase 2d | — | +| Streaming | 📋 Phase 2d | 📋 Phase 2d | + +--- + +## Troubleshooting + +### `RuntimeError: anthropic is not installed` + +The `anthropic` Python package is missing from the workspace image. Add `anthropic` to `requirements.txt` in your custom image and rebuild, or use the standard `molecule-ai-workspace-template-hermes` image. + +### Gemini workspace getting Anthropic dispatch instead + +A global `ANTHROPIC_API_KEY` is taking priority. Clear it at the workspace level: +```bash +curl -X PUT $MOLECULE_API/workspaces/$GEMINI_WS/secrets \ + -d '{"key":"ANTHROPIC_API_KEY","value":""}' +``` + +### Multi-turn context lost between calls + +Each workspace maintains its own history buffer. Ensure you are sending all turns of a conversation to the same workspace. A2A `context_id` scopes history within the workspace. + +### OpenAI-compat shim returns garbled Gemini output + +If you are routing a Gemini model through a key that triggers the compat shim (e.g. `OPENROUTER_API_KEY`), you will see the old role/format translation issues. Switch to `GEMINI_API_KEY` for native dispatch. + +--- + +## See also + +- [Concepts — Workspaces](/docs/concepts#workspaces) +- [API Reference — POST /workspaces](/docs/api-reference#post-workspaces) +- [Google ADK Runtime](/docs/google-adk) — Gemini-native alternative to Hermes for ADK-first workflows +- PR #240: [Phase 2a — native Anthropic dispatch](https://github.com/Molecule-AI/molecule-core/pull/240) +- PR #255: [Phase 2b — native Gemini dispatch](https://github.com/Molecule-AI/molecule-core/pull/255) +- PR #267: [Phase 2c — multi-turn history on all paths](https://github.com/Molecule-AI/molecule-core/pull/267) +- Issue [#513](https://github.com/Molecule-AI/molecule-core/issues/513) diff --git a/content/docs/meta.json b/content/docs/meta.json index 7157fa9..ebafd13 100644 --- a/content/docs/meta.json +++ b/content/docs/meta.json @@ -17,6 +17,7 @@ "observability", "troubleshooting", "---Runtimes---", - "google-adk" + "google-adk", + "hermes" ] }