[Molecule-Platform-Evolvement-Manager] PR #59 (commitdae42e2) was merged ~2 weeks ago with a bad diff that deleted all Next.js/Fumadocs build files (package.json, app/, lib/, source.config.ts, tsconfig.json, etc.) and most MDX content pages. This broke the Vercel build, taking doc.moleculesai.app offline. Root cause: the PR branch was likely rebased or reset to a state that only contained the marketing/ subtree, so the merge diff showed deletions for every other file. This commit: 1. Restores all build infrastructure from the last good commit (86fa0e9) 2. Restores 25 deleted MDX content pages (concepts, quickstart, etc.) 3. Adds frontmatter (title) to 55 .md files added post-bad-merge that were missing the required YAML frontmatter for Fumadocs 4. Removes duplicate quickstart.mdx (superseded by quickstart.md) 5. Adds CI workflow (.github/workflows/ci.yml) to catch build failures on PRs before merge — this would have prevented the outage Build verified: 99 static pages generated successfully. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
346 lines
14 KiB
Plaintext
346 lines
14 KiB
Plaintext
---
|
||
title: Hermes Runtime & Multi-Provider Dispatch
|
||
description: Hermes is Molecule AI's built-in inference router. Route tasks to Anthropic, Gemini, or any OpenAI-compatible model through native dispatch paths — with correct multi-turn history on all three.
|
||
---
|
||
|
||
import { Callout } from 'fumadocs-ui/components/callout';
|
||
|
||
# Hermes Runtime & Multi-Provider Dispatch
|
||
|
||
Hermes is Molecule AI's built-in inference router powering `runtime: hermes` workspaces. It supports three dispatch paths — a native Anthropic Messages API path, a native Gemini `generateContent` path, and an OpenAI-compatible shim for 13+ other providers — keyed automatically by which API secret is present on the workspace.
|
||
|
||
Phases 2a through 2e are fully merged to `main`:
|
||
|
||
- **Phase 2a** (PR #240) — native Anthropic dispatch
|
||
- **Phase 2b** (PR #255) — native Gemini dispatch with correct `role: "model"` + `parts` wire format
|
||
- **Phase 2c** (PR #267) — correct multi-turn history preserved as turns (not flattened) on all three paths
|
||
- **Phase 2d** (PR #499) — stacked system messages (`system_blocks` kwarg) on Anthropic and Gemini paths
|
||
- **Phase 2e** (PRs #644, #645) — native `tools=[]` parameter + `response_format=json_schema` structured output on Anthropic native path
|
||
|
||
<Callout type="info">
|
||
**Remaining roadmap:** vision content blocks and streaming on native paths are scoped for a future release.
|
||
</Callout>
|
||
|
||
---
|
||
|
||
## Dispatch table
|
||
|
||
Hermes selects an inference path based on which API key is set on the workspace. Keys are resolved in priority order:
|
||
|
||
> `HERMES_API_KEY` → `OPENROUTER_API_KEY` → `ANTHROPIC_API_KEY` → `GEMINI_API_KEY`
|
||
|
||
The first key found wins. Don't set `HERMES_API_KEY` if you want native Anthropic or Gemini dispatch — it takes priority and routes through the OpenAI-compat shim.
|
||
|
||
| Key present | Dispatch path | Provider | Wire format |
|
||
|---|---|---|---|
|
||
| `ANTHROPIC_API_KEY` | Native Anthropic | Anthropic | Messages API — `{role, content}` |
|
||
| `GEMINI_API_KEY` | Native Gemini | Google | `generateContent` — `{role: "model", parts: [{text}]}` |
|
||
| `OPENROUTER_API_KEY` / `HERMES_API_KEY` / other | OpenAI-compat shim | 13+ providers | OpenAI Chat Completions |
|
||
| None | Error | — | — |
|
||
|
||
**Fail-loud semantics:** if `ANTHROPIC_API_KEY` is set but the `anthropic` Python package is not installed in the workspace image, Hermes raises a `RuntimeError` immediately — before any inference attempt. Same for `google-genai`. Silent fallback to the compat shim would mask format errors; Hermes fails loudly instead.
|
||
|
||
---
|
||
|
||
## Secrets
|
||
|
||
Set provider keys as global or workspace-level secrets:
|
||
|
||
```bash
|
||
# Native Anthropic dispatch
|
||
curl -X PUT http://localhost:8080/settings/secrets \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"key":"ANTHROPIC_API_KEY","value":"sk-ant-..."}'
|
||
|
||
# Native Gemini dispatch
|
||
curl -X PUT http://localhost:8080/settings/secrets \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"key":"GEMINI_API_KEY","value":"YOUR-GEMINI-KEY"}'
|
||
|
||
# OpenAI-compat shim (OpenRouter, Groq, Mistral, etc.)
|
||
curl -X PUT http://localhost:8080/settings/secrets \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"key":"OPENROUTER_API_KEY","value":"sk-or-..."}'
|
||
```
|
||
|
||
To force a specific workspace to use Gemini dispatch when a global `ANTHROPIC_API_KEY` is set, clear the key at the workspace level:
|
||
|
||
```bash
|
||
curl -X PUT http://localhost:8080/workspaces/$GEMINI_WS/secrets \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"key":"ANTHROPIC_API_KEY","value":""}'
|
||
```
|
||
|
||
---
|
||
|
||
## Quickstart
|
||
|
||
### Native Anthropic dispatch
|
||
|
||
```bash
|
||
export MOLECULE_API=http://localhost:8080
|
||
|
||
# 1. Store your Anthropic key
|
||
curl -s -X PUT $MOLECULE_API/settings/secrets \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"key":"ANTHROPIC_API_KEY","value":"sk-ant-YOUR-KEY"}' | jq .
|
||
|
||
# 2. Create a Hermes workspace
|
||
ANTHROPIC_WS=$(curl -s -X POST $MOLECULE_API/workspaces \
|
||
-H "Content-Type: application/json" \
|
||
-d '{
|
||
"name": "hermes-anthropic",
|
||
"role": "Inference worker — native Anthropic path",
|
||
"runtime": "hermes",
|
||
"model": "anthropic:claude-sonnet-4-5"
|
||
}' | jq -r '.id')
|
||
|
||
# 3. Wait for ready
|
||
until curl -s $MOLECULE_API/workspaces/$ANTHROPIC_WS \
|
||
| jq -r '.status' | grep -q ready; do sleep 5; done
|
||
|
||
# 4. Confirm dispatch path
|
||
curl -s -X POST $MOLECULE_API/workspaces/$ANTHROPIC_WS/a2a \
|
||
-H "Content-Type: application/json" \
|
||
-d '{
|
||
"jsonrpc":"2.0","id":"probe-1","method":"message/send",
|
||
"params":{"message":{"role":"user","parts":[{"kind":"text",
|
||
"text":"Which provider API are you calling to generate this response?"}]}}
|
||
}' | jq '.result.parts[0].text'
|
||
# Expected: confirms Anthropic Messages API — no OpenAI-compat translation layer
|
||
```
|
||
|
||
### Native Gemini dispatch
|
||
|
||
```bash
|
||
# 1. Store your Gemini key
|
||
curl -s -X PUT $MOLECULE_API/settings/secrets \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"key":"GEMINI_API_KEY","value":"YOUR-GEMINI-KEY"}' | jq .
|
||
|
||
# 2. Create a Gemini workspace
|
||
GEMINI_WS=$(curl -s -X POST $MOLECULE_API/workspaces \
|
||
-H "Content-Type: application/json" \
|
||
-d '{
|
||
"name": "hermes-gemini",
|
||
"role": "Inference worker — native Gemini path",
|
||
"runtime": "hermes",
|
||
"model": "gemini:gemini-2.0-flash"
|
||
}' | jq -r '.id')
|
||
|
||
# 3. Wait for ready
|
||
until curl -s $MOLECULE_API/workspaces/$GEMINI_WS \
|
||
| jq -r '.status' | grep -q ready; do sleep 5; done
|
||
|
||
# 4. Confirm dispatch path
|
||
curl -s -X POST $MOLECULE_API/workspaces/$GEMINI_WS/a2a \
|
||
-H "Content-Type: application/json" \
|
||
-d '{
|
||
"jsonrpc":"2.0","id":"probe-2","method":"message/send",
|
||
"params":{"message":{"role":"user","parts":[{"kind":"text",
|
||
"text":"Which provider API are you calling?"}]}}
|
||
}' | jq '.result.parts[0].text'
|
||
# Expected: confirms Google generateContent — role: "model" + parts[] wrapper used correctly
|
||
```
|
||
|
||
### Multi-turn history (Phase 2c)
|
||
|
||
```bash
|
||
# Turn 1
|
||
curl -s -X POST $MOLECULE_API/workspaces/$ANTHROPIC_WS/a2a \
|
||
-H "Content-Type: application/json" \
|
||
-d '{
|
||
"jsonrpc":"2.0","id":"turn-1","method":"message/send",
|
||
"params":{"message":{"role":"user","parts":[{"kind":"text",
|
||
"text":"My name is Alice. Remember that."}]}}
|
||
}' | jq '.result.parts[0].text'
|
||
|
||
# Turn 2 — history is threaded as turns, not flattened into a single blob
|
||
curl -s -X POST $MOLECULE_API/workspaces/$ANTHROPIC_WS/a2a \
|
||
-H "Content-Type: application/json" \
|
||
-d '{
|
||
"jsonrpc":"2.0","id":"turn-2","method":"message/send",
|
||
"params":{"message":{"role":"user","parts":[{"kind":"text",
|
||
"text":"What is my name?"}]}}
|
||
}' | jq '.result.parts[0].text'
|
||
# Expected: "Alice" — role attribution is preserved across turns
|
||
```
|
||
|
||
Before Phase 2c, multi-turn history was flattened into a single user blob. The model could often recover context from the text but lost clean role attribution, which caused failures on structured prompts. Phase 2c passes turns as turns: OpenAI and Anthropic use `{role, content}`; Gemini uses `{role: "model", parts: [{text}]}`.
|
||
|
||
---
|
||
|
||
## Multi-provider teams
|
||
|
||
An orchestrator can fan tasks to Anthropic and Gemini workers simultaneously, each routed through its native path — no application-level provider switching required:
|
||
|
||
```bash
|
||
# Fan out — both workers fire via delegate_task_async
|
||
curl -s -X POST $MOLECULE_API/workspaces/$ORCH_ID/a2a \
|
||
-H "Content-Type: application/json" \
|
||
-d "{
|
||
\"jsonrpc\":\"2.0\",\"id\":\"fan-1\",\"method\":\"message/send\",
|
||
\"params\":{\"message\":{\"role\":\"user\",\"parts\":[{\"kind\":\"text\",
|
||
\"text\":\"delegate_task_async $ANTHROPIC_WS 'Draft release notes for v2.1' AND delegate_task_async $GEMINI_WS 'Summarise the last 30 days of support tickets'\"}]}}
|
||
}" | jq .
|
||
```
|
||
|
||
Both workers receive correctly formatted messages through their native paths. No LiteLLM proxy layer. No format translation overhead on every request.
|
||
|
||
---
|
||
|
||
## Advanced: stacked system messages
|
||
|
||
[NousResearch Hermes 4](https://hermes4.nousresearch.com) works best when persona, tool context, and reasoning policy are sent as **separate** `{"role": "system"}` entries rather than one concatenated string. `HermesA2AExecutor` supports this via the `system_blocks` kwarg (PR #499).
|
||
|
||
### Usage
|
||
|
||
```python
|
||
from workspace_template.executors.hermes_a2a_executor import HermesA2AExecutor
|
||
|
||
executor = HermesA2AExecutor(
|
||
system_blocks=[
|
||
"You are a senior security auditor. Be terse and precise.", # persona
|
||
"You have access to bash, file search, and grep tools.", # tools context
|
||
"Think step-by-step before concluding. Cite evidence.", # reasoning policy
|
||
]
|
||
)
|
||
```
|
||
|
||
The executor emits each non-empty, non-`None` block as a separate `{"role": "system"}` message in the recommended order: **persona → tools context → reasoning policy**.
|
||
|
||
### Behaviour
|
||
|
||
| Condition | Result |
|
||
|-----------|--------|
|
||
| `system_blocks` is set | Emits one `{"role": "system"}` per non-empty block; `system_prompt` is ignored |
|
||
| Entry is `None` or `""` | Silently skipped |
|
||
| All entries empty | Zero system messages emitted |
|
||
| `system_blocks` not set (`None`) | Falls back to the legacy `system_prompt` path — **fully backward-compatible** |
|
||
|
||
### Backward compatibility
|
||
|
||
Callers that pass a single `system_prompt` string are **unaffected**:
|
||
|
||
```python
|
||
# Legacy path — still works, no changes required
|
||
executor = HermesA2AExecutor(
|
||
system_prompt="You are a security auditor. Think step-by-step."
|
||
)
|
||
```
|
||
|
||
Only set `system_blocks` when you want fine-grained control over block ordering or need to inject tool manifests into a dedicated block.
|
||
|
||
---
|
||
|
||
## Native tools parameter (Phase 2e — PR #644)
|
||
|
||
Hermes now passes tool definitions to the model via the native `tools=[]` API parameter instead of injecting them as text in the prompt. This applies to the **Anthropic native dispatch path** and produces structured tool call/result blocks that the Nous/Hermes-3 tool call format handles correctly.
|
||
|
||
```python
|
||
executor = HermesA2AExecutor(
|
||
tools=[
|
||
{
|
||
"name": "bash",
|
||
"description": "Run a bash command and return stdout/stderr.",
|
||
"input_schema": {
|
||
"type": "object",
|
||
"properties": {
|
||
"command": {"type": "string", "description": "The shell command to run"}
|
||
},
|
||
"required": ["command"]
|
||
}
|
||
}
|
||
]
|
||
)
|
||
```
|
||
|
||
The OpenAI-compat shim path also accepts `tools=[]` but continues to inject them as text-in-prompt for compatibility with OpenRouter-routed models that don't natively support tool calls.
|
||
|
||
## Structured output — `response_format` (Phase 2e — PR #645)
|
||
|
||
`response_format=json_schema` is wired through to the Anthropic native dispatch path. Pass a JSON Schema definition to request strictly-typed JSON output from the model:
|
||
|
||
```python
|
||
executor = HermesA2AExecutor(
|
||
response_format={
|
||
"type": "json_schema",
|
||
"json_schema": {
|
||
"name": "audit_finding",
|
||
"schema": {
|
||
"type": "object",
|
||
"properties": {
|
||
"severity": {"type": "string", "enum": ["critical", "high", "medium", "low"]},
|
||
"description": {"type": "string"},
|
||
"remediation": {"type": "string"}
|
||
},
|
||
"required": ["severity", "description", "remediation"]
|
||
}
|
||
}
|
||
}
|
||
)
|
||
```
|
||
|
||
The model's completion will always be valid JSON matching the schema. The Gemini native and OpenAI-compat shim paths do not yet support `response_format` — it is silently ignored on those paths.
|
||
|
||
---
|
||
|
||
## Capability table
|
||
|
||
### Shipped (Phases 2a–2e — all merged to main)
|
||
|
||
| Capability | OpenAI-compat shim | Anthropic native | Gemini native |
|
||
|---|---|---|---|
|
||
| Plain text, single-turn | ✅ | ✅ | ✅ |
|
||
| Multi-turn history | ⚠️ flattened into one user blob | ✅ role-attributed turns | ✅ `role: "model"` + `parts` wrapper |
|
||
| Correct Gemini wire format | ❌ wrong role, missing parts | — | ✅ |
|
||
| No compat-shim translation overhead | ❌ every request translated | ✅ | ✅ |
|
||
| Stacked system messages (`system_blocks`) | ❌ | ✅ | ✅ |
|
||
| Native `tools=[]` parameter | ⚠️ text-in-prompt injection | ✅ PR #644 | 📋 roadmap |
|
||
| Structured output (`response_format=json_schema`) | ❌ | ✅ PR #645 | 📋 roadmap |
|
||
|
||
### Roadmap (future release)
|
||
|
||
| Capability | Anthropic native | Gemini native |
|
||
|---|---|---|
|
||
| Vision content blocks | 📋 | 📋 |
|
||
| Streaming | 📋 | 📋 |
|
||
| Native tools on Gemini path | — | 📋 |
|
||
| Structured output on Gemini path | — | 📋 |
|
||
|
||
---
|
||
|
||
## Troubleshooting
|
||
|
||
### `RuntimeError: anthropic is not installed`
|
||
|
||
The `anthropic` Python package is missing from the workspace image. Add `anthropic` to `requirements.txt` in your custom image and rebuild, or use the standard `molecule-ai-workspace-template-hermes` image.
|
||
|
||
### Gemini workspace getting Anthropic dispatch instead
|
||
|
||
A global `ANTHROPIC_API_KEY` is taking priority. Clear it at the workspace level:
|
||
```bash
|
||
curl -X PUT $MOLECULE_API/workspaces/$GEMINI_WS/secrets \
|
||
-d '{"key":"ANTHROPIC_API_KEY","value":""}'
|
||
```
|
||
|
||
### Multi-turn context lost between calls
|
||
|
||
Each workspace maintains its own history buffer. Ensure you are sending all turns of a conversation to the same workspace. A2A `context_id` scopes history within the workspace.
|
||
|
||
### OpenAI-compat shim returns garbled Gemini output
|
||
|
||
If you are routing a Gemini model through a key that triggers the compat shim (e.g. `OPENROUTER_API_KEY`), you will see the old role/format translation issues. Switch to `GEMINI_API_KEY` for native dispatch.
|
||
|
||
---
|
||
|
||
## See also
|
||
|
||
- [Concepts — Workspaces](/docs/concepts#workspaces)
|
||
- [API Reference — POST /workspaces](/docs/api-reference#post-workspaces)
|
||
- [Google ADK Runtime](/docs/google-adk) — Gemini-native alternative to Hermes for ADK-first workflows
|
||
- PR #240: [Phase 2a — native Anthropic dispatch](https://github.com/Molecule-AI/molecule-core/pull/240)
|
||
- PR #255: [Phase 2b — native Gemini dispatch](https://github.com/Molecule-AI/molecule-core/pull/255)
|
||
- PR #267: [Phase 2c — multi-turn history on all paths](https://github.com/Molecule-AI/molecule-core/pull/267)
|
||
- Issue [#513](https://github.com/Molecule-AI/molecule-core/issues/513)
|