docs/content/docs/hermes.mdx
rabbitblood 40bd0cfdde fix: restore build infrastructure deleted by bad PR #59 merge
[Molecule-Platform-Evolvement-Manager]

PR #59 (commit dae42e2) was merged ~2 weeks ago with a bad diff that
deleted all Next.js/Fumadocs build files (package.json, app/, lib/,
source.config.ts, tsconfig.json, etc.) and most MDX content pages.
This broke the Vercel build, taking doc.moleculesai.app offline.

Root cause: the PR branch was likely rebased or reset to a state that
only contained the marketing/ subtree, so the merge diff showed
deletions for every other file.

This commit:
1. Restores all build infrastructure from the last good commit (86fa0e9)
2. Restores 25 deleted MDX content pages (concepts, quickstart, etc.)
3. Adds frontmatter (title) to 55 .md files added post-bad-merge that
   were missing the required YAML frontmatter for Fumadocs
4. Removes duplicate quickstart.mdx (superseded by quickstart.md)
5. Adds CI workflow (.github/workflows/ci.yml) to catch build failures
   on PRs before merge — this would have prevented the outage

Build verified: 99 static pages generated successfully.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-22 14:03:24 -07:00

346 lines
14 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: Hermes Runtime & Multi-Provider Dispatch
description: Hermes is Molecule AI's built-in inference router. Route tasks to Anthropic, Gemini, or any OpenAI-compatible model through native dispatch paths — with correct multi-turn history on all three.
---
import { Callout } from 'fumadocs-ui/components/callout';
# Hermes Runtime & Multi-Provider Dispatch
Hermes is Molecule AI's built-in inference router powering `runtime: hermes` workspaces. It supports three dispatch paths — a native Anthropic Messages API path, a native Gemini `generateContent` path, and an OpenAI-compatible shim for 13+ other providers — keyed automatically by which API secret is present on the workspace.
Phases 2a through 2e are fully merged to `main`:
- **Phase 2a** (PR #240) — native Anthropic dispatch
- **Phase 2b** (PR #255) — native Gemini dispatch with correct `role: "model"` + `parts` wire format
- **Phase 2c** (PR #267) — correct multi-turn history preserved as turns (not flattened) on all three paths
- **Phase 2d** (PR #499) — stacked system messages (`system_blocks` kwarg) on Anthropic and Gemini paths
- **Phase 2e** (PRs #644, #645) — native `tools=[]` parameter + `response_format=json_schema` structured output on Anthropic native path
<Callout type="info">
**Remaining roadmap:** vision content blocks and streaming on native paths are scoped for a future release.
</Callout>
---
## Dispatch table
Hermes selects an inference path based on which API key is set on the workspace. Keys are resolved in priority order:
> `HERMES_API_KEY` → `OPENROUTER_API_KEY` → `ANTHROPIC_API_KEY` → `GEMINI_API_KEY`
The first key found wins. Don't set `HERMES_API_KEY` if you want native Anthropic or Gemini dispatch — it takes priority and routes through the OpenAI-compat shim.
| Key present | Dispatch path | Provider | Wire format |
|---|---|---|---|
| `ANTHROPIC_API_KEY` | Native Anthropic | Anthropic | Messages API — `{role, content}` |
| `GEMINI_API_KEY` | Native Gemini | Google | `generateContent` — `{role: "model", parts: [{text}]}` |
| `OPENROUTER_API_KEY` / `HERMES_API_KEY` / other | OpenAI-compat shim | 13+ providers | OpenAI Chat Completions |
| None | Error | — | — |
**Fail-loud semantics:** if `ANTHROPIC_API_KEY` is set but the `anthropic` Python package is not installed in the workspace image, Hermes raises a `RuntimeError` immediately — before any inference attempt. Same for `google-genai`. Silent fallback to the compat shim would mask format errors; Hermes fails loudly instead.
---
## Secrets
Set provider keys as global or workspace-level secrets:
```bash
# Native Anthropic dispatch
curl -X PUT http://localhost:8080/settings/secrets \
-H "Content-Type: application/json" \
-d '{"key":"ANTHROPIC_API_KEY","value":"sk-ant-..."}'
# Native Gemini dispatch
curl -X PUT http://localhost:8080/settings/secrets \
-H "Content-Type: application/json" \
-d '{"key":"GEMINI_API_KEY","value":"YOUR-GEMINI-KEY"}'
# OpenAI-compat shim (OpenRouter, Groq, Mistral, etc.)
curl -X PUT http://localhost:8080/settings/secrets \
-H "Content-Type: application/json" \
-d '{"key":"OPENROUTER_API_KEY","value":"sk-or-..."}'
```
To force a specific workspace to use Gemini dispatch when a global `ANTHROPIC_API_KEY` is set, clear the key at the workspace level:
```bash
curl -X PUT http://localhost:8080/workspaces/$GEMINI_WS/secrets \
-H "Content-Type: application/json" \
-d '{"key":"ANTHROPIC_API_KEY","value":""}'
```
---
## Quickstart
### Native Anthropic dispatch
```bash
export MOLECULE_API=http://localhost:8080
# 1. Store your Anthropic key
curl -s -X PUT $MOLECULE_API/settings/secrets \
-H "Content-Type: application/json" \
-d '{"key":"ANTHROPIC_API_KEY","value":"sk-ant-YOUR-KEY"}' | jq .
# 2. Create a Hermes workspace
ANTHROPIC_WS=$(curl -s -X POST $MOLECULE_API/workspaces \
-H "Content-Type: application/json" \
-d '{
"name": "hermes-anthropic",
"role": "Inference worker — native Anthropic path",
"runtime": "hermes",
"model": "anthropic:claude-sonnet-4-5"
}' | jq -r '.id')
# 3. Wait for ready
until curl -s $MOLECULE_API/workspaces/$ANTHROPIC_WS \
| jq -r '.status' | grep -q ready; do sleep 5; done
# 4. Confirm dispatch path
curl -s -X POST $MOLECULE_API/workspaces/$ANTHROPIC_WS/a2a \
-H "Content-Type: application/json" \
-d '{
"jsonrpc":"2.0","id":"probe-1","method":"message/send",
"params":{"message":{"role":"user","parts":[{"kind":"text",
"text":"Which provider API are you calling to generate this response?"}]}}
}' | jq '.result.parts[0].text'
# Expected: confirms Anthropic Messages API — no OpenAI-compat translation layer
```
### Native Gemini dispatch
```bash
# 1. Store your Gemini key
curl -s -X PUT $MOLECULE_API/settings/secrets \
-H "Content-Type: application/json" \
-d '{"key":"GEMINI_API_KEY","value":"YOUR-GEMINI-KEY"}' | jq .
# 2. Create a Gemini workspace
GEMINI_WS=$(curl -s -X POST $MOLECULE_API/workspaces \
-H "Content-Type: application/json" \
-d '{
"name": "hermes-gemini",
"role": "Inference worker — native Gemini path",
"runtime": "hermes",
"model": "gemini:gemini-2.0-flash"
}' | jq -r '.id')
# 3. Wait for ready
until curl -s $MOLECULE_API/workspaces/$GEMINI_WS \
| jq -r '.status' | grep -q ready; do sleep 5; done
# 4. Confirm dispatch path
curl -s -X POST $MOLECULE_API/workspaces/$GEMINI_WS/a2a \
-H "Content-Type: application/json" \
-d '{
"jsonrpc":"2.0","id":"probe-2","method":"message/send",
"params":{"message":{"role":"user","parts":[{"kind":"text",
"text":"Which provider API are you calling?"}]}}
}' | jq '.result.parts[0].text'
# Expected: confirms Google generateContent — role: "model" + parts[] wrapper used correctly
```
### Multi-turn history (Phase 2c)
```bash
# Turn 1
curl -s -X POST $MOLECULE_API/workspaces/$ANTHROPIC_WS/a2a \
-H "Content-Type: application/json" \
-d '{
"jsonrpc":"2.0","id":"turn-1","method":"message/send",
"params":{"message":{"role":"user","parts":[{"kind":"text",
"text":"My name is Alice. Remember that."}]}}
}' | jq '.result.parts[0].text'
# Turn 2 — history is threaded as turns, not flattened into a single blob
curl -s -X POST $MOLECULE_API/workspaces/$ANTHROPIC_WS/a2a \
-H "Content-Type: application/json" \
-d '{
"jsonrpc":"2.0","id":"turn-2","method":"message/send",
"params":{"message":{"role":"user","parts":[{"kind":"text",
"text":"What is my name?"}]}}
}' | jq '.result.parts[0].text'
# Expected: "Alice" — role attribution is preserved across turns
```
Before Phase 2c, multi-turn history was flattened into a single user blob. The model could often recover context from the text but lost clean role attribution, which caused failures on structured prompts. Phase 2c passes turns as turns: OpenAI and Anthropic use `{role, content}`; Gemini uses `{role: "model", parts: [{text}]}`.
---
## Multi-provider teams
An orchestrator can fan tasks to Anthropic and Gemini workers simultaneously, each routed through its native path — no application-level provider switching required:
```bash
# Fan out — both workers fire via delegate_task_async
curl -s -X POST $MOLECULE_API/workspaces/$ORCH_ID/a2a \
-H "Content-Type: application/json" \
-d "{
\"jsonrpc\":\"2.0\",\"id\":\"fan-1\",\"method\":\"message/send\",
\"params\":{\"message\":{\"role\":\"user\",\"parts\":[{\"kind\":\"text\",
\"text\":\"delegate_task_async $ANTHROPIC_WS 'Draft release notes for v2.1' AND delegate_task_async $GEMINI_WS 'Summarise the last 30 days of support tickets'\"}]}}
}" | jq .
```
Both workers receive correctly formatted messages through their native paths. No LiteLLM proxy layer. No format translation overhead on every request.
---
## Advanced: stacked system messages
[NousResearch Hermes 4](https://hermes4.nousresearch.com) works best when persona, tool context, and reasoning policy are sent as **separate** `{"role": "system"}` entries rather than one concatenated string. `HermesA2AExecutor` supports this via the `system_blocks` kwarg (PR #499).
### Usage
```python
from workspace_template.executors.hermes_a2a_executor import HermesA2AExecutor
executor = HermesA2AExecutor(
system_blocks=[
"You are a senior security auditor. Be terse and precise.", # persona
"You have access to bash, file search, and grep tools.", # tools context
"Think step-by-step before concluding. Cite evidence.", # reasoning policy
]
)
```
The executor emits each non-empty, non-`None` block as a separate `{"role": "system"}` message in the recommended order: **persona → tools context → reasoning policy**.
### Behaviour
| Condition | Result |
|-----------|--------|
| `system_blocks` is set | Emits one `{"role": "system"}` per non-empty block; `system_prompt` is ignored |
| Entry is `None` or `""` | Silently skipped |
| All entries empty | Zero system messages emitted |
| `system_blocks` not set (`None`) | Falls back to the legacy `system_prompt` path — **fully backward-compatible** |
### Backward compatibility
Callers that pass a single `system_prompt` string are **unaffected**:
```python
# Legacy path — still works, no changes required
executor = HermesA2AExecutor(
system_prompt="You are a security auditor. Think step-by-step."
)
```
Only set `system_blocks` when you want fine-grained control over block ordering or need to inject tool manifests into a dedicated block.
---
## Native tools parameter (Phase 2e — PR #644)
Hermes now passes tool definitions to the model via the native `tools=[]` API parameter instead of injecting them as text in the prompt. This applies to the **Anthropic native dispatch path** and produces structured tool call/result blocks that the Nous/Hermes-3 tool call format handles correctly.
```python
executor = HermesA2AExecutor(
tools=[
{
"name": "bash",
"description": "Run a bash command and return stdout/stderr.",
"input_schema": {
"type": "object",
"properties": {
"command": {"type": "string", "description": "The shell command to run"}
},
"required": ["command"]
}
}
]
)
```
The OpenAI-compat shim path also accepts `tools=[]` but continues to inject them as text-in-prompt for compatibility with OpenRouter-routed models that don't natively support tool calls.
## Structured output — `response_format` (Phase 2e — PR #645)
`response_format=json_schema` is wired through to the Anthropic native dispatch path. Pass a JSON Schema definition to request strictly-typed JSON output from the model:
```python
executor = HermesA2AExecutor(
response_format={
"type": "json_schema",
"json_schema": {
"name": "audit_finding",
"schema": {
"type": "object",
"properties": {
"severity": {"type": "string", "enum": ["critical", "high", "medium", "low"]},
"description": {"type": "string"},
"remediation": {"type": "string"}
},
"required": ["severity", "description", "remediation"]
}
}
}
)
```
The model's completion will always be valid JSON matching the schema. The Gemini native and OpenAI-compat shim paths do not yet support `response_format` — it is silently ignored on those paths.
---
## Capability table
### Shipped (Phases 2a2e — all merged to main)
| Capability | OpenAI-compat shim | Anthropic native | Gemini native |
|---|---|---|---|
| Plain text, single-turn | ✅ | ✅ | ✅ |
| Multi-turn history | ⚠️ flattened into one user blob | ✅ role-attributed turns | ✅ `role: "model"` + `parts` wrapper |
| Correct Gemini wire format | ❌ wrong role, missing parts | — | ✅ |
| No compat-shim translation overhead | ❌ every request translated | ✅ | ✅ |
| Stacked system messages (`system_blocks`) | ❌ | ✅ | ✅ |
| Native `tools=[]` parameter | ⚠️ text-in-prompt injection | ✅ PR #644 | 📋 roadmap |
| Structured output (`response_format=json_schema`) | ❌ | ✅ PR #645 | 📋 roadmap |
### Roadmap (future release)
| Capability | Anthropic native | Gemini native |
|---|---|---|
| Vision content blocks | 📋 | 📋 |
| Streaming | 📋 | 📋 |
| Native tools on Gemini path | — | 📋 |
| Structured output on Gemini path | — | 📋 |
---
## Troubleshooting
### `RuntimeError: anthropic is not installed`
The `anthropic` Python package is missing from the workspace image. Add `anthropic` to `requirements.txt` in your custom image and rebuild, or use the standard `molecule-ai-workspace-template-hermes` image.
### Gemini workspace getting Anthropic dispatch instead
A global `ANTHROPIC_API_KEY` is taking priority. Clear it at the workspace level:
```bash
curl -X PUT $MOLECULE_API/workspaces/$GEMINI_WS/secrets \
-d '{"key":"ANTHROPIC_API_KEY","value":""}'
```
### Multi-turn context lost between calls
Each workspace maintains its own history buffer. Ensure you are sending all turns of a conversation to the same workspace. A2A `context_id` scopes history within the workspace.
### OpenAI-compat shim returns garbled Gemini output
If you are routing a Gemini model through a key that triggers the compat shim (e.g. `OPENROUTER_API_KEY`), you will see the old role/format translation issues. Switch to `GEMINI_API_KEY` for native dispatch.
---
## See also
- [Concepts — Workspaces](/docs/concepts#workspaces)
- [API Reference — POST /workspaces](/docs/api-reference#post-workspaces)
- [Google ADK Runtime](/docs/google-adk) — Gemini-native alternative to Hermes for ADK-first workflows
- PR #240: [Phase 2a — native Anthropic dispatch](https://github.com/Molecule-AI/molecule-core/pull/240)
- PR #255: [Phase 2b — native Gemini dispatch](https://github.com/Molecule-AI/molecule-core/pull/255)
- PR #267: [Phase 2c — multi-turn history on all paths](https://github.com/Molecule-AI/molecule-core/pull/267)
- Issue [#513](https://github.com/Molecule-AI/molecule-core/issues/513)