molecule-ai-workspace-templ.../docs/ARCHITECTURE.md
Hongming Wang 7e7871875c
fix(v2.1.0): startup bugs + full provider matrix — proven end-to-end (#6)
Stacked follow-up on the v2.0.0 rewrite. The merged v2.0.0 template
had three latent issues that only surfaced during local E2E testing:

1) sudo → gosu (python:3.11-slim ships neither; only gosu was in
   the Dockerfile). start.sh was calling sudo which would have
   broken every container boot.

2) PATH pointed at /home/agent/.hermes/bin which doesn't exist —
   install.sh symlinks ~/.local/bin/hermes. Installer is also
   interactive by default; needs --skip-setup to run in docker build.

3) start.sh wrote ~/.hermes/cli-config.yaml but hermes-agent reads
   ~/.hermes/config.yaml. cli-config.yaml.example is just a starter
   file — install.sh copies it to config.yaml on first boot. Without
   our overwrite the template inherited the example default
   (anthropic/claude-opus-4.6 + provider: auto) instead of the
   workspace's chosen model. We now rewrite config.yaml every boot
   from HERMES_DEFAULT_MODEL + HERMES_INFERENCE_PROVIDER env.

Also:
- Added xz-utils + build-essential to the image (hermes installer
  extracts a Node 22 .tar.xz and some Python deps in .[all] build
  from source).
- Forward every provider key hermes-agent knows about, not just
  the 6 from v2.0.0. All ~22 providers documented in the official
  website/docs/integrations/providers.md are now wired:
    HERMES_API_KEY, NOUS_API_KEY, OPENROUTER_API_KEY, OPENAI_API_KEY,
    ANTHROPIC_API_KEY, GEMINI_API_KEY, GOOGLE_API_KEY, DEEPSEEK_API_KEY,
    GLM_API_KEY, KIMI_API_KEY, KIMI_CN_API_KEY, MINIMAX_API_KEY,
    MINIMAX_CN_API_KEY, DASHSCOPE_API_KEY, XIAOMI_API_KEY,
    ARCEEAI_API_KEY, NVIDIA_API_KEY, OLLAMA_API_KEY, HF_TOKEN,
    AI_GATEWAY_API_KEY, KILOCODE_API_KEY, OPENCODE_ZEN_API_KEY,
    OPENCODE_GO_API_KEY, COPILOT_GITHUB_TOKEN, GH_TOKEN
- config.yaml models[] list expanded to 30+ entries covering every
  provider family (Hermes 3/4, Anthropic direct, OpenAI via
  OpenRouter, Gemini direct, DeepSeek, GLM, Kimi, MiniMax global+CN,
  Qwen/DashScope, Xiaomi MiMo, Arcee Trinity, NVIDIA NIM, Ollama
  Cloud, Hugging Face catch-all, Vercel AI Gateway, OpenCode Zen+Go,
  Kilo Code, OpenRouter catch-all, custom/local).
- top-level required_env: [] — hermes supports too many providers
  for a single hardcoded requirement; per-model required_env in
  the canvas Config tab drives the real UX. hermes-agent itself
  errors loud at request time if zero providers are configured.
- HERMES_CUSTOM_BASE_URL / HERMES_CUSTOM_API_KEY env support in
  start.sh — lets operators point hermes at OpenAI direct, LM Studio,
  LiteLLM, any OpenAI-compat endpoint without exec-ing into the
  container.
- HERMES_INFERENCE_PROVIDER env — forces a specific provider,
  overriding hermes' auto-detection (which routes OPENAI_API_KEY
  to openai-codex OAuth path → 401 Missing Authentication header).
- docs/CONFIGURATION.md rewritten with the full provider matrix,
  OAuth flow, forcing a provider, auxiliary model, persistence
  layout, and the common routing gotchas surfaced during testing.
- docs/ARCHITECTURE.md adds "Provider routing (how keys become
  inference)" section.

Proved end-to-end on local Docker:
  [start.sh] hermes gateway ready on :8642 (pid 22)
  Uvicorn running on http://0.0.0.0:8000
  → A2A message/send "Respond with HERMES BRIDGE WORKING END TO END"
  ← HERMES BRIDGE WORKING END TO END — (via OpenAI Responses API)
  → "Run uname -a && whoami && pwd using your terminal tool"
  ← Linux 094f72... aarch64 GNU/Linux / agent / /home/agent
     (real tool call — not chat response)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 19:05:36 +00:00

8.6 KiB
Raw Permalink Blame History

Architecture — A2A bridge to hermes-agent

Port map

┌─────────────────── workspace container ───────────────────┐
│                                                           │
│   :8000  ← molecule_runtime (A2A server + adapter) ──┐    │
│                                                      │    │
│                       proxy                          │    │
│                                                      ▼    │
│   :8642  → hermes-agent gateway (OpenAI-compat API)       │
│            running as user `agent`, state in ~/.hermes    │
│                                                           │
└───────────────────────────────────────────────────────────┘
            ▲
            │  (only :8000 exposed outside)
            │
      platform + canvas
  • :8000 — A2A server, exposed to the rest of the platform. Contract is stable across all runtimes (langgraph, claude-code, hermes, etc.).
  • :8642 — hermes-agent's OpenAI-compatible HTTP API. Loopback only. Never routed outside the container.

Boot sequence

start.sh (runs as root inside the container):

  1. Generate a random API_SERVER_KEY if the env var isn't already set. This is hermes-agent's bearer token; the executor reads it from the env at request time.
  2. Write /home/agent/.hermes/.env:
    • API_SERVER_ENABLED=true
    • API_SERVER_KEY=<generated>
    • API_SERVER_HOST=127.0.0.1
    • API_SERVER_PORT=8642
    • Any provider keys present in the container env (HERMES_API_KEY, OPENROUTER_API_KEY, ANTHROPIC_API_KEY, OPENAI_API_KEY, GEMINI_API_KEY, MINIMAX_API_KEY) forwarded through.
  3. Launch hermes gateway in the background as user agent via sudo -u agent -E bash -lc 'hermes gateway'. Logs → /var/log/hermes-gateway.log.
  4. Poll http://127.0.0.1:8642/health up to 60×1s. Fail loud on timeout — dumps last 80 log lines to stderr so provisioning logs capture the reason.
  5. exec molecule-runtime — replaces the shell, becoming PID 1. molecule-runtime loads Adapter = HermesAgentAdapter from __init__.py and starts the A2A server on :8000.

Request flow

canvas ─── POST /a2a/... ───▶ molecule_runtime (:8000)
                                    │
                                    ▼
                        HermesAgentProxyExecutor.execute()
                                    │
                 ┌──────────────────┴──────────────────┐
                 ▼                                     ▼
      extract message text                  build {model, messages[], stream:false}
                                                      │
                                                      ▼
                                  POST 127.0.0.1:8642/v1/chat/completions
                                  Authorization: Bearer ${API_SERVER_KEY}
                                                      │
                                                      ▼
                                  hermes-agent runs the turn with its
                                  native tools (terminal, files, web,
                                  memory, skills), resolves provider
                                  from the `model` string, returns
                                  OpenAI-format response.
                                                      │
                                                      ▼
                           extract choices[0].message.content
                                                      │
                                                      ▼
                              event_queue.enqueue_event(
                                new_agent_text_message(...)
                              )
                                                      │
                                                      ▼
                                            canvas receives the reply

What the bridge is intentionally not doing

  • Provider selection. The bridge sends the model string verbatim. hermes-agent owns the registry and picks provider + API keys. If you ever feel tempted to add fallback chains here, stop — that's a regression to v1.x.
  • Tool routing. Tools are hermes-agent's job. Our bridge sees only the final assistant text.

Provider routing (how keys become inference)

Provider resolution happens inside hermes-agent, driven by:

  1. ~/.hermes/cli-config.yamlmodel.provider field. start.sh seeds this file on first boot (auto by default, or whatever HERMES_INFERENCE_PROVIDER specifies).
  2. ~/.hermes/.env — every provider key we forward from the container env (see start.sh for the full list; see CONFIGURATION.md#provider-matrix for the mapping).
  3. Auto-detection — when provider: auto, hermes walks its internal resolution order and picks the first provider whose credential is present. When multiple keys are set, prefer explicit HERMES_INFERENCE_PROVIDER to avoid surprises.

Common routing gotcha

With only OPENAI_API_KEY set and provider: auto, hermes-agent will route to openai-codex (Codex API, OAuth-only) and return:

401 - Missing Authentication header

The fix is to set HERMES_INFERENCE_PROVIDER=openrouter — hermes's openrouter provider accepts OPENAI_API_KEY as alt-auth and routes OpenAI-format Chat Completions correctly. This is documented in CONFIGURATION.md#forcing-a-provider.

Auxiliary model

Vision, web summarization, and MoA use a separate auxiliary model — defaults to Gemini Flash via OpenRouter. If OPENROUTER_API_KEY is absent, these capabilities break silently (the primary path still works). Set HERMES_AUXILIARY_PROVIDER to override.

  • Streaming. stream: false in the request payload. A later revision can upgrade to SSE by subscribing to GET /v1/runs/{run_id}/events and pushing partial messages into the A2A event queue — the AgentExecutor contract already supports multiple enqueue_event calls per turn.
  • Caching. hermes-agent has its own session store (X-Hermes-Session-Id); the bridge does not attempt to pin conversations. Molecule's A2A layer already carries session context in its message envelope.

Failure modes

Symptom Likely cause Where to look
Provisioning fails at "health probe" hermes gateway crashed during boot /var/log/hermes-gateway.log (tail in start.sh stderr dump)
Every request returns [hermes-agent error 401] API_SERVER_KEY mismatch between processes Inspect /home/agent/.hermes/.env + container env
Every request returns [hermes-agent error 400] Model string unrecognized by hermes-agent docker exec -u agent … hermes model inside the container
[hermes-agent unreachable] Gateway exited post-boot (OOM, crash) /var/log/hermes-gateway.log; may need container restart
Skills disappear between sessions /home/agent/.hermes not volume-mounted Platform-side volume config; see CONFIGURATION.md
Agent ignores provider key Key not forwarded from container env Workspace secrets; see runbooks/saas-secrets.md in monorepo

Future work

  • Streaming: subscribe to /v1/runs/.../events and pipe partial assistant tokens + tool progress into the A2A queue.
  • Session pinning: thread hermes-agent's X-Hermes-Session-Id through the A2A envelope so long conversations keep server-side context when beneficial.
  • hermes config set sync: when canvas Config tab changes the model field, also invoke hermes model <id> so CLI usage inside the workspace's Terminal tab stays in sync.
  • Gateway platforms passthrough: let customers opt their workspace into hermes-agent's Telegram/Discord/Slack platforms without duplicating the config surface.
  • Install pin: once stabilized, replace curl install.sh | bash in the Dockerfile with a pinned commit SHA so a bad upstream release doesn't break builds.